KI in der Medizin

Wissens­portal

© iStock/Yakobchukolena

Health Data ≠ Health Data: Why Claims Data, Patient Records and Registry Data Are Not the Same

Author: Dr Eveline Prochaska, Dresden University of Technology, 29 January 2026

Artificial Intelligence (AI) in medicine is considered a key technology for improving diagnostics, prognostic models and data-driven healthcare. In practice, however, it is less the algorithm than the underlying data source that determines whether AI models are valid, explainable and transferable.

Three types of data play a central role in the field of health data: claims data, electronic health records (EHRs / ePAs) and registry data. They differ fundamentally, with direct consequences for the use of AI.

1. Claims Data

Claims data are generated primarily for the reimbursement of medical services and contain structured information on diagnoses, procedures, prescriptions and costs. They often cover very large populations over long periods of time and are considered a classic form of Real-World Data (RWD) [1].

Relevance for AI

  • Very large case numbers → suitable for Machine Learning models at the population level
  • Enable longitudinal analyses over many years
  • Relevant foundation for health services research, health economics and risk stratification [2]

Limitations

  • Limited clinical detail
  • Diagnoses are not necessarily clinically validated
  • No laboratory values, vital signs or clinical findings

For AI in medicine, claims data are particularly suitable for cost, risk and utilisation models, but less suitable for clinical decision support.

2. Electronic Health Records (EHRs / ePAs)

Data from electronic health records are generated directly during the clinical care process. They include laboratory values, vital signs, medication histories, imaging data, clinical findings and extensive unstructured text. Today, EHR data form the central foundation of many AI applications in medicine [3].

Relevance for AI / Medical AI

  • Rich clinical context
  • Foundation for Deep Learning, Natural Language Processing (NLP) and image analysis
  • Enables detailed phenotyping and personalised prediction models [4]

Challenges

  • Fragmented data across different institutions
  • Heterogeneous documentation standards
  • Significant effort required for data harmonisation, bias analysis and quality assurance [3]

For clinical AI models, patient records are essential. However, their use requires considerable technical and domain-specific expertise.

3. Registry Data

Registry data are collected specifically for clearly defined research questions, for example relating to particular diseases, therapies or medical devices. They follow predefined inclusion criteria and quality standards and are regarded as a particularly valid source of real-world data [5].

Relevance for AI

  • High data quality and structured variables
  • Particularly suitable for outcome analyses and model validation
  • Relevant for regulatory and quality assurance purposes [6]

Limitations

  • Limited case numbers
  • Selective populations
  • Often no complete longitudinal care pathway

For AI applications, registry data are particularly valuable for validating and safeguarding models.

 

Figure 1: Graphical comparison of the three health data types presented (AI-generated with the help of ChatGPT)

Why the Data Source Matters for AI

None of these data sources is inherently superior. The real added value comes from combining them:

  • Claims data provide breadth and temporal continuity
  • EHR data provide clinical depth
  • Registry data provide quality and focus

Modern data infrastructures and common data models (e.g. Sentinel, OMOP) aim to integrate these data sources in a controlled manner [7].

Conclusion

AI in medicine rarely fails because of the algorithm, but often because of unrealistic expectations regarding the data. For research, clinical practice and industry, the key question therefore remains: Which data are suitable for which research question – and with what limitations?

Referenzen

  1. Sherman RE et al. Real-World Evidence — What Is It and What Can It Tell Us?
    N Engl J Med, 2016. DOI: 10.1056/NEJMsb1609216
  2. Berger ML et al. Good practices for real-world data studies of treatment and/or comparative effectiveness. Pharmacoepidemiology and Drug Safety, 2017. DOI: 10.1002/pds.4297
  3. Hersh WR et al. Caveats for the use of operational electronic health record data in comparative effectiveness research. Medical Care, 2013. DOI: 10.1097/MLR.0b013e31829b1dbd
  4. Knevel R et al. From real-world electronic health record data to real-world results using artificial intelligence. Annals of the Rheumatic Diseases, 2023. DOI: 10.1136/ard-2022-222626
  5. Gliklich RE et al. Registries for Evaluating Patient Outcomes: A User’s Guide.
    AHRQ, 2014. ISBN: 978-1-58763-439-1
  6. Ioakeim-Skoufa I et al. Electronic Health Records: A Gateway to AI-Driven Multimorbidity Solutions—A Comprehensive Systematic Review. Journal of Clinical Medicine, 2025. DOI: 10.3390/jcm14103434
  7. Platt R et al. The FDA Sentinel Initiative — An evolving national resource.
    N Engl J Med, 2018. DOI: 10.1056/NEJMp1809643
Co-funded by the European Union
This project is co-financed from tax revenues on the basis of the budget adopted by the Saxon State Parliament
You are using an outdated browser. The website may not be displayed correctly.