Health Data ≠ Health Data: Why Claims Data, Patient Records and Registry Data Are Not the Same

Author: Dr Eveline Prochaska, Dresden University of Technology, 29 January 2026

Artificial Intelligence (AI) in medicine is considered a key technology for improving diagnostics, prognostic models and data-driven healthcare. In practice, however, it is less the algorithm than the underlying data source that determines whether AI models are valid, explainable and transferable.

Three types of data play a central role in the field of health data: claims data, electronic health records (EHRs / ePAs) and registry data. They differ fundamentally, with direct consequences for the use of AI.

1. Claims Data

Claims data are generated primarily for the reimbursement of medical services and contain structured information on diagnoses, procedures, prescriptions and costs. They often cover very large populations over long periods of time and are considered a classic form of Real-World Data (RWD) [1].

Relevance for AI

Very large case numbers → suitable for Machine Learning models at the population level
Enable longitudinal analyses over many years
Relevant foundation for health services research, health economics and risk stratification [2]

Limitations

Limited clinical detail
Diagnoses are not necessarily clinically validated
No laboratory values, vital signs or clinical findings

For AI in medicine, claims data are particularly suitable for cost, risk and utilisation models, but less suitable for clinical decision support.

2. Electronic Health Records (EHRs / ePAs)

Data from electronic health records are generated directly during the clinical care process. They include laboratory values, vital signs, medication histories, imaging data, clinical findings and extensive unstructured text. Today, EHR data form the central foundation of many AI applications in medicine [3].

Relevance for AI / Medical AI

Rich clinical context
Foundation for Deep Learning, Natural Language Processing (NLP) and image analysis
Enables detailed phenotyping and personalised prediction models [4]

Challenges

Fragmented data across different institutions
Heterogeneous documentation standards
Significant effort required for data harmonisation, bias analysis and quality assurance [3]

For clinical AI models, patient records are essential. However, their use requires considerable technical and domain-specific expertise.

3. Registry Data

Registry data are collected specifically for clearly defined research questions, for example relating to particular diseases, therapies or medical devices. They follow predefined inclusion criteria and quality standards and are regarded as a particularly valid source of real-world data [5].

Relevance for AI

High data quality and structured variables
Particularly suitable for outcome analyses and model validation
Relevant for regulatory and quality assurance purposes [6]

Limitations

Limited case numbers
Selective populations
Often no complete longitudinal care pathway

For AI applications, registry data are particularly valuable for validating and safeguarding models.

Figure 1: Graphical comparison of the three health data types presented (AI-generated with the help of ChatGPT)

Why the Data Source Matters for AI

None of these data sources is inherently superior. The real added value comes from combining them:

Claims data provide breadth and temporal continuity
EHR data provide clinical depth
Registry data provide quality and focus

Modern data infrastructures and common data models (e.g. Sentinel, OMOP) aim to integrate these data sources in a controlled manner [7].

Conclusion

AI in medicine rarely fails because of the algorithm, but often because of unrealistic expectations regarding the data. For research, clinical practice and industry, the key question therefore remains: Which data are suitable for which research question – and with what limitations?

Referenzen

Sherman RE et al. Real-World Evidence — What Is It and What Can It Tell Us?
N Engl J Med, 2016. DOI: 10.1056/NEJMsb1609216
Berger ML et al. Good practices for real-world data studies of treatment and/or comparative effectiveness. Pharmacoepidemiology and Drug Safety, 2017. DOI: 10.1002/pds.4297
Hersh WR et al. Caveats for the use of operational electronic health record data in comparative effectiveness research. Medical Care, 2013. DOI: 10.1097/MLR.0b013e31829b1dbd
Knevel R et al. From real-world electronic health record data to real-world results using artificial intelligence. Annals of the Rheumatic Diseases, 2023. DOI: 10.1136/ard-2022-222626
Gliklich RE et al. Registries for Evaluating Patient Outcomes: A User’s Guide.
AHRQ, 2014. ISBN: 978-1-58763-439-1
Ioakeim-Skoufa I et al. Electronic Health Records: A Gateway to AI-Driven Multimorbidity Solutions—A Comprehensive Systematic Review. Journal of Clinical Medicine, 2025. DOI: 10.3390/jcm14103434
Platt R et al. The FDA Sentinel Initiative — An evolving national resource.
N Engl J Med, 2018. DOI: 10.1056/NEJMp1809643

Back to the news overview

Knowledge Portal