KI in der Medizin

Wissens­portal

© iStock/D3Damon

Registry Data for AI in Medicine: How Cancer Registry Data Can Be Used – From Exploration to Model Validation

Based on the articleHealth Data ≠ Health Data: Why Claims Data, Patient Records and Registry Data Are Not the Samepublished on 29 January 2026

Autorin: Dr. Eveline Prochaska, Technische Universität Dresden, 29.01.2026

Registry data for AI in medicine provide a central foundation for valid, explainable and trustworthy models. Artificial Intelligence (AI) in medicine is often described as a data-driven key technology. In practice, however, it is less the algorithm than the underlying data on which models are developed, trained and evaluated that determines success [1,2].

Building on our article on claims data, electronic health records (EHRs) and registry data, we present a specific registry data source that plays a central role in AI-in-medicine projects in Germany: krebsdaten.de [1].

Why Registry Data Are Important for AI in Medicine

Registry data differ fundamentally from clinical routine data or claims data. They are collected specifically for defined research questions, follow established documentation and quality standards, and are designed to ensure comparability [4].

For AI applications, this means:

  • Less data noise
  • Clearly defined variables
  • High methodological robustness

Registry data therefore provide a key foundation for explainable, valid and sustainably usable AI models [5,6].

What Is krebsdaten.de?

krebsdaten.de is the public information and analysis portal of the Centre for Cancer Registry Data (ZfKD) at the Robert Koch Institute. It combines data from the population-based cancer registries of all German federal states and makes them available in aggregated form [1].

The platform provides, among other things:

  • Cancer incidence, mortality and prevalence
  • Survival rates and time trends
  • Regional analyses
  • Regular reports such as Cancer in Germany

This makes krebsdaten.de a central reference source for cancer epidemiology in Germany.

What Data Are Available?

The underlying cancer registries collect information including:

  • Tumour type and location
  • Date of disease onset and diagnosis
  • Age and sex groups
  • Survival probabilities

The data are plausibility-checked, harmonised and statistically processed. Personal raw data are not publicly accessible. The focus is deliberately placed on the population level [1,4].

How Can the Data Be Used?

The use of krebsdaten.de is organised at several levels and is aimed at both newcomers and experienced research teams.

Figure 2: Ways to Use Data (AI-generated with ChatGPT)

a) Open Access: Public Database Query

The interactive database allows users to retrieve aggregated data without submitting an application, for example by cancer type, year, age, sex or region.

Suitable for:

  • Initial insights into oncological data
  • Trend and comparative analyses
  • Exploration of AI-related research questions
  • Reference and benchmark data

This provides a low-threshold entry point, particularly for companies or institutions with little experience in working with registry data.

 

b) Scientific Use: Research Data by Application

For more in-depth analyses, structured research datasets can be requested from the ZfKD. A clearly defined scientific research question and a methodological concept are required.

AI rarely fails because of the algorithm, but because of unrealistic expectations regarding data.Registry data help bridge this gap.

Suitable for:

  • Validation of AI models
  • Population-based risk or prognostic models
  • Methodologically demanding research projects

Data are provided in compliance with data protection requirements in anonymised or strongly pseudonymised form [1,4].

 

c) Combining Registry Data with Other Data Sources

For many AI-in-medicine applications, the greatest added value arises from combining registry data with other data sources such as EHR data, claims data or imaging data [6,8].

In such multimodal approaches, registry data often serve as a reference, validation or calibration layer, while clinical routine data provide individual-level depth [5,9,10].

This combination is particularly relevant for explainable and responsible AI systems in healthcare [6].

Conclusion

Registry data such as those available through krebsdaten.de make an important contribution to the classification, validation and trustworthiness of AI models in medicine [1,5,6].

Referenzen

  1. Robert Koch Institute (Ed.). Cancer in Germany. Centre for Cancer Registry Data (ZfKD), current edition. https://www.krebsdaten.de
  2. Sherman RE et al. Real-World Evidence — What Is It and What Can It Tell Us?
    New England Journal of Medicine, 2016. DOI: 10.1056/NEJMsb1609216
  3. Berger ML et al. Good practices for real-world data studies of treatment and/or comparative effectiveness. Pharmacoepidemiology and Drug Safety, 2017. DOI: 10.1002/pds.4297
  4. Gliklich RE, Dreyer NA, Leavy MB (Hrsg.) Registries for Evaluating Patient Outcomes: A User’s Guide. AHRQ, 3rd Edition, 2014. unter https://effectivehealthcare.ahrq.gov/sites/default/files/pdf/registries-guide-3rd-edition_research.pdf, Zugriff am 30.1.26
  5. Topol EJ. High-performance medicine: the convergence of human and artificial intelligence. Nature Medicine, 2019. DOI: 10.1038/s41591-018-0300-7
  6. Wiens J et al. Do no harm: a roadmap for responsible machine learning for health care. Nature Medicine, 2019. DOI: 10.1038/s41591-019-0548-6
  7. Hersh WR et al. Caveats for the use of operational electronic health record data in comparative effectiveness research. Medical Care, 2013. DOI: 10.1097/MLR.0b013e31829b1dbd
  8. Esteva A et al. A guide to deep learning in healthcare. Nature Medicine, 2019.
    DOI: 10.1038/s41591-018-0316-z
  9. Rieke N et al. The future of digital health with federated learning. npj Digital Medicine, 2020. DOI: 10.1038/s41746-020-00323-1
  10. Knevel R et al. From real-world electronic health record data to real-world results using artificial intelligence. Annals of the Rheumatic Diseases, 2023. DOI: 10.1136/ard-2022-222626
Co-funded by the European Union
This project is co-financed from tax revenues on the basis of the budget adopted by the Saxon State Parliament
You are using an outdated browser. The website may not be displayed correctly.