Multi-cancer risk stratification based on national health data

Multi-cancer risk stratification based on national health data: a retrospective modelling and validation study

Publikation: Bidrag til tidsskrift › Tidsskriftartikel › Forskning › fagfællebedømt

Standard

Multi-cancer risk stratification based on national health data : a retrospective modelling and validation study. / Jung, Alexander W.; Holm, Peter C; Gaurav, Kumar; Hjaltelin, Jessica Xin; Placido, Davide; Mortensen, Laust Hvas; Birney, Ewan; Brunak, Søren; Gerstung, Moritz.

I: The Lancet Digital Health, Bind 6, Nr. 6, 2024, s. e396-e406.

Publikation: Bidrag til tidsskrift › Tidsskriftartikel › Forskning › fagfællebedømt

Harvard

Jung, AW, Holm, PC, Gaurav, K, Hjaltelin, JX, Placido, D, Mortensen, LH, Birney, E, Brunak, S & Gerstung, M 2024, 'Multi-cancer risk stratification based on national health data: a retrospective modelling and validation study', The Lancet Digital Health, bind 6, nr. 6, s. e396-e406. https://doi.org/10.1016/S2589-7500(24)00062-1

APA

Jung, A. W., Holm, P. C., Gaurav, K., Hjaltelin, J. X., Placido, D., Mortensen, L. H., Birney, E., Brunak, S., & Gerstung, M. (2024). Multi-cancer risk stratification based on national health data: a retrospective modelling and validation study. The Lancet Digital Health, 6(6), e396-e406. https://doi.org/10.1016/S2589-7500(24)00062-1

Vancouver

Jung AW, Holm PC, Gaurav K, Hjaltelin JX, Placido D, Mortensen LH o.a. Multi-cancer risk stratification based on national health data: a retrospective modelling and validation study. The Lancet Digital Health. 2024;6(6):e396-e406. https://doi.org/10.1016/S2589-7500(24)00062-1

Author

Jung, Alexander W. ; Holm, Peter C ; Gaurav, Kumar ; Hjaltelin, Jessica Xin ; Placido, Davide ; Mortensen, Laust Hvas ; Birney, Ewan ; Brunak, Søren ; Gerstung, Moritz. / Multi-cancer risk stratification based on national health data : a retrospective modelling and validation study. I: The Lancet Digital Health. 2024 ; Bind 6, Nr. 6. s. e396-e406.

Bibtex

@article{f82f4c78240f46b6895802a49765f74c,

title = "Multi-cancer risk stratification based on national health data: a retrospective modelling and validation study",

abstract = "Background: Health care is experiencing a drive towards digitisation, and many countries are implementing national health data resources. Although a range of cancer risk models exists, the utility on a population level for risk stratification across cancer types has not been fully explored. We aimed to close this gap by evaluating pan-cancer risk models built on electronic health records across the Danish population with validation in the UK Biobank. Methods: In this retrospective modelling and validation study, data for model development and internal validation were derived from the following Danish health registries: the Central Person Registry, the Danish National Patient Registry, the death registry, the cancer registry, and full-text medical records from secondary care records in the capital region. The development data included adults aged 16–86 years without previous malignant cancers in the time period from Jan 1, 1995, to Dec 31, 2014. The internal validation period was from Jan 1, 2015, to April 10, 2018, and the data included all adults without a previous indication of cancer aged 16–75 years on Dec 31, 2014. The external validation cohort from the UK Biobank included all adults without a previous indication of cancer aged 50–75 years. We used time-dependent Bayesian Cox hazard models built on the combined medical history of Danish individuals. A set of 1392 covariates from available clinical disease trajectories, text-mined basic health factors, and family histories were used to train predictive models of 20 major cancer types. The models were validated on cancer incidence between 2015 and 2018 across Denmark and on individuals in the UK Biobank. The primary outcomes were discrimination and calibration performance. Findings: From the Danish registries, we included 6 732 553 individuals covering 60 million hospital visits, 90 million diagnoses, and a total of 193 million life-years between Jan 1, 1978, and April 10, 2018. Danish registry data covering the period from Jan 1, 2015, to April 10, 2018, were used to internally validate risk models, containing a total of 4 248 491 individuals who remained at risk of a primary malignant cancer diagnosis and 67 401 cancer cases recorded. For the external validation, we evaluated the same time period in the UK Biobank covering 377 004 individuals with 11 486 cancer cases. The predictive performance of the models on Danish data showed good discrimination (concordance index 0·81 [SD 0·08], ranging from 0·66 [95% CI 0·65–0·67] for cervix uteri cancer to 0·91 [0·90–0·92] for liver cancer). Performance was similar on the UK Biobank in a direct transfer when controlling for shifts in the age distribution (concordance index 0·66 [SD 0·08], ranging from 0·55 [95% CI 0·44–0·66] for cervix uteri cancer to 0·78 [0·77–0·79] for lung cancer). Cancer risks were associated, in addition to heritable components, with a broad range of preceding diagnoses and health factors. The best overall performance was seen for cancers of the digestive system (oesophageal, stomach, colorectal, liver, and pancreatic) but also thyroid, kidney, and uterine cancers. Interpretation: Data available in national electronic health databases can be used to approximate cancer risk factors and enable risk predictions in most cancer types. Model predictions generalise between the Danish and UK health-care systems. With the emergence of multi-cancer early detection tests, electronic health record-based risk models could supplement screening efforts. Funding: Novo Nordisk Foundation and the Danish Innovation Foundation.",

author = "Jung, {Alexander W.} and Holm, {Peter C} and Kumar Gaurav and Hjaltelin, {Jessica Xin} and Davide Placido and Mortensen, {Laust Hvas} and Ewan Birney and S{\o}ren Brunak and Moritz Gerstung",

note = "Publisher Copyright: {\textcopyright} 2024 The Author(s). Published by Elsevier Ltd. This is an Open Access article under the CC BY 4.0 license",

year = "2024",

doi = "10.1016/S2589-7500(24)00062-1",

language = "English",

volume = "6",

pages = "e396--e406",

journal = "The Lancet Digital Health",

issn = "2589-7500",

publisher = "Elsevier",

number = "6",

}

RIS

TY - JOUR

T1 - Multi-cancer risk stratification based on national health data

T2 - a retrospective modelling and validation study

AU - Jung, Alexander W.

AU - Holm, Peter C

AU - Gaurav, Kumar

AU - Hjaltelin, Jessica Xin

AU - Placido, Davide

AU - Mortensen, Laust Hvas

AU - Birney, Ewan

AU - Brunak, Søren

AU - Gerstung, Moritz

PY - 2024

Y1 - 2024

N2 - Background: Health care is experiencing a drive towards digitisation, and many countries are implementing national health data resources. Although a range of cancer risk models exists, the utility on a population level for risk stratification across cancer types has not been fully explored. We aimed to close this gap by evaluating pan-cancer risk models built on electronic health records across the Danish population with validation in the UK Biobank. Methods: In this retrospective modelling and validation study, data for model development and internal validation were derived from the following Danish health registries: the Central Person Registry, the Danish National Patient Registry, the death registry, the cancer registry, and full-text medical records from secondary care records in the capital region. The development data included adults aged 16–86 years without previous malignant cancers in the time period from Jan 1, 1995, to Dec 31, 2014. The internal validation period was from Jan 1, 2015, to April 10, 2018, and the data included all adults without a previous indication of cancer aged 16–75 years on Dec 31, 2014. The external validation cohort from the UK Biobank included all adults without a previous indication of cancer aged 50–75 years. We used time-dependent Bayesian Cox hazard models built on the combined medical history of Danish individuals. A set of 1392 covariates from available clinical disease trajectories, text-mined basic health factors, and family histories were used to train predictive models of 20 major cancer types. The models were validated on cancer incidence between 2015 and 2018 across Denmark and on individuals in the UK Biobank. The primary outcomes were discrimination and calibration performance. Findings: From the Danish registries, we included 6 732 553 individuals covering 60 million hospital visits, 90 million diagnoses, and a total of 193 million life-years between Jan 1, 1978, and April 10, 2018. Danish registry data covering the period from Jan 1, 2015, to April 10, 2018, were used to internally validate risk models, containing a total of 4 248 491 individuals who remained at risk of a primary malignant cancer diagnosis and 67 401 cancer cases recorded. For the external validation, we evaluated the same time period in the UK Biobank covering 377 004 individuals with 11 486 cancer cases. The predictive performance of the models on Danish data showed good discrimination (concordance index 0·81 [SD 0·08], ranging from 0·66 [95% CI 0·65–0·67] for cervix uteri cancer to 0·91 [0·90–0·92] for liver cancer). Performance was similar on the UK Biobank in a direct transfer when controlling for shifts in the age distribution (concordance index 0·66 [SD 0·08], ranging from 0·55 [95% CI 0·44–0·66] for cervix uteri cancer to 0·78 [0·77–0·79] for lung cancer). Cancer risks were associated, in addition to heritable components, with a broad range of preceding diagnoses and health factors. The best overall performance was seen for cancers of the digestive system (oesophageal, stomach, colorectal, liver, and pancreatic) but also thyroid, kidney, and uterine cancers. Interpretation: Data available in national electronic health databases can be used to approximate cancer risk factors and enable risk predictions in most cancer types. Model predictions generalise between the Danish and UK health-care systems. With the emergence of multi-cancer early detection tests, electronic health record-based risk models could supplement screening efforts. Funding: Novo Nordisk Foundation and the Danish Innovation Foundation.

AB - Background: Health care is experiencing a drive towards digitisation, and many countries are implementing national health data resources. Although a range of cancer risk models exists, the utility on a population level for risk stratification across cancer types has not been fully explored. We aimed to close this gap by evaluating pan-cancer risk models built on electronic health records across the Danish population with validation in the UK Biobank. Methods: In this retrospective modelling and validation study, data for model development and internal validation were derived from the following Danish health registries: the Central Person Registry, the Danish National Patient Registry, the death registry, the cancer registry, and full-text medical records from secondary care records in the capital region. The development data included adults aged 16–86 years without previous malignant cancers in the time period from Jan 1, 1995, to Dec 31, 2014. The internal validation period was from Jan 1, 2015, to April 10, 2018, and the data included all adults without a previous indication of cancer aged 16–75 years on Dec 31, 2014. The external validation cohort from the UK Biobank included all adults without a previous indication of cancer aged 50–75 years. We used time-dependent Bayesian Cox hazard models built on the combined medical history of Danish individuals. A set of 1392 covariates from available clinical disease trajectories, text-mined basic health factors, and family histories were used to train predictive models of 20 major cancer types. The models were validated on cancer incidence between 2015 and 2018 across Denmark and on individuals in the UK Biobank. The primary outcomes were discrimination and calibration performance. Findings: From the Danish registries, we included 6 732 553 individuals covering 60 million hospital visits, 90 million diagnoses, and a total of 193 million life-years between Jan 1, 1978, and April 10, 2018. Danish registry data covering the period from Jan 1, 2015, to April 10, 2018, were used to internally validate risk models, containing a total of 4 248 491 individuals who remained at risk of a primary malignant cancer diagnosis and 67 401 cancer cases recorded. For the external validation, we evaluated the same time period in the UK Biobank covering 377 004 individuals with 11 486 cancer cases. The predictive performance of the models on Danish data showed good discrimination (concordance index 0·81 [SD 0·08], ranging from 0·66 [95% CI 0·65–0·67] for cervix uteri cancer to 0·91 [0·90–0·92] for liver cancer). Performance was similar on the UK Biobank in a direct transfer when controlling for shifts in the age distribution (concordance index 0·66 [SD 0·08], ranging from 0·55 [95% CI 0·44–0·66] for cervix uteri cancer to 0·78 [0·77–0·79] for lung cancer). Cancer risks were associated, in addition to heritable components, with a broad range of preceding diagnoses and health factors. The best overall performance was seen for cancers of the digestive system (oesophageal, stomach, colorectal, liver, and pancreatic) but also thyroid, kidney, and uterine cancers. Interpretation: Data available in national electronic health databases can be used to approximate cancer risk factors and enable risk predictions in most cancer types. Model predictions generalise between the Danish and UK health-care systems. With the emergence of multi-cancer early detection tests, electronic health record-based risk models could supplement screening efforts. Funding: Novo Nordisk Foundation and the Danish Innovation Foundation.

U2 - 10.1016/S2589-7500(24)00062-1

DO - 10.1016/S2589-7500(24)00062-1

M3 - Journal article

C2 - 38789140

AN - SCOPUS:85193505927

VL - 6

SP - e396-e406

JO - The Lancet Digital Health

JF - The Lancet Digital Health

SN - 2589-7500

IS - 6

ER -

ID: 394711306