
A groundbreaking artificial intelligence (AI) model now predicts the progression of over 1,000 diseases throughout a person’s life. This innovation promises to transform precision prevention, healthcare policy planning, and bias-aware medical innovations. The study, titled “Learning the Natural History of Human Disease with Generative Transformers,” was published in the journal Nature.
Researchers have developed a machine learning model that utilizes large-scale health data to forecast the trajectory of 1,256 distinct ICD-10 level 3 diseases. By analyzing patients’ past medical histories, the model achieves predictive accuracy comparable to existing tools focused on individual diseases. It can simulate future health trajectories for up to two decades, offering insights into personalized health risks and comorbidities.
The Need for Complex Disease Models
Human disease progression is a complex interplay of health, acute illness, and chronic conditions, often appearing as clusters of comorbidities influenced by genetics, lifestyle, and socioeconomic factors. Understanding these patterns is crucial for personalized healthcare, lifestyle guidance, and effective early screening programs. However, traditional algorithms, designed for single diseases, fall short in capturing the complexity of over 1,000 recognized health conditions.
This limitation is particularly pressing as aging populations face rising burdens of illnesses like cancer, diabetes, cardiovascular disease, and dementia. Accurate modeling of disease trajectories is vital for healthcare planning and economic policy. Artificial intelligence, particularly large language models (LLMs), offers a promising solution by learning dependencies across data sequences, much like predicting disease based on prior health events.
Inspired by this analogy, researchers have developed transformer-based models for predicting specific conditions, with encouraging early results. Yet, a truly comprehensive model capable of simulating the full spectrum of multimorbidity across time had not been systematically evaluated until now.
Developing a Large-Scale Data Model
The researchers introduced Delphi-2M, a transformer-based model, to predict lifetime disease trajectories. Unlike language models processing words, Delphi-2M worked with diagnostic codes from the tenth revision of the International Classification of Diseases (ICD-10), as well as factors like death, sex, BMI, and lifestyle habits such as smoking and alcohol use.
To address gaps in medical records, the team inserted artificial “no-event” tokens. The model’s vocabulary included disease codes, lifestyle levels, sex, no-event, and padding tokens, totaling around 1,270. Training utilized health records from the UK Biobank, with 402,799 participants for training, 100,639 for validation, and 471,057 for longitudinal testing. To test generalizability, the model was also validated on data from 1.93 million Danish individuals.
Several modifications tailored the base model to health data: replacing positional encoding with continuous age encoding, adding an output head to predict time-to-next event, and altering attention masks to prevent tokens at the same time point from influencing one another. Delphi-2M could estimate risks for over 1,000 diseases, forecast diagnosis timing, and simulate complete health trajectories.
Evaluating the Model’s Performance
Delphi-2M’s performance was evaluated using health data up to age 60 from 63,622 participants in the UK Biobank. The model generated simulated health trajectories and compared them with tangible outcomes. Predictions of disease rates at ages 70 and 75 closely matched observed patterns, confirming its ability to capture population-level incidence trends.
While predictive accuracy declined over longer time horizons, from an average AUC of approximately 0.76 to about 0.70 at 10 years, Delphi-2M still outperformed models based only on age and sex. The model effectively distinguished risks across subgroups defined by lifestyle or previous illnesses, supporting its value for personalized risk profiling.
Delphi-2M could also generate synthetic health trajectories that mirrored real-world disease patterns without duplicating individual records. A model trained solely on this synthetic data retained much of the original’s performance, showing only a three-point drop in AUC.
To interpret predictions, researchers examined the embedding space, revealing disease clusters consistent with ICD-10 chapters and showing how specific diagnoses shaped outcomes, such as the strong impact of pancreatic cancer on mortality. External validation on Danish data confirmed generalizability, with an average AUC of about 0.67, though with a modest performance drop.
Conclusions and Future Implications
The study introduced Delphi-2M, a GPT-based model capable of predicting and simulating the progression of multiple diseases over time. Compared with single-disease or biomarker-based models, Delphi-2M showed strong accuracy in forecasting health risks across more than 1,000 conditions. For diabetes risk, however, it performed lower than the single-marker HbA1c approach, with only a modest decline in performance when tested on Danish data.
Its ability to sample synthetic future trajectories allows estimation of long-term disease burdens and the creation of privacy-preserving datasets. The model also highlighted patterns of comorbidities and temporal influences of illnesses, achieving an AUC of about 0.97 for predicting death.
However, several limitations were noted. Predictions reflected biases in UK Biobank data, including healthy volunteer effects, recruitment bias, and missingness patterns. Differences were also seen across ancestry and socioeconomic groups. Importantly, the model captures statistical associations but not causal relationships, which limits its direct clinical use.
Overall, Delphi-2M demonstrates the promise of transformer-based models for personalized risk prediction, healthcare planning, and biomedical research. Future improvements may integrate multimodal data, support clinical decision-making, and aid policy development in aging populations.