Overview: The integration and analysis of electronic health record (EHR) data across institutions present both significant challenges and transformative opportunities for biomedical research. This talk outlines the development and application of federated data networks such as I2B2 and SHRINE, which enable real-time, privacy-preserving queries across distributed EHR systems. These platforms support large-scale cohort discovery and have been instrumental in initiatives like the EnACT network and the COVID-19-focused 4CE consortium. In contrast, centralized repositories like the NIH’s N3C and Epic’s Cosmos database offer comprehensive datasets conducive to machine learning but may obscure site-specific variability. 

A critical insight is that perceived data quality issues often stem from the structure and use of EHRs rather than true inaccuracies. For example, diagnosis codes may reflect billing practices rather than confirmed conditions, and laboratory test timing can be more predictive of outcomes than test values themselves. To address these limitations, probabilistic phenotyping using AI and machine learning is employed to infer more accurate disease classifications and patient health status. 

Ultimately, the choice between federated and centralized models should align with specific research goals, and understanding the context of data generation is essential for valid inference. This work underscores the importance of combining informatics infrastructure with clinical insight to harness the full potential of big health data. 

Griffin Weber, MD, PhD, Associate Professor of Medicine and Biomedical Informatics at Beth Israel Deaconess Medical Center and Harvard Medical School

About the Speaker: Griffin M Weber, MD, PhD, is an Associate Professor of Medicine and Biomedical Informatics at Beth Israel Deaconess Medical Center (BIDMC) and Harvard Medical School (HMS). He is also the Faculty Lead of Harvard’s Clinical and Translational Science Award (CTSA) Informatics Program and Director of the Biomedical Research Informatics Core (BRIC) at BIDMC. Dr. Weber helped develop Informatics for Integrating Biology and the Bedside (i2b2), an open-source platform for query and analysis of clinical data, and the Shared Health Research Information Network (SHRINE), which connects i2b2 and OMOP databases across organizations to form federated research data networks. He also created the Profiles RNS website, which is used by dozens of medical schools and research organizations to create searchable online profiles of their investigators. Dr. Weber is a Fellow of the American College of Medical Informatics (FACMI), in recognition of his contributions to the field of medical informatics.