Pointers for analytics of healthcare data
In recent time I've been asked for pointers on analysing complex healthcare data. This is a difficult issue. Healthcare analytics / health informatics / medical informatics / etc. range over a wide area, driven a wide variety of interests and outcomes, overlapping hugely in some areas and not at all in others. The area is rapidly evolving but there's not a lot of formal, standardised learning. It's in around the same place bioinformatics was 20 years ago. So this is an attempt to sift the chaff, and list some useful pointers for those wanting to know more.
This is expected to be a living evolving document
What is this about?
- Secondary use of healthcare data, for identifying patient populations, stratification, pharmaceuticals development, scientific research etc.
- Real World Evidence and observational studies
It is explicitly not about:
- Primary use of healthcare data, for direct improvements of patient care
- Healthcare IT
- Hospital software and IT systems
Skills & people
Skill-lists for scientific & technical disciplines often end up as sprawling wish-lists (c.v. the NIH list of critical skills for bioinformatics that includes several subjects at degree levels of mastery ...). So the following is given lightly and with the full intention that most is learnt on the job:
- Analytics: scripting (e.g. R or Python), visualisation, data handling
- Background & domain knowledge: comfort with biomedical and healthcare terms
- Database access, some SQL
- Some statistics
- Awareness of standards and terminologies (e.g. CDSIC, ontologies, etc.)
People who are usefully good at health informatics often are or had had job titles like:
- health informaticians
- biomedical data scientists
- clinicians, pharmacologists etc. who have got into programming
Books & papers
There's some interesting reading out there but you often have to pick out the relevant nuggets amidst material that intended for hospital staff or administrators:
- O'Reilly has a surprisings number of relevant titles:
- Anonymising Health Data is focused but has good coverage of privacy and governance
- I was on a paper Opportunities and obstacles for deep learning in biology and medicine. Although some of it is unashamedly molecular, there is much that is about patients and higher-level data.
- There's a slightly older book from 2016, "Secondary Analysis of Electronic Health Records" that's still quite useful
- The Book of OHDSI, as below.
- Analytics in Healthcare is once again mixed bag but very recent
ClassCentral has a list of online classes in bioinformatics and healthcare
There's a lot of "health informatics" courses out there, but some are more about making people familiar with the technology and landscape, or talking about the IT plumbing. Some possibly relevant ones in the UK include:
- UEdinburgh https://www.ed.ac.uk/bayes/about-us/our-work/education/workforce-development/courses/health-data-science
- ULeeds MSc Precision Medicine: Genomics & Analytics
- London School of Hygiene and Tropical Medicine MSc Health Data Science
- UCL AI Enabled Healthcare
- U Manchester Health Data Science, Clinical Bioinformatics, Health Informatics
- UCL courses: https://www.ucl.ac.uk/health-informatics/node/787/health-informatics-mscpgdippgcert
- Kings College London Applied Statistical Modelling & Health Informatics PgCert
- Bournemouth Digital Health and Artificial Intelligence MSc
- University of West London: Health Informatics
- City & Guilds Health Informatics
- Imperial College Cancer Informatics (MRes), Data Science (Biomedical Research MRes), Health Data Analytics and Machine Learning (MSc)
Much is made of Real World Data and Real World Evidence and Real World Analytics. Tell you a secret - these are effectively all the same thing, despite the protests of experts.
AMIA (the American Medical Informatics Association) covers all the various flavours of healthcare and medical informatics, including our use cases, and hosts some excellent courses.
OHDSI (the Observational Health Data Sciences and Informatics program) is heading the standardisation of healthcare data into a common model called OMOP. This looks like becoming the dominant model for federated analysis of healthcare data. Read the Book of OHDSI for more.