Social determinants of health and natural language processing

August 6th, 2018 / By L. Gordon Moore, MD

Healthcare delivery is under extreme pressure to improve costs and outcomes. One reliable pathway is by improving efficiency. One aspect of efficiency is to better match interventions to people most likely to benefit from those interventions, so let’s track a pathway that might lead to improved efficiency in care delivery.

We typically start by identifying people who have a particular disease—diabetes for example—and we (the healthcare delivery system) work to identify all the people with whom we interact who have diabetes. We then work to ensure these people get the best diabetes care we can deliver. Disease registries are tools for tracking the delivery of care to people with a specific condition and we can embed evidence-based guidelines in the registry as a rules-engine to track the effective delivery of care.

As we manage this population of people with diabetes, we find that it is a heterogeneous population. As described by Bernstein in his study, when we look at the total illness burden of people with diabetes we find immense variability. The figure from his study below tells the story of how often these populations segmented by total illness burden and severity are likely to end up in the hospital.  We improve efficiency by applying the additional information that comes from total illness burden segmentation to case/care management interventions.

We don’t have to stop here. Understanding the total illness burden provides a more sophisticated understanding of a population’s needs, but we also know that disease and total illness burden are only part of the story. Figure 2 below from the Kaiser Family Foundation reminds us of other factors that weigh heavily on medical outcomes and has raised interest in understanding social and other non-medical factors.

The difficulty with knowing these non-medical and social determinant factors stems from the lack of encoded social determinant and non-medical factor data that can be read by machines. The bulk of encoded data is focused on revenue cycle management. To get these factors into the encoded data stream, we could ask the electronic medical record manufacturers to create more drop-down menus or check-boxes, but the significant amount of clinician anger stemming from the burden of check-box documentation suggests that we might want to explore other pathways.

Advancements in an aspect of artificial intelligence—natural language processing (NLP)—make it possible for machines to read and make sense of clinician notes.  Oreskovic and colleagues used NLP to identify patients in the emergency department more likely to benefit from case management. Their approach demonstrated the ability of NLP to correctly identify these individuals by having computers read clinician notes and thus allowing the case manager to focus more time on working with people rather than hunting through the EMR.

I’ve described a pathway to improving healthcare delivery efficiency by using large data sets (e.g. claims) to understand a person’s total illness burden. With the addition of NLP to identify social and other non-medical factors impacting outcomes, we are able to know much more than a person’s diagnosis. This additional knowledge improves the chances that we bring resources to those most likely to benefit. This is the essential work of improving healthcare efficiency.

L. Gordon Moore, MD, is Senior Medical Director, Clinical Strategy and Value-based Care for 3M Health Information Systems.