Podcast Episode Transcript: Analyzing unstructured data to unmask cognitive impairment

With L. Gordon Moore, MD

Gordon Moore, Doctor of Medicine: Hello. This is Gordon Moore, the host of the 3M Inside Angle podcast. Today I am talking with Dr. Andrea Gilmore-Bykovskyi from the University of Wisconsin School of Nursing. She’s a geriatric nurse and health services investigator at the University of Wisconsin-Madison School of Nursing and Alzheimer’s Disease Research Center. Dr. Gilmore-Bykovskyi studies clinical care delivery and health disparities among people living with and at risk for Alzheimer’s disease and related dementias. Dr. Gilmore-Bykovskyi’s the recipient of a 2018 Paul B. Beeson Emerging Leaders Career Development award focused on developing novel approaches to identifying and engaging patients and caregivers with Alzheimer’s disease from disadvantaged backgrounds in clinical research. Welcome, Dr. Gilmore-Bykovskyi.

Andrea Gilmore-Bykovskyi, Doctor of Philosophy, Registered Nurse: Thank you so much for having me.

Gordon: I tell you where this comes for me, the idea of a conversation with you. I just popped online, and I saw an article that had come out talking about cognitive impairment and how we can understand the words that clinicians use in documenting aspects of that that can help unmask cognitive impairments. And that’s meaningful to me because I think a lot about how we can use our vast amounts of information to understand things that can help us help our patients get to better outcomes. How do we do that when we use natural language processing and other engines to automate some of that stuff? For those engines to work, we have to understand the meaning. And so what I’d like to know about the research that you have done. First tell me why cognitive impairment. Why is that an important issue to you?

Dr. Gilmore-Bykovskyi: Dementia happens to be a passion of mine. My clinical practice has been focused on caring for people with Alzheimer’s disease and other forms of dementia for many years. One of the real challenges that we have in this field is that a great deal of individuals who are living with dementia currently have not been diagnosed (the estimates range from about 40 to 60 percent of people). Some new data have come out recently to suggest that number might even be higher. We also know that many people who have received a diagnosis don’t fully understand it. It is such a missed opportunity because dementia in particular requires a keen sense of awareness regarding someone’s diagnosis and where exactly they are in their disease continuum in order to provide the best quality care.

Unlike other conditions, dementia’s really heterogenous, so someone with dementia isn’t the same as someone else with dementia. What we know in our care system is that information gets lost and I think this is really frustrating to clinicians and patients. So my interest in dementia is trying to understand how we can support clinicians and patients by bringing to bear information that’s already been collected and things that we already know about someone that might improve recognition of dementia and also provide more continuity as people move across our very fragmented system of care.

Gordon: Before we get to understanding more about how you gain that understanding and clarity, tell me about the importance of dementia as a factor that can impact cost of care delivery, clinical outcomes, things that matter in the context of value-based care and payment model change and measuring quality. I sense that that’s a very important factor. Is that true in your research?

Dr. Gilmore-Bykovskyi: Oh, absolutely. We know that individuals with dementia are about 40 percent more likely to experience a rehospitalization within 30 days of hospital discharge, which is certainly costly. It’s certainly a marker of very poor care, but we also know that dementia in general is a very expensive condition.

Our approach to supporting people with dementia historically has been quite reactive. We’re often coming in when there’s a crisis, connecting somebody to perhaps being admitted to a nursing home, but really, we’re not being thoughtful. We don’t really have a good primary or secondary prevention model for dementia, and I think that’s where a lot of the higher care costs come into play.

Some other key issues to understand clinically are that if you’re not very cognizant of where someone is with their cognitive abilities, it’s essentially impossible to ensure that the adequate supports and clinical services are in place to support management of their other chronic diseases and also support those who are living with and caring for them in meeting their day-to-day needs.

Gordon: It sounds to me that “forewarned is forearmed” is the principle at play here. If I know more upfront, I can do more and therefore reduce bad outcomes. I may not change the dementia, but there are aspects of what happens because of dementia that I might be able to impact.

Dr. Gilmore-Bykovskyi: Absolutely. Things like medications that you would not want to administer to somebody with cognitive impairment, early interventions that you might implement for somebody who is in critical care if you’re aware of the fact that they already have impaired cognition. I think a lot of important work happens in our care system upon discharge from acute care. What types of supports, services and resources are in place for someone when they return home or when they enter a nursing home setting? We know individuals with dementia are really at a disadvantage and are vulnerable to the deficits in communication. If I’m aware of their dementia, I may, for example, provide additional follow-up services. I may schedule that PCP visit a week sooner than I would for somebody who I have confidence knows how to take their medications. There are many things that we should modify with how we deliver care to people with dementia.

Gordon: In that specific context, I think about people who end up in the emergency department because of some medication reaction, and one of those really scary meds is warfarin. And you’ve done some work specifically around warfarin. Tell me more about that.

Dr. Gilmore-Bykovskyi: Deficits in communication at the point of hospital discharge are—I would almost call them overwhelming. There’s no shortage of evidence to support the fact that not only are key aspects of information about certain medications missing—warfarin is one that we’ve looked at—but also aspects of the care plan. With whom should somebody follow up after discharge? Is that person provided with all of the necessary components one would need to successfully complete that follow-up visit?

We know there is protection in terms of negative outcomes in seeing another provider quickly after discharge. Some of the things we have found for warfarin are that key aspects of what someone’s last INR was, for example, of what their INR was in the hospital, and what their current dosage of warfarin is are not really communicated in detail. There a lot of reasons this happens. No one is out trying to sabotage what happens to someone when they leave the hospital, but assumptions are made about the fact that perhaps a primary care provider will be able to access information from when someone was in the hospital. Perhaps they’ll just be able to ask the caregiver or the patient.

There’s really a lack of awareness about the fact that our systems are inherently fragmented, and there’s limited access to, particularly, the electronic health record information. And because you as a clinician are in the hospital and have access to all of it, we make assumptions about what that next care provider has. The reality, unfortunately, is  they often do not have access to a great deal of relevant information. Because of that, they may implement changes in a care plan, or they may change medications without key pieces of information.

Gordon: That eludes, then, to the ability to extract information from unstructured data. Tell me why unstructured data, what’s important about that, and then expand on the work you’ve done to systematically understand that.

Dr. Gilmore-Bykovskyi: Unstructured data essentially refer to narrative text fields within the electronic health record and the comparator to that would be structured data elements. Within the electronic health record, there are many things where there is a finite selection. If you enter someone’s blood pressure, that’s numeric data. There are limits to what types of numbers you can enter into those fields. We have many  dropdowns and flow charts and things like that, but narrative data is sort of free-flowing information from clinicians that might detail information about their encounters with patients, their observations about symptoms, their assessment of how someone would’ve responded to certain interventions, etc. We think it’s particularly rich in describing cognitive symptoms  and other types of symptoms that often accompany cognitive impairment.

Gordon: As you described the structured versus unstructured, it sounds like the structured is so much more accessible. Why not make all of the data in a medical record structured?

Dr. Gilmore-Bykovskyi: Oh, for so many reasons. It’s hard to know where to start with that one. One challenge that’s very specific to Alzheimer’s and dementia, as I already mentioned, is that it’s greatly underdiagnosed. Even if someone receives a diagnosis—perhaps they’re prescribed what would be referred to as an anti-dementia medication, an acetylcholinesterase inhibitor—they still may not have that diagnosis coded into their medical record. There are many reasons for this, the primary being that it’s a very stigmatizing diagnosis.

When we are pulling from structured data, very often we are wanting to identify specific clinical conditions or risk factors related to those conditions. We’re looking at prior ICD codes, essentially, or prior medications that someone received and it doesn’t really do us any good in the case of dementia because so many individuals are lacking that data. So that’s step one.

Step two is that even for individuals who may have previously received a note in their medical record, maybe in their problem list or perhaps an actual diagnosis that was billed for, we know not all data is transferred across all health systems. When someone enters the hospital, we can’t have full confidence that all of a patient’s data would’ve traveled with them. Perhaps they’re in a different health system. Perhaps they haven’t provided the system with access to integrate their record with records from other settings. That’s one very practical reason specific to dementia.

The other reason is that I think it’s inappropriate to make assumptions that structured, pre-specified data has merits over unstructured data just because it’s technically easier to work with. It certainly is much easier to process, much easier to construct variables from, and much easier to apply to some of the methodologies you referred to, such as natural language processing or machine learning. A lot of people have done that with data fields from those structured data elements and flow sheets.

I think the promise in unstructured data is that it provides us with very detailed information that is very proximal to someone’s real-time clinical symptoms. What we see is that clinicians, when they use these structured data fields, often copy forward from things that they saw documented in other situations. It doesn’t include everything you might want to say, so there are lots of limitations to the structured data element, but certainly benefits. I think in the case of dementia, relying exclusively on those is a really significant missed opportunity.

Gordon: It makes me think about when I would create notes in an electronic medical record. I would be filling out the history, social history, family history, things like that, putting in the medications, putting in a lot of structured data. At the end of the note, sometimes using templates around whatever the clinical scenario was, I would sometimes put in just a couple of lines describing for myself in the future what was most salient and important about that encounter.

That really had nothing to do with all those other structured—yes, it did have something to do with structured data, but there wasn’t a format for me just to freely describe nuanced information that I wanted to carry forward as a raft floating on this lake of stuff that we tend to pack in there for billing or what have you. So there’s that inflexibility of structured data entry that frustrates communication around nuanced information that I think still drives so much rich information into free text, which is then difficult to understand. Yet you’ve been doing work to make meaning and understanding around free text as it pertains to cognitive impairments. Tell me about that.

Dr. Gilmore-Bykovskyi: What we’ve been doing has essentially been mining unstructured text fields in the electronic health records of individuals with Alzheimer’s and related dementias and really focusing on those acute care stays, specifically data that arise from hospitalizations or emergency department visits. The reason being that there’s a lot more frequency and activity in the electronic health record during that time and we know that we have multiple clinical disciplines engaging with patients, so we see data from a vast range of perspectives.

What we’ve been doing is essentially identifying any data elements, terminology, or words and their accompanying structures semantically as they occur in narrative text that we think are reflective of symptoms of cognitive impairment. We also have focused quite a bit on descriptions of behavioral disturbances, which are a core feature of dementia. As someone’s cognition becomes impaired, reasonably, we see changes in personality and behavior also. These are the types of nuances that aren’t built into structured text fields, but certainly, clinicians recognize them, and they have to work with them. That’s why we see them coming up in free text.

Gordon: Give me an example of the kind of thing you notice.

Dr. Gilmore-Bykovskyi: A lot of documentation includes descriptions of very, very vague cognitive deficits, so nothing that really meets our gold standard diagnostic criteria for how we would assess or evaluate cognition, so perhaps describing someone as experiencing sundowning, having an altered mental status or being confused. We see a lot of description vaguely of agitation, of being resistant to some care processes, or being aggressive with some care processes. And these are common features, particularly in moderate to advanced dementia.

If somebody, for example, doesn’t have a diagnosis of dementia within their medical record, but in their acute stay, we see documentation noting that they’re confused and agitated, resisting care, and having difficulty following direction, as a clinician, I think there is more than enough evidence at that point to say, “Something cognitively is not quite right here, and this requires further investigation.” Really, our goal is to integrate and walk across these unstructured data elements to provide clinicians with decision support tools because despite the best efforts of many scientists, we know that clinicians are not very good at assessing cognition.

Gordon: As you describe that, I have this sense of fascinating opportunity. Maybe I’m a primary care clinician at my practice and I’m seeing somebody maybe for a second or third visit, and something’s going on. I may just not clue into the fact that it’s dementia, or I may be worried about applying that diagnosis, and yet the words are getting into the record. I’m imagining now I have an engine that is recognizing those things and says, “Hey, wait a minute. Guess what. These words are starting to add up, and it’s beginning to cross the threshold where you might want to consider stuff.” Is that where you’re going?

Dr. Gilmore-Bykovskyi: Some of our ongoing work is essentially geared toward testing the predictive ability of some of these variables that we’ve been able to identify in the unstructured text fields. Unfortunately the process for doing this = is technically and computationally a little bit challenging.

We work across all of the narrative text data, and we actually turn some of those discrete terms we think are particularly predictive into sort of structured variables, and we essentially create a machine learning algorithm, so to speak. Of course, there are many computational iterations of that to help us learn which data elements across these unstructured text fields are most predictive of identifying the individuals that indeed do have cognitive impairment.

One of the big challenges with this  line of work is that in order to really learn a predictive model, we also need to know which data elements identify individuals without cognitive impairment. So, some of the work my team’s doing right now is identifying complementary data elements that are in unstructured text fields that denote intact cognition, for lack of a better word.

Oftentimes, we see descriptions of how oriented someone is to the situation. Most clinicians will be familiar with this documentation of saying A&O, which stands for alert and oriented. You document times one or two or three, meaning they’re aware of who they are, where they are, and essentially what time it is. There are various iterations of that, so oftentimes, we’ll see documentation of this person is alert and oriented times three or times four. What we want to understand is if, indeed, a predictive element of saying, “Yes, this person, in fact, doesn’t have dementia,” is true because it may not be.

I think that’s a important task for this field because unlike diabetes or hypertension, absence of a dementia diagnosis in your health record doesn’t necessarily mean you don’t have dementia. So there is some additional work with the validation that’s required for these phenotype models, which is how we refer to this approach of wanting to identify a condition using electronic health record data.

Gordon: Wow. I’m thinking about the garbage in, garbage out problem. We have these machines that can sum up vast quantities of data and look for relationships, but if we’re putting soft stuff into the input side, we may not really be able to understand what’s in the output. And I’m thinking about all the times as a resident when I was saying A&O times three and wondering, “Oh, gosh. I hope I didn’t mess it up.”

Dr. Gilmore-Bykovskyi: Yeah. What does that really mean?

Gordon: —Right. Exactly. What’s your research telling you about that? Do you have definitive statements to say about the value of that statement at this point?

Dr. Gilmore-Bykovskyi: There is some good data looking at documentations surrounding orientation status. A lot of it’s been done by Donna Fick, who is a nurse scientist out at Penn State. The findings around looking at orientation specifically thus far are that nurses in particular do reliably assess and document on that because it’s so easy and quick to assess. Whether or not it’s really meaningful and predictive in the course of dementia has yet to be understood because dementia as a condition inherently waxes and wanes. So I think that second question as to how useful is this specific aspect of cognition in identifying folks with dementia has yet to be answered.

In terms of garbage in, garbage out, that’s a real challenge with the entire field of electronic health record phenotyping. One thing that we have observed in work on other conditions is that if you have a high enough volume of data, using these new-generation analytic and computational strategies, machine learning, deep learning—as I said, there are a variety of models— some of the limitations of those data are weeded out if the signal is strong enough. I think it’s a buzzword, and everyone thinks, “Machine learning is a computer doing everything.” It’s actually a little bit more complicated than that.

To state it a little bit better, with a significant quantity of data, we find that the predictive signal outweighs the noise in the data. That’s one of the benefits, I think, of big data and electronic health record data, is there’s a lot of information. If you look at an individual perhaps over a period of years and take data from all of their hospital stays and their PCP visits, that’s a lot of information and opportunity for a model to learn and home in on which of these data elements are truly predictive. I think we have seen from other conditions, particularly heart failure and diabetes, there’s been a lot of progress with that. We can see the signal through the noise, so we’re able to overcome, hopefully, some of the garbage in. But I think there’s recognition that this is not pristine data.

Gordon: You mention what I think about as a power argument around the ability of machines to get closer to the signal through the noise. Are there some clearly understood denominators around that? Or does it depend on the issue?

Dr. Gilmore-Bykovskyi: That’s a  good question. It depends on a lot of things. It depends on the specificity of data. It depends on the clinical condition and how specific and accurate your gold standard is. In dementia, our gold standard isn’t great, so we rely on a series of administrative codes currently to identify our gold standard of knowing whether someone has dementia. We know that even with neuropsychological evaluation  our diagnoses sometimes are not as accurate, so that’s certainly a factor.

It also depends analytically on the type of machine learning model you would want to use. Some require really high dimensional data, so the volume and complexity of the data is really important. Others may be a little more agile and more appropriate to smaller amounts of data, but it’s similar to any analytical question. It depends on the model that you’re using, the specificity of the data and what you know about the clinical condition. How confident can you be in that gold standard? In some ways, a lot of the same rules, so to speak, apply as when you think about a standard regression.

Gordon: That makes sense to me. In  this context, is the data of a hospital typically large enough or questionable, depending on the question being asked?

Dr. Gilmore-Bykovskyi: I think it depends on the question being asked, and I think also, beyond the computational task, you want to have a strong conceptual understanding as to whether this translates to clinical practice. What is the ultimate intent and application of your algorithm? If it is to work across data from acute care episodes, it needs to be developed in a manner where it just works across those data.

I think it also depends on how robust your model is. How many variables are in it? If you’re going to do the “kitchen sink” approach and say, “These are all the potential candidate variables we’ve identified,” and throw them all in, you need to have a little bit more data. If you’re going to have a more heuristic approach and say, “We think these models clinically are most likely to be appropriate,” perhaps you can get away with a smaller sample. But I think that that counterpart of really paying attention conceptually to ensuring the question makes sense and that it has translatability to practice is the more difficult task.

Gordon: Where do you go from here? What are your next big projects? What are you working on now?

Dr. Gilmore-Bykovskyi: Essentially, what we’re doing now is constructing the necessary data set in order to test some of the candidate variables we’ve identified from unstructured text data. As I said, that really does involve building this cohort of individuals that we can confirm from a gold standard do not have cognitive impairment as an important methodological task on the road toward wanting to identify cognitive impairment.

That is no small undertaking because that means we need to have some in-person assessment and evaluation of people. We really can’t rely on a counterpart gold standard. That’s one of our big jobs right now and then working to learn an optimum model. There has been other work in this area, I should mention, most of which has used structured data elements. We’re hopeful that integrating these unstructured data elements will improve the predictability of some of those existing approaches.

Gordon: That’s terrific. Well, Dr. Gilmore-Bykovskyi, I want to thank you so much for your time today.

Dr. Gilmore-Bykovskyi: You’re so welcome. Thank you so much for inviting me.

View Session Spotlight (PDF)