From 3M Health Information Systems
Podcast Episode Transcript: Analyzing human language for social good
Dr. Gordon Moore: Welcome to the Inside Angle podcast. This is your host, Gordon Moore, and today, I’m speaking with Philip Resnik. He is a professor of linguistics at the University of Maryland with a joint appointment at the University of Maryland Institute for Advanced Computer Studies and affiliate appointments in Computer Science. Welcome, Philip.
Philip Resnik: Hi there. Great to be here.
Dr. Moore: Today, I invited you on the podcast because I’m fascinated by your work. You work in computer sciences and on how language works. And that, to me, is huge and important because it begins to solve what I, and many others, see as major problems with how we make meaning of oceans and oceans of information in health care, and use computers to facilitate that work. But I reflect on frustration of colleagues when they look at the electronic healthcare cluster, using it, and saying things like, “This doesn’t work. It takes so much time and so much effort for me to put data into this.”
And then on the other hand, I hear from people who work on quality and measurement who say, “We should get doctors to enter more structured data.” So if you don’t mind, could you start with just sort of the basics of how it is we extract information from electronic health records and how your work intersects with that?
Philip: Sure. One of the most important things to recognize is that language is a fundamental part of how clinicians practice. There is a notion of a clinical narrative. There is a set of ways of doing things that is part of how clinicians are trained. The things that go into clinical dictations are there because clinicians have been trained to identify things that are relevant and to put them there.
One of the challenges with electronic health records, the way that they have evolved, is to a great extent, they were designed by the folks who recognize the importance of doing large-scale analysis of data, and took kind of a shortcut, “Hey, the best way to get the information in a structured form to analyze is to have the clinicians put it in in a structured form.” And that has been a real problem.
And one of the ways that folks in my field have tried to address that is by finding automatic ways to take unstructured information, the language in clinical records, and to extract the relevant structured information, whether that is diagnosis codes, procedure codes, and so forth, or things that are more fine-grained like clinical concepts.
There is an entire body of work in an area that’s known as natural language processing, natural language as opposed to computer languages; in other words, the language that people use. And natural language processing, for quite a long time, has been focused on trying to extract structured information so you could do useful things with it, and that’s now being used on a large scale in the clinical world as well.
Dr. Moore: That makes a lot of sense to me because I just observed that a lot of the information in electronic medical records is in the notes of doctors and nurses and others who are writing in a chart, for instance. But I think that natural language processing and computers have been around for a long time, and there’s been a bunch of attempts to pull that information out of unstructured text and make meaning of it. And it seems like it’s still really hard and it’s not easy to do that work. What’s so hard about that?
Philip: Oh, my goodness. Yeah. Language is hard. If you’ve tried texting and having it autocorrect to the wrong thing, or you’ve used Siri or Alexa and had them not “understand” you, you get a sense of the difficulties of language. Language has a great deal of ambiguity and it has an enormous amount of variety. You can say the same thing in lots and lots of different ways, and those are enormous challenges in any setting.
Now, one of the interesting things about medicine is that there is a large body of information sources. These are the important concepts. People have been building repositories of clinical concepts for years, and vocabularies, and lexicons, and so forth. And so these actually provide a handle in a way that everyday language does not. So that provides some hope and some opportunity to—because you kind of know what the structure is that you’re going for in a way that a lot of stuff, like social media analysis, does not.
We are doing an amazing thing just talking to each other right now. And the field is continuing to evolve. It’s really in the last five or ten years that it’s come to sort of public attention and public utility with things like the devices we’re all starting to get used to today, and also things that you’re seeing out of the marketplace in terms of analysis of clinical text on a large scale.
Dr. Moore: That gives me hope to know that we’re seeing continued advances in the ability to understand this unstructured mass of information. Because I see so much frustration on the part of the clinical workforce in data entry, are we at a point now where we can say that we have certain tools that will absolutely improve workflow and make accurate meaning of what people are saying?
Philip: I’m going to give that a qualified yes, and the reason that it’s qualified is that I don’t think anybody should ever bank on technology being perfect. What’s really important is to recognize that the goal is not to have fully automated technology. The goal is to get the job done in the right way; and probably for a lot of people listening, in a cost-effective way. And a lot of the time, the right way to do that is to have an effective combination of what the machine can do and what people can do.
And so workflow is, in fact, a really important part of the process and has value in and of itself. A lot of the time, for example, you can code more consistently when the machine is suggesting codes than if you have coders doing stuff independently from each other because if there’s a gray area they may go off in different directions, whereas if you have the system doing the analysis, it may present something to them in a unified, consistent way that they would agree on. So the human aspect is really quite an important element of the effective use of this kind of technology, in my opinion.
Dr. Moore: Another aspect, and I don’t know if this is something you’ve looked at all, so feel free to take it where you will. But I know that with a bunch of clinicians, they’re using a lot of automated phrases and verbal content that they can then drop into notes to meet certain documentation guidelines and rules. And I hear from a number of my colleagues that when they begin to read the notes of other doctors, it takes them a long time to get down to the kernel of truth and really find out what’s going on. They feel like they’re wading through a sea of words. Is that something that you’re aware of and does that impact the industry much?
Philip: Yeah. I think this is a case of the tail wagging the dog. This is a place where there are various kinds of requirements that are being set, and people are doing workarounds. There’s a lot of copy-paste and that sort of thing. Clinicians are able to read things that other clinicians write when it’s a natural process. They have a way of communicating with each other. And the current health record structures are kind of warping that, and providing incentives to do things in ways that aren’t completely natural.
And so the right thing to do is to find ways to bring more natural human communication back into the process. But in order to do that, you need to use technology effectively, because if you simply go back to just, “Hey, here’s unstructured dictation,” then you lose all of those advantages of all that kind of data analysis downstream that have such value to offer.
Dr. Moore: I want to pivot now to the work that you’ve been doing more recently. I know that you’ve been collaborating with your wife, and I’d like to hear more about that.
Philip: Oh, sure. So I am a computer scientist and linguist. I work on language technology. My wife, Rebecca, is a clinical psychologist. And for us, a lot of dinner table conversation is, “Gee, couldn’t we put this kind of stuff together; the stuff that you do and the stuff that I do in order to do some good in the world?” And so I found myself, a number of years ago, working on technology in the academic setting that was really in the social science domain, trying to connect what people expressed in language to underlying mental states.
Some of this was in political science, for example. Can you look at the way that somebody frames their language and infer whether they’re taking, say, a conservative or a liberal perspective and so forth. And what we discovered in honest to goodness just dinner table conversation was, “Hey, what if you were to take that same idea of analyzing people’s language, and instead of looking at political science questions, what if you were to apply that to mental health?”
So the question is, to what extent can you look at how somebody uses language and infer important things one would like to know, such as is this person suffering from clinical depression? Is this somebody at risk for suicide? Is this person, who might be schizophrenic, at risk of going downhill? And so forth.
And so the two of us wound up co-founding with a third person, Meg Mitchell, now at Google, essentially a workshop that’s now going into its sixth year, that brings clinicians and technology people like me together into a room in order to look at ways that the technology can help, particularly focused in looking at people’s language, and to some extent, their behavior online as well. So that’s basically something that we’ve found ourselves working on jointly. And we go to conferences and then I’ll give a talk on the technology side. She’ll give a talk on the clinical side, on the flipside. We’ve really been building a community around this stuff.
Dr. Moore: That sounds really logical and fascinating, that you can look at the words a person is using and how they’re putting them together and come to some understanding and conclusions based on that. Is that actually working? Have you guys been able to demonstrate that?
Philip: Yeah. Like I said, this community has really been building. And one of the things that’s particularly interesting about this stuff is that part of what it’s doing is highlighting some of the challenges of traditional diagnostic systems, sort of clear differences between here’s depression, and here’s PTSD, and here are these different conditions.
And within the clinical world, completely independent of this, people are starting to recognize that the boundaries can be fuzzier. Well, if you’re going to take a fuzzier and more nuanced view, then looking at people’s language use, rather than a checklist in the DSM-5, provides something that is a more nuanced source of data for these sorts of things.
And yes, absolutely. There has been some really nice work out there. A collaborator of mine, Glen Coppersmith, and his colleagues, for example, have done some tremendous work on detection of suicide risk on the basis of social media postings with accuracy that is high enough that the conversation now should be and is starting to move from, “How do we do good research on this stuff?” to “How do we actually think about actually deploying it in the real world?”
And one of the interesting things about that, by the way, is the idea that there is more to the world than just what happens clinically. A person has a healthcare encounter every now and then, but there are other sources of evidence, like there’s social media use, that provide information inside what Glen calls “the clinical white space,” all of that information in between the clinical encounters. And gaining leverage from putting those two things together is really an amazing opportunity as long as, of course, you can do all of that in an appropriate and ethical way.
Dr. Moore: Yeah, that’s a tough nut there, I imagine. If we could have signals of somebody who could be on a slippery slope or decompensating in some way where they could be of significant risk to themselves or others, it’ll be great to be able to recognize that. But what’s the boundary, ethically, of tapping into information sources where we haven’t had permission in the past?
Philip: Yeah, there is an evolving community that’s looking at exactly this set of issues. And people, if you think about it, are surprisingly receptive to the idea of using data for social good. You have some awful examples where there has been deception, there’s been sort of large-scale use of data in inappropriate ways, and we have to not let ourselves get tainted by that. And people in organizations need to let themselves recognize that even though there is that kind of misuse, they shouldn’t be scared of finding appropriate ways of using data.
That’s, in some sense, my biggest worry, is that some of the bad stuff has scared people off. But I think we need to focus on appropriate ways of using data for good. And there’s an entire set of guidelines, and standards, and communities right now starting to look at exactly that.
Dr. Moore: Where does someone look for that, if they’re interested?
Philip: That’s a really interesting question. So in terms of the academic community, there is a conference that’s emerged. The acronym is FATML, Fairness, Accountability and Transparency in Machine Learning, because a lot of the data work is on the basis of machine learning. And that would be a great place to start in terms of a community that is looking at these ethical issues. That would probably be an excellent place to start. I would strongly suspect that if you use that as a starting point, that will lead out to the right sorts of people who are looking at these things.
Dr. Moore: That’s great. Thanks. When you were talking about using language to understand potential mental health issues, and I think about a person with schizophrenia, their use of language is really different, and I think that’s one of the hallmarks. And so that seems like a—it would be an obvious place to go. But then, there’s sort of subtle differences between dysthymia and depression and these other, as you mentioned, DSM-5 discreet categories which, I begin to wonder, if in the real world, they’re less discreet than we imagined and more on a continuum of feeling, and functionality, and ability.
Philip: For sure. And in fact, there are efforts to try to revisit diagnostic categories. There’s one called HiTOP or DOHC, many people would be familiar with, but HiTOP, in particular, is starting to take an approach that looks a lot more like a big, huge Venn diagram where there are circles or different pieces of the puzzle, as opposed to these sort of crisp checklists like diagnostic distinctions.
You mentioned schizophrenia, that’s actually a really good example. Disordered thought, disordered language is actually something that people in my community have been designing models and algorithms to detect with a surprising amount of success and predictive accuracy. I’m, in fact, working with Deanna Kelly, who is at the University of Maryland Medical School on a project trying to put together precisely that kind of approach: using the technological side together with her team’s clinical expertise. And this is exactly the kind of thing we need to be encouraging more often.
Dr. Moore: I’m thinking about the devastating impact of a diagnosis of schizophrenia, but maybe the potential of getting a little bit ahead of it if we begin to see harbingers in use of language. Is that something you’ve explored?
Philip: Yeah. So the way to think about this is that there are various stages at which this kind of technology can be deployed. You can have a screening at the very beginning. And having people opt in—again, this has to be done ethically, but assuming that that is being done appropriately,—I mean, look, think about it this way. There are literally over 120 million people in this country who live in federally designated areas where there are shortages of mental health providers. If there’s going to be detection in the first place, it’s going to be in the regular doctor’s office in that 15-minute visit.
If you are able to have a window into what’s going on with them, you’d be able to surface a lot more. So that’s the screening side of it. Then what do you do about this? You need to do some form of analysis in order to do a finer-grained assessment and understand what to do. And then once somebody is in treatment, you have monitoring as part —once an intervention has taken place. So there are various levels at which this can take place. The ability to tap into people’s language use as a valuable source of evidence applies at every single one of those levels.
Dr. Moore: Wow, that’s so neat. Hey, I understand that you’ve taken the year off in sabbatical. And I’m curious, what’re you doing?
Philip: Let’s not confuse sabbatical with taking the year off. A lot of people think, “Oh, wow. You get to do vacation for a year.” Yeah, no. I actually have an official sabbatical project. I mean, I’m doing lots of things, but the main focus for my sabbatical project is aimed at tackling what I would describe as the data crisis in mental health. There’s a data crisis in healthcare generally because HIPAA was written without downstream, large-scale uses of data in mind. And there’s a lot of fear about finding ways to use this data for research.
And as a result, in my field, in natural language processing, whereas in other fields, they have these very, very large shared data sets that really advance the state of the engineering and the science. The data sets that people use and share in health care, natural language processing research, are orders of magnitude smaller, less sharing goes on. And unless you’re working within a very large data setting— researchers inside an organization like Kaiser, for example; or inside a 3M, or other organizations like that where you have large quantities of data and you can have the appropriate uses of it worked out— it’s very hard to do.
So the way that I’m tackling this for mental health: I’m not tackling the HIPAA issue, but what I am tackling is the use of potentially sensitive information, like social media. And the core idea is that I am building a mental health data enclave, basically an environment that is hosted on AWS, in the cloud, Amazon Web Services, and this is with some funding from Amazon as part of their Amazon Machine Learning Research Awards program, which is sort of a program that funds faculty to do pieces of research.
And I’m collaborating with an organization called NORC at the University of Chicago, which has a great deal of experience building large, secure enclaves. And the idea is to build an environment where we don’t just collect data sets and figure out how to disseminate those out to the researchers to work on large-scale shared data, where everybody has to figure out their own way of securing the data, and making sure that all the Is are dotted and the Ts are crossed, and so forth.
The idea is to build a secure cloud enclave where the researchers come to the data. And so they come inside the secure enclave, the data cannot leave. And so they do the work inside the secure sandbox, and now there’s an ability to share what they’re doing, work on shared data sets, work at a scale that they haven’t been able to before. And the goal is to really kick-start progress on mental health language processing research by creating an environment where the researchers can work at the scale that researchers work at in all sorts of other domains, but we just haven’t been able to within the healthcare and the mental health domain.
Dr. Moore: And I think not every listener may understand the need for scale on this environment, so do you want to described how it is that you build a corpus and understand what is true and then test it?
Philip: Sure. Thank you for pointing that out. It’s an important question. So the word “corpus,” related to the word “body,” in general, refers to a large body of data, particularly language data. So you can have a corpus of social media posts that have been contributed, again, with all of the appropriate IRB, Institutional Review Board, ethical review, and so forth.
A corpus of data, for example, where you have a large set of data that are contributed by people who are suffering from a clinical depression and also a large corresponding set of data by a control group where people are not suffering from a depression, just like any other medical condition, you want to have this one group that is the relevant positives and a control group to compare.
Then the machine learning approach that has come to dominate language technology is about building computational models that analyze this large data set and learn to identify the predictive features, the aspects of the data, the language that’s being used, which could be something as simple as words and phrases, but could be more complex topics and other features of the data, aspects of the connectivity of people’s thinking in the schizophrenia case.
There are a lot of automatic modeling approaches that one can do to learn which features of the language tend to correlate with the condition and not to correlate with people who don’t have the condition in the control group. And that enables you to build a predictive model. So that if you get somebody who has not been a part of this original data set, you can look at their language use, and then make a prediction as to whether they fall into one category or the other, and this kind of predictive modeling has the potential to be highly accurate.
But if it’s going to work well, it’s got to be based on large quantities of data because you need to be able to capture the variability in language that we were talking about earlier. You need to be able to capture the fact that people say things in different ways. They use language in different ways. And there are going to be lots of things that you only see a small amount of the time that might nonetheless be highly relevant. And therefore, in order to build models that capture that, you have to start with large quantities of data.
That is something that is the basis for all the language technology folks are familiar with right now. When you are texting on your phone and it auto-completes to make it easier for you, the reason it’s able to do that is because somebody has analyzed huge quantities of language in order to build a good predictive model of what that next word is going to be based on what you’ve said so far. We need to be doing exactly that same kind of large-scale work when it comes to the clinical domain.
Dr. Moore: And when you’re talking about large scale, is that 100 patients who have schizophrenia and 100 who do not? What kind of scale are you talking about?
Philip: No. One hundred patients is a great data set in order to validate some work, in order to get an idea of what you’re talking about. But when we’re talking about clinical data sets, typically, you want to be going up several orders of magnitude. Look, when social media companies are trying to figure out what ads to show you, they are working on hundreds of thousands or millions or more data points in order to make their predictions. Ideally, we should be working with data at that scale.
Now, hundreds will get you somewhere but thousands are enough to really make progress. But ultimately, what you really want to be doing is working in the domain of tens or hundreds of thousands of, at least, data points. So maybe it’s a smaller number of patients, but for each of them, there are many social media postings, or many instances of a clinical document, and so forth.
That’s the kind of scale that we’re talking about, and in order to do that, we really need to beef up our approach to these problems. We can’t be doing it in the sort of scattered, haphazard way that we have been, hence, the idea of trying to build a centralized repository where it is safe and ethical for people to work with larger quantities of data.
Dr. Moore: So that sounds like a big challenge. And I guess, now I see why you’re going after this large repository of secured data because it’s hard for me to think of there being thousands of people with schizophrenia within a single data set, at a university hospital, for instance. It’s possible, but it’s just a little bit unlikely, so you need to pool data across, and I guess that’s one of the big challenges in health care.
Philip: That’s right. And even if you can’t pool data across, even if you’re working on a data set of, say, 100 patients, which can be a fairly significant study. If I have that data and I can’t share it with a colleague of mine at Johns Hopkins or at Stanford, the field just suffers. And so combining to make larger data sets is a piece of the puzzle. But even the ability to share data without combining into larger data sets is going to be a way to bootstrap progress. And that’s the first part of what we’re going for here in this project.
Dr. Moore: How’s it going so far?
Philip: It’s going really well. I have been lucky enough to connect with NORC at the University of Chicago. It’s an organization that used to be focused only on survey research. NORC stands for National Opinion Research Center. They’re a 70-something-year-old organization, highly trusted; and they, in recent years, have broadened out to larger scale, data-oriented analysis as part of a broader mission of understanding what’s going in the same way that surveys do.
And there is a healthcare repository that they have built. They’ve actually built a HIPAA-compliant data enclave, conceived of and run by a fellow named Tim Mulcahy. And he and I have been collaborating on putting together this mental health enclave using their knowledge and body of experience, having built and run a healthcare data enclave for the last 10 or 12 years. And now, what we’re trying to do is apply this to mental health, put it on the cloud, and make it more widely available to academic researchers like me.
Dr. Moore: This is incredibly timely work, and I’m excited to hear that it’s moving forward. Philip Resnik, I want to thank you for your time today.
Philip: It’s been a pleasure.