Detecting severe allergic reactions to drugs, identifying people at risk of Alzheimer's disease, and learning about medical cannabis use may seem unrelated. But all might be advanced by applying natural language processing (NLP) and machine learning to clinician's written notes.
"When people visit a clinic, their care team documents the visit in clinical notes," said Associate Investigator David Carrell, PhD, who leads NLP research at Kaiser Permanente Washington Health Research Institute (KPWHRI). At Kaiser Permanente Washington alone, health care teams write millions of notes annually into an electronic health record (EHR). The notes contain valuable data for improving care.
NLP algorithms analyze text — a complex task because the same idea may be expressed in many different ways. For example, one doctor might write "difficulty breathing" while another might use the clinical term "dyspnea." Spelling errors, abbreviations, and missing punctuation make the task even harder. Carrell and colleagues develop NLP algorithms to identify notes about conditions or behaviors so they can be included in computer models with applications ranging from monitoring national drug safety to improving primary care. Some models also use machine learning methods, which can better represent the complex relationships between data from clinical notes and a condition or behavior.
Generally, Carrell explained, the steps of NLP are:
Carrell and colleagues recently published 3 studies showing the versatility of NLP and machine learning.
In the American Journal of Epidemiology, Carrell and colleagues showed how NLP might increase the accuracy of drug safety monitoring by the U.S. FDA (Food and Drug Administration). Medications are the most common cause of fatal anaphylaxis (serious allergic reaction). The FDA uses automated algorithms — but not yet NLP — to monitor for medication-related anaphylaxis in EHRs.
Anaphylaxis is difficult to identify in text because of varied symptoms including low blood pressure, vomiting, and rash. Anaphylaxis is also rare, so cases in EHR data are scarce.
Carrell and team used NLP and anonymized EHR data from Kaiser Permanente Washington to help create an anaphylaxis word list. They added this dictionary to anaphylaxis-predicting models. When tested using validation data from Kaiser Permanente Northwest, several models improved on the FDA anaphylaxis-identifying algorithms. The methods might also be used to better track other rare and serious conditions, such as emerging infectious diseases.
KPWHRI coauthors on the study include Kara Cushing-Haugen, Ron Johnson, Vina Graham, David Cronkite, and Jennifer Nelson.
A study in BMC Medical Informatics and Decision Making led by Senior Investigator Rob Penfold, PhD, asked: Can NLP help develop a model to identify patients with mild cognitive impairment (MCI)?
MCI is a decline in memory, thinking, or behavior that is greater than expected with age. A possible sign of future Alzheimer's disease or related dementia, MCI can be detected in primary care. An NLP-based resource might assist clinicians in knowing who could benefit from MCI screening.
Carrell and colleagues used NLP and anonymized Kaiser Permanente Washington EHR data, including from the Adult Changes in Thought (ACT) study, to identify MCI-related concepts from clinical notes. They used the results to develop a machine learning MCI-prediction model. The model's ability to identify people with MCI through MCI-associated concepts was similar to the ability of screening tests to identify people with other conditions such as cancer. This shows the potential of the approach for developing a tool to help clinicians care for patients with possible future Alzheimer's disease and assist health care organizations in planning for members' needs.
KPWHRI coauthors on the study include David Cronkite, Chester Pabiniak, Tammy Dodd, Ashley Glass, Eric Johnson, and Ella Thompson.
In Substance Abuse, the NLP team researched medical cannabis use documented in EHRs for a study led by Assistant Investigator Gwen Lapham, PhD, MPH, MSW. Knowing when and why people use medical cannabis is important for studying its safety and effectiveness for conditions including pain and anxiety.
Anonymized development and validation data came from Kaiser Permanente Washington, which since 2015 has routinely asked adult primary care patients about past-year cannabis use. The reasons for use may be in clinical notes, but challenges in identifying this information include the different terms associated with cannabis.
Despite the difficulties, the study reports an NLP model identified more than half of the 5.6% of records with documented medical cannabis use. NLP-assisted manual review identified the remainder. The study shows NLP could help obtain data for research and assist clinicians and patients with decision-making about cannabis use.
KPWHRI coauthors on the study include David Cronkite, Mary Shea, Malia Oliver, Casey Luce, Theresa Matson, Jennifer Bobb, Clarissa Hsu, and Katharine Bradley.
The KPWHRI NLP team continues to advance NLP and machine learning in studies applying these methods to cancer, acute pancreatitis, COVID-19, mental health conditions, and substance use disorders. They also continue to develop methods to streamline FDA safety monitoring of medications and medical devices.
By Chris Tachibana
Dr. Jennifer Nelson explains how KP scientists are helping the CDC and FDA keep an eye out for rare adverse events.
Adult Changes in Thought (ACT) Study launches a new website to advance our understanding of brain aging.
New research examines providers’ notes to understand patients’ cannabis use and health conditions.