NLP in EHR: A game-changer for unstructured data

“Data data everywhere, but not an insight to be found.” This is how clinicians feel when it comes to dealing with the problem of unstructured data in electronic health records (EHRs). Healthcare providers rely on hundreds of distinct data signals to make decisions during each patient interaction in order to offer successful treatment.

As care teams assess, diagnose, and treat patients, they create and add data to the patients’ EHR, including discrete lab results, qualitative descriptions, transcripts of their opinions, judgments, and more. It’s no wonder that the industry is confronted with a massive challenge: how to harness all of this unstructured clinical data for new insights into innovation and enhancing patient care. This is where natural language processing (NLP) technology comes in.

Five major applications of NLP in EHR

NLP algorithms can gather unstructured data, assess its grammatical structure, determine the meaning of the information, and summarize it. As a result, NLP in EHR technology can save costs while extracting data for in-depth big data analytics. Here’s a look at the major applications of NLP in EHRs.

1. Extracts information from clinical notes

The majority of clinical texts are free-form, feature acronyms and abbreviations, and may have spelling and typing errors.

Example of a mock clinical note

NLP in EHR clinical notes

Source: cTAKES

Medical abbreviations are often difficult to decipher due to the lack of a standard dictionary. NLP in EHR algorithms are used to extract critical information such as diagnosis, recommendations, timelines, and hypothetical symptoms that the doctor has confirmed are not occurring (negation).

Nuance Communications and Epic, for example, are collaborating to integrate the artificial intelligence (AI) and natural language processing (NLP) capabilities of Nuance’s computer-assisted physician documentation (CAPD) tool into the Epic NoteReader EHR module for clinical documentation improvement.

The Nuance CAPD tool can highlight specific clinical symptoms in an EHR and warn doctors when data is missing or needs clarification by analyzing relevant patient notes using deep learning and NLP technology. The companies claim that by embedding the CAPD tool within Epic, physicians will be able to receive feedback at the point of care as provider organizations work to improve severity-adjusted quality scores and better understand reimbursement and risk adjustment factors to improve care management.

2. Boosts phenotyping capabilities

A phenotype is the physical or physiological manifestation of a specific trait in an organism. These characteristics could be related to physical appearance, biological processes, or behavior. Phenotyping enables clinicians to group or categorize patients to provide a more in-depth, focused look at data (e.g., identifying patients who share particular qualities) and the opportunity to compare patient cohorts.

Most analysts and physicians currently use structured data for phenotyping since it is simple to extract for analysis. NLP provides analysts with a tool for extracting and analyzing unstructured data (e.g., follow-up appointments, vitals, charges, orders, interactions, and symptoms), which some experts estimate accounts for up to 80% of all patient data available. Access to unstructured data increases the amount of information available for developing phenotypes for patient groups.

For instance, Linguamatics, an NLP text mining platform, has partnered with Stead Family Children’s Hospital, Iowa, to use NLP in EHR to extract phenotype details of patients with suspected genetic disorders. This not only helped in ensuring accuracy but also led to huge savings in terms of time and costs, as NLP technology completed the task of going through data of 700 patients in just 1.2 hrs, which would have taken over 240 hrs if done manually. The tech also found 29.1 phenotype terms, as opposed to the 1.9 terms found manually.

3. Finds patient cohorts for clinical trials

Traditional methods of recruiting patients for clinical trials include screening the charts manually. If it is a rare disease, the number of available patients is even smaller. Using EHRs for e-screening and identifying patient cohorts is more effective, as NLP technology that’s present in the EHR can quickly identify the set keywords for the criteria and the patient’s phenotype, and show relevant results to the researchers that meet their requirements.

Bristol-Myers Squibb (BMS), for example, wanted to learn more about patient stratification for heart failure risk, through a clinical trial. BMS researchers collected EHR and imaging data from approximately 900 patients and used NLP to collect information on about 40 different elements related to patient demographics, clinical outcomes, clinical phenotypes, and other variables such as ejection fraction and left ventricular mass. The researchers then utilized this information to classify patients into four groups based on clinical and echocardiographic parameters.

On the other hand, Premier Applied Sciences® (PAS), healthcare improvement company Premier’s research and analytics branch, has teamed with Clinithink, a healthcare technology startup, to bring NLP technology to trial sponsors, investigators, life sciences companies, and other research organizations that could benefit from it. NLP is assisting their researchers in not only identifying the finest clinical trial candidates but also in selecting participating sites with extensive knowledge of qualifying patients.

4. Visualizing data for chart review

To understand a patient’s major past medical history, clinicians must read through multiple reports — a chart review process that typically requires Registered Nurse (RN) expertise. To speed up the chart review process, NLP is used to summarize and visualize information for the chart review so clinicians can quickly grasp a patient’s medical history.

For example, Harvard Medical School’s Translational Data Science Center for a Learning Health System (CELEHS) has collaborated with VERITY Bioinformatics researchers to develop the Chart Review Tool Powered by NLP (CHANL). This software is intended to make chart review of narrative text notes from EHRs easier. It can not only execute intelligent searches for many keywords at the same time but can also automatically detect other terms and concepts linked to the keyword in the notes.

A sample of the CHANL interface


Source: CELEHS

Furthermore, Wolters Kluwer Health has also recently announced its collaboration with Chart Review Accelerator, regarding clinical NLP solutions. To rapidly discover gaps in care, the new software automatically examines patient charts, extracting clinically essential information such as illnesses, drugs, procedures, allergies, and lab results and presents them in a visual format.

5. Improves the quality of health systems

The US federal government and its agencies mandate all hospitals to disclose specific outcome measures. One needed metric is the adenoma detection rate (ADR), which is the percentage of adenomas discovered during a colonoscopy. The present reporting procedure involves paying someone to evaluate a small sample of patient charts, look through pathology findings, and compute the ADR. This procedure can be automated and accelerated by NLP, which increases the sample size of patient charts and allows real-time analysis.

For instance, a clinician has created a report card that uses NLP to compute ADR automatically. According to studies, when physicians can see measurable results of their performance, they are more likely to change their behavior. In this scenario, physicians who received feedback on their ADR modified their habits to increase the detection rate. This is significant since every 1% increase in ADR leads to a 3% decrease in colon cancer mortality.

Challenges and limitations of NLP in EHR

While NLP applications improve the value of healthcare data and show significant potential, NLP has a long way to go to increase adoption and a large-scale impact on outcomes improvement, due to certain roadblocks.

  1. Restrictions regarding sublanguages

Another problem for NLP is sublanguage, which is a subset of natural language. Medical language is a sublanguage having a separate vocabulary list and regulations than the main language. NLP systems must comprehend the rules of a sublanguage to extract meaning from it. For example, social media is a sublanguage. It expresses meaning through abbreviations and emoticons (versus using words for the same concepts). With these distinctions, researchers cannot expect an NLP system trained on newspaper content to extract meaning from social media.

Medical language is divided into sublanguages. Medical blogs and clinical notes, for example, utilize distinct wording. Because of these distinctions, health systems should not buy off-the-shelf NLP systems designed for one sublanguage and apply them for another. NLP systems must be tailored for usage in a given language (e.g., healthcare) by developers and analyzers, and this process takes time.

  1. Problem with data identification

Only if the data is easily identifiable can good, usable data be extracted. When extracting data from EHRs, analysts frequently discover an issue with data entry: users frequently enter type information, which increases their proclivity to use shortcuts and build templates.

NLP searches for sentences rather than templates, making it difficult to work with the data within templates. Another issue is that cut-and-pasted language propagates more patient data than is necessary (note bloat) as well as obsolete or erroneous information throughout health records, making clinician notes less valuable.

  1. Inability to distinguish linguistic variation

There are numerous ways to say the same thing due to linguistic variation (e.g., derivation, in which different forms of words have a similar meaning, and synonymy, in which one concept has different words). Linguistic variance is not yet distinguished by NLP. This can cause NLP to misinterpret clinical notes which can be dangerous when it comes to developing treatment plans and operative care.

The way forward

Despite the limitations of NLP, it is opening up new and exciting possibilities for healthcare delivery and patient experience, as well as assisting the industry in bringing Healthcare 4.0 to life. It won’t be long before advanced NLP coding recognition allows physicians to spend more time with patients while also assisting them in reaching insightful conclusions based on accurate data.

Healthcare 4.0 is fast approaching. However, at a time when the industry is facing economic pressures, new regulatory requirements, accelerated digitization, and a paradigm shift to value-based care, is your organization ready to handle this shift? Netscribes can help you make the best of these opportunities and stay ahead of the competition through our healthcare market research services.

With experience working across more than 57 countries, 86 specialties, and 60 medical conditions and ailments, we couple local expertise with rich insight from carefully verified physicians, allied healthcare professionals, administrators, KOLs, payers, and patients, to help you attain a sustainable differential advantage. To know more, contact us today.

Contact Us
  • I agree to receive updates on the latest industry trends, products and services from Netscribes.
  • We respect your right to data privacy and security. You may unsubscribe from our communications at any time. For more information, check out our Privacy Policy.