Solving the Big Data problem in pharma innovation Pharmaphorum

The effectiveness of AI applications can be undermined by the volumes of unstructured data prevalent in the pharma industry. What can be done to overcome this issue?

We live in an exciting time for the pharmaceutical industry. Cutting-edge technologies like artificial intelligence (AI) and Blockchain are making headlines or revolutionising everything from drug discovery to clinical trials. Many of these innovations are built upon the same foundation: Big Data. But a longstanding challenge within Big Data must be overcome in order for technologies like AI to achieve their full potential. That challenge is unstructured data.

Unstructured data and pharmaceutical AI

The need to overcome this challenge can be illustrated by examining the consequences of unstructured data for the effectiveness of AI applications within the pharmaceutical and life science industries.

As I’ve written about in the past, the history of AI can be seen through the lens of three distinct waves. The first wave brought ‘knowledge engineering’ software that enabled efficient solutions to practical challenges. The second wave brought machine learning programs that enabled automated pattern recognition and advanced statistical analysis. We’ve now entered the third wave of AI, which has the power to generate novel hypotheses by analysing massive sets of data.

Third-wave AI has the potential to significantly accelerate the research and development process for new drugs, as companies like Merck & Co and Sanofi have begun to discover. Applications of third-wave AI programs have powered medical discoveries such as the connection between fish oil and Raynaud’s disease.

But third-wave AI applications have also suffered a series of failures in healthcare and pharmaceutical contexts. MD Anderson’s problems with IBM Watson serve as a notable example. In that instance, the problems all started when MD Anderson changed its electronic medical record (EMR) provider, preventing Watson from accessing the data that it needed. This example illustrates the challenge posed by unstructured data and the corresponding need for greater data integrity within life science industries.

Data integrity in life sciences

Many of today’s AI programs depend on good, clean data in order to operate effectively. If access to such data is compromised, the AI program’s ability to conduct analysis and generate hypotheses is undermined.

Data sets within the pharmaceutical and life science industries pose a particular challenge for AI programs because of the unusual density, depth, and diversity of biological data. Because the complexity of biological data renders it incomprehensible to many AI programs, the majority of pharmaceutical research today is carried out manually. Human researchers curate data, generate hypotheses, and perform experiments in much the same way that they have for decades. Lacking automation, the drug discovery, development, and testing process is inefficient, expensive, and often inaccurate.

The inefficiency of this process causes prolonged delays between the completion of an experiment and the publication of its results in scientific journals or databases. This delay has resulted in a significant problem with publication bias and inaccuracy in the industry. Even the open-science movement, which is attempting to increase access to not-yet-published clinical research results, depends on manually-curated datasets that are usually created by companies with proprietary interests.

Even heavily-curated data sets are often too inconsistent to be meaningfully analysed by AI. Take, for example, the challenge posed by abbreviations and acronyms within the pharmaceutical industry. The same abbreviation may carry different meanings depending on its context. ‘Ca’, for instance, could mean ‘cancer’ in one context and ‘calcium’ in another. Most AI depends on accurate and nuanced contextual information, and manually-curated data sets often fall short of this mark.

Overcoming the unstructured data challenge

Fortunately, some of the world’s leading firms have begun to explore two possible ways to overcome these challenges. One approach is to simply improve the state of available data sets. 2009’s HITECH Act modelled this approach by standardising EMR systems to create richer, more comprehensive, and more up-to-date, biological data sets. As a result, diverse data from biological patents, clinical trials, academic theses, and other sources can increasingly be analysed by advanced AI programs.

The second way to overcome the unstructured data challenge is simply to build better AI. Recent innovations have brought ‘context normalisation’ AI technology that can process and analyse unstructured, heterogeneous data points using a combination of natural language processing, machine learning, and cutting-edge text analytics. Finally, the most advanced AI programs are able to utilise disparate, incongruous data to generate novel hypotheses without the need for costly human curation.

Innovations like these are allowing researchers to analyse data, generate hypotheses, and conduct conclusive clinical trials at unprecedented levels of speed and accuracy. This is good news for pharmaceutical companies, medical professionals, and consumers alike.

The original article was published on Pharmaphorum.

About the author:

Gunjan Bhardwaj is the founder and CEO of Innoplexus, a leader in AI and analytics as a service for life science industries. With a background at Boston Consulting Group and Ernst & Young, he bridges the worlds of AI, consulting, and life science to drive innovation.

Featured News

on June 3, 2024

Cookie	Duration	Description
cookielawinfo-checkbox-analytics	11 months	The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	The cookies are used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is used to store whether or not a user has consented to the use of cookies. It does not store any personal data.

Latest Posts

Partnering to unlock the true potential of cannabis in medical care

3

Partex and Singapore’s Experimental Drug Development Centre collaborate to bring forward an innovative approach for early drug discovery and development

23

Partex Partners with Lupin to Revolutionize Drug Discovery through AI-Driven Asset Search and Evaluation

Solving the Big Data problem in pharma innovation Pharmaphorum

Featured News

Partex and Singapore’s Experimental Drug Development Centre collaborate to bring

Partex Partners with Lupin to Revolutionize Drug Discovery through AI-Driven

Partex NV announces collaboration with Althea DRF Lifesciences to provide

Innovative AI technology in oncology: Partex Group presents results from

Partex NV Forges Collaboration with Sanofi in AI-Based Dossier Enrichment

Innoplexus and AIO Studien gGmbH are jointly announcing the start

Strategic partnership announcement: Innoplexus holding company Partex NV is pleased

AI supports targeted therapy recommendations fortumor diseases – BMBF funds

Innoplexus and Innovatrix capital partner to bring about new risk-transfer

Innoplexus and Inflection Biosciences enter strategic collaboration based on Artificial

Partnering to unlock the true potential of cannabis in medical care

Partnering to make 100,000s COVID-19 publications searchable

Machine learning as an indispensable tool for Biopharma

Precision medicine and the discovery of biomarkers

Partex and Singapore’s Experimental Drug Development Centre collaborate to bring forward an innovative approach for early drug discovery and development

Partex Partners with Lupin to Revolutionize Drug Discovery through AI-Driven Asset Search and Evaluation

Partex NV announces collaboration with Althea DRF Lifesciences to provide comprehensive end-to-end services to accelerate drug discovery and development

Innovative AI technology in oncology: Partex Group presents results from a pilot project

WHO WE ARE

WHAT WE OFFER

HOW WE WORK

WHY US

Updates

Frankfurt (Germany)

Pune (India)

Iselin (USA)

Cham (Switzerland)

Ontosight^® Terminal

FREE for a limited time