How Innoplexus uses smart and on-edge technology to help its pharma clients find relevant answers hidden in unstructured data
When a major global pharmaceutical company wanted to accelerate the development of a new therapy line and discover how to differentiate their early stage assets, they approached Innoplexus. That decision could not have been more apt.
Since its inception in 2011, the Indo-German company has been working to help transform the global healthcare market, especially focussing on the sector of medical research, using unique and advanced solutions based on continuous data analytics, artificial intelligence (AI) and Machine Learning.
“For a few years now, we have been crawling the whole universe of publicly available data and indexing them using our proprietary semantic-and context-related ontology. Whether a drug developer is seeking existing research, a medical researcher is searching for alternative treatments, or a practitioner is attempting to find data on a particular disease -- increasing access to relevant information removes roadblocks to discovery and fuels rapid growth. We reduce the noise and help them discover new opportunities and answers previously hidden in the avalanche of structured, unstructured, and disconnected data, and increase the efficiency of managing the full picture of relevant data,” says Kumar Anshu, CEO of Innoplexus, India.
The history of AI and its impact on modern medicine
Taking us through the history of AI, Anshu explains that it can be seen through the lens of three distinct waves. “The first wave brought ‘knowledge engineering’ software that enabled efficient solutions to practical challenges. The second brought machine learning programmes that enabled automated pattern recognition and advanced statistical analysis. We’ve now entered the third wave of AI, which has the power to generate novel hypotheses by analysing massive sets of data.”
Third-wave AI has the potential to significantly accelerate the research and development process for new drugs by not only helping automate repetitive, lower-level cognitive functions that once had to be carried out manually, but also discovering new patterns and, most importantly, generating algorithms to explain them. These programmes normalise the context of disparate data points and generate original, novel hypotheses at a faster rate and with greater accuracy than human researchers can.
He further illustrates this with a quote from J.C.R. Licklider’s 1960 paper Man-Computer Symbiosis: “About 85 percent of my ‘thinking’ time was spent getting into a position to think, to make a decision, to learn something I needed to know. Much more time went into finding or obtaining information than into digesting it. Several hours of calculating were required to get the data into comparable form. When they were in comparable form, it took only a few seconds to determine what I needed to know.”
Contemporary AI solutions like those offered by Innoplexus provide researchers with a greater breadth and depth of data that is more focused and relevant, enabling them to arrive at a higher quantity of more accurate hypotheses and test these hypotheses with unprecedented speed. This leads to a significantly faster and less expensive discovery process, with lower risks and more effective results.
Proprietary technology to solve data-driven challenges
Often, it is impossible for researchers to get access to the vast wealth of medical, research, and patient data spread across thousands of sources. The team at Innoplexus says their goal is to democratise this information, bringing all of it onto one easy-to-use platform
Over two years, Gaurav Tripathi, co-founder and CTO of Innoplexus, and his team, experimented with product development and consulted with C-level executives across a range of industries. Explaining their proprietary technology, Anshu says, “We started by building a comprehensive data as a service platform (DaaS) known as iPlexusTM, by condensing the entire digital universe for life sciences, to help create a previously unrealised resource for the industry. That enabled us to build more continuous analytics applications for specific business use cases that make information more actionable for users.”
According to him, AI and Machine Learning are critical for making data actionable because, “There is an ocean of data outside. When we started looking at market opportunities, we realised that many businesses were not aware of the volume of data available in the public domain, or were spending heavily on manually developed and manually curated products. The biggest challenge is to get access to relevant data and real-time intelligence, and turn it into relevant intuitive insights. The depth of data, or its many layers that interact simultaneously, require machines that can crawl faster and draw connections faster. We created an architecture that scales easily and enables us to increase the capacity to crawl public life sciences content from 1,000 pages per second to 20,000 pages per second. Also, the data is diverse, in that it ranges from publications, to gene sequences, to patient records. AI and ML can help us go beyond what a human mind can infer from data, such as the unknown patterns, hidden networks, and undiscovered relationships between biological entities. Delivering these insights can result in major discoveries.”
Automating curation of life sciences data
Rather than build on existing technology, they invested time and effort in inventing automated ways of curating life sciences data from sources such as publications and publication abstracts, PDFs and web pages. This process involved connecting all the concepts and categories in life sciences content and developing systems and ontologies to automate those connections. Innoplexus products leverage machine and deep learning, natural language recognition, network analysis, computer vision and entity normalisation tools and algorithms.
“One important point about data from life sciences is that in many cases it’s very dense,” says Anshu. “For example, a single sentence in an extract or publication may be backgrounded by 10 papers and 20 years of research. Furthermore, that sentence does not stand alone within the publication or extract itself and has different meanings to people working in different areas in life sciences.”
iPlexusTM automates the process of crawling through billions of data points from thousands of sources, aggregating them according to specific use cases, analysing them for patterns, relations, and entities. It then presents the results in an intuitive interface with visualisations. They call this process framework CAAV -- Crawl, Aggregate, Analyse, Visualise. “The idea is to triangulate information on all known drugs, diseases, and therapeutic techniques, while making data exploration more user friendly. As a result, users don’t need to spend months getting that data or investing in proprietary solutions like they would with traditional data solutions,” says Anshu. He adds, “Through iPlexusTM, we want to facilitate better ways of discovering, exploring and analysing biomedical research.”
Facilitating access to the right research at the right time
For instance, in the case of the pharma major, which wanted to identify where and how it could differentiate its early stage assets, this was the approach adopted by Innoplexus. “We did an end-to-end mapping of clinical and pre-clinical activities and a deep dive across the scientific ecosystem to screen public databases for key assets, existing capabilities, pipelines and key resources in the innovation ecosystem. This helped us identify early-stage assets by landscaping competitor activities and related developments to anticipate competitor strategy and find key opportunities and possible risks. We also extracted development information on scientific studies, clinical and pre-clinical activities, key research and market information from public databases, and identified key peripheral capabilities, which could be leveraged for better higher innovation value proposition. This enabled the client to make calculated decisions, minimise risks and identify niche opportunities,” explains Anshu.
The platform has more than 300 terabytes of crawled and indexed scientific data across 365,000 clinical trial databases, 200 biological databases, all major patent offices, regulatory agencies and patient forums, which includes 25 million publications, 20 million patents, 10 million key opinion leaders. The numbers are continuously growing as they add to their database.
Giving more insights into their efforts to democratise research in the sector, Anshu, says, “There is a lot of significant research taking place, not only in the labs of big pharma players - but also at small pharma labs, niche biotech companies, academic institutions, etc. A lot of this gets lost because of non-availability through a common point of source. Information is scattered and in different unstructured packages, making it difficult for consumption and not readily available to researchers. Our product aims to bridge this gap by collecting, cleaning, categorizing and arranging various kinds of information readily for the user. It offers confidence that the researcher will not miss any significant information that might impact his/her research later.”
AI and the future of healthcare
Further explaining the role, he believes AI will play in transforming the global healthcare market, Anshu says, “For pharma companies, it will set the stage for a new business model within a digital setting -- from discovery to commercialisation. We also see new AI-driven drug discovery platforms emerging. Till now, clinical development has been all about setting up trial centres and getting the patients to these centres. Targeted therapy presents a paradigm shift, with patients being identified using their personal and clinical characteristics in order to direct them to the right centres. In fact, regulators are embracing AI to see how drugs can get faster market authorisation, while ensuring there are no bottlenecks, to avoid compromising patient safety.”