Turing award laureate Mike Stonebraker and I co-founded Paradigm4 in 2010 to bring technology from Mike’s MIT lab to the commercial science community to transform the way researchers interrogate and analyse large-scale multidimensional scientific data. The aim was to create a software platform that allowed scientists to focus on their science without getting bogged down in data management and computer science details – subsequently enabling more efficient hypothesis generation and validation, delivering insights to advance drug discovery and precision medicine.
Throughout his 40 years working with database management systems, Mike heard from scientists across disciplines from astrophysics, climatology and computational biology that traditional approaches for storing, analysing and computing on heterogeneous and highly dimensional data using tables, files and data lakes were inefficient and limiting. Valuable scientific data—along with its metadata—must be curated, versioned, interpretable and accessible so that researchers can do collaborative and reproducible research.
We created a technology (REVEAL™) that is purpose-built to handle large-scale heterogeneous scientific data. Storage is organised around arrays and vectors to enable sophisticated data modelling as well as advanced computation and machine-learning. This enables scientists to ask and answer more questions, and get more meaningful answers, more quickly.
Translational research is the process of applying ideas, insights and discoveries generated through basic scientific inquiry to the treatment or prevention of human disease. The philosophy of “bench to bedside” underpins the concept of translational medicine, from basic research to patient care.
There are a number of benefits to streamlining translational research, as it gives scientists the ability to integrate ‘OMICS data, clinical, EMR, biomedical imaging, wearables and environmental data to build a rich, systems-level understanding of human biology, disease and health.
We are actively working with leading biopharma companies globally, as well as research institutes. One of our current projects is working with Alnylam Pharmaceuticals to expedite their research leveraging one of the biggest genetic projects ever undertaken – the UK Biobank. Over 500,000 people have donated their genotypes, phenotypes and medical records. With so much data available on such a large scale, Alnylam’s scientists faced a challenge when it came to extracting meaningful information and making valuable connections that could unlock breakthroughs in scientific research.
The UK Biobank captures genomics, longitudinal medical information and images, so having all that data in one place allows researchers to correlate someone’s traits and presence/absence of a disease, or even susceptibility to diseases like COVID-19, with their genetic make-up. Alnylam has used our technology to help use these correlations to investigate causes of disease and identify potential treatments.
The idea of precision medicine – delivering the right drug treatment to the right patient at the right time and at the right dose – underpins current thinking in healthcare practice, and in pharma R&D. However, until single-cell ‘OMICS came along, researchers were looking at an aggregated picture – the ‘OMICs of a tissue system, rather than that of a single cell type. Now, single-cell analysis has become a major focus of interest and is widely seen as the ‘game changer’ – with the potential to take precision medicine to the next level by adding ‘right cell’ into the mix.
We offer biopharmaceutical developers the ability to break through the data wrangling, distributed computing and machine-learning challenges associated with the analysis of large-scale, single-cell datasets. Users can then build a multidimensional understanding of disease biology, scale to handle more samples from patients with more cells, more features, broader coverage and readily assess key biological hypotheses for target evaluation, disease progression and precision medicine.
By using our platform data are natively organised into arrays that can easily be queried with scientific languages, such as R and Python. The old way of working –– opening many files and transforming into matrices and data frames for use with scientific computing software –– is no longer necessary, because the data are natively “science-ready”. For companies that have tens of thousands of data sets, aggregation of that data in a usable format is tremendously empowering.
Our “Burst Mode” automated elastic computing capability makes it possible for individual scientists to run their own algorithms at any scale without requiring the help of IT or a computer scientist. The software automatically fires up and shuts down hundreds of transient compute workers to execute their task. Any researcher can access the power of hundreds of computers from a laptop.
When Covid-19 hit earlier last year we partnered with a leading pharma company to identify tissues expressing the key SARS-CoV-2 entry associated genes.. We found they were expressed in multiple tissue types, thus explaining the multi-organ involvement in infected patients observed worldwide during the ongoing pandemic.
The first data sets were from the Human Cell Atlas (HCA) and the COVID-19 Cell Atlas. Questions such as “Where is the receptor for SARS-CoV-2” or “What are the tissue distribution and cell types that contain COVID-19 receptors?” can be answered in 30 seconds or less, with responses from 30 or more data sets (since expanded to ~100). More advanced questions can now be investigated, such as the causes for complications and sequelae seen in some patients. Rather than organising all of those data, researchers can focus their attention on unlocking answers.
It has allowed us to support scientists in breaking through the complexities of working with massive single cell, multi-patient datasets. Accelerating drug and biomarker discovery is a key driver for our customers.
The life science community, as well as more commercially oriented research and development groups in pharma and biotech, understand that they need to use leading edge algorithms and cost-effective, scalable computational platforms to give them the ability ask and answer questions in seconds instead of weeks to push forward discovery. Paradigm4 gives the confidence to make earlier and adaptive change decisions that will shorten development, and provide earlier access to complex, real-time data that can detect efficacy and safety signals sooner. Importantly, working in partnership with these users, we will further improve and develop the capabilities in analysing datasets, benefitting researchers as they continue to strive for better results.