Background
The delivery of informatics systems within lab-based companies has always been a very active space. Traditionally this was focused on single systems or multiple connected systems covering a specific set of workflows. Most labs now operate at least some form of LIMS, ELN, LES, SDMS, CDS etc. The current “Big Thing” is how to use those same laboratory informatics systems to deliver data that supports Artificial Intelligence and Machine Learning (AI/ML) to derive more value and to predict outcomes of changes to the workflow to achieve specific goals.
What do you want from AI/ML?
This may seem like an obvious, but important point to consider. Before embarking on your AI/ML journey, make sure you have some specific and prioritized targets in mind, for example moving to environmentally friendly raw materials, reduced toxicity effects for a class of drugs, lower costs of production by optimising process conditions and so on. This helps you to:
-
- Prioritize where to focus your efforts
-
- Estimate potential costs and expected benefits
-
- Know who in your organisation will need to be involved
-
- Know who will be impacted by the program
-
- Understand where funding is coming from
Holding workshops with stakeholders to capture and prioritize areas of focus is a useful activity in mapping out your AI/ML journey and gathering all these attributes. It will also help you to decide where to invest in preparing your data and lab applications for the journey ahead.
Is Digital Transformation a Necessary Step to AL/ML?
We are often asked if it necessary to replace lab spreadsheets with a LIMS or ELN before using AI/ML and the answer is “no” – it may require more effort to normalize your data sets to make them comparable and consistent, but there are no absolutes when it comes to your data.
Over recent years, Digital Transformation (DT) has become an area of increasing focus for businesses. DT dictates a much wider scope to laboratory systems with greater depth of functionality, increased benefits, and consequently a longer delivery time frame associated with the implementation of a programme of changes. It feels like many lab-based companies and their lab informatics teams are just starting to understand the challenges associated with DT.
Given that AI/ML works best with more data, you may want to include digital transition as a part of your AI/ML program. An incremental/parallel delivery of DT and AI/ML coverage is often a pragmatic approach, allowing your AI/ML model scope to increase as your DT program delivers more data dimensions.
Is My Data Consistent and Interoperable?
For most organisations this is a difficult question to answer without analysing the state of your informatics landscape. Unless you are a green-field facility with an unlimited budget, the chances are pretty low that your scientific informatics platforms were carefully and consistently implemented from day zero: systems evolve; get upgraded, replaced and migrated; systems from multiple vendors are used; applications have been developed in house; workflows are modified – all leading to inconsistencies in terminology and gaps in data elements. Mapping out data flows and manual transitions between systems is a great way to identify these gaps, and identify the work needed to consolidate your reference data sets. A good question to ask repeatedly during this process is ‘Which is the source of truth for this data element…?’
Is data perfection necessary before I develop my central data repository?
In short, no it isn’t. The central data repository (CDR) relies on being able to access data from each of the producing systems that will be included in the AI/ML modelling. Data is going to be transported to the data layer using either a native application tool, or an external interface such as ETL, and usually after a particular workflow has completed in the source application – for example data review and authorization. This means you are going to need a range of approaches to get the data you need in the CDR according to capabilities, data types and underlying technology of each source application. It therefore makes sense to start the CDR connectivity for each of your producing systems when the data is ready to be used, i.e. consistent, interoperable, comprehensive enough to contribute to the model.
Need more information?
If you’d like to read a more in-depth discussion of the issues highlighted in this blog, we have published a white paper on the subject “Machine Learning and Lab Informatics – Where to Start?”
Send us an email at info@scimcon.com if you’d like us to help you navigating the AI/ML journey, we’d be happy to discuss your individual needs.


