Best Practices for CDMOs Dealing with Data Mayhem

Holger Amort

Oliver Kingston

It would be easy to say that the pandemic has driven dramatic growth for CDMO companies, as more pharmaceutical firms seek to outsource. Really though, the trend started much earlier, with many CDMOs having put their massive expansion plans in place a year or more before the pandemic took hold.

The true drivers of growth, instead, likely come down to cost and scale -- the potential savings that pharmaceutical companies can realize from outsourcing their drug manufacturing processes and the flexibility a CDMO can provide to those companies when launching a new product. These pre-existing markets forces then combined with factors related to the pandemic to further grow business for CDMOs.

Regardless of the market drivers in play, the challenge for CDMOs remains the same. It was one thing to manage, say, two or three clients running across four productions lines, but with business booming, those same four lines might support closer to eight clients. One product might have a line booked from 11 am to 2 pm and then another runs from 2 pm to 4pm, and so on.

And therein lies the problem – aside from the cross-contamination challenges that come into play, all those quick turnarounds will cause data mayhem. How do you compartmentalize and partition each line’s data sets? How do you provide real time data access to each customer separately? And how do you do all that while ensuring the high levels of data integrity required for safety and compliance?

The initial answer involves compartmentalizing the various data into the data historian, and then giving clients access via VPN. However, as clients -- with their different processes -- continue to multiply, even that answer begins to change. Let’s say you have five companies or more on a single line, at some point your data architecture will strain to accommodate the challenge.

If you continue to do things the way you always have, this accommodation can only be made by hiring more data scientists and building data model after data model. That’s simply not sustainable in a world where data scientists are scarce and typical data models can take two to three months to build. Instead, a truly new approach will be required, one that relies more on automated workflows and less on manual interaction.

Keeping the status quo, in other words, isn’t a viable path for contract manufacturers that intend to be agile. And be agile they must in a competitive landscape in which organizations aren’t selling a product, but their facility and technology. Those that can offer a more high-tech facility will be more attractive to clients, many of whom have distanced themselves from conventional, offline testing (chromatography) and moved to more modern inline and online testing (near-infrared or Raman spectroscopy).

So how can CDMOs best position themselves for rapid growth, especially with clients that have embraced a quality-by-design, PAT manufacturing model? Through our work with several clients, we’ve identified four best practices that organizations can follow to automate their data processes to handle the mayhem that comes when their clients multiply but their production lines remain comparatively static.

Assess Process, Not Just Tech

Assessment is an obvious first step in the journey toward automation, but less obvious is what, exactly, needs to be assessed. Organizations often overlook processes in favor of pure technology assessments. This is especially troubling as process is where most of the inefficiency resides.

For example, we’ve seen contract manufacturers with well-built data architectures, but for which it can take three to sixth months to provide employees or clients access to a specific data table. Usually, bottlenecks like these are the product of GMP adherence and the complications of effectively safeguarding each client’s IP. In other words, the challenge isn’t in data collection, but in parsing that data out by date, product, line and schedule.

In particular, though, it comes down to validation – everything, after all, has to go through a change process. We’ve found it to be the number one determining factor in a how long a project will take from start to finish. Consider the data architecture example. All the information needed has likely been collected in the data lake, but perhaps a certain piece of data has different naming conventions in one table than in another. Logic dictates that a “simple” fix needs to be made, but such a fix would trigger a change control that could touch a multitude of different documents. That could turn into a full-blown project in and of itself. Testing and validation procedures, then, are good places to start your automation journey.
Think in terms of pipelines

Just as you cannot clone human data scientists, you cannot clone their data integration efforts. So, the goal of any automation push should be to create end-to-end pipelines with reproducible deployment, integration, machine learning and AI capabilities. In other words, instead of proofs of concept and one-offs, your eye should be on scale from the beginning. For example, an anomaly detection algorithm that identifies if a pump or filter press is not working properly or will fail soon is immensely valuable. However, that value is blunted if the algorithm’s deployment to your thousands of pumps is not also automated.

To use our example above, if every pump is configured in your database with a different set of attributes, the organization will have no choice but to treat every related project as a one-off. It’s only once you’ve contextualized your assets, and defined the relationships between them, that you can seamlessly multiply your data models to enable automation at scale.
This requires mindset change more than technology or process change. Data scientists can think in terms of pipelines, of course, but to truly realize the art of the possible here, this new mentality needs to be supported with formal data governance driven from the top down.

The organization needs to coalesce around the right data contextualization and the right metadata so that, later on, models could be deployed automatically. Consider this mindset change one that shifts from equipment-specific modeling to process-specific modeling.
Prepare to feed the beast behind the dashboards

You’ve deployed your pipelines and successfully validated them, established proper safeguards and introduced mechanisms to scale them up across different facilities. These are inherently good things, but the real work is just beginning.

Continuously pulling data from a database is a difficult task for any technology platform – issues related to bandwidth, memory and CPU often follow. So, consider your future state. Can your current hardware infrastructure scale to meet the demands of advanced data modeling – hundreds or even thousands of models, not just a few proofs of concept?
Push, Don’t Pull

Handling the power of data automation is one thing -- harnessing that power is another. We do a lot of work installing and supporting operations data management platforms, for example. When we talk with people that work in the field, they often indicate that they’re only able use 3-10% of the data these systems provide. They’ve put together a 16 or 32 CPU monster, only to work it at a fraction if its capacity.

Consider: People in operations rarely have time to look at operational dashboards. And even if they did, looking at one or two dashboards is, simultaneously, a limited view of all the data available and an overwhelming amount of data for an individual to process and determine what’s most actionable.

When it comes to communicating data then, focus on communicating insights through a pre-existing channel with high adoption like Slack or Teams. Better yet, go beyond alerts by automatically triggering workflows. For example, if a predictive model flags a pump as nearing the end of its lifespan, it could automatically generate a maintenance workflow to first order and then install a replacement.
The difference between this and a flashing red alert light on a dashboard is massive. It represents the first step toward proactive maintenance -- predicting failure and scheduling maintenance to prevent it – when the alternative is stalled product lines containing millions in material.

We’ve found these best practices to be effective guideposts for CDMOs as they navigate expansion and data transformation. For many though, overcoming the challenges of data mayhem is just a first step. The data integration best practices above not only help organizations tame their data, but build the foundation needed to introduce digital twin to their manufacturing operations.

When biopharma companies work with a CDMO, they need a high degree of confidence in the manufacturer’s ability to meet their own high standards and those of the FDA. And when the CDMO market landscape is viewed through this lens, it’s clear that a digital twin can be a competitive differentiator.

When underpinned by modern methods of spectroscopy and supported by the high levels of data integrity described above, digital twin technology can help CDMOs and their clients achieve the high levels of manufacturing consistency required to deliver faster batch delivery to customers. Nothing can provide CDMO’s biopharma clients with higher levels of confidence.

iSpeak Blog posts provide an opportunity for the dissemination of ideas and opinions on topics impacting the pharmaceutical industry. Ideas and opinions expressed in iSpeak Blog posts are those of the author(s) and publication thereof does not imply endorsement by ISPE.

Facilities & Equipment

Biotech Single Use Supply Chain

About the Authors

Holger Amort

MSAT Data Analyst

Cognizant Life Science

Dr Ernst Holger Amort started his career in the chemical industry, where he worked in R&D, production and data science in Europe and the US...

Oliver Kingston

Global Strategic Account Manager

Cognizant

Oliver Kingston is a Strategic Global Account Manager with TQS Integration, A Cognizant Company. Oliver is responsible for some of TQS’s largest strategic clients, globally...

Best Practices for CDMOs Dealing with Data Mayhem

About the Authors

2025 ISPE Annual Meeting & Expo Focuses on Pharma 4.0™, with Takeda’s Gunter Baumgartner as Executive Chair and AstraZeneca’s Ciby Abraham as Conference Chair

Amgen Bets on Agility in the Age of AI

ISPE UK 2025 Emerging Leaders Awards

Students and Recent Graduates: Join us at the 2025 ISPE International Emerging Leader Hackathon

ISPE Announces New Certificate Programs from ISPE Academy

Sterile and Nonsterile Cleanroom Garments, Particle Emission Testing, Sterile and Nonsterile Gowning: Part 1 – Materials and Components

Related Articles