iSpeak Blog

Applying Large Language Models (LLMs) in GMP Regulated Biotech IT Compliance: A Case Study

Wenqi Kong
digital-network-connections-750px.jpg

The rational and regulated application of LLMs in GMP biotech IT compliance work can effectively eliminate the risks of information disclosure.

From a practical perspective, the adoption of such intelligent technologies has delivered tangible value and concrete benefits to daily work, such as documentation translation, presentation of ideas, and project management.

This case study presents an example of how an LLM assisted Python script was applied to streamline routine IT compliance activities, demonstrating a practical use case for integrating LLM capabilities into day to day operational workflows.

Against the backdrop of the rapid development of artificial intelligence (AI) and LLMs, people have presented divergent attitudes: some express anxiety and panic over technological changes, some embrace the innovation with enthusiasm, while others remain insensitive to such transformations. Initially, there was a period of perplexity, as the profound changes introduced by novel technologies require adequate time for exploration, validation, and practical implementation. In the role of a cross-area engineer working between the quality department and information technology, responsibilities are governed by rigorous standards of professionalism, data integrity, and confidentiality within a GMP-oriented framework. In contrast, the IT function is inherently oriented toward exploration and innovation. Consequently, the balanced and compliant application of LLMs—while reconciling quality requirements with technological advancement—requires extensive exploratory research and iterative practice.

Based on the strict requirements for access control of quality control (QC) laboratory software under the pharmaceutical regulatory environment, such as US Food and Drug Administration (US FDA) 21 CFR Part 11, European Union (EU) GMP, National Medical Products Administration (NMPA) GMP. Periodic reviews of user permissions in laboratory instrument software are required.

As an IT compliance engineer, one of the primary responsibilities is to conduct periodic reviews to verify whether user access privileges in laboratory instrument software (e.g., Waters Empower, Thermo Scientific™ Chromeleon, and Agilent CDS) are consistent with approved permission applications. Traditionally, this verification relied on manual cross checking and required a secondary reviewer. With the continuous growth in user volume, this manual process consumed a substantial amount of working time, ranging from approximately half an hour to nearly an entire workday. To reduce this workload, a Python script developed using LLM support was implemented to complete the review after user lists are exported from the QC system and the account management system.

Stage I

At the beginning, examples of exported role and group lists were uploaded directly to general LLM platforms. However, the models demonstrated ambiguous recognition—particularly for users with similar spellings—and frequent errors in identifying user accounts and permission configurations, resulting in unsatisfactory verification outcomes. Furthermore, this approach introduced significant risks related to confidential information leakage.

Stage II

When the “code generation” capability emerged in LLMs, an exploratory attempt was undertaken to assess whether non professional programmers could leverage LLMs to develop practical, automated, and reliable data processing tools. At the preliminary stage, the prompt merely instructed the model to generate a program for comparing account and permission consistency between two files. The resulting script was only able to perform data matching based on assumed logic and could not accommodate inconsistent headers or disordered data structures in the source files, leading to poor verification accuracy.

Stage III

In the subsequent optimization phase, the precise target data columns in both files were explicitly defined, and preparatory data processing was performed to standardize the input structures. The LLM was then instructed to generate Python code to compare account and permission consistency between the two files. This iteration significantly reduced manual effort and improved overall operational efficiency.

Stage IV

To further reduce the time required for manual preprocessing, the LLM was prompted with a defined role as a professional Python developer and provided with detailed, structured requirements. These included a four step program with a clear logical architecture, comprehensive code explanations, saving intermediate data sheets at each step, and real time execution feedback (such as the volume of processed data and number of user accounts). This approach fully standardized the workflow for guiding the LLM in generating the Python script.

  • By explicitly defining the exact data positions required within each file, the LLM was guided to extract valid structured data from the exported lists, while systematically filtering out irrelevant headers and footer information.
  • Core data fields, including user accounts and access permissions, were standardized and written to intermediate worksheets. Non‑value‑adding rows and columns—such as file title lines containing supplier identifiers—were removed to ensure clean and structured datasets for subsequent processing.
  • The two datasets were consolidated and cross‑validated using the approved permission application form as the authoritative reference for user access and permission control.
  • A structured logic was applied to match user accounts and verify privilege consistency, yielding standardized and interpretable verification outputs.
  • Accounts present in the approved application record but absent from the system are flagged as “Account Not Created.”
  • Accounts present in both datasets with mismatched access privileges are flagged as “Inconsistent Permission Configuration.”
  • Accounts present in the system but not documented in the approved application record are flagged as “Account Created.”

After the generated code was transferred into the visual studio, the script was executed and further refined. Through iterative debugging—guided by intermediate output worksheets and targeted correction of logical issues (e.g., incorrect column mapping and incomplete data extraction)—a fully functional automated verification script was ultimately developed.

Consequently, activities that previously required hours of manual effort can now be completed within approximately one minute through automated script execution, significantly improving the efficiency and accuracy of access reviews.

This solution was initially applied to individual systems on a case by case basis. In subsequent phases of routine work, the approach is intended to be extended to automate privilege matrix verification across multiple laboratory platforms, such as user group and role configurations within Empower. In future experimental and exploratory efforts within routine work, this approach is intended to be further evaluated and iteratively expanded to automate privilege matrix verification across multiple laboratory platforms, such as user group and role configurations within Empower. Additionally, future experimentation will focus on a more intelligent automated solution that integrates directly with the native export modules of Empower and the account application system, enabling scheduled and unattended background verification.

The long-term objective is to achieve real-time data synchronization between laboratory system access privileges and approved request data, which will require further in-depth experimentation as well as underlying system integration. It is anticipated that this practical exploration may serve as a reference for practitioners seeking to apply AI and LLMs rationally in the digital era, reduce technological uncertainty, and support the responsible and innovative use of intelligent tools in compliance governance.


Disclaimer

iSpeak blog posts provide an opportunity for the dissemination of ideas and opinions on topics impacting the pharmaceutical industry. Ideas and opinions expressed in iSpeak blog posts are those of the author(s) and publication thereof does not imply endorsement by ISPE.

ISPE members: View ISPE Communities of Practice. 
Not an ISPE member? Join today.

Submit Your Best Content to ISPE

ISPE’s official blog, iSpeak accepts contributions from our Members and professionals in the pharma industry.  

What We Look For 

References