iSpeak Blog

Data Integrity and Large-Scale Systems

Mark E. Newton
Data Integrity and Large-Scale Systems

This blog post will discuss the relationship between large systems that manage records at the multi-batch, multi-site level of operation and the data integrity of those records. There are several aspects to consider and they are specific to this type of environment. <

  1. Issues common to small, standalone systems (no regular backup, limited security, ability to use operating system to delete or move data files) are gone. These enterprise-capable systems have more extensive security controls and network capabilities. In addition, roles are usually customized to permit or disallow specific activities. There is a technical capability to add/remove actions from a role which may not exist in smaller applications
  2. These systems often are not the source system for data. Instead, these systems hold summary data that has been transferred. So proper data review cannot be done here—it is necessary to return to the original system for that activity
  3. These systems require administrator-level access to delete or modify data outside the normal transactions of the application. This is a good thing, because it can be used to enforce change management procedures to data changes. These types of data changes are high-risk from an integrity perspective, and they require thoughtful, strict control in design and execution
  4. These systems often manage records through the use of statuses (states). Statuses provide lifecycle management of records

A Couple of Relevant Data Integrity Truths

Some truths about data integrity are learned by experience: Data integrity, when lost, is most often lost at the point of data collection (original record). If the audit trail was not enabled, you cannot enable it later and collect the original user and date/time. If the clock is wrong, you cannot correct the date/time a week later (you can try, but it has so many downside risks you will not do it if you are wise). To get quality data, the collecting system must be working correctly at the moment of collection. There is no going back in time to fix the error. You can document the error, but not fix it. And that leads to another experienced truth: Data Integrity, once lost, cannot be restored. It can only be mitigated with other data.

Why do these truths matter? Large-scale systems mostly store records that are transferred to them. They have superior controls (checksums, encryption and audit trails) so data cannot be changed or deleted without means to detect it. But if the original system allows people to modify or delete results then data transferred into the large-scale system has poor integrity when received. Overall, data integrity is only as good as the weakest link and that link is typically the source system. This is why auditors and inspectors focus on the source systems where original data is created.

Hiding Data In Plain Sight

One unique feature of large scale systems is the ability to hide data in plain sight. This is accomplished using statuses. Here is an example:

Business Action and Associated Sample Statuses for a Fictional LIMS system
ActionStatus
Test record createdCreated
Sample pulled and labelledIn Transit
Sample received by QC Lab In Lab
QC Analyst Starts entering data In Process
QC Analyst marks testing complete Complete
QC Reviewer performs review Reviewed
QC Supervisor reviews and releases test Released
Submitter decides to remove test Withdrawn
QC Reviewer or Supervisor has an issue they need to resolve before testing can continue Suspended

The above is one example to illustrate how a LIMS system might operate. Each system has its own statuses for the lifecycle of records. So how are records hidden? In the above example, the terminal (final) status for tests is either “Released” or “Withdrawn.” Everything else is in the middle of the process. By default, test result reports provide Released status test results. But what if someone started a test, stopped it and requested a new one? The first record is still out there (unless submitter withdraws it). So as a Data Integrity auditor, look for those records in the middle. There are two pieces of information necessary to do it: (a) A list of all statuses for tests in this system (should be in a user manual or an standard operating procedure document, also known as an SOP); (b) A report of tests in a status of (one of the middle statuses—pick one). This uncovers tests in the middle of the process. If the report also shows the date the status was applied to the record it is now possible to identify tests not taken to conclusion (terminal status) in a timely manner.

If the company declares in an SOP that they do not use some specific status in the system, ask for a report of records in that “we never use it” status. It may be surprising how often records are found.

One final idea: ask for a count of records in each status. Sometimes this will determine which status to investigate in detail. For example, if a count reveals 100 suspended records, a deeper look into the specific records, the length of time in that status and the reason(s) is warranted.

Without understanding how statuses map the lifecycle of data, you might accept a report that provides data only for the most common outcome, which would be “Released” in the above example, rather than look for the unusual statuses where issues might hide, such as “In Process” or “Suspended”.

Conclusion

In summary, the use of statuses is a powerful tool to help when inspecting large electronic systems. The key is understanding how to use them to see the workflow and make decisions where additional information might provide useful insights into business behaviors.

Disclaimer:

iSpeak Blog posts provide an opportunity for the dissemination of ideas and opinions on topics impacting the pharmaceutical industry. Ideas and opinions expressed in iSpeak Blog posts are those of the author(s) and publication thereof does not imply endorsement by ISPE.

DI