Data Quality and Data Integrity: What is the Difference?
There have been situations where people struggled to separate integrity and quality. A search of the internet only reinforced the confusion; so let’s compare and contrast the terms. Data Integrity-Defined Wikipedia defines data integrity as “maintaining and assuring the accuracy and consistency of data over its entire life-cycle”1. The FDA introduced the acronym “ALCOA”2 to provide attributes of integrity; the term “ALCOA+”3 adds four additional attributes.
The GAMP Data Integrity SIG uses ALCOA+ to guide its activities.
Data Quality-Defined: GAMP does not define data quality. AHIMA4 says, “Data quality ensures clear understanding of the meaning, context, and intent of the data.” In addition, they describe components as: Accessibility, Accuracy, Consistency, Comprehensiveness, Currency, Granularity, Definition, Relevancy, and Timeliness.
Data Integrity in Practice: Look at ALCOA+. Its attributes describe the desired state for original (raw) data collection and storage on a permanent medium. ALCOA+ assures that collected data is genuine and has an audit trail to expose any data changes to a reviewer. ALCOA+ permits people to return to that original data in the future and verify that correct decisions were made. Data integrity proves that the original value is trustworthy.
Data Quality in Practice: A database (or physical vault) of carefully collected and preserved, trustworthy data is useless if users cannot access and organize that information to make business decisions. This is the domain of data quality. Data quality establishes the standards for entering metadata which will be joined to the original data to form records, tables and views. Calculations, derivations, trends and other transformations will permit evaluation of a batch of drug product, then the manufacturing process, then the manufacturing site, then the whole manufacturing operation. Large-scale decisions go beyond any single observation, but they are invalid if systematic errors are present in original observations (e.g. a testing instrument is out of calibration). Wide-ranging decisions about global operations depend on two fundamentals: (1) the underlying data must be trustworthy; and, (2) all relevant data is in the dataset. Failure of either could result in invalid conclusions and improper decisions.
Weaving Integrity and Quality Both data integrity and data quality are the result of well-designed and executed organizational practices. While both data integrity and data quality are desirable, neither describes the whole set of regulations, principles, and activities that govern data and information throughout the data lifecycle. We might consider quality and integrity as similar, but in fact they are independent: it is possible to have integrity without quality, and quality without integrity.
To illustrate their independence, consider two cases: Case 1: Data Integrity Without Quality The laboratory has new equipment, great training and an excellent quality culture. Independent audits give them glowing reports. They create data with a high amount of integrity—you trust their data. In contrast, each manufacturing site has its own electronic batch record system(EBRS). They each have a different standard for describing materials, procedures, methods, and the like. No two sites describe their processes identically for the same product, even though all groups electronically submit data to the same lab. Quality would like to assess the manufacturing capability of product XY across the nine global sites where it is manufactured. Due to site-centric data descriptions, IT has to create a different query at each site then combine them to provide the Quality unit with the data required for the assessment.
The site-centric model made this project both more complex and expensive. Case 2: Data Quality Without Integrity The manufacturer has a single, global EBRS installation with strict data management practices that ensure database attributes are defined only once. Validated reports are available for routine operations. However, each of the nine manufacturing sites uses a local contract laboratory to conduct in-process and release testing. These contract labs keep data on their local systems, entering the final reportable value in the global EBRS system using a secured network connection in the lab.
The manufacturer conducts a cursory review of SOPs and Deviations at the contract firm every two years. If any deviations occur in the contract firm, they are responsible for investigating and closing them—but they receive no payment for deviation activities. They are paid for the number of test results they provide the manufacturer. This business scenario provides ample motivation for the contract firm to take shortcuts in practices, conduct superficial investigations, and release test results with inadequate review, and few chances to detect the poor integrity of the underlying data. The manufacturer can create necessary reports quickly and efficiently, but the data in the reports could lead to incorrect conclusions about capability, because the data in the report cannot be trusted.
Quality and/or Integrity? Data Integrity’s focus is providing a value that can be trusted by users. Data Quality’s focus is providing attributes around data values (context, metadata) so values can be sorted, searched, and filtered in an efficient manner, confident that the complete data set is included. For firms to survive it is not quality or integrity—both are necessary.
Special thanks to Bob McDowall, Principle for McDowall Consulting for reviewing this article.
Are you looking for a hands-on approach for identifying, mitigating, and remediating potential causes of breaches in data integrity?
Don’t miss this special half-day Data Integrity Workshop focused on key data integrity issues facing the pharmaceutical product lifecycle. This interactive workshop will identify important regulatory issues impacting data integrity, answer key questions surrounding current expectations, and provide an overview of the Application Integrity Policy. Learn more about the Data Integrity Workshop and how to register.
- 1. http://en.wikipedia.org/wiki/Data_integrity
- 2. “ALCOA” is an acronym for the terms Attributable, Legible, Contemporaneous, Original, Accurate. See S.W. Woollen, Data Quality and the Origin of ALCOA, in Newsletter of the Southern Regional Chapter Society of Quality Assurance, Summer 2010
- 3. ALCOA+ adds Complete, Consistent, Enduring, Available. See GCP Inspectors Working Group (GCP IWG) - Reflection paper on expectations for electronic source data and data transcribed to electronic data collection tools in clinical trials EMA/INS/GCP/454280/2010, 09 June 2010
- 4. American Health Information Management Association (AHIMA). http://www.ahima.org