Data Quality Management of Real-World Data and Evidence
This article presents a model for promoting data quality of real-world data and evidence (RWD/RWE) and GxP processes with references to ISO 8000 for data quality management, making it a common language between regulated and nonregulated organizations.
Ensuring data quality across different sources and systems is a major challenge when using RWD/RWE. Several definitions of data quality have been provided by authorities and international standards. Many of these define data quality involving “fit for purpose” with which these definitions are identified or equated.
The ISO 8000 series specifies an approach that addresses data quality in line with the requirements of the ISO 9000 series, which is a well-known international standard for quality management systems (QMSs). The ISPE GAMP® 5 Guide, 2nd edition involves a “life cycle approach within a QMS” and also refers to the ISO 9000 series. Because both the international standard and the guide are based on the QMS of ISO 9000, ISO 8000 is compatible for both regulated and nonregulated organizations.
We developed a conceptual model that references ISO 8000 for data quality management, making it a common language between regulated and nonregulated organizations to manage data quality in both RWD/RWE sources and GxP processes.
Background
In recent years, the potential use of RWD and/or RWE in medical product development to support regulatory decision-making has been discussed among medical product developers and regulators. This began with the 21st Century Cures Act,1 which has been enforced by the US Food and Drug Administration (FDA) since 2016 to accelerate the development of new drugs and medicinal products using cutting-edge technology. The law provides a framework for the use of RWE for regulatory purposes. Following the framework, regulatory authorities have published guidance on the use of RWD/RWE for regulatory decision-making in Japan, the United States, and the European Union.2, 3, 4, 5, 6 Several clinical studies have been conducted with RWD.7, 8
Today, for example, many RWD databases are available in Japan9 and are established by providers such as private companies, academic societies, and health authorities. The data sources are electronic medical records from medical institutions and health insurance claims information from insurance companies. However, using the data from the electronic medical records and health insurance claims for clinical trials is not the original intended purpose for the source data. This means that the RWD databases do not always fit the sponsors’ intended purposes and can introduce a gap in the expected data quality. Therefore, researchers conducting clinical studies using RWD must evaluate the quality of the dataset.10
According to Good Clinical Practice (GCP), a trial sponsor is responsible for validating the electronic trial data system when handling the trial data with the system.11 Computerized system validation (CSV) and monitoring plans tailored to data integrity risks are GCP requirements. However, many RWD sources do not fall under GxP regulations and ensuring data quality is a significant challenge when using RWD.12 Furthermore, it is known that data integrity and data quality are different terms.13, 14, 15 In addition, the Medicines and Healthcare products Regulatory Agency (MHRA)16 and Organisation for Economic Co-operation and Development (OECD)17 state that the controls required for data integrity do not guarantee data quality.
To respond to this challenge, we wanted to clarify how to ensure the quality of RWD where providers do not rely on the CSV framework or data integrity controls when collecting RWD. We also aimed to develop reliable and trustworthy RWE from sources of RWD that may not have been validated according to standards accepted in the pharmaceutical industry.
We aimed to develop reliable and trustworthy RWE from sources of RWD that may not have been validated according to standards accepted in the pharmaceutical industry.
Methods
Investigation of Regulatory Rules
We investigated the regulatory requirements related to RWD/RWE, focusing on CSV and data quality in Japan, the United States, and the European Union, using the websites of the Pharmaceuticals and Medical Devices Agency (PMDA),18 US FDA, 19 and European Medicines Agency (EMA).20 We reviewed the relevant laws and guidelines focusing on the following points: definition of data quality and/or integrity, quality control and assurance of RWD, RWD/RWE acceptance for regulatory application, requirements for CSV, and so forth.
Investigation of Industry Standards Regarding Data Quality
First, we investigated the definitions of data quality in GxP regulations and guidelines and in international standards. GxP regulations and guidelines include the MHRA,16 OECD,17 and EMA.21 The international standard used was ISO 8000-2 [22]. Second, we reviewed the ISPE GAMP® 5: A Risk-Based Approach to Compliant GxP Computerized Systems (Second Edition),23 ISPE GAMP® Guide: Records and Data Integrity,24 ISPE GAMP® Good Practice Guide: Validation and Compliance of Computerized GCP Systems and Data – Good eClinical Practice (Second Edition),12 and standards issued by the International Organization for Standardization (ISO) on data quality (ISO 8000-1, ISO 8000-2, ISO 8000-8, ISO 8000-61, ISO/TS 8000-65)22, 25, 26, 27, 28 and quality management systems (ISO 9001).29
Conceptual Model for Ensuring RWD/RWE Quality
Based on these reviews, we designed a conceptual model that can ensure the quality of RWD/RWE by considering the following:
- The ISPE GAMP® 5 Guide Second Edition and ISPE GAMP® RDI Guide cover the GxP process but not the source of RWD/RWE
- The ISPE GAMP® 5 Guide Second Edition follows a “life cycle approach within a QMS”23
- The ISO 9000 series is a well-known international standard of QMS
- The ISO 8000 series is an international standard for data quality based on the ISO 9000 series
Studies using RWD include registration trials, postmarketing surveillance, and clinical research. In this study, we focused on RWD registration trials.
Results
Regulatory Rules on CSV and Data Quality Related to RWD/RWE
CSV is mandatory when conducting clinical studies according to the International Council for Harmonisation of Technical Requirements for Pharmaceuticals for Human Use (ICH) E6 (R3) guideline. As stated in Section 3.16 Data and Records under Part 3.16.1 Data Handling, “the sponsor should ensure that data acquisition tools are fit for purpose and designed to capture the information required by the protocol. They should be validated and ready for use prior to their required use in the trial.”11 In the definition of “computerised systems validation” found in the glossary, CSV should be performed through a risk-based approach: “the approach to validation should be based on a risk assessment that takes into consideration the intended use of the system and the potential of the system to affect trial participant protection and the reliability of trial results.”11
Similarly, for the systems used in daily medical practices, such as electronic medical records system, the “Guideline on Safety Management of Medical Information Systems” in Japan30 requires documenting the intended use of medical information systems. However, it does not explicitly require validating that the systems fulfill requirements for the intended use. In Japan, the Ministry of Health, Labour and Welfare (MHLW) has clarified data handling when using RWD, registries, or medical information databases. Data input into clinical report forms or electronic data capture (EDC) systems must be reviewed by a physician.4 It is also necessary to educate and train the related personnel. Access management for the system and data and audit trails are required. The CSV of the system for processing data used by regulatory applications is mandatory, although the electronic medical records, claims, and diagnosis procedure combination systems are beyond the scope of the CSV.
Documents and materials submitted to the US FDA as regulatory applications that include RWD/RWE should comply with the Code of Federal Regulations, Title 21 Part 11 (Part 11).31 It is also stated in the Guidance for Industry:32
| Defined by | Definition |
|---|---|
| MHRA16 | The assurance that data produced is exactly what was intended to be produced and fit for its intended purpose. This incorporates ALCOA. |
| OECD17 | Data quality is the assurance that the data produced is generated according to applicable standards and fit for intended purpose. Data quality is assured by appropriate study design that accurately and scientifi cally addresses the experimental question and hypotheses being studied and by the availability of adequate resources. Data quality affects the value and overall acceptability of the data in regard to decision-making or onward use. |
| EMA21 | Data quality is defi ned as “fitness for purpose for users’ needs in relation to health research, policy making, and regulation and that the data reflect the reality, which they aim to represent.” |
| ISO 8000-222 | Data quality is the degree to which a set of inherent characteristics of data fulfills requirements. Data specification is a set of requirements covering the characteristics of data being fit for one or more particular purposes. (Note: Through these two definitions of ISO 8000-2, it can be said that data quality involves “fit for purpose” when requirements are fulfi lled based on data specification.) |
| ISPE GAMP® 5 Guide Second Edition23 | The guide does not define data quality. |
| ISPE GAMP® RDI Guide24 | The guide does not define data quality but emphasizes the importance of a data governance framework that includes data quality management. |
“Sponsors should ensure that the interoperability of EHR [electronic health record] and EDC systems (e.g., involving the automated electronic transmission of relevant EHR data to the EDC system) functions in the manner intended in a consistent and repeatable fashion and that the data are transmitted accurately, consistently, and completely. The sponsor’s quality management plan (e.g., standard operating procedures, software development life cycle model, change control procedures) should address the interoperability of the EHR and EDC system and the automated electronic transmission of EHR data elements to the EDC system. Sponsors should ensure that software updates to the sponsor’s EDC systems do not affect the integrity and security of EHR data transmitted to the sponsor’s EDC systems. In addition, as part of the quality management plan, FDA encourages sponsors to periodically check a subset of the extracted data for accuracy, consistency, and completeness with the EHR source data and make appropriate changes to the interoperable system when problems with the automated data transfer are identified.”
However, Part 11 does not apply to electronic medical records.33 Medical records follow medical regulations and do not fall under predicate rules for GxP systems. In contrast, the MHRA stated that system validation is necessary for EHR systems.34
Industry Standards for Data Quality
There are several definitions of data quality in both GxP regulations and international standards (see Table 1), but all are comparable in terms of “fit for purpose,” where interoperability of definitions between the regulations and the standard can be found. The ISO 8000 series is an international standard for data quality built on the fundamental concepts and principles of the ISO 9000 series, which is a widely adopted QMS across industries.
The ISO 8000 series defines which data characteristics are relevant to data quality, specifies requirements applicable to those characteristics, and provides guidelines for improving data quality. The standard promotes the adoption of a process approach, and these processes follow the fundamental structure of the Plan‑Do‑Check‑Act (PDCA) cycle as adopted in ISO 9001.29
ISO 8000 uses the following three categories to measure data quality:26
- Syntactic quality, which is the degree to which data conforms to its specified syntax, i.e., requirements stated by the metadata
- Semantic quality, which is the degree to which data corresponds to what it represents
- Pragmatic quality, which is the degree to which data is found suitable and worthwhile for a particular purpose
Syntactic and semantic quality are measured through a verification process, whereas pragmatic quality is measured through a validation process, where verification and validation are defined by ISO 9000.
These three categories provide clues for addressing data quality challenges that consist of complex factors. For example, errors in the format of the data interfaced between the source system of the RWD and GxP systems pose a problem in terms of syntactic quality, and selection of inappropriate data as a surrogate variable can be an issue in terms of meaning or relevance (a semantic issue), both of which can bring a pragmatic quality issue.
In addition, the ISO 8000 series refers to the ISO/IEC 25000 series, known as SQuaRE (software product quality requirements and evaluation), which provides a practical view of data quality in the context of individual systems. This view complements the ISO 8000 series, which addresses broader considerations, including how organizations can maintain data quality across multiple systems and when data crosses organizational boundaries.22
Conceptual Model for Ensuring Quality of RWD/RWE
We designed a conceptual model to ensure the quality of RWD, as shown in Figure 1. Where GxP regulations are applied, the ISPE GAMP® 5 Guide Second Edition, the ISPE GAMP® RDI Guide, and applicable GAMP® Good Practice Guides are referenced by the regulated companies for their GxP compliance. Although these GAMP guides contribute to data quality management, attention must be paid to the fact that data integrity controls do not guarantee data quality, as mentioned by MHRA, OECD, and one of the GAMP® Good Practice Guides.12, 16, 17 In addition, if GxP regulations are not applied, the guides may not be referred to even though the data are used for RWD by regulated companies. To explicitly cover data quality aspects for both GxP-regulated and nonregulated data, ISO 8000 can be used to build a data QMS for both regulated companies and suppliers of RWD.

Discussion
The US FDA has established a framework for the RWE Program to evaluate the potential use of RWE to support approval of a new indication for an already approved drug.5 In addition, the US FDA has issued guidelines on the use of registry data in regulatory decision-making [35]. These documents state that sponsors should consider whether the data is fit for use by assessing its relevance and reliability before using it in regulatory decision-making.
The EMA and heads of medicines agencies have issued guidelines to establish a data quality framework (DQF) that provides a standardized approach to the use of RWD for regulatory purposes. 21. The DQF states that already collected data should be assessed to determine whether the data is fit for decision-making. The MHLW has issued guidelines for the use of registry data for regulatory applications.2 The guidelines relate to the appropriateness of using registry data. Sponsors should consider using registry data appropriately, as the use of unsuitable registry may lead to incorrect interpretations or conclusions regarding the development of drugs, medical devices, or regenerative medicine products.
However, these regulatory documents do not mention CSV for RWD source systems. The US FDA’s guidance requires processes and procedures for registries to ensure data quality of the registries, but not CSV for them. The MHLW states in its administrative communication that if data is transferred from the EMR to the EDC electronically, the transfer program should be validated. 4 To the best of our knowledge, only the MHRA requires CSV for EMR systems.
As seen previously, there are differences in regulations between regions and regulators, such as the MHRA requiring the CSV of the EHR system, whereas both the FDA and MHLW do not. The EMA says that the sponsor should determine during site selection whether electronic medical record systems deployed at investigators/institutions are fit for purpose.6 Therefore, the data quality and integrity of EHR/RWD data must be assessed before it is used in clinical trials because this data is recorded for purposes other than clinical trials.
Although CSV and/or data integrity controls are not required, ensuring the quality of the RWD is essential for achieving the intended purpose. However, nonregulated organizations, including suppliers of RWD, may not refer to the GAMP guides such as the ISPE GAMP® 5 Guide Second Edition and ISPE GAMP® RDI Guide, which are applicable to regulated companies. Even if reference is made to the guides, attention needs to be paid that the controls for data integrity do not guarantee data quality.16, 17
We designed a conceptual model to ensure the quality of the RWD by referring to ISO 8000. Because ISPE GAMP® 5 Guide Second Edition has a “life cycle approach within a QMS”23 and ISO 9000 is widely adopted across industries, it is expected that ISO 8000 (which is based on ISO 9000) is compatible for both regulated and nonregulated organizations. In addition, it is useful for both organizations to communicate with each other using ISO 8000 as a common language when developing approaches to ensure data quality.
Generally, these international standards are applied to specific organizations at a feasible level for these organizations by themselves. Therefore, even if an organization applied ISO 8000 for itself, it does not automatically that mean the organization has a sufficient level of data QMS for other organizations. Also, there is no official certification scheme available regarding ISO 8000 so far, although ISO 9000 certification is widely available.
However, these international standards help establish defined terminology and a general framework for data quality, which provides a way to assess external data sources for regulated companies. For example, the three categories used in ISO 8000 to measure data quality—syntactic, semantic, and pragmatic—provide an analytical view for data quality challenges. Another example is ISO/TS 8000-65,28 which provides a questionnaire that consists of 53 questions to evaluate an organization’s data quality management implementation.
This study has three limitations. First, it was based on currently available regulatory requirements, guidelines, and laws issued by health authorities in Japan, the United States, the European Union, and the United Kingdom. When new guidelines are issued in the future, more attention should be paid. Although further investigation in other regions, such as Asia, may provide insights into various aspects, regulations in these countries are expected to provide sufficient information to enable us to generate a valuable understanding of the current state of RWD/RWE usage.
Second, we presented a conceptual model showing that ISO 8000 can be a common language for both regulated and nonregulated companies to ensure data quality of RWD and do not present practical instructions in applying ISO 8000 for readers. We intend to design more pragmatic guidance to apply ISO 8000 at our next step. Finally, our focus was on how to ensure the quality of data in RWD. Third, we did not investigate how to design clinical studies using RWD from a therapeutic-area-specific perspective. However, these conclusions can be generalized to many therapeutic areas.
Conclusion
EHR systems are typically not validated, and regulatory agencies typically do not expect EHR systems to be validated. EHR data is primarily recorded for a different purpose other than clinical trials. Therefore, the data quality and integrity of her and RWD/RWE data must be assessed before it is used in clinical trials.
ISO 8000, based on ISO 9000, is an industry standard that helps define terminology and provide a general framework for data quality. In general, it may be easier to use data from organizations that adheres to ISO 8000 because ISO 9000 has been considered in the creation of GAMP guidance, but a more in-depth assessment is required to ensure that the data is fit for the intended purpose.
Acknowledgments
We thank Mr. Takumi Fukusaki for helpful comments on information management. We thank the ISPE Japan Affiliate for supporting us in refining the English. This study was presented in part during the poster session at the 15th Annual Meeting of the Japan Society of Clinical Trials and Research on 8 March 2024, in Osaka, Japan, and at the GAMP Japan Seminar on 18 October 2024, in Tokyo, Japan.