Streamlining Real-World Data Access: FAIR Data Practices at Bayer
Discover how Bayer empowers scientists by providing seamless access to Real-World Data (RWD) that supports their research inquiries using FAIR data products:
· Datasets are registered and discoverable in a FAIR data catalog
· We use a community standard to represent Terms of Use.

Overview
Bayer’s research teams generating real-world evidence operate independently, leading to decentralized Real World Data (RWD) acquisition, scattered data repositories, and a lack of coordination. This fragmentation results in missed opportunities for data sharing and infrastructure collaboration, incurring unnecessary costs. Consequently, colleagues spend a significant amount of time locating, understanding, and accessing real-world data (RWD).
Additionally, teams purchasing the data sources are responsible for compliant use of the data and must navigate complex license agreements, which vary by dataset, complicating data reuse and compliance.
To resolve these issues, we applied FAIR data and data product principles to storing and cataloging RWD. Our RWD store provides essential services, processes, and infrastructure, detailed in the following sections.
Process
1) Register in a Data Catalog
New datasets are registered in a data catalog. The data catalog is built to provide Global, Unique, Permanent, Resolvable Identifiers (GUPRIs). We utilize Colid (https://bayer-group.github.io/COLID-Documentation/#/), an open-source data catalog developed at Bayer. (see reference section)
2) Add metadata for discoverability
We enhance registered resources with metadata to improve discoverability. The metadata enhancement includes resource names, coding systems, dataset dimensions, therapeutic areas, and other relevant aspects, following a real-world data-specific ontology developed with our scientists.
3) Annotate data policy terms
Additional metadata extracted from dataset license contracts describes permitted, prohibited, or required actions when working with the dataset. We utilize the Open Digital Rights Language (ODRL) along with an action taxonomy to provide a unified representation of diverse contracts.
4) Assign Data Stewards
We assign a data steward to each resource, who is responsible for managing authorizations and ensuring the accuracy of metadata.
Outcomes
This initiative enhances transparency regarding available datasets, leading to several key benefits: – Increased return on investment through the repurposing of datasets for unforeseen secondary uses.
- Improved strategic planning for data investments as dataset transparency rises, allowing colleagues to collaborate.
- A professionalized dataset offering that encourages sharing due to clearly defined processes and established infrastructure.
Publishing relevant metadata enables scientists to find and access datasets pertinent to their research easily. Clear data policy descriptions clarify permissible actions, reducing the need for consultation with data stewards or legal teams and improving compliance with policies.
Overall, these advancements significantly enhance time-to-insight, reducing the duration from research questions to actionable evidence.
References and Resources
- Colid: https://bayer-group.github.io/COLID-Documentation/#/https://bayer-group.github.io/COLID-Documentation/#/
- Corporate Linked Data – short: COLID – is a technical solution for corporate environments that provides a metadata repository for corporate assets based upon semantic models. It was developed by Bayer colleagues and made open source.
- ODRL: https://www.w3.org/TR/odrl-model/https://www.w3.org/TR/odrl-model/
- The Open Digital Rights Language (ODRL) is a policy expression language that provides a flexible and interoperable information model, vocabulary, and encoding mechanisms for representing statements about the usage of content and services.
At a Glance
Team
4 Data Engineers
Timeline
Set up: 6 months, now running in the fifth year
Benefits and Deliverables
- Enhanced transparency and compliance improve dataset visibility, adherence to policies and strategic dataset acquisition planning.
- Higher return on investment through repurposing datasets for new use cases.
- A FAIR data catalog with assigned Data Stewards ensures accurate metadata management and effective governance.
Authors
- Alexandra Grebe de Barron
- Marius Michaelis
- Matthias Jurisch
Top Tips
- Be Pragmatic: Focus on practical solutions that offer real utility.
- Use Community Standards: Start by modeling examples with relevant established standards.
- Engage Users Early: Conduct user research with early prototypes throughout development.