Data Management Platform to Securely Consolidate Data across 15 Subsidiaries of a Biotech Company
About our Client
The Client is a biopharmaceutical company that combines natural science and cutting-edge technologies to develop alternative treatments for mental disorders. It leverages data-driven drug development and digital therapeutics to innovate and personalize mental healthcare globally.
Decentralized Scientific Data Was Holding Back Research
The Client’s company is comprised of 15 subsidiaries that operate across the globe. Having started in 2018, the Client has accumulated vast scientific data like research findings, preclinical reports, trial protocols, and clinical trial reports. However, due to decentralized data storage and management, it was hard for the subsidiaries’ clinical scientists and managers to leverage the full value of this data.
Defining the Scope of a Data Consolidation Solution
Initially, the Client turned to ScienceSoft to build a solution that would consolidate the research data from globally dispersed sources into a data warehouse with analytics capabilities. Relying on 17 years of experience in data warehousing and first-hand expertise from 20+ healthcare data analytics projects, ScienceSoft’s team analyzed the Client’s goals and the overall maturity of its IT ecosystem. The experts concluded that most of the subsidiaries’ current systems could not be integrated with a DWH within the required deadline due to their technical and organizational isolation.
With that in mind, the Client decided to start by implementing a data management platform that would centralize the subsidiaries’ data and enable its easy navigation and presentation. Since all of the Client’s subsidiaries are legally independent entities, strictly defined user access rights were also highly important to the Client r.
To meet the Client’s priorities and tight deadlines, ScienceSoft suggested dividing the project into two steps:
- Developing an MVP that would enable the most critical capabilities: centralized data storage, role-based access, and ML-powered data search.
- Investigating the subsidiaries’ systems and data to prepare the solution to be evolved into a full-scale DWH that would enable the ingestion of unstructured data types, ML-based analytics, and direct integrations with the subsidiaries’ data sources.
An AWS Data Management Platform as a Foundation for a Data Warehouse
ScienceSoft’s team delivered the MVP design in 6 weeks. The designed solution is a data management platform that stores each subsidiary’s data in a dedicated folder, enables keyword-based search across all folders, and allows data access with restrictions defined by the data owner.
Data ingestion
The proposed MVP can ingest three data formats: DOCX, PDF, and CSV. Users manually prepare and upload files to the cloud data storage (Amazon S3) via the Secure File Transfer Protocol (SFTP). Each of the 15 subsidiaries has a dedicated folder in the centralized repository. The storage also supports data versioning, enabling users to upload newer versions of files while still having access to their previous iterations.
Data search
Users can perform keyword-based searches with filters across all folders in the storage. The system runs an AWS Lambda function to find documents that contain the given keyword. Amazon QuickSight provides a dashboard table that features the keyword-containing files and links to them.
Security
Data access is restricted by row-level security policies defining which records are revealed to any user or user group. Each folder’s owner determines the access rights.
The stored data is encrypted at rest and in transit. Encryption is also applied to the administrative, system, and user action logs.
System availability and fault tolerance
The system has recovery and backup mechanisms to enable fast recovery from an unplanned event or primary data failure. The solution can recover from any single point of failure automatically.
Evolution capabilities
Data platform layers are built with regionally distributed processing and encryption for further GDPR and HIPAA compliance.
ScienceSoft also provided the Client with an architecture design to upgrade the data management platform to a data warehouse in the future. The proposed DWH architecture enables support for unstructured data (e.g., images), data processing (e.g., slicing and dicing or data segmentation), advanced data visualization techniques (e.g., three-dimensional graphs), and ML-powered analytics capabilities (e.g., forecasting, anomaly alerting).
Cross-Subsidiary Data Availability with Granular Security
Within six weeks, the Client received a comprehensive MVP design for a data management platform. Once implemented, it will allow scientists and managers from geographically dispersed subsidiary companies to get immediate access to valuable clinical and non-clinical data, which will streamline new drug development and improve the treatment of mental disorders.
ScienceSoft provided a detailed roadmap to MVP implementation, complete with detailed guidelines on how to set up and manage the required tools and systems.
The delivered solution design enables data access control across 15 legally independent entities and is ready to be expanded with advanced data processing, visualization, and analytics capabilities.
Technologies and Tools
SFTP, Amazon S3, AWS Lambda Functions, Amazon QuickSight