
Data Management Solution to Consolidate Data of a 50-Year-Long Heart Disease Study
About Our Client
The Client is a US university with a nearly 200-year history and a strong background in interdisciplinary research.
Need to Consolidate Longitudinal Study Data on 20K Participants
For 50 years, the Client has worked on a study that is aimed to identify dependencies between heart diseases and various factors, including lifestyle habits and accompanying conditions. The university has continuously monitored about 20,000 participants, regularly documenting different measurements and findings (e.g., blood pressure and EKG parameters, lifestyle changes). All the data was stored in disparate Excel and CSV files where multiple participant parameters (e.g., sex, age, weight, vitals) were documented according to different coding systems. The Client wanted to structure this data and build a BI solution that would allow researchers to quickly find the needed information, spot trends and dependencies, and generate reports automatically (e.g., for internal use or sharing with external organizations). Trusting ScienceSoft’s experience in healthcare and data analytics as well a proven track record of delivering solutions for research universities, the Client turned to us to carry out the initiative.
Unifying Study Data Across 40,000 Documented Parameters
ScienceSoft assembled a team of a business analyst, two data engineers, and a project manager. We communicated with research team leads and statisticians to elicit the requirements for the future system. The Client wanted a centralized data collection, storage, and management solution accessible via Power BI. The system was to accumulate all current and historical research data, ensure data accuracy and consistency through detailed mapping, and enable multifaceted data analysis. The Client stressed that its employees are to be trained on Power BI usage. It also requested that the system be built on Microsoft techs as the university’s IT infrastructure was already running on them.
After requirement engineering, we studied the provided research files. Millions of data entries across 40,000 parameters were stored in disparate files and coded according to different codebooks, so the same parameters (e.g., gender, blood pressure measurements) could have different codes in different files. Such data could not make up the basis for a comprehensive database that would power the target solution.
Taking into account the volume of data to be standardized and the associated data mapping efforts, the Client and ScienceSoft decided to start by creating a database that would store all the research data in a unified, highly structured way suitable for analytics and reporting. ScienceSoft was also to configure Microsoft Power BI web and desktop apps and provide the Client with training and user manuals on how to work with the solution. It was agreed that in the future, ScienceSoft would be tasked with delivering a more advanced system with automated loading, cleaning, and standardization of new data, as well as custom analytics features.
ScienceSoft studied the provided research files and the corresponding codebooks, created unified codes for all parameters, and mapped out the database structure. Next, we uploaded the standardized data into the database (Microsoft SQL Server + Azure Data Factory) of the Power BI web and desktop apps and configured native role-based access control mechanisms.
ScienceSoft also created an exhaustive user manual that, among other things, included instructions on granting access to new users and working with Power Apps. With ample training materials, non-IT users can easily navigate the solution and apply changes; for example, they can independently edit the mapped parameters (e.g., definition, description, or code).
Standardized Data Enabling Streamlined Data Search and Reporting
With ScienceSoft’s assistance, the Client received a Power BI data management solution that accumulated research data gathered over 50 years. We standardized the coding for 40,000 parameters and created a database that stores millions of data entries (e.g., demographics parameters, lifestyle habits, vital measurements) in a highly structured way. The Client also received an exhaustive Power BI manual, including instructions on self-service updates of parameters in the database. The solution allowed the Client to explore data in search of insights, reduce the time spent on manual data search and report creation, and enhance collaboration among the university employees thanks to the centralized access to unified data. As of April 2025, the Client plans to continue cooperating with ScienceSoft to upgrade the solution to a more comprehensive data management system with automated data upload, built-in cleaning mechanisms, and custom analytics capabilities.
Technologies and Tools
Microsoft Power BI, Microsoft Power Apps, Microsoft SQL Server, Azure Data Factory, Azure Key Vault.
More Case Studies
88 results for:

Portfolio Management and Trading Automation Software Powered by Data Science
ScienceSoft developed a fully featured algorithmic trading solution with custom predictive and prescriptive analytics models at its core. The software provides data-driven guidance on security investments for NASDAQ and AMEX traders and automates trading execution.

Data Analytics System Enabling Cross Analysis of 30,000 Attributes and 100x Faster Reporting
ScienceSoft designed and launched a scalable big data analytics system based on Apache Hadoop, Apache Hive, and Apache Spark. The new solution processes 1,000+ types of advertising data in real time and enables comprehensive analytics for different markets.