Data Analytics Solution to Explore Equipment Testing Data across 300 Parameters
About Our Client
The Client is a leading producer of fuel cell stacks for commercial and specialized vehicles, power units, and vessels.
Manual Data Analysis Was Inefficient and Time-Consuming
The Client has hundreds of testing equipment units that validate the performance and safety of fuel cell stacks across nearly 300 parameters. Prior to the project, the testing results data was stored in different formats and locations and used to be analyzed manually. This approach was inefficient and time-consuming, so the Client decided to build a reliable analytics solution that would enable centralized storage, visualization, and analysis of the testing results.
Defining Database Structure and Ensuring Accurate Data Transformation
ScienceSoft assigned a solution architect and two data engineers to the project. The team started with the audit of the testing results files (CSV tables) to understand the data specifics and define the structure of the future consolidated database. Based on the results, the team created the following database column structure: tests metadata, tests results, and calculated metrics (e.g., averaged and standard deviations).
The raw CSV files include test-specific parameters and have multiple variations in their structure (e.g., in the number of columns, the number of rows in the column header), which could become an obstacle to accurate data consolidation in the database. It was crucial to prevent data integration errors; at the same time, ScienceSoft’s engineers wanted to avoid describing each possible CSV table structure and its transformation rules. If done manually, this would delay the solution delivery and cause additional efforts for the Client in the future, when new tests are added and have to be described the same way.
To solve this problem, ScienceSoft’s data engineers created several setup file formats that reflect the most frequent structures of the source CSV tables. The team coded the analytics solution to compare the processed files to the setup formats and perform data distribution based on the comparison results. Such an approach ensures correct data distribution even when the source table has a nonstandard structure.
Centralized Data Storage and Processing Mechanisms
ScienceSoft’s team designed and implemented an Azure-based solution that supports the following data management steps:
Raw data ingestion and storage
Test CSV files from the testing systems are ingested into the raw data storage (Azure Blob Storage). A special trigger puts every new file in the message queue.
Data processing and uploading to the database
The database management system (Microsoft SQL Server) checks the raw storage for new files every five minutes. Once a new file is detected, it is uploaded to the database (Azure SQL Database) through an LTE (load–transform–extract) job and removed from the queue. Azure Functions compares the incoming table structure to the setup formats and distributes the data across the database.
Data access
The Client’s employees can access the consolidated testing data and use it in the following ways:
- Get self-service analytics reports by accessing the database via Excel or Power BI.
- Turn to centralized data storage for manual exploration of source files.
The Client can grant its employees access to the database via SQL Server Management Studio.
Building Interactive Power BI Reports for Test Monitoring
ScienceSoft’s data engineers created Microsoft Power BI reports with dynamic visuals for different types of tests. During each test, the equipment records multiple readings every 5 seconds. The reports illustrate these readings (e.g., cells voltage, anode- and cathode-related values) in interactive graphs. When users change the time interval, the graphs change accordingly to illustrate the readings within the required time range: from any particular five-second interval to the entire test period.
Automated Analytics and Complete View of Testing Results
The Client was fully satisfied with ScienceSoft’s services. Within six months, it received a reliable analytics solution that accumulated disparate, multi-source data into a single point of truth. The solution eliminated manual work for the Client’s employees and accelerated data analysis speed. The Client also received dynamic Power BI reports with interactive visuals that enable analysis of testing results at any required level of granularity.
Technologies and Tools
Azure, Azure SQL Database, Microsoft SQL Server, Microsoft Power BI.