A Data Warehouse to Achieve Reporting Consistency and Enhance Power BI Performance
About Our Client
Our Client is a distributor of an internationally recognized bottled water beverage with over two decades of experience in the market. The company is dedicated to reducing its impact on the environment minimizing the use of plastic in its packaging in recent years.
Data Silos Led to Reporting Discrepancies and Subpar Power BI Performance
The Client used Microsoft Power BI to build reports on sales, marketing, and corporate finance management. Each of the 30 reports had its own data sources and separate rules for data collection, transformation, and aggregation, as well as for KPI calculation. Such an approach resulted in data discrepancies and overlaps across reports (e.g., the same KPIs were calculated according to different logic or based on different datasets). Moreover, there were significant lags in report execution time, as Power BI had to connect to six systems of different nature (e.g., cloud and on-premises databases, external data sources) and execute all data transformations from cleansing to aggregation.
To get rid of data silos and solve the described issues, the Client wanted to implement centralized data storage as a single point of reliable and consistent data and turned to ScienceSoft to carry out the initiative.
Designing Data and Analytics Architecture
ScienceSoft assembled a team of a project manager, a business analyst, a data engineer, a solution architect, and a DevOps engineer. We studied the provided documentation and conducted observation sessions and workshops with the Client’s stakeholders to learn about the Company’s organizational structure, reporting processes, and expectations. Collaborating with the Client’s team, ScienceSoft explored the company’s data sources and the roles they played for each report.
Based on the findings acquired throughout the discovery phase, ScienceSoft prepared a solution vision document and the corresponding architecture design. Since the Client’s infrastructure was based on Microsoft services (e.g., Azure SQL Database as cloud data storage), ScienceSoft used Microsoft technologies in the system design. Such an approach would allow the company to reduce the costs associated with purchasing licenses from other vendors and ensure smooth integration of the new solution into the existing infrastructure. At this stage of the project, we also calculated the total cost of ownership (ToC) for the target solution.
Implementing an Azure-Based Data Warehouse With an OLAP Cube
After receiving the Client’s approval, ScienceSoft implemented the designed data storage and reporting solution. The key elements of the solution include ETL pipelines and a data warehouse (DWH) that was engineered based on Azure SQL Database. The ETL pipelines clean, deduplicate, and validate data from all the data sources and upload it into the data warehouse (DWH). The DWH contains an OLAP cube with in-built KPI calculation logic and data aggregation patterns, which enables the delivery of the same reports as in the old solution but with improved consistency and accuracy. With the cube, business users can also build ad hoc reports, while the IT team can create new reports much faster.
To enhance data discoverability and governance, ScienceSoft supplied the solution with a metadata catalog that acts as a centralized repository of data definitions and lineage. The catalog streamlines the processes that are responsible for data consistency and accuracy.
We also implemented row-level security based on Azure Active Directory. That would allow the Client to control access to reports’ data across six user groups at a granular level and thus ensure sensitive data protection. After the solution was implemented, ScienceSoft held a knowledge transfer session with the Client’s team and handed over all the technical documentation to them.
New DWH Driving Consistent KPI Calculation and Faster Report Execution
The Client received a centralized data warehouse that streamlined the company’s previous reporting processes which were inefficient due to data silos and resulted in inconsistent calculation logic across 30 reports. The single point of reliable information securely consolidates the data across on-premises and cloud data sources and ensures consistent KPI calculation without duplication, redundancy, and other discrepancies. Thanks to the newly added OLAP cube, business users can continue getting the reports they used to have in the old solution but enjoy higher accuracy. The IT team can benefit from the cube as well, since it enables faster creation of reports from scratch. The Client also achieved higher report execution speed since Power BI uses single-source structured data instead of connecting to several separate data sources.
Technologies and Tools
Azure SQL Database, Azure Active Directory.