Big Data Consulting to Improve the Performance of Apache Cassandra Database
About the Client
The Client is a European decentralized energy company partnering with hundreds of distributed electricity producers nationwide.
Facing a Decline in Energy Analytics App Performance
To balance energy supply and demand and avoid over- and underproduction, the Client performs multiple time-series calculations related to the energy volume produced and stored by its partners.
The Client’s analytics application automatically queries the Apache Cassandra database for 10,000+ energy production and consumption values every 5 minutes (for short-term monitoring) and every 24 hours (for daily reports). The company employees also use the database to perform ad hoc analytics.
As more partners joined the Client’s energy network, the data load increased, and the app wasn’t returning calculation results at the desired speed anymore. The Client’s IT team identified low-performing Java functions and attributed the issue to incorrect Cassandra configurations. The Client needed a professional Cassandra consultancy to confirm these assumptions and improve database performance.
Detecting Inefficiencies in Cassandra Configurations
ScienceSoft appointed a DevOps engineer and a senior data engineer to the project. The DevOps engineer checked the configurations of the application and its infrastructure and made sure there were no issues at that level. Meanwhile, the data engineer examined the database and spotted several Cassandra inefficiencies that could be causing poor app performance:
- The energy consumption and production readings were grouped by 5-minute intervals, leading to additional calculations whenever the Client needed to check combined values for a particular hour or 24 hours.
- The excessive partition size (over 100MB) made the app process large files in search of a single small value, resulting in slower analytics output.
Cassandra Optimization Measures
ScienceSoft’s data engineer provided the Client with several database optimization recommendations that would help increase the analytics performance:
- Adding an extra table field (in the format year–month–day–hour) that would allow the application to immediately get energy consumption and production data for a certain hour instead of performing the calculation based on multiple 5-minute timestamps.
- Reducing the partition size to the optimal range of 10MB to 100MB to ensure the queries address smaller, easy-to-process data chunks.
The data engineer also updated the calculation query code to match the new table structure.
Increasing Query Return Speed
Within just five days, the Client received an expert audit of its application infrastructure and Cassandra database, complete with recommendations on how to optimize the database structure. Once implemented, the changes will allow the Client to increase the performance of its energy analytics application and receive timely calculations, which is critical for informed decision-making.
Technologies and Tools
Apache Cassandra, Cassandra SQL.