Apache NiFi Managed Support to Ensure 99% System Stability and 10x Faster Data Processing
About Our Client
The Client is an American biotechnology corporation with 10,000+ employees.
Challenge
The Client has been running a big data solution for laboratory data and had an in-house team to support it. Still, the team was lacking the resources to support Apache NiFi – a data processing tool in the Client’s big data ecosystem. When the Client turned to ScienceSoft, they had a huge backlog of tasks for Apache NiFi configuration, support and enhancement, and the whole big data ecosystem suffering from delays in data transfer and processing due to the bugs in the code.
Solution
To solve the Client’s challenge, ScienceSoft dedicated a managed IT support team consisting of a Project Manager, a Senior Open Source Engineer, and a Data Engineer.
ScienceSoft started with analyzing the Client’s IT infrastructure in general and data pipelines with Apache NiFi in particular. During the discovery, our team found out that IT infrastructure monitoring was missing. To recommend the Client a fitting monitoring tool, we analyzed strengths and weaknesses of several options. Based on the analysis, our team prepared a complex monitoring solution consisting of Netdata, Prometheus, and Grafana, and tuned Apache NiFi so that it could transfer data to the monitoring solution.
To deal with a backlog of tasks for Apache NiFi configuration, support and enhancement, ScienceSoft’s team established effective collaboration with the Client’s in-house team. The majority of tasks were allocated across sprints by the in-house team, while ScienceSoft’s team took part in prioritization and could include high-priority tasks in a sprint. Every 2 weeks we had meetings with the Client where we presented the results of our work within a sprint.
To ensure fast delivery, ScienceSoft’s team introduced CI\CD pipelines for the NiFi system using NiFi Registry API.
In general, about 50% of efforts were dedicated to development and code enhancement, and another 50% - to support tasks. Now, when the development tasks are finished, the Client wants to shift to support and maintenance completely.
After observing how the Client uses Apache NiFi to satisfy their needs, ScienceSoft pointed out a better option – Apache Airflow. The Client adopted this recommendation, and included the transition from Apache NiFi to Apache Airflow into the backlog.
Results
At the project closing stage, the stabilized big data solution was able to process several queries up to 10 times faster than before. The stability of the system and the percentage of the successfully processed data increased from 50% up to 99% thanks to the enhancements implemented by our data engineers.
Technologies and Tools
Communication and collaboration tools: Slack, Google Meet.
Databases: MySQL.
Big data: Apache NiFi, Apache Kafka, Apache Zookeeper.
Monitoring tools: Netdata, Prometheus, Grafana.
DevOps: Chef.
Ticketing system: Jira.