Big Data Consulting for Garan to Drive 100x Faster Data Processing
Summary
ScienceSoft performed an audit of an enterprise-wide big data reporting solution for a global apparel manufacturer and provided recommendations that helped significantly boost the system’s performance.
About Garan
Garan is a US-based manufacturer and distributor of branded and private-label clothing for children and adults. Founded in 1941, the company has over 4,000 employees worldwide and owns GARANIMALS® — a private brand of clothes for babies and toddlers designed to promote kids’ confidence through easy mixing and matching of items. In 2002, Garan became a wholly-owned subsidiary of Berkshire Hathaway, Inc.
Garan had an on-premises reporting solution based on Microsoft SQL Server. The company relied on it to get daily sales reports. As Garan’s business grew, the solution could not accommodate the increasing data volume, so the company upgraded the system with HDFS, Apache Spark, and Apache Hive technologies. However, the data processing speed remained subpar, which caused delays in the delivery of business-critical insights.
Garan needed the solution to process more than 300GB of ORC files every day to provide up-to-date sales insights, and the compound dataset included billions of historical records that were expanded with daily updates. The company expected the number of records to double within the next two years.
To address the pressing data processing issues and find a future-proof solution to the challenge, Garan was looking for an IT vendor with extensive experience in big data to review the reporting system and offer actionable improvement recommendations.
Big Data Solution Audit and Consultation Sessions
ScienceSoft held several interviews with Garan’s stakeholders to understand the company’s reporting needs, learn about the configurations of the big data solution, and assess the attempted remediation measures.
ScienceSoft concluded that Garan’s system needed a revamp to enable timely reporting and accommodate future data volume growth. However, taking into account the business-critical nature of the reporting solution, we suggested optimizing the existing system and carrying out a system revamp in the future.
For six weeks, ScienceSoft held sessions with five IT specialists from Garan’s team. Our experts advised them on the optimal remediation steps, gathered feedback on the achieved results, and provided instructions on further relevant measures. We also held multiple Q&A and knowledge-sharing sessions with Garan’s team.
Using ScienceSoft’s advice, the Garan achieved the following improvements:
- Implemented efficient partitioning and partition pruning of database tables, which increased the execution speed of BI queries performed using Hive.
- Replaced MapReduce with a properly configured Tez execution engine for Hive processing. The engine ensured a more efficient allocation of memory and CPU resources across different tasks.
- Moved a part of the logic to the Spark SQL cluster to enable ACID transformations.
- Fine-tuned Hive and Spark configurations.
- Used MSC repair, Spark compact files, and bucketing to optimize data storage for high-performance read operations.
Garan also received a detailed report on the pros and cons of further improving the current on-premises solution vs. building a new cloud-based system.
Kenric Smith, IT Director at Garan Inc., says:
Garan's operations largely depend on timely analytical insights, so when the performance of our big data reporting solution decreased dramatically, we needed to fix the problem as quickly as possible, ScienceSoft's consulting on Hadoop and Spark made a tremendous difference. The changes we made on their advice helped our data processing speed drop from hours to minutes.
We particularly appreciated that ScienceSoft understood the time-sensitive nature of the project and didn’t try to revamp our entire system. Instead, they offered immediate fixes that brought instant results. We're also grateful for the in-depth Q&A sessions they held with our IT team and the exhaustive consultations on further improving our solution. As we prepare to migrate to the cloud, we know that we've already found an IT vendor that is both technically savvy and highly respectful of clients' goals.
Key Outcomes for Garan
- A complete audit of the business-critical reporting solution completed within six weeks.
- 100x faster big data processing for timely sales insights and forecasts.
- The foundation for a gradual solution revamp without business disruptions. As of September 2024, Garan is using the improved reporting system and collaborating with ScienceSoft on developing a cloud-based big data analytics solution using Microsoft Fabric.
Technologies and Tools
HDFS, Apache Spark, Apache Hive, PySpark, Python, Microsoft SQL Server, T-SQL.