MVP of a Data Catalog Product That Helped a Swiss Startup Attract Its First Client
About Our Customer
The Customer is a Swiss startup developing innovative data management solutions.
Need for Data Catalog Development Skills to Attract a Potential Client
The Customer had an idea for a data catalog platform that would streamline enterprise-wide data search thanks to efficient metadata management and automated access control. The startup already had a potential client who liked the described idea but wanted to see it in action. Since the Customer lacked in-house resources to develop the product, it turned to ScienceSoft, trusting our 35 years of experience in data management and analytics.
DataHub as a Future-Proof Option for Data Catalog Product
ScienceSoft’s team conducted several interviews with the Customer to understand its vision for the future data catalog platform. The Customer wanted it to be a SaaS product that would store its clients’ metadata, enable conventional and AI-powered search across enterprise data objects, and automate user access management.
As the next step, we studied the needs of the Customer’s potential client. The client is a Swiss medical center that wants to enable smooth data search across its 50+ research papers and BI reports for more than 2,500 employees.
In light of these findings, ScienceSoft suggested customizing an open-source data catalog platform to build the product. Such an approach would allow the Customer to avoid costly custom development and roll out the product faster while enabling all required software capabilities.
ScienceSoft compared several open-source platforms and chose DataHub as an optimal option for the following reasons:
- High scalability to enable stable performance even within complex data environments.
- Broad customization capabilities.
- Native integration with a wide range of data management tools (e.g., Amazon S3, Azure Synapse Analytics, MySQL, Google Cloud Storage, Apache Kafka) to ensure smooth data integration for the startup’s future clients.
- Ease of integration with AI tools to enable AI-powered data search.
- Best cost-to-performance ratio.
Since the Customer was committed to presenting the platform to its client as soon as possible, it was agreed that ScienceSoft would develop a minimum viable product (MVP). The MVP would feature native DataHub capabilities and a custom-built UI tailored to the needs of the medical center. As for the general-purpose UI and AI-based data search, the Customer planned to add them in a full-featured product version later on.
Data Catalog Product MVP With a Custom UI and Single Sign-On (SSO)
ScienceSoft deployed a DataHub-based data catalog platform within an AWS Lightsail instance. The project was carried out in a Kubernetis environment with a Goharbor instance to store and maintain deployment configurations. To enhance security and streamline the user experience, we implemented a single sign-on feature based on Microsoft Azure Active Directory that is in line with the data environment and user roles of the Customer’s client. The team provided the Customer with a detailed user guide that included instructions on logging in, data uploading, search, and more.
The data catalog works as follows:
- Users upload their databases to the data catalog platform.
- The platform automatically gathers and structures the relevant metadata (e.g., file owner name, research domain).
- Data stewards analyze the results of automated metadata upload and perform manual adjustments if needed (e.g., adding or removing metadata, creating new table columns).
- The system uses the metadata to perform the search.
ScienceSoft built a custom UI in accordance with the needs of the Customer’s client (e.g., research domains to be reflected in the navigation pane). Users type a search request, and the system returns the relevant data objects. The UI features a button for sending an automated access request to the object’s owner. After users open a data object, they can perform data searches within it, modify tags, leave comments, rate data quality, or navigate the domain tree to which the data object belongs.
Data Catalog Product Helped the Startup Win Its First Client
In just eight weeks, the Customer received an MVP of a data catalog platform that streamlines enterprise-wide data search and access. The product is built on DataHub, an open-source metadata management platform, which allowed the Customer to reduce development costs and launch the product faster. To help the Customer win its first potential client, a Swiss medical center, ScienceSoft implemented tailored features and UI elements requested by the organization. ScienceSoft also delivered a detailed software guide to help the new users adopt the product smoothly.
After trying out the MVP, the Customer’s client was very satisfied with the product and purchased the software subscription while also expressing interest in the full-featured version of the data catalog. As of June 2024, the Customer is planning to upgrade the product with AI-powered search capabilities and build a universal UI that would be attractive to diverse potential clients. The optimal tech stack suggested by ScienceSoft will prevent any scalability and customization limitations during product evolution.
Technologies and Tools
DataHub, Java, React.js, AWS Lightsail, Microsoft Azure Active Directory, Goharbor.