500 Patch Pairs to Train ML-Based Tool for Code Vulnerability Detection and Remediation
About Our Client
The Client is a renowned provider of enterprise security solutions.
Lack of Senior JavaScript and TypeScript Developers Experienced in Secure Coding
The Client was evolving its AI/ML-powered vulnerability detection tool for secure source code development. The tool is trained to identify the reported vulnerabilities found in open-source projects and enterprise solutions from the world’s leading IT companies. The system pins down the exact location of each vulnerability in the code, highlights the contributing function or statement, prioritizes issues, and provides detailed resolution guides.
The Client needed to augment its team with a secure coding expert to gain experience in handling 25 chosen non-trivial code vulnerabilities in JavaScript and TypeScript code. So, the Client was looking for a trustworthy vendor that could promptly provide a fitting specialist to build quality data sets for ML models training and contribute to faster software product evolution.
Rapid Team Augmentation with a Secure Coding Expert
Trusting ScienceSoft’s solid experience in creating secure software solutions, the Client turned to us to empower its team with a skilled secure coding expert. To meet the Client’s needs, ScienceSoft allocated a full-time consultant with ~15 years of experience and a strong track record of JavaScript and TypeScript projects.
Our consultant got to work in two weeks after the Client’s initial request. His role on the project was to prepare large code data sets for the required two programming languages — JavaScript and TypeScript. These data sets served to train the Client’s ML models to identify the agreed complex 25 vulnerabilities.
Our secure coding expert wrote the wrong and the right code versions for each case. The code with a vulnerability served as an input for the ML models, and the correct code — as an output. Overall, data sets contained over 514 patch pairs (261 for JavaScript code and 253 for TypeScript code).
ML Data Sets to Find and Patch 25 New Vulnerabilities
In three months, the Client received two high-quality ML data sets: one is for JavaScript and the other — for TypeScript code. With this, the Client was able to train its tool to identify 25 new vulnerabilities and offer 514 different fixes to resolve them, providing its users with a significantly more powerful security solution.
Technologies and Tools
JavaScript, TypeScript