Best Practices for Application Performance Management

3 Pillars of Our Approach to Application Performance Management

APM is a collaborative effort between development and operations teams, QA engineers, and business stakeholders. In this process, product owners define KPIs aligned with business needs and work with developers to prioritize performance improvements, using insights from both QA and operations teams.

We tailor APM strategies to the software type and scale, domain specifics, project budget, and short- and long-term business objectives. This ensures that performance management aligns with operational priorities without overspending.

To maximize impact and minimize costs in APM, we focus on high-value activities: writing efficient code, detecting errors early, and closely monitoring core functions and high-traffic areas. We also implement canary releases or blue-green deployments and set dynamic alert thresholds.

How We Monitor, Troubleshoot, and Enhance Application Performance

To ensure lasting speed and efficiency of your applications, we follow a proactive, layered strategy that combines:

Continuous real-time system performance monitoring

By implementing solutions like Datadog, New Relic, and Dynatrace, along with custom tools and plugins, we monitor performance across the entire technology stack — from the front end, back end, and service mesh to the data storage and the underlying infrastructure. Among key metrics are response time, request throughput, error rates, and system resource utilization (CPU, memory, I/O).
For network-heavy applications, we monitor network latency and throughput to detect bottlenecks and optimize traffic flow.

For cloud services, we configure tools like Amazon CloudWatch or Azure Monitor to keep an eye on instance health, scaling activities, and service availability.
We monitor external dependencies (e.g., payment gateways, external APIs, or third-party software) to detect potential performance bottlenecks originating outside our application. For example, we can simulate transactions to ensure that third parties respond as expected. If these third-party services slow down or fail, alerts signal the team to investigate.

Weighing the gains in visibility against any negative impact on performance

While logging and monitoring tools provide essential visibility into system health, they also introduce additional system load. This load can consume valuable network bandwidth, increase CPU and memory usage, and slow down response — especially critical in high-performance or real-time applications. To counter this, we strategically place instrumentation only on the most crucial paths, ensuring minimal impact on system performance while still gathering the necessary data.

Boris Shiklo

CTO at ScienceSoft

Synthetic and real user monitoring

We set up synthetic transactions that simulate user interactions, allowing us to test performance from multiple global locations in a controlled manner, identify issues before users are impacted, and refine and adjust performance baselines.
We leverage real user monitoring (RUM) and record user sessions to collect data from user interactions and understand the actual user experience.

Automated alerts and anomaly detection

We establish alerts for critical KPIs like application latency, transaction errors, and database connection failures to prevent performance degradation from impacting user experience.
We use custom alerts and anomaly detection thresholds to immediately flag potential issues based on historical performance trends. These automated alerts help detect and react to unusual patterns, such as spikes in error rates or sudden increases in resource usage. We design alerting systems to minimize false positives, understanding that if the number of “spam” alerts is overwhelming, teams may start ignoring them, potentially missing critical issues.

Why we keep alerting thresholds dynamic

We adapt thresholds to variable system loads to ensure that our teams focus on actual performance issues that impact the UX or system reliability. For example, during off-peak hours, system resource usage might naturally decrease, so we adjust thresholds to a higher tolerance, reducing unnecessary alerts. Conversely, during peak hours or high-traffic events (e.g., Black Friday sales), we tighten thresholds to catch small performance dips that would otherwise go unnoticed.

Boris Shiklo

CTO at ScienceSoft

Diagnostics and troubleshooting

We leverage distributed tracing to identify the root causes of slowdowns or bottlenecks in complex, microservices-based architectures. This helps pinpoint which specific services or dependencies need attention.
We implement log management solutions, such as ELK Stack (Elasticsearch, Logstash, Kibana) or Splunk, which aggregate and analyze logs across the stack, enabling efficient diagnostics and higher visibility.
We implement automated corrective actions to ensure systems can recover from common failures on their own. For instance, we configure load balancers to reroute traffic away from failing instances and use auto-scaling groups to adjust resource allocation based on demand.
We set clear response and resolution times for handling performance issues and create comprehensive runbooks with step-by-step guides for diagnosing and resolving typical problems. Additionally, we establish on-call schedules, ensuring that skilled personnel are available to promptly address any performance incidents.

Capacity management and optimization

By regularly reviewing infrastructure utilization data, we ensure that resources aren’t under- or over-allocated. This includes scaling compute instances, optimizing storage, and tuning network settings as necessary.
We schedule periodic performance reviews to analyze resource consumption trends, enabling informed decisions on infrastructure scaling to ensure efficiency and cost control. This is especially important for the cloud, where misconfigured and inappropriate services can not only lead to suboptimal performance but also quickly escalate cloud costs.

Disaster recovery

We develop tailored disaster recovery strategies with clear recovery time and recovery point objectives (RTO and RPO).
We conduct quarterly or semiannual disaster drills, including unannounced tests, to validate migration and switching protocols.

Detailed logs of all recovery activities are maintained for ongoing audits and process improvements.

Continuous improvement and feedback loops

We continuously optimize code and configurations based on monitoring insights and evolving usage patterns.
We keep third-party libraries and dependencies updated to benefit from performance improvements in their new versions.
We create tailored dashboards and reports for key stakeholder groups to keep them informed about the performance and health of their applications, past incidents, performance considerations, trade-offs, and optimizations.

Sample Performance Benchmarks We Set Based on App Type

Web portals

Page load time: under 1 sec.
Time to First Byte (TTFB): under 200 ms.
Requests per second (RPS): from 100 to 1,000 RPS for standard apps; up to 100,000 RPS for high-traffic applications.
Error rate: below 1% (0.1% or lower for critical applications).
Database query performance: under 100 ms; <1 ms for frequently used queries.

Ecommerce applications

Page load time: under 2 sec even during high traffic; checkout pages under 1.5 sec.
Transaction processing time: under 3 sec.
Search query response time: under 1 sec, even with large product databases.
Uptime: 99.9% or higher.

Financial applications

Transaction processing time: under 10 ms for high-frequency trading apps; under 350 ms for real-time payment processing solutions.
Latency: under 0.5 ms for high-frequency trading apps; under 50 ms for real-time payment processing solutions.
Transactions per second (TPS): up to 1.5M TPS.
Data consistency and accuracy: 100% data integrity.
Uptime: 99.99% or higher.

ERP systems

Transaction response time: critical transactions (e.g., order processing, invoice generation) respond within 1–2 sec.
Concurrent user access: handling hundreds to thousands of concurrent users.
Data processing throughput: millions of records per hour.
Report generation: sandard reports get generated within 2–5 sec; complex reports within 30 sec.

How We Engineer Applications for Optimal Performance

We integrate APM into every stage of our software development lifecycle, starting from initial planning and design to deployment.

Gathering app performance requirements

We engage all key stakeholder groups to gather accurate and achievable performance requirements:

We interview business teams about expected response times, user experience goals, throughput targets, and overall business objectives.
With technical stakeholders, we discuss scalability, concurrency levels, resource utilization, and the technical feasibility of meeting these performance metrics.
To establish realistic and competitive performance standards, we also analyze industry benchmarks and evaluate competitors.
We align performance requirements with budget limitations, ensuring optimal speed and availability without overspending. When necessary, we refine performance targets or evaluate alternative approaches to achieve the best results without overstepping financial boundaries.

Finally, we document all performance-related requirements alongside functional ones and establish clear KPIs to support accountability and transparency.

Designing a high-performing app architecture

We architect applications to align with the chosen performance requirements by:

Segmenting the app into services, components, and layers and ensuring optimal communication methods and separation of concerns between them. While microservices are often praised for enabling high-performance, scalable systems, we recognize that a well-crafted monolithic architecture can sometimes be a more fitting solution, offering minimal latency and the highest data integrity (e.g., for banking apps).
Selecting high-performing technologies like ASP.NET, Spring Boot, Node.js, and fast databases such as Redis and PostgreSQL. Sometimes, we combine several technologies for more efficiency, e.g., by employing Go or C++ for specific performance-intensive tasks or multiple types of databases to optimize different parts of the app.
Planning caching mechanisms at various layers — client-side, server-side, and database level — to reduce load times and server processing demands.
Balancing security and performance. Security features — such as encryption, authentication, and access control — often add processing time, use up CPU cycles, and increase memory consumption. We solve this by limiting encryption to critical areas, using lightweight protocols for frequent authentications, and implementing less intrusive monitoring during low-risk periods.
Planning the infrastructure for peak load handling by leveraging automatic cloud resource scaling and failover strategies.

At the outset, we create prototypes and proofs of concept (PoCs) to validate performance assumptions, using the insights gained to refine both architecture and design pattern choices.

We then map the application’s architecture and dependencies, visualizing component interactions and analyzing the potential impacts of changes or failures on performance. With this data, our project managers and solution architects identify performance risks and develop mitigation strategies.

Programming for maximum efficiency

The main way to secure stable app performance is by writing efficient code. At ScienceSoft, we achieve this via:

Following established coding standards and practices to deliver clean and concise code.
Selecting business logic algorithms that meet core business objectives with maximum efficiency, considering execution speed, memory usage, and scalability.
Applying precise thread management. For example, we can use synchronization mechanisms to ensure that only one thread can access a shared resource at a time. To avoid deadlocks, we use techniques like lock hierarchy or timeout locks.
Utilizing resources efficiently. We optimize database queries, minimize network requests, reduce memory usage, and ensure timely disposal of objects and connection closure to prevent leaks.

Designing integrations in a way that minimizes performance impacts. For connections with internal and external services, we employ strategies such as API requests batching, cashing, asynchronous processing, batch requests, load balancing, data compression, and rate limiting and throttling.

Implementing circuit breakers, retries, and fallbacks to handle unexpected failures.

Performance testing

To ensure that applications meet performance expectations, we employ a comprehensive suite of load, stress, spike, endurance, and performance regression tests. To get realistic results, we populate test databases with data that mirrors production scenarios and use environments that closely replicate production settings, including hardware, software configurations, and network conditions.

Test automation is our primary approach for performance checks; however, we also utilize manual testing where automation is not feasible or when a more exploratory approach is needed. We integrate automated performance tests into CI/CD pipelines for continuous monitoring and early detection of issues and create reusable performance test scripts to enable ongoing performance monitoring even after the software is rolled out.

See an Example of Our Performance Test Report

ScienceSoft conducted performance testing for AnyDesk, a secure remote access platform used by over 200,000 companies for remote work and IT support. The focus was on assessing framerate, latency, and bandwidth usage to identify improvement areas and compare AnyDesk's performance against competitors. The report includes key insights and actionable recommendations to enhance user experience and optimize resource utilization.

Check the report

Regardless of the deployment strategy, we enable automated rollbacks to quickly revert to previous stable versions if performance issues are detected.

Performance-focused deployment strategies

After the app is launched, we maintain a seamless user experience while safely deploying new code into production thanks to the following strategies:

Dark launches

New features are deployed to production but kept invisible to end users. Developers observe the feature’s impact on system performance without risking user experience.

Canary releases

New features are gradually rolled out to small subsets of users to monitor performance and detect issues before broader deployment.

Feature flags

Features with flags get turned on or off instantly without redeployment to test the performance of specific components under varying loads and quickly toggle features if they negatively impact performance.

Blue-green deployments

We maintain two identical environments. At any given time, only one of them (“blue”) is live and serving users, while the other one (“green”) is idle. A new version of the app is deployed to the “green” environment for testing. If it is successful, the traffic is switched from the “blue” environment to the “green” one.

Regardless of the deployment strategy, we enable automated rollbacks to quickly revert to previous stable versions if performance issues are detected.

See How We Helped Our Clients Engineer, Test, and Improve App Performance

Improving the Performance and Scalability of an MMO Game for Yager

ScienceSoft helped a well-known video game development studio improve the scalability and performance of its new MMO game before the launch.

Check the project

Big Data Performance Consulting for a Satellite Agency

In just one month, ScienceSoft reviewed 1,000+ pages of software documentation and outlined a more efficient architecture and tech stack for a satellite-based climate monitoring solution.

Check the project

Load Testing of a Leading Edge Management Platform

ScienceSoft checked if the performance, stability, and error handling of the platform remained acceptable after a 100x load increase and described the detected bottlenecks and their mitigation strategies.

Check the project

Performance Testing of Digital Ticketing Software

ScienceSoft conducted performance testing of digital ticketing software for an Australian concert venue owner, detected the peak load the software can handle, and helped ensure its stability under continuous load.

Check the project

Performance Testing of Corporate Applications for a Major Wood Product Manufacturer

ScienceSoft’s experts evaluated capacity, stability, and configuration needs for three corporate applications, enabling smooth launches with no post-launch performance concerns.

Check the project

What Our Clients Say

Joakim Ohlander Joakim Ohlander LinkedIn

Technical Director

ScienceSoft has been a life savior for us and our players when we were about to release our video game The Cycle Frontier and were facing immediate issues in terms of backend scalability. Their combination of expert knowledge at Microsoft Azure .NET and great agile collaboration skills allowed us to start working fast and effectively together in solving problems which allowed us to release. We are forever grateful for the help ScienceSoft provided us and would recommend anyone who is in a similar situation.

Check the original

Check the project

Hillary Slovak Hillary Slovak LinkedIn

Tech Director

We cooperated with ScienceSoft’s test automation team to validate and re-validate the performance of our partner’s ecommerce store. The collaboration with ScienceSoft helped stabilize the store’s performance and ensure its full compliance with the performance requirements. We also received all the test scripts and instructions on how to run them. We recommend ScienceSoft as a quality-centered software testing partner.

Check the original

Peter Hermann Peter Hermann LinkedIn

CEO

For the past 6 years, ScienceSoft has been a reliable partner in supporting and maintaining our HR software product. They have fixed hundreds of bugs in the product’s performance since the start of our cooperation back in 2016 and keep modernizing HR on our demand, introducing new features and tweaking existing functionality to let us better cater to our growing client base.

Check the original

Gordon Seipold Gordon Seipold LinkedIn

CEO

ScienceSoft has been providing an excellent level of service maintaining our application’s AWS infrastructure, as well as deploying and configuring new Linux-based virtual machines and AWS cloud services. During our cooperation, ScienceSoft’s team has built a fault-tolerant and highly available application infrastructure with automatic crash recovery capabilities, which makes our web application stable and high-performing. We would certainly recommend ScienceSoft as a reliable managed infrastructure service provider.

Check the original

Ted Frost Ted Frost LinkedIn

Managing Director

Partnering with ScienceSoft has been an excellent experience. <…> They identified and fixed several longstanding issues that had been causing us persistent difficulties. Their communication was exemplary; unlike our previous experiences with outsourcing, we never had to chase them for updates, and they were always prompt in responding to our queries.

Check the original

Optimize Application Performance From Day One

At ScienceSoft, every step, from software design to evolution, is aimed at achieving top application speed and efficiency. Let’s discuss how our knowledge can benefit your apps.

Contact the team

ScienceSoft’s Approach to Application Performance Management

3 Pillars of Our Approach to Application Performance Management

How We Monitor, Troubleshoot, and Enhance Application Performance

Continuous real-time system performance monitoring

Synthetic and real user monitoring

Automated alerts and anomaly detection

Diagnostics and troubleshooting

Capacity management and optimization

Disaster recovery

Continuous improvement and feedback loops

Sample Performance Benchmarks We Set Based on App Type

How We Engineer Applications for Optimal Performance

Gathering app performance requirements

Designing a high-performing app architecture

Programming for maximum efficiency

Performance testing

Performance-focused deployment strategies

See How We Helped Our Clients Engineer, Test, and Improve App Performance

Improving the Performance and Scalability of an MMO Game for Yager

Big Data Performance Consulting for a Satellite Agency

Load Testing of a Leading Edge Management Platform

Performance Testing of Digital Ticketing Software

Performance Testing of Corporate Applications for a Major Wood Product Manufacturer

What Our Clients Say