May 14, 2026

4 mins read

High Availability in Production: What Running at Scale Actually Requires

I lead the Onboarding, Identity and Access Platform at Moniepoint. Every session on Moniepoint Personal and Business Banking products, across Web, Mobile, POS, and USSD, starts with a call to one of our services, which makes high availability and scalability the problem I think about every day. This article is about how we approach it.

Measuring Availability and Scalability

Two properties define whether a platform survives production:

Availability is the percentage of time a system is operational and accessible to users.
Scalability is the system's ability to handle increasing load (more transactions, more users, more data) without performance degradation.

For a platform serving over 10 million businesses and individuals, a 0.1% availability dip is not a line on a dashboard. It is thousands of transactions that did not go through, and thousands of customers who lost trust in that moment.

How we measure availability

We target 99.99% uptime on all critical services. To measure this, we register every microservice health check in GCP Uptime Checks. Each service exposes a health endpoint, GCP Uptime polls it from multiple regions, and we alert on any sustained failure.

From this, we derive two numbers that drive every incident review:

MTTD (Mean Time to Detect): How quickly our monitors catch an anomaly before a customer does.
MTTR (Mean Time to Resolve): How fast we restore service once an incident is detected.

Figure 1: Uptime trajectories for a Tier 1 microservice.

Tip: Default Spring Boot Actuator health checks are unpredictable in production. They pull in dependencies (DB, cache, message broker) whose transient failures can unnecessarily flap a service out of rotation. We always implement a custom health check we fully control, so we can fine-tune the logic separately for startup, readiness, and liveness probes.

How we measure scalability

Scalability is harder to measure passively. We measure it by running load tests against expected and projected traffic, and tracking latency (p95, p99) and error rate as throughput increases. A service is "scalable" for us when we can show the latency curve stays flat up to 2x our peak traffic.

Figure 2: Latency consistency relative to increased throughput.

Key Principles

Achieving high availability and scalability starts from the design. A poorly architected and configured system will cause sleepless nights in production. In order to achieve high availability, we adhere to the key principles below:

1. Build For Failure

Services must be stateless and horizontally scalable, and observability (metrics, logs, alerts) must be live at launch rather than added later. Failure modes are a design input, so we define expected failure scenarios during the RFC stage and build the metrics, logs, and alerts to cover them before the service ships.

2. Configure For Scale

Default configurations will not survive real throughput. Every team documents the non-functional requirements of their service (expected throughput, latency SLOs), and sizes every queue, connection pool, thread pool, and timeout from back-of-envelope calculations grounded in those numbers. A useful reference on this is Queuing Theory for Software Engineers; you can also check out this video by our VP of Research & Development: Moniepoint Research Talks 7.0: The Mathematics of Software Engineering.

Figure 3: Configuration for Redis connection pool.

3. Apply Resiliency Patterns

A retry storm will take down a healthy service faster than the original incident. Every inter-service call on our platform uses circuit breakers to prevent cascading failures, exponential backoff with jitter on retries, and fallbacks when a dependency degrades. Applying the patterns is only half the job; they must also be configured properly. The same discipline from Principle 2 applies: circuit breaker thresholds, retry counts, backoff windows, and fallback timeouts all need to be sized from the service's real throughput and latency profile, not left at library defaults.

Figure 4: Circuit Breaker configuration for a dependency.

4. Prove It

Performance claims need evidence, and intuition is not evidence. Every non-trivial change must pass automated tests (unit, integration, and E2E) before being merged, and every new service must pass a documented load test before going to production.

Figure 5: Analysis of load test outcomes for a microservice.

Conclusion

Building highly available and scalable systems is difficult, and the target moves every quarter. The principles do not. Measure everything. Design before you build. Size your configs to the real load. Defend against retry storms. Contain failures when they happen.

Our goal is simple: deliver a seamless experience to over 10 million businesses and individuals, and hit 99.99% availability while doing it. We are constantly learning, constantly measuring, and constantly refining the way we build.

If you'd like to help us meet this goal, check our careers page for open roles.

Read similar stories

Save as You Transact: How we built a savings product that reacts to millions of events

Infrastructure

May 06, 2026

Save as You Transact: How we built a savings product that reacts to millions of events

by John Ojetunde

What I Learned About AI Trust from Reconciling over 100 Billion Transactions

Infrastructure

April 07, 2026

What I Learned About AI Trust from Reconciling over 100 Billion Transactions

by Wole Olorunleke

High Availability in Production: What Running at Scale Actually Requires

Measuring Availability and Scalability

How we measure availability

How we measure scalability

Key Principles

1. Build For Failure

2. Configure For Scale

3. Apply Resiliency Patterns

4. Prove It

Conclusion

Infrastructure

Save as You Transact: How we built a savings product that reacts to millions of events

Infrastructure

What I Learned About AI Trust from Reconciling over 100 Billion Transactions

Get more stories like this