SRE Golden Signals: How to Apply Google's Monitoring Framework to Your API in 2026
Free Tool: Health Pulse
Test this on your site — no signup required
In 2016, Google's Site Reliability Engineering book introduced a framework that fundamentally changed how the industry thinks about monitoring. The "four golden signals" — latency, traffic, errors, and saturation — give any team a principled answer to the question "what should we be monitoring?"
Ten years later, the framework is more relevant than ever. But most teams that have read the SRE book are not applying the golden signals in practice. They're monitoring things that are easy to instrument (server CPU, memory, HTTP status codes) rather than things that matter for user-facing reliability.
This guide is the practical implementation playbook: how to instrument all four golden signals for a production API, what alerts to set, and what you learn from each signal that the others won't tell you.
What Are the 4 Golden Signals?
The four golden signals are:
- Latency — how long it takes to service a request (and critically: the latency of failed requests vs successful requests)
- Traffic — how much demand is being placed on your system
- Errors — the rate of requests that are failing (explicitly or implicitly)
- Saturation — how "full" your service is, with a focus on the resource that's closest to capacity
Google's insight was that these four signals, measured together, give you enough information to detect virtually any production problem. They're not the only things to monitor — but they're the minimum viable monitoring set for a service you care about.
Signal 1: Latency
What to measure
Latency has two important dimensions:
- Latency of successful requests (your 200 OK responses)
- Latency of failed requests (your 4XX and 5XX responses)
The SRE book specifically calls out the importance of measuring both. A service that fails fast (returns 500 in 2ms) is behaving very differently from one that fails slow (returns 500 after 28 seconds of timeout). The difference matters for user experience and for diagnosing the root cause.
The percentile trap
Averages are useless for latency monitoring. A service with P50 latency of 150ms and P99 latency of 12 seconds looks "fine" in an average. The P99 customers — the slowest 1% — are experiencing a broken service.
Always instrument latency as percentiles:
- P50 (median): What the typical user experiences
- P95: What the upper-normal user experiences
- P99: What your slowest 1% experience
- P99.9: Useful for catching timeout storms and edge cases
Practical implementation
For a Node.js Express API:
const responseTime = require('response-time');
const client = require('prom-client'); // or any metrics library
const httpRequestDuration = new client.Histogram({
name: 'http_request_duration_seconds',
help: 'HTTP request duration in seconds',
labelNames: ['method', 'route', 'status_code'],
buckets: [0.01, 0.05, 0.1, 0.25, 0.5, 1, 2.5, 5, 10]
});
app.use(responseTime((req, res, time) => {
httpRequestDuration
.labels(req.method, req.route?.path || req.path, res.statusCode)
.observe(time / 1000);
}));
Alert thresholds
Set alerts based on your current P99 baseline, not on arbitrary values:
Alert: P99 latency > 3x baseline for 5 consecutive minutes
Critical: P99 latency > 10x baseline for 2 consecutive minutes
Alert: P99 latency of FAILED requests > 10 seconds (timeout cascade warning)
For external API monitoring without custom instrumentation, PingSLA's API Deep-Scan measures response time from multiple global regions and reports TTFB breakdown (DNS, TCP, TLS, server processing separately).
Signal 2: Traffic
What to measure
Traffic means requests per second (RPS) — but the useful granularity is requests per minute with a 1-minute rolling window, broken down by:
- Endpoint (your checkout endpoint vs your health check endpoint have very different traffic patterns)
- Request type (reads vs writes)
- Authenticated vs unauthenticated
Traffic monitoring answers a different question than latency or errors: is this change in behavior expected? A 50% traffic drop might mean your site is down. It might mean your marketing campaign ended. Traffic monitoring gives you the context to interpret other signals.
The two traffic alerts that actually matter
1. Traffic drop alert:
Alert: RPS drops by more than 40% from the same 1-hour window 7 days ago, sustained for 10+ minutes
Traffic drops are often more valuable than error rate alerts for detecting frontend failures. If users can't load your site at all, your server receives zero requests — your error rate stays at 0% because there are no requests to fail. Traffic monitoring catches this.
2. Traffic spike alert:
Alert: RPS increases by more than 5x baseline for 3+ consecutive minutes
Spike detection gives you advance warning of saturation before it causes errors.
External traffic monitoring without APM
If you don't have an APM setup, PingSLA's synthetic flow monitoring provides a proxy traffic signal: your scheduled monitor fires at regular intervals, and if the response pattern changes (slower responses, higher error rates), it's an early indicator of traffic-related degradation even before real user traffic shows the problem.
Signal 3: Errors
What counts as an error?
This is more complex than it appears. The SRE book distinguishes:
Explicit failures:
- HTTP 500 Internal Server Error
- HTTP 503 Service Unavailable
- HTTP 429 Too Many Requests (from the client's perspective, their request failed)
Implicit failures:
- HTTP 200 with an error body (
{"error": "not found"}) - HTTP 200 with an empty response where content is expected
- HTTP 200 with malformed or incorrect data
- HTTP 200 that takes 28 seconds to respond (success, but too slow to be useful)
Most monitoring tools only catch explicit failures. Implicit failures — 200s that are wrong — are what real production monitoring misses and what costs the most revenue silently.
Error rate calculation
Error rate = (error requests / total requests) × 100
Set alerts on error rate, not absolute error count. A 2% error rate on a high-traffic service generates more absolute errors than a 50% error rate on a low-traffic service — but the 50% rate is the catastrophic one.
Alert: Error rate > 1% for 5 consecutive minutes
Critical: Error rate > 5% for 2 consecutive minutes
Alert: Error rate of /checkout endpoint > 0.1% (checkout errors deserve lower thresholds)
Implicit error detection with body validation
For implicit failure detection on critical APIs, use schema validation to catch 200s with wrong content:
PingSLA's Schema Validator and API Deep-Scan tools can validate API response bodies against expected schemas, catching:
- Missing required fields
- Type mismatches (string returned where integer expected)
- Empty arrays where populated results are expected
- Malformed JSON
This is the difference between monitoring "is the API responding?" and monitoring "is the API responding correctly?"
Signal 4: Saturation
What to measure
Saturation is about the resource that's closest to the limit — not all resources equally. Common saturation resources:
- CPU: Useful for compute-bound workloads. Less useful for I/O-bound APIs.
- Memory: Critical for detecting memory leaks. High memory doesn't always mean problems.
- Database connection pool: The most common saturation bottleneck in web APIs. When exhausted, requests queue or fail.
- Thread pool / event loop lag: For Node.js, event loop delay is the saturation signal. Not CPU.
- Disk I/O: For log-heavy or file-processing services.
- External API rate limits: If your service calls third-party APIs, their rate limits are your saturation ceiling.
The saturation signal that most teams miss
Database connection pool exhaustion is the most common and most dangerous saturation failure mode in production APIs, and it's systematically undermonitored.
When your connection pool exhausts:
- New requests queue waiting for a connection
- Latency spikes (P99 goes from 150ms to 15 seconds)
- Eventually request timeouts trigger
- Error rate spikes
- Service appears completely down to users
The connection pool exhaustion happened 10 minutes before the error rate spike. Monitoring pool exhaustion would have given you a 10-minute head start.
For PostgreSQL:
SELECT count(*) as active_connections,
max_conn,
(count(*) / max_conn::float * 100) as utilization_pct
FROM pg_stat_activity,
(SELECT setting::int as max_conn FROM pg_settings WHERE name = 'max_connections') mc
GROUP BY max_conn;
Alert when utilization exceeds 70%.
Saturation leading indicators
The golden signals framework uses saturation as a leading indicator — a signal that predicts future problems before they manifest as errors. Good saturation alerts fire at 70–80% utilization, giving you time to act before the service degrades at 95%+.
Alert: DB connection pool utilization > 70% for 5+ minutes
Alert: Event loop delay (Node.js) > 100ms average over 1 minute
Alert: Memory utilization > 80% (with growing trend over 30 minutes)
Critical: Any resource utilization > 90%
Putting All Four Signals Together: The SLO View
The four golden signals become most useful when combined into Service Level Objectives (SLOs):
Example SLO for a checkout API:
| Signal | SLI (what we measure) | SLO (what we commit to) |
|---|---|---|
| Latency | P99 response time, successful requests | 95% of successful requests complete in < 500ms |
| Traffic | Request rate to /api/checkout | Able to serve up to 500 RPS without degradation |
| Errors | Error rate on /api/checkout | Error rate < 0.5% over any 5-minute window |
| Saturation | DB connection pool utilization | Connection pool utilization < 70% during normal load |
Defining SLOs against each golden signal gives you a structured way to prioritize reliability work: which SLO are you closest to breaching? Which breach would have the largest user impact?
External vs Internal Monitoring
The golden signals framework assumes you're instrumenting your own services internally (metrics, traces, logs). But not every team has the infrastructure for full APM.
External monitoring — where a third party runs health checks against your API from outside — complements internal monitoring in important ways:
| Internal monitoring (APM) | External monitoring (PingSLA) |
|---|---|
| Sees your server's perspective | Sees your users' perspective |
| Detects server-side errors instantly | Detects failures that only appear externally |
| Requires instrumentation code | Zero code — checks your live URLs |
| Blind to network/CDN issues | Catches CDN misconfigurations |
| Can't simulate user journeys | Synthetic flows simulate real user behavior |
The ideal monitoring stack uses both: internal APM for deep server-side observability, external synthetic monitoring for user-perspective health and flow validation.
Starting With Golden Signals Today
If you're starting from zero monitoring, the priority order is:
-
Error rate (today): Set up HTTP monitoring on your critical endpoints. Alert on 5XX rate. Takes 5 minutes with PingSLA.
-
Latency (this week): Add response time thresholds to your HTTP monitors. Alert at 3x baseline. PingSLA reports TTFB breakdown.
-
Traffic (this week): Review your application logs or analytics for baseline RPS. Set traffic drop alerts.
-
Saturation (next month): Instrument database connection pool utilization. Set 70% alert.
For latency and errors combined, the API Deep-Scan tool gives you a free one-time view of your API's health across all four signal dimensions from multiple global regions.
Summary
The four golden signals — latency, traffic, errors, saturation — give you a principled framework for deciding what to monitor. Most teams over-monitor things that are easy to measure (server CPU) and under-monitor things that matter (P99 latency, implicit errors, connection pool saturation, traffic drops).
Starting with the golden signals and gradually tightening your SLO definitions is more valuable than adding more monitoring tools. The goal is fewer, higher-signal alerts — not more dashboards.
Test your API across all four golden signal dimensions with the free API Deep-Scan. No signup required.
Monitor your site from 22 probe nodes across 16 countries →
Start 15-Day Trial →