sremonitoringapi-monitoringgolden-signalsdevopsobservability

SRE Golden Signals: How to Apply Google's Monitoring Framework to Your API in 2026

PingSLA Team··11 min read

Free Tool: Health Pulse

Test this on your site — no signup required

Try Free →

In 2016, Google's Site Reliability Engineering book introduced a framework that fundamentally changed how the industry thinks about monitoring. The "four golden signals" — latency, traffic, errors, and saturation — give any team a principled answer to the question "what should we be monitoring?"

Ten years later, the framework is more relevant than ever. But most teams that have read the SRE book are not applying the golden signals in practice. They're monitoring things that are easy to instrument (server CPU, memory, HTTP status codes) rather than things that matter for user-facing reliability.

This guide is the practical implementation playbook: how to instrument all four golden signals for a production API, what alerts to set, and what you learn from each signal that the others won't tell you.

What Are the 4 Golden Signals?

The four golden signals are:

  1. Latency — how long it takes to service a request (and critically: the latency of failed requests vs successful requests)
  2. Traffic — how much demand is being placed on your system
  3. Errors — the rate of requests that are failing (explicitly or implicitly)
  4. Saturation — how "full" your service is, with a focus on the resource that's closest to capacity

Google's insight was that these four signals, measured together, give you enough information to detect virtually any production problem. They're not the only things to monitor — but they're the minimum viable monitoring set for a service you care about.

Signal 1: Latency

What to measure

Latency has two important dimensions:

  • Latency of successful requests (your 200 OK responses)
  • Latency of failed requests (your 4XX and 5XX responses)

The SRE book specifically calls out the importance of measuring both. A service that fails fast (returns 500 in 2ms) is behaving very differently from one that fails slow (returns 500 after 28 seconds of timeout). The difference matters for user experience and for diagnosing the root cause.

The percentile trap

Averages are useless for latency monitoring. A service with P50 latency of 150ms and P99 latency of 12 seconds looks "fine" in an average. The P99 customers — the slowest 1% — are experiencing a broken service.

Always instrument latency as percentiles:

  • P50 (median): What the typical user experiences
  • P95: What the upper-normal user experiences
  • P99: What your slowest 1% experience
  • P99.9: Useful for catching timeout storms and edge cases

Practical implementation

For a Node.js Express API:

const responseTime = require('response-time');
const client = require('prom-client'); // or any metrics library

const httpRequestDuration = new client.Histogram({
  name: 'http_request_duration_seconds',
  help: 'HTTP request duration in seconds',
  labelNames: ['method', 'route', 'status_code'],
  buckets: [0.01, 0.05, 0.1, 0.25, 0.5, 1, 2.5, 5, 10]
});

app.use(responseTime((req, res, time) => {
  httpRequestDuration
    .labels(req.method, req.route?.path || req.path, res.statusCode)
    .observe(time / 1000);
}));

Alert thresholds

Set alerts based on your current P99 baseline, not on arbitrary values:

Alert: P99 latency > 3x baseline for 5 consecutive minutes
Critical: P99 latency > 10x baseline for 2 consecutive minutes
Alert: P99 latency of FAILED requests > 10 seconds (timeout cascade warning)

For external API monitoring without custom instrumentation, PingSLA's API Deep-Scan measures response time from multiple global regions and reports TTFB breakdown (DNS, TCP, TLS, server processing separately).

Signal 2: Traffic

What to measure

Traffic means requests per second (RPS) — but the useful granularity is requests per minute with a 1-minute rolling window, broken down by:

  • Endpoint (your checkout endpoint vs your health check endpoint have very different traffic patterns)
  • Request type (reads vs writes)
  • Authenticated vs unauthenticated

Traffic monitoring answers a different question than latency or errors: is this change in behavior expected? A 50% traffic drop might mean your site is down. It might mean your marketing campaign ended. Traffic monitoring gives you the context to interpret other signals.

The two traffic alerts that actually matter

1. Traffic drop alert:

Alert: RPS drops by more than 40% from the same 1-hour window 7 days ago, sustained for 10+ minutes

Traffic drops are often more valuable than error rate alerts for detecting frontend failures. If users can't load your site at all, your server receives zero requests — your error rate stays at 0% because there are no requests to fail. Traffic monitoring catches this.

2. Traffic spike alert:

Alert: RPS increases by more than 5x baseline for 3+ consecutive minutes

Spike detection gives you advance warning of saturation before it causes errors.

External traffic monitoring without APM

If you don't have an APM setup, PingSLA's synthetic flow monitoring provides a proxy traffic signal: your scheduled monitor fires at regular intervals, and if the response pattern changes (slower responses, higher error rates), it's an early indicator of traffic-related degradation even before real user traffic shows the problem.

Signal 3: Errors

What counts as an error?

This is more complex than it appears. The SRE book distinguishes:

Explicit failures:

  • HTTP 500 Internal Server Error
  • HTTP 503 Service Unavailable
  • HTTP 429 Too Many Requests (from the client's perspective, their request failed)

Implicit failures:

  • HTTP 200 with an error body ({"error": "not found"})
  • HTTP 200 with an empty response where content is expected
  • HTTP 200 with malformed or incorrect data
  • HTTP 200 that takes 28 seconds to respond (success, but too slow to be useful)

Most monitoring tools only catch explicit failures. Implicit failures — 200s that are wrong — are what real production monitoring misses and what costs the most revenue silently.

Error rate calculation

Error rate = (error requests / total requests) × 100

Set alerts on error rate, not absolute error count. A 2% error rate on a high-traffic service generates more absolute errors than a 50% error rate on a low-traffic service — but the 50% rate is the catastrophic one.

Alert: Error rate > 1% for 5 consecutive minutes
Critical: Error rate > 5% for 2 consecutive minutes
Alert: Error rate of /checkout endpoint > 0.1% (checkout errors deserve lower thresholds)

Implicit error detection with body validation

For implicit failure detection on critical APIs, use schema validation to catch 200s with wrong content:

PingSLA's Schema Validator and API Deep-Scan tools can validate API response bodies against expected schemas, catching:

  • Missing required fields
  • Type mismatches (string returned where integer expected)
  • Empty arrays where populated results are expected
  • Malformed JSON

This is the difference between monitoring "is the API responding?" and monitoring "is the API responding correctly?"

Signal 4: Saturation

What to measure

Saturation is about the resource that's closest to the limit — not all resources equally. Common saturation resources:

  • CPU: Useful for compute-bound workloads. Less useful for I/O-bound APIs.
  • Memory: Critical for detecting memory leaks. High memory doesn't always mean problems.
  • Database connection pool: The most common saturation bottleneck in web APIs. When exhausted, requests queue or fail.
  • Thread pool / event loop lag: For Node.js, event loop delay is the saturation signal. Not CPU.
  • Disk I/O: For log-heavy or file-processing services.
  • External API rate limits: If your service calls third-party APIs, their rate limits are your saturation ceiling.

The saturation signal that most teams miss

Database connection pool exhaustion is the most common and most dangerous saturation failure mode in production APIs, and it's systematically undermonitored.

When your connection pool exhausts:

  1. New requests queue waiting for a connection
  2. Latency spikes (P99 goes from 150ms to 15 seconds)
  3. Eventually request timeouts trigger
  4. Error rate spikes
  5. Service appears completely down to users

The connection pool exhaustion happened 10 minutes before the error rate spike. Monitoring pool exhaustion would have given you a 10-minute head start.

For PostgreSQL:

SELECT count(*) as active_connections,
       max_conn,
       (count(*) / max_conn::float * 100) as utilization_pct
FROM pg_stat_activity, 
     (SELECT setting::int as max_conn FROM pg_settings WHERE name = 'max_connections') mc
GROUP BY max_conn;

Alert when utilization exceeds 70%.

Saturation leading indicators

The golden signals framework uses saturation as a leading indicator — a signal that predicts future problems before they manifest as errors. Good saturation alerts fire at 70–80% utilization, giving you time to act before the service degrades at 95%+.

Alert: DB connection pool utilization > 70% for 5+ minutes
Alert: Event loop delay (Node.js) > 100ms average over 1 minute
Alert: Memory utilization > 80% (with growing trend over 30 minutes)
Critical: Any resource utilization > 90%

Putting All Four Signals Together: The SLO View

The four golden signals become most useful when combined into Service Level Objectives (SLOs):

Example SLO for a checkout API:

SignalSLI (what we measure)SLO (what we commit to)
LatencyP99 response time, successful requests95% of successful requests complete in < 500ms
TrafficRequest rate to /api/checkoutAble to serve up to 500 RPS without degradation
ErrorsError rate on /api/checkoutError rate < 0.5% over any 5-minute window
SaturationDB connection pool utilizationConnection pool utilization < 70% during normal load

Defining SLOs against each golden signal gives you a structured way to prioritize reliability work: which SLO are you closest to breaching? Which breach would have the largest user impact?

External vs Internal Monitoring

The golden signals framework assumes you're instrumenting your own services internally (metrics, traces, logs). But not every team has the infrastructure for full APM.

External monitoring — where a third party runs health checks against your API from outside — complements internal monitoring in important ways:

Internal monitoring (APM)External monitoring (PingSLA)
Sees your server's perspectiveSees your users' perspective
Detects server-side errors instantlyDetects failures that only appear externally
Requires instrumentation codeZero code — checks your live URLs
Blind to network/CDN issuesCatches CDN misconfigurations
Can't simulate user journeysSynthetic flows simulate real user behavior

The ideal monitoring stack uses both: internal APM for deep server-side observability, external synthetic monitoring for user-perspective health and flow validation.

Starting With Golden Signals Today

If you're starting from zero monitoring, the priority order is:

  1. Error rate (today): Set up HTTP monitoring on your critical endpoints. Alert on 5XX rate. Takes 5 minutes with PingSLA.

  2. Latency (this week): Add response time thresholds to your HTTP monitors. Alert at 3x baseline. PingSLA reports TTFB breakdown.

  3. Traffic (this week): Review your application logs or analytics for baseline RPS. Set traffic drop alerts.

  4. Saturation (next month): Instrument database connection pool utilization. Set 70% alert.

For latency and errors combined, the API Deep-Scan tool gives you a free one-time view of your API's health across all four signal dimensions from multiple global regions.

Summary

The four golden signals — latency, traffic, errors, saturation — give you a principled framework for deciding what to monitor. Most teams over-monitor things that are easy to measure (server CPU) and under-monitor things that matter (P99 latency, implicit errors, connection pool saturation, traffic drops).

Starting with the golden signals and gradually tightening your SLO definitions is more valuable than adding more monitoring tools. The goal is fewer, higher-signal alerts — not more dashboards.


Test your API across all four golden signal dimensions with the free API Deep-Scan. No signup required.

Monitor your site from 22 probe nodes across 16 countries →

Start 15-Day Trial →