The Monitoring Checklist Every SaaS Team Should Run Before Deploying
Most deploy failures are caught by customers. You push at 3 PM, the team celebrates green pipelines, and by 4 PM the first support ticket arrives: "Something broke." The root cause is almost never missing tests — it's missing monitoring during the deployment window.
Pipelines test code in isolation. Monitoring tests code in production, with real users, real infrastructure, and all the third-party dependencies that staging environments don't replicate. This checklist covers what to verify before, during, and after every production deploy to close the gap.
Why Deploy Failures Hit Users First
Automated tests cover your code paths. They don't cover:
- CDN cache state after a deploy
- Third-party service behavior with your new code
- Real user latency from global locations
- Browser-rendered JavaScript behavior at scale
- Downstream API responses to your updated request format
The deploy itself often changes the infrastructure state in ways that tests don't model: cache invalidation, connection pool restarts, CDN propagation delays. The 5-minute window after a deploy is when these emergent failures appear — and the monitoring that catches them needs to be already running, not spun up reactively.
Phase 1: Pre-Deploy Checklist
Run these checks before deploying to production. They establish a baseline and confirm your current state is healthy — so any degradation post-deploy is immediately visible.
1. Run the Infrastructure Audit
Go to pingsla.com/tools/infra-audit and run a full infrastructure scan. This gives you a baseline for SSL certificate status, DNS records, security headers, and performance metrics. Save the report. If something changes post-deploy, you'll need this baseline to confirm the deploy caused it.
2. Record your TTFB baseline from all regions
Run the Latency Detector and record the current TTFB from each probe region. Post-deploy TTFB should be within 10% of baseline. A 30% TTFB increase in a specific region is a clear signal of cache invalidation issues or CDN misconfiguration.
3. Confirm all flow monitors are passing
If you have PingSLA flow monitors configured for login, checkout, and your primary API, verify they're all passing before you touch production. You don't want to deploy into a pre-existing incident.
4. Check SSL certificate expiry
Run the SSL + DNS Hunter on your domain. Never deploy with an SSL certificate expiring in under 14 days. Some deploy processes (particularly those involving load balancers or CDN configurations) can inadvertently trigger certificate re-issuance or reset — having an already-expiring certificate in this window is a compounding risk.
5. Verify webhook delivery is working
If your deploy touches any webhook-related code (event handlers, webhook endpoint routes, signature validation), run the Webhook Checker to confirm current webhook delivery is healthy before you change anything.
Phase 2: During Deploy Checklist
1. Enable enhanced monitoring for the deploy window
In PingSLA, temporarily reduce your critical monitor check intervals to 30 seconds (if available on your plan) during the deploy window. More frequent checks mean faster detection if something goes wrong.
2. Watch flow monitor results in real time
Open the PingSLA dashboard and keep the monitor results panel visible during the deploy. You want live feedback on whether login and checkout flows are passing as your deploy propagates.
3. Set up a temporary WhatsApp alert for the window
If your normal alert channel is email, temporarily add a WhatsApp alert for the 30-minute deploy window. You want instant notification during the period of highest failure risk.
4. Note the exact deploy completion time
Record the exact timestamp when the deploy completes. This is your T+0 reference for evaluating any post-deploy monitoring changes.
Phase 3: Post-Deploy Checklist
Run these checks immediately after deploy completes and repeat at T+15 minutes.
1. Run the Latency Detector — compare to pre-deploy baseline
TTFB should be within 10% of your pre-deploy baseline. A significant increase in specific regions indicates CDN cache invalidation in progress (usually resolves in 5–15 minutes) or a performance regression in your application code.
2. Run the Checkout Defender — must pass from all regions
This is the non-negotiable check. A checkout that fails post-deploy is an emergency, regardless of how green every other metric looks. Go to pingsla.com/tools/checkout-defender and verify the payment flow passes from at least 3 regions, including mobile viewport.
3. Run the Login Validator
Authentication failures post-deploy typically indicate a session token format change, cookie configuration regression, or OAuth callback URL mismatch. Run the Login Validator to confirm login is working from multiple regions.
4. Run the Schema Validator on your primary API
If your deploy touches API response formats, run the Schema Validator to confirm the response structure matches your expected schema. This catches accidental breaking changes before client applications fail.
5. Check error rate in logs for 15 minutes
Monitor your application error rate for 15 minutes post-deploy. A normal post-deploy error rate should return to baseline within 5 minutes (cache warm-up). Sustained error rate elevation beyond 5 minutes warrants investigation.
6. Verify SSL certificate is still valid
Occasionally, deploy processes involving load balancer or CDN reconfiguration reset SSL certificates. Run a quick SSL check at T+15 to confirm the certificate is still valid and the chain is intact.
Automate the Post-Deploy Check with GitHub Actions
The most reliable version of this checklist is one that runs automatically on every deploy:
# .github/workflows/post-deploy-check.yml
name: Post-Deploy Health Check
on:
deployment_status:
jobs:
health-check:
if: github.event.deployment_status.state == 'success'
runs-on: ubuntu-latest
steps:
- name: Checkout flow check
run: |
curl -s -o /dev/null -w "%{http_code}" \
"https://tools.pingsla.com/api/tools/checkout/test" \
-H "Content-Type: application/json" \
-d '{"url":"${{ secrets.PRODUCTION_CHECKOUT_URL }}"}'
- name: Latency check
run: |
RESULT=$(curl -s "https://tools.pingsla.com/api/tools/latency/detect" \
-H "Content-Type: application/json" \
-d '{"url":"${{ secrets.PRODUCTION_URL }}"}')
echo "Latency check result: $RESULT"
- name: Notify on failure
if: failure()
run: |
curl -X POST "${{ secrets.PINGSLA_WEBHOOK_URL }}" \
-d '{"message":"Post-deploy health check failed for ${{ github.event.deployment.environment }}"}'
This GitHub Actions workflow automatically runs the health checks after every successful deployment. If the checkout or latency check fails, it fires a webhook to PingSLA for immediate alerting.
What to Monitor | How Often | Alert Threshold
| Monitor Type | Check Interval | Alert Threshold | Priority |
|---|---|---|---|
| Checkout flow (all regions) | 1 minute | Any failure | Critical |
| Login flow | 1 minute | Any failure | Critical |
| Primary API health | 30 seconds | >500ms or non-200 | Critical |
| SSL certificate expiry | Daily | <30 days = warn, <7 = critical | High |
| TTFB baseline (key pages) | 5 minutes | >30% increase vs baseline | Medium |
| API schema validation | 5 minutes | Any schema mismatch | Medium |
| Webhook delivery | 10 minutes | Any delivery failure | Medium |
| DNS records | Hourly | Any change | Medium |
- How do I run a PingSLA API check as part of my CI/CD pipeline?
- PingSLA's API supports programmatic triggering of tool checks via REST endpoints. The free tool endpoints at tools.pingsla.com/api/tools/* can be called from any CI/CD environment with a standard HTTP request. For continuous monitoring with alert integration, use PingSLA's authenticated API (available on Starter plan and above) which supports webhook callbacks when checks fail.
- What is the most commonly missed post-deploy check?
- The mobile viewport checkout check. Teams consistently test their checkout on desktop browsers and miss the mobile-specific failures that affect 60-70% of their traffic. Adding a mobile viewport test to the post-deploy checklist — specifically checking that the payment widget renders on a 375px viewport — catches the failures that cost the most revenue.
- How do I prevent a bad deploy from affecting users if the post-deploy check fails?
- Automated rollback triggered by a failed post-deploy check is the gold standard. In GitHub Actions, you can trigger a rollback workflow when a health check step fails. In PingSLA, configure a webhook that triggers your deployment platform's rollback API when a critical monitor fails within 5 minutes of a deploy. This reduces the blast radius of a bad deploy from hours (when discovered by customers) to minutes (when caught by automated monitoring).
Run all 5 pre-deploy checks free — no signup required
Try Free Tools →Monitor your site from 15 real global locations →
Start Free →