Skip to content
Tutorial

502/504 Errors on VPS: Where to Look First (Nginx, App, or Database?)

A practical decision tree to identify whether gateway errors come from reverse proxy, application runtime, or database pressure.

Published:
Data notes

502/504 Errors on VPS: Where to Look First (Nginx, App, or Database?)

502 and 504 errors are often treated as “Nginx problems.” In reality, Nginx is usually where failure becomes visible, not where it starts.

To recover quickly, use a fault-domain sequence instead of random restarts.

Fast fault-domain decision tree

  1. Can Nginx reach upstream process at all?
  2. Is upstream process alive but slow?
  3. Is upstream blocked on dependency (DB/cache/API)?

This sequence avoids expensive guesswork.

Nginx checks (2 minutes)

Review:

  • error log timestamps around spike window
  • upstream connection/refused/timeouts
  • recent config or deploy changes

If errors show “connect() failed,” suspect app process or socket path. If errors show “upstream timed out,” app or dependency latency is likely.

App runtime checks (5 minutes)

Inspect:

  • process liveness and restart loops
  • thread/worker saturation
  • queue backlog
  • GC pauses (language-dependent)

A healthy process list with unhealthy response times often means dependency wait, not compute shortage.

Dependency checks (DB/cache/external API)

Measure:

  • DB connection pool exhaustion
  • slow query spikes
  • cache timeout rates
  • third-party API latency/error rate

Gateway errors can cascade from one slow downstream system.

Stabilization order

When user impact is active:

  1. protect critical endpoints
  2. reduce expensive background load
  3. increase timeout only if root cause is being fixed (not as permanent band-aid)
  4. rollback recent risky change when evidence supports it

Blindly increasing all timeouts can turn fast-fail incidents into slow-fail incidents.

Prevent repeat incidents

  • Define SLO-based alerts for upstream latency, not only status codes.
  • Track deployment annotations in dashboards.
  • Keep runbooks per service with known bottleneck patterns.

You want responders to choose from known failure classes, not invent new process during outage.

Final takeaway

502/504 troubleshooting is fastest when you treat proxy, app, and dependency as separate layers. The error code is only the symptom; recovery speed depends on how quickly you isolate the true layer under stress.

Next steps

Jump into tools and related pages while the context is fresh.

Ready to choose your VPS?

Use our VPS Finder to filter, compare, and find the perfect plan for your needs.