502/504 Errors on VPS: Where to Look First (Nginx, App, or Database?)
A practical decision tree to identify whether gateway errors come from reverse proxy, application runtime, or database pressure.
- Dataset size: 1,257 plans across 12 providers. Last checked: 2026-01-28.
- Change log updated: 2026-02-16 ( see updates).
- Latency snapshot: 2026-01-23 ( how tiers work).
- Benchmarks: 60 run(s) (retrieved: 2026-01-23). Benchmark your own VPS .
- Found an issue? Send a correction .
502/504 Errors on VPS: Where to Look First (Nginx, App, or Database?)
502 and 504 errors are often treated as “Nginx problems.” In reality, Nginx is usually where failure becomes visible, not where it starts.
To recover quickly, use a fault-domain sequence instead of random restarts.
Fast fault-domain decision tree
- Can Nginx reach upstream process at all?
- Is upstream process alive but slow?
- Is upstream blocked on dependency (DB/cache/API)?
This sequence avoids expensive guesswork.
Nginx checks (2 minutes)
Review:
- error log timestamps around spike window
- upstream connection/refused/timeouts
- recent config or deploy changes
If errors show “connect() failed,” suspect app process or socket path. If errors show “upstream timed out,” app or dependency latency is likely.
App runtime checks (5 minutes)
Inspect:
- process liveness and restart loops
- thread/worker saturation
- queue backlog
- GC pauses (language-dependent)
A healthy process list with unhealthy response times often means dependency wait, not compute shortage.
Dependency checks (DB/cache/external API)
Measure:
- DB connection pool exhaustion
- slow query spikes
- cache timeout rates
- third-party API latency/error rate
Gateway errors can cascade from one slow downstream system.
Stabilization order
When user impact is active:
- protect critical endpoints
- reduce expensive background load
- increase timeout only if root cause is being fixed (not as permanent band-aid)
- rollback recent risky change when evidence supports it
Blindly increasing all timeouts can turn fast-fail incidents into slow-fail incidents.
Prevent repeat incidents
- Define SLO-based alerts for upstream latency, not only status codes.
- Track deployment annotations in dashboards.
- Keep runbooks per service with known bottleneck patterns.
You want responders to choose from known failure classes, not invent new process during outage.
Final takeaway
502/504 troubleshooting is fastest when you treat proxy, app, and dependency as separate layers. The error code is only the symptom; recovery speed depends on how quickly you isolate the true layer under stress.