VPS Not Responding? A 15-Minute Triage Checklist Before You Reboot
A practical first-response checklist to diagnose an unresponsive VPS without making recovery harder.
- Dataset size: 1,257 plans across 12 providers. Last checked: 2026-01-28.
- Change log updated: 2026-02-16 ( see updates).
- Latency snapshot: 2026-01-23 ( how tiers work).
- Benchmarks: 60 run(s) (retrieved: 2026-01-23). Benchmark your own VPS .
- Found an issue? Send a correction .
VPS Not Responding? A 15-Minute Triage Checklist Before You Reboot
When a VPS appears down, the fastest action often feels like a reboot. That works sometimes, but it can also destroy useful evidence and extend downtime if the root cause is still present after boot.
This checklist is built for the first 15 minutes of response, when clarity matters more than speed theater.
Minute 0-2: Confirm scope, not panic
Start by answering two questions:
- Is this host-level downtime or only one service path?
- Is impact global or region/user-segment specific?
Check an external probe first, then confirm from a second network. A local office DNS issue has fooled many teams into restarting healthy servers.
Minute 2-5: Test control plane access
Try access methods in this order:
- Provider console / serial console
- SSH from a known-good bastion
- Internal service checks from adjacent hosts (if available)
If SSH fails but provider console works, your issue is likely network, firewall, SSH daemon, or auth path. If even console is unstable, suspect host pressure or hypervisor-side events.
Minute 5-8: Capture quick host state
If you can reach the shell, record:
uptimefree -mdf -htop -b -n1 | head -40dmesg | tail -100
You are looking for obvious pressure signatures: full disk, swap thrash, runaway CPU, OOM kills, or filesystem errors.
Do not spend this window on deep forensics. Gather enough to decide the next safe move.
Minute 8-11: Check edge and routing basics
A surprising number of “dead server” incidents are path issues:
- Expired DNS records after migration
- Broken security group / ACL updates
- MTU mismatch after tunnel or provider changes
- Origin-only firewall rules accidentally blocking edge traffic
Validate the path from client edge to origin, not just process health on the box.
Minute 11-13: Decide stabilize vs reboot
Reboot only after one of these conditions is true:
- Kernel/host is clearly wedged and non-recoverable in place
- Service restart cannot proceed due to stuck system state
- You captured enough diagnostic context for postmortem
If you can safely restart only the impacted service, do that first.
Minute 13-15: Communicate and assign ownership
Before the next action, publish:
- Current impact statement
- What has been verified
- Next technical action
- Next update time
This keeps incident comms aligned and prevents duplicated blind actions.
Final note
The goal of early triage is not to be heroic. It is to avoid making a bad event worse. A structured 15-minute routine will outperform ad-hoc intuition almost every time.