Monitoring & Telemetry
Purpose
The monitor runs every 60 seconds inside the gateway process. It collects server-level and per-slice telemetry, performs health checks, auto-restarts crashed containers, manages the slice pool (assigns waiting users and provisions new slices), and prunes old telemetry data.
Architecture
Monitor Loop (every 60 seconds)
- Server telemetry — Collects RAM, swap, CPU, disk from system commands. Stores in
gateway.server_telemetry. - Slice pool management — Assigns available slices to users without one. If available count < 1, auto-provisions a new slice.
- Per-slice health checks — For each assigned/available slice:
- Checks Docker container is running (auto-restarts if not)
- Calls
/api/healthand/api/status/stateon the slice - Collects RAM + CPU from bulk
docker stats --no-stream - Collects disk usage from
du -sb /data/slices/{N}/ - Stores in
gateway.telemetry
Data Collection Methods
| Metric | Source | Frequency |
|---|---|---|
| Server RAM | free -b |
60s |
| Server swap | free -b |
60s |
| Server CPU | top -bn1 |
60s |
| Server disk | df -B1 / |
60s |
| Container RAM | docker stats --no-stream (bulk) |
60s |
| Container CPU | docker stats --no-stream (bulk) |
60s |
| Slice disk | du -sb /data/slices/{N}/ |
60s |
| WhatsApp state | curl /api/status/state |
60s |
| Slice health | curl /api/health |
60s |
Auto-Recovery
| Condition | Action |
|---|---|
| Container not running | docker start wank-slice-{N} |
| User has no slice + one available | Automatic assignment |
| Available slices < 1 | Auto-provision new slice |
Telemetry Retention
- Pruned every 1 hour
- Both
gateway.telemetryandgateway.server_telemetrykept for 7 days
Startup Timing
- Monitor starts 30 seconds after gateway boot (let services settle)
- Then runs every 60 seconds
- Telemetry prune runs every 1 hour
Status
Active and collecting data continuously since deployment. Telemetry feeds the admin dashboard sparklines and capacity planning panel.