DocHub
60-second health loop with auto-recovery, per-slice and server telemetry, and 7-day retention

Monitoring & Telemetry

Purpose

The monitor runs every 60 seconds inside the gateway process. It collects server-level and per-slice telemetry, performs health checks, auto-restarts crashed containers, manages the slice pool (assigns waiting users and provisions new slices), and prunes old telemetry data.

Architecture

Monitor Loop (every 60 seconds)

  1. Server telemetry — Collects RAM, swap, CPU, disk from system commands. Stores in gateway.server_telemetry.
  2. Slice pool management — Assigns available slices to users without one. If available count < 1, auto-provisions a new slice.
  3. Per-slice health checks — For each assigned/available slice:
    • Checks Docker container is running (auto-restarts if not)
    • Calls /api/health and /api/status/state on the slice
    • Collects RAM + CPU from bulk docker stats --no-stream
    • Collects disk usage from du -sb /data/slices/{N}/
    • Stores in gateway.telemetry

Data Collection Methods

Metric Source Frequency
Server RAM free -b 60s
Server swap free -b 60s
Server CPU top -bn1 60s
Server disk df -B1 / 60s
Container RAM docker stats --no-stream (bulk) 60s
Container CPU docker stats --no-stream (bulk) 60s
Slice disk du -sb /data/slices/{N}/ 60s
WhatsApp state curl /api/status/state 60s
Slice health curl /api/health 60s

Auto-Recovery

Condition Action
Container not running docker start wank-slice-{N}
User has no slice + one available Automatic assignment
Available slices < 1 Auto-provision new slice

Telemetry Retention

  • Pruned every 1 hour
  • Both gateway.telemetry and gateway.server_telemetry kept for 7 days

Startup Timing

  • Monitor starts 30 seconds after gateway boot (let services settle)
  • Then runs every 60 seconds
  • Telemetry prune runs every 1 hour

Status

Active and collecting data continuously since deployment. Telemetry feeds the admin dashboard sparklines and capacity planning panel.