DocHub
Replaced standby pool of 3 slices with on-demand provisioning via POST /api/connect

On-Demand Slice Provisioning

Purpose

Replaced the pre-created standby pool of Docker slices (MIN_AVAILABLE=3) with on-demand provisioning. Slices are now created only when a user clicks “Connect WhatsApp” – no idle containers, no orphan cleanup issues, no wasted resources.

Why

The old approach pre-created 3 standby slices at all times via ensureSlicePool(). This caused:

  • Orphan containers consuming memory when not assigned
  • Stale state when containers sat idle then got assigned
  • Port conflicts and production incidents from containers on wrong ports
  • Unnecessary subscription-based cleanup code (cleanupExpiredSlices)
  • Docker container startup + Chrome is ~7 seconds, making the pool pointless

Architecture Change

Before:

  1. Gateway starts -> ensureSlicePool() creates 3 containers with status=‘available’
  2. User registers -> assignSliceToUser() picks an available slice
  3. Monitor runs ensureSlicePool() every health check cycle to maintain pool
  4. cleanupExpiredSlices() runs on timer to destroy expired subscription slices

After:

  1. User registers -> no slice created, slice_id = NULL
  2. User clicks “Connect WhatsApp” -> POST /api/connect (new gateway endpoint)
  3. Gateway provisions slice on-demand via provisionSlice(userId) (~7s)
  4. Slice starts, QR code flows to browser via SSE
  5. If QR times out (3 codes expire) -> slice auto-destroyed, user can try again

New Gateway Endpoint: POST /api/connect

File: gateway/src/connect.ts

  • Uses requireSession middleware (validates session cookie, no slice needed)
  • Per-user provisioning lock (in-memory Set) prevents double-click
  • If user already has a slice: checks if container is running, forwards initialize
  • If no slice: calls provisionSlice(userId), forwards initialize to new slice
  • Returns { status: 'provisioned' | 'initializing', port, sliceId }

Removed Code

Function File Reason
ensureSlicePool() provisioner.ts No standby pool needed
assignSliceToUser() provisioner.ts Slice assigned during provisioning
cleanupExpiredSlices() monitor.ts No subscription-based lifecycle
MIN_AVAILABLE constant provisioner.ts No pool size concept

Slice Lifecycle

No Slice -> [Connect click] -> Provisioning (~7s) -> QR Showing -> QR Scanned -> Loading Chats -> Ready
                                                      | (3 codes expire)
                                                    QR Timeout -> Slice Destroyed -> No Slice

Auto-Destruction on QR Timeout

When wa:loading_failed with reason whatsapp_logout or wa:error with code QR_TIMEOUT is received by the SSE connector:

  1. Event broadcast to browser (user sees timeout screen)
  2. 2-second delay
  3. User unlinked: UPDATE gateway.users SET slice_id = NULL
  4. destroySlice(sliceId) – stops container, drops DB, removes data
  5. Connector destroyed, removed from map

Startup Cleanup

verifyAvailablePristine() runs at gateway startup:

  1. Destroys all slices with status=‘available’ (should never exist)
  2. Lists running Docker containers matching wank-slice-*
  3. Compares against gateway.slices table
  4. Destroys any orphan containers not tracked in DB

Gateway Files Changed

File Changes
connect.ts New file – POST /api/connect handler
provisioner.ts Removed ensureSlicePool, assignSliceToUser, MIN_AVAILABLE
monitor.ts Removed ensureSlicePool call, cleanupExpiredSlices
auth.ts Registration no longer assigns a slice
proxy.ts Added requireSession middleware (session-only, no slice required)
sse.ts Handles no-slice SSE, QR_TIMEOUT destruction, wa:contact_phase relay
index.ts New /api/connect route before catch-all proxy

Status

Complete and deployed. All new users provision on-demand. No standby pool.