On-Demand Slice Provisioning
Purpose
Replaced the pre-created standby pool of Docker slices (MIN_AVAILABLE=3) with on-demand provisioning. Slices are now created only when a user clicks “Connect WhatsApp” – no idle containers, no orphan cleanup issues, no wasted resources.
Why
The old approach pre-created 3 standby slices at all times via ensureSlicePool(). This caused:
- Orphan containers consuming memory when not assigned
- Stale state when containers sat idle then got assigned
- Port conflicts and production incidents from containers on wrong ports
- Unnecessary subscription-based cleanup code (
cleanupExpiredSlices) - Docker container startup + Chrome is ~7 seconds, making the pool pointless
Architecture Change
Before:
- Gateway starts ->
ensureSlicePool()creates 3 containers with status=‘available’ - User registers ->
assignSliceToUser()picks an available slice - Monitor runs
ensureSlicePool()every health check cycle to maintain pool cleanupExpiredSlices()runs on timer to destroy expired subscription slices
After:
- User registers -> no slice created,
slice_id = NULL - User clicks “Connect WhatsApp” ->
POST /api/connect(new gateway endpoint) - Gateway provisions slice on-demand via
provisionSlice(userId)(~7s) - Slice starts, QR code flows to browser via SSE
- If QR times out (3 codes expire) -> slice auto-destroyed, user can try again
New Gateway Endpoint: POST /api/connect
File: gateway/src/connect.ts
- Uses
requireSessionmiddleware (validates session cookie, no slice needed) - Per-user provisioning lock (in-memory Set) prevents double-click
- If user already has a slice: checks if container is running, forwards initialize
- If no slice: calls
provisionSlice(userId), forwards initialize to new slice - Returns
{ status: 'provisioned' | 'initializing', port, sliceId }
Removed Code
| Function | File | Reason |
|---|---|---|
ensureSlicePool() |
provisioner.ts | No standby pool needed |
assignSliceToUser() |
provisioner.ts | Slice assigned during provisioning |
cleanupExpiredSlices() |
monitor.ts | No subscription-based lifecycle |
MIN_AVAILABLE constant |
provisioner.ts | No pool size concept |
Slice Lifecycle
No Slice -> [Connect click] -> Provisioning (~7s) -> QR Showing -> QR Scanned -> Loading Chats -> Ready
| (3 codes expire)
QR Timeout -> Slice Destroyed -> No Slice
Auto-Destruction on QR Timeout
When wa:loading_failed with reason whatsapp_logout or wa:error with code QR_TIMEOUT is received by the SSE connector:
- Event broadcast to browser (user sees timeout screen)
- 2-second delay
- User unlinked:
UPDATE gateway.users SET slice_id = NULL destroySlice(sliceId)– stops container, drops DB, removes data- Connector destroyed, removed from map
Startup Cleanup
verifyAvailablePristine() runs at gateway startup:
- Destroys all slices with status=‘available’ (should never exist)
- Lists running Docker containers matching
wank-slice-* - Compares against
gateway.slicestable - Destroys any orphan containers not tracked in DB
Gateway Files Changed
| File | Changes |
|---|---|
| connect.ts | New file – POST /api/connect handler |
| provisioner.ts | Removed ensureSlicePool, assignSliceToUser, MIN_AVAILABLE |
| monitor.ts | Removed ensureSlicePool call, cleanupExpiredSlices |
| auth.ts | Registration no longer assigns a slice |
| proxy.ts | Added requireSession middleware (session-only, no slice required) |
| sse.ts | Handles no-slice SSE, QR_TIMEOUT destruction, wa:contact_phase relay |
| index.ts | New /api/connect route before catch-all proxy |
Status
Complete and deployed. All new users provision on-demand. No standby pool.