DocHub
How the automated documentation validator works, what it checks, and what it produces

Daily Agent

Purpose

The daily agent (src/scripts/daily-agent.ts, 217 lines) scans all documentation manifests, identifies gaps and issues, and produces a JSON report. It optionally calls the Claude API for deeper analysis. It runs via cron at 3:00 AM UTC.

What It Checks

Schema Coverage

For each subproject, it reads the schema_coverage map from _subproject.yaml and counts how many of the 8 sections are marked true. Missing purpose or architecture sections are flagged as high severity; all others are medium.

Empty Subprojects

Any subproject with zero .md files is flagged as high severity.

Broken Cross-References

For each document’s references array, the agent checks that the target subproject (format: project/subproject) actually exists. Broken references are medium severity.

What It Does NOT Check

  • Whether content actually covers the declared schema sections
  • Prose quality or completeness
  • Stale documents (noted in the interface but not currently implemented)
  • Markdown syntax errors

Report Structure

interface AgentReport {
  generated_at: string;
  summary: {
    total_projects: number;
    total_subprojects: number;
    total_documents: number;
    avg_schema_coverage: number;  // 0-100
  };
  projects: Array<{
    id: string;
    name: string;
    subprojects: Array<{
      id: string;
      name: string;
      doc_count: number;
      schema_coverage: Record<string, boolean>;
      coverage_pct: number;       // 0-100
    }>;
  }>;
  issues: Array<{
    type: 'missing_schema' | 'broken_reference' | 'stale_doc' | 'empty_subproject';
    severity: 'high' | 'medium' | 'low';
    location: string;             // project/subproject or project/subproject/page
    message: string;
  }>;
  recommendations: string[];
  ai_analysis?: string;           // Claude API response (if ANTHROPIC_API_KEY set)
}

Claude API Integration

If ANTHROPIC_API_KEY is set and there are documents to analyze, the agent calls Claude Sonnet with the full report JSON and asks for:

  1. Overall documentation health assessment
  2. Most critical gaps identified
  3. Priority recommendations

The response is stored in the ai_analysis field. The agent is read-only — it never modifies documentation, only reports on it.

Recommendations Engine

The agent generates simple rule-based recommendations:

  • Zero documents: “Start by adding content to the content/ directory”
  • Average coverage below 50%: “Prioritize filling in purpose and architecture sections”
  • High-severity issues exist: “{N} high-severity issues found. Address these first.”

Output

Reports are saved as reports/YYYY-MM-DD.json. The latest report is served by GET /api/report/latest.

Cron Execution

The daily run is orchestrated by scripts/daily-run.sh:

# 1. git pull origin main
# 2. npm ci --production
# 3. npx tsc
# 4. node dist/scripts/build-site.js     # rebuild manifests
# 5. node dist/scripts/daily-agent.js    # run validation
# 6. pm2 restart dochub                  # restart server

Cron entry (set up by deploy/setup-droplet.sh):

0 3 * * * cd /home/chas-watkins/code/DocHub && bash scripts/daily-run.sh >> /var/log/dochub/cron.log 2>&1

Variables

Variable Required Description
CONTENT_DIR No Path to content directory (default ./content)
ANTHROPIC_API_KEY No Enables Claude API analysis in reports