How the automated documentation validator works, what it checks, and what it produces

Daily Agent

Purpose

The daily agent (src/scripts/daily-agent.ts, 217 lines) scans all documentation manifests, identifies gaps and issues, and produces a JSON report. It optionally calls the Claude API for deeper analysis. It runs via cron at 3:00 AM UTC.

What It Checks

Schema Coverage

For each subproject, it reads the schema_coverage map from _subproject.yaml and counts how many of the 8 sections are marked true. Missing purpose or architecture sections are flagged as high severity; all others are medium.

Empty Subprojects

Any subproject with zero .md files is flagged as high severity.

Broken Cross-References

For each document’s references array, the agent checks that the target subproject (format: project/subproject) actually exists. Broken references are medium severity.

What It Does NOT Check

Whether content actually covers the declared schema sections
Prose quality or completeness
Stale documents (noted in the interface but not currently implemented)
Markdown syntax errors

Report Structure

interface AgentReport {
  generated_at: string;
  summary: {
    total_projects: number;
    total_subprojects: number;
    total_documents: number;
    avg_schema_coverage: number;  // 0-100
  };
  projects: Array<{
    id: string;
    name: string;
    subprojects: Array<{
      id: string;
      name: string;
      doc_count: number;
      schema_coverage: Record<string, boolean>;
      coverage_pct: number;       // 0-100
    }>;
  }>;
  issues: Array<{
    type: 'missing_schema' | 'broken_reference' | 'stale_doc' | 'empty_subproject';
    severity: 'high' | 'medium' | 'low';
    location: string;             // project/subproject or project/subproject/page
    message: string;
  }>;
  recommendations: string[];
  ai_analysis?: string;           // Claude API response (if ANTHROPIC_API_KEY set)
}

Claude API Integration

If ANTHROPIC_API_KEY is set and there are documents to analyze, the agent calls Claude Sonnet with the full report JSON and asks for:

Overall documentation health assessment
Most critical gaps identified
Priority recommendations

The response is stored in the ai_analysis field. The agent is read-only — it never modifies documentation, only reports on it.

Recommendations Engine

The agent generates simple rule-based recommendations:

Zero documents: “Start by adding content to the content/ directory”
Average coverage below 50%: “Prioritize filling in purpose and architecture sections”
High-severity issues exist: “{N} high-severity issues found. Address these first.”

Output

Reports are saved as reports/YYYY-MM-DD.json. The latest report is served by GET /api/report/latest.

Cron Execution

The daily run is orchestrated by scripts/daily-run.sh:

# 1. git pull origin main
# 2. npm ci --production
# 3. npx tsc
# 4. node dist/scripts/build-site.js     # rebuild manifests
# 5. node dist/scripts/daily-agent.js    # run validation
# 6. pm2 restart dochub                  # restart server

Cron entry (set up by deploy/setup-droplet.sh):

0 3 * * * cd /home/chas-watkins/code/DocHub && bash scripts/daily-run.sh >> /var/log/dochub/cron.log 2>&1

Variables

Variable	Required	Description
`CONTENT_DIR`	No	Path to content directory (default `./content`)
`ANTHROPIC_API_KEY`	No	Enables Claude API analysis in reports