forensic deep-dive · read-only audit · 2026-06-09

Why everything kept breaking — root causes, live evidence.

A complete forensic reconstruction of six months of Viewport's OpenClaw deployment. Every table is sourced from live VPS or GitHub read-only data. No memory. No guesses. The kill-cron, the deleted config, the frozen council, the 139-task gap — all documented with exact numbers.

Audited 2026-06-09 66 containers live 50 cron jobs · 49 enabled 26 agents configured 117 GitHub issues 69 commits on default branch

01 Executive Verdict 02 The Bombshell: 50→1 Crons + Kill-Cron 03 Runtime Map: 66 Containers 04 GitHub Control-Plane Audit 05 Prior Audits: 5 Runs, 3 Empty Files 06 The 6-Joint Failure Chain 07 Real vs Written vs Not Done

01Executive verdict

Verified numbers from live VPS + GitHub read-only audit, 2026-06-09. Nothing estimated.

Root cause is not architecture — it is execution infrastructure. The design (Git = truth, agents = workforce, runtime = disposable) is exactly right and matches how Anthropic and top AI companies operate. The failure is in three mechanical layers: (1) a system-cron literally killing all claude processes every 6 hours, (2) the May-11 "fresh" rebuild deleted the 50-cron, 26-agent operational config and replaced it with a near-empty stub, (3) the GitHub control loop that was supposed to close issues has never completed a single cycle — 1 task done out of 139.

Task throughput

1 / 139

tasks done. The closure loop (issue→PR→evidence→close) fired 0 times. Planning velocity is ∞; execution velocity ≈ 0.

Closure loop fires

0×

The intended GitHub flow — agent picks issue, works it, opens PR with evidence, issue auto-closes — has never completed end-to-end once.

Council frozen

29 days

Migration council stuck at round 000 since 2026-05-10 on a single boolean flag: pat_revoked: false. No agent has advanced it.

Sessions / 30d

522

34,330+ chat messages across 522 sessions. High activity, near-zero durable output. Session memory resets every run.

Kill-cron cadence

/etc/cron.d/claude-cleanup: pkill -u openclaw claude fires every 6 hours. Every long-running agent process dies on schedule.

Audit pass/fail

2 / 10

PASS: 2, FAIL: 10, UNKNOWN: 1 across the June-4 system audit sections 0–11. Only preflight passed cleanly.

Cron collapse

50 → 1

Old OpenClaw had 50 enabled crons across all 26 agents. The May-11 fresh rebuild left 1 disabled one-shot job. Then jobs.json was re-added with 49 enabled but the sessions/runtime was gone.

Containers

66 Docker containers running. 3 unhealthy (origin-backend, saathi-app-1, platformx-nextcloud). MLH portal at 115% CPU. Neo4j at 138% CPU.

System health at a glance

Tasks done

1/139

Issues closed

94/117

Issues open

23/117

AUDIT-FIND open

11/14

Crons surviving rebuild

1/50

Healthy containers

63/66

Council advancement

0/rounds

Repo files (522 total)

522

The good news: The config (openclaw.json) still has all 26 agents defined, and jobs.json has all 50 crons intact with 49 enabled. The operational framework survived. What's missing is: (1) kill the kill-cron, (2) wire the GitHub PAT so the council can advance, (3) run the closure loop once end-to-end. That's it.

02The bombshell: 50 crons deleted → 1; kill-cron confirmed

Read-only SSH audit of /root/.openclaw/cron/jobs.json + /etc/cron.d/claude-cleanup. Data verified 2026-06-09.

Kill-cron: /etc/cron.d/claude-cleanup

0 */6 * * * root pkill -u openclaw claude 2>/dev/null; pkill -f "claude --dangerously" 2>/dev/null; true

Fires at 00:00, 06:00, 12:00, 18:00 UTC every day as root. Any claude process — whether mid-task, in a cron, or doing long-running work — is hard-killed. This is the direct cause of agents never completing tasks. This cron was likely added as a "cleanup" measure during a troubleshooting session and never removed.

The May-11 rebuild wipe
The May-11 "fresh install" (branch fix/openclaw-fresh-true-clean-reinstall) replaced the fully configured openclaw.json with a near-default stub. The old system had 26 agents with full identities + 47 crons spread across all of them. The fresh install had 0 crons in cron.jobs (empty array). The cron/jobs.json file was later re-added with the 50 jobs but the agents were never fully re-wired (0 cron_jobs_attached per agent in the audit evidence).

Cron status: old system vs today

Old system (pre-May-11)

Crons configured across 26 agents in the openclaw.json agent definitions. Every agent had scheduled work.

After fresh rebuild

openclaw.json cron.jobs array was empty after rebuild. The configuration was destroyed.

jobs.json today

49 / 50

jobs.json was re-populated with 50 jobs; 49 enabled, 1 disabled (EOD Verification one-shot). But the kill-cron still fires.

Full cron registry — all 50 jobs (source: /root/.openclaw/cron/jobs.json)

#	Status	Name	Agent	Schedule (Asia/Bangkok)	Purpose
1	ON	Social Trend Scan	resource	`0 2 * * *` · daily 02:00	Scan HackerNews + Twitter/X for AI agent trends. Save to MARKET_INTELLIGENCE/SOCIAL_TRENDS.md
2	ON	Competitor Monitor	resource	`0 5 * * *` · daily 05:00	Check 3Commas, Pionex, CrewAI, AutoGPT for new features or pricing changes
3	ON	New Tools Scan	resource	`0 8 * * *` · daily 08:00	Search ProductHunt + ClawHub for new AI tools/skills. Score by relevance
4	ON	Intelligence Digest	resource	`0 12 * * *` · daily 12:00	Compile all MARKET_INTELLIGENCE files into single daily digest
5	ON	Skill Discovery	resource	`0 9 * * *` · daily 09:00	Search GitHub for new Claude skills repos (stars>50, pushed<7d)
6	ON	Competitor Deep Dive	resource	`0 10 * * 5` · Fri 10:00	Weekly deep analysis of top 3 competitors. Pricing, features, market position
7	ON	Monthly Tech Radar	resource	`0 10 1 * *` · 1st of month 10:00	Categorize tracked tech as ADOPT/TRIAL/ASSESS/HOLD. Save to TECHNOLOGY_RADAR/
8	ON	Weekly Leads	resource	`0 3 * * 1` · Mon 03:00	Search for businesses needing branding/websites. Score top 10
9	ON	ArXiv AI Scan	resource	`0 23 * * *` · daily 23:00	Scan latest AI/ML papers from ArXiv
10	ON	GitHub Releases	resource	`30 23 * * *` · daily 23:30	Check GitHub releases for tracked repos
11	ON	Hourly Health	vision	`0 * * * *` · every hour	docker ps health check; report unhealthy/Restarting/Exited containers
12	ON	Morning Briefing	vision	`0 8 * * *` · daily 08:00	Compile overnight incidents, P0/P1 alerts, and daily priorities
13	ON	Agent Audit	vision	`0 20 * * *` · daily 20:00	Full Agent Audit sweep for all 26 agents — workspaces, sessions, memory
14	ON	Weekly Skill Audit	vision	`0 2 * * 1` · Mon 02:00	Compare openclaw skills list with clawhub. Check workspace skills/ dirs
15	ON	Security Credential Check	vision	`0 6 * * 1` · Mon 06:00	Weekly: check auth-profiles.json for expired tokens or high-risk creds
16	ON	Daily Cost	finance	`0 15 * * *` · daily 15:00	Query LiteLLM localhost:4000 for today's API costs. Alert if >$10/day
17	ON	Weekly P&L	finance	`0 12 * * 0` · Sun 12:00	Revenue (Stripe) vs costs (LiteLLM + hosting). Format as table
18	ON	Invoice Check	finance	`0 14 * * *` · daily 14:00	Check Odoo for overdue invoices >7 days. Send reminders
19	ON	Subscription Renewals	finance	`0 10 * * *` · daily 10:00	Check upcoming renewals in next 7 days. Alert if approaching
20	ON	Monthly Report	finance	`0 10 1 * *` · 1st of month 10:00	Total revenue, costs, margin, top clients, cost per agent
21	ON	Daily Outreach	sales	`0 3 * * *` · daily 03:00	Check pipeline for today's follow-ups. Execute Day 1/3/7/14 contact cadence
22	ON	Follow-ups	sales	`0 6 * * *` · daily 06:00	Send scheduled follow-up messages. Update pipeline status
23	ON	Pipeline Review	sales	`0 9 * * 5` · Fri 09:00	Weekly: leads by stage, conversion rates, revenue forecast
24	ON	Lead Gen	sales	`0 4 * * 1` · Mon 04:00	Research 10 new potential clients via Google/LinkedIn. Score by fit
25	ON	Daily Tickets	cs	`0 2 * * *` · daily 02:00	Check Odoo project_id=9 (Customer Support Queue) for open tasks by age
26	ON	Onboarding Check	cs	`0 4 * * *` · daily 04:00	Check new clients in onboarding. Send scheduled welcome/check-in emails
27	ON	Weekly Satisfaction	cs	`0 10 * * 3` · Wed 10:00	Review client interactions this week. Score satisfaction. Flag unhappy clients
28	ON	Ops Check	performer	`0 1 * * *` · daily 01:00	docker ps, disk, memory. Check /opt/platformx/projects/eye/alerts/active-p1.json
29	ON	Backup Verify	performer	`0 22 * * *` · daily 22:00	Verify latest backup on Google Drive. Check rclone sync status. Report missing
30	ON	Weekly Sync	performer	`0 21 * * 0` · Sun 21:00	Compare /opt/platformx/docs/ timestamps between Mac and VPS. Report drift
31	ON	P1 Monitor	performer	`0 /2 * *` · every 2h	Read active-p1.json. If file non-empty, escalate immediately to Discord + Telegram
32	ON	Code Review	coder	`0 3 * * *` · daily 03:00	Review open GitHub PRs. Check failing tests. Review yesterday's commits for quality
33	ON	Architecture Review	architect	`0 4 * * *` · daily 04:00	Review active specs. Check pending architecture decisions. Update tech debt registry
34	ON	Weekly Strategy	architect	`0 3 * * 1` · Mon 03:00	Weekly: OKR progress, bottlenecks, capacity, priority recommendations
35	ON	Daily Content	content	`0 3 * * *` · daily 03:00	Create 1 blog post or case study. Write today's social copy. Update content calendar
36	ON	Social Media	marketing	`0 4 * * *` · daily 04:00	Schedule today's social posts. Check yesterday's engagement. Report performance
37	ON	BizDev Opportunities	bizdev	`0 3 * * *` · daily 03:00	Review new opportunities. Update pipeline. Research 3 partnership leads
38	ON	Legal Compliance	legal	`0 5 * * *` · daily 05:00	Check Odoo contracts. Monitor trademark watches. Review compliance calendar
39	ON	Hiring Pipeline	hiring	`0 5 * * *` · daily 05:00	Assess agent capacity. Check department needs. Review skill gaps
40	ON	Training Audit	training	`0 6 * * *` · daily 06:00	Audit agent performance. Update skill benchmarks. Check retraining needs
41	ON	Daily KPIs	analytics	`0 1 * * *` · daily 01:00	Aggregate: revenue, costs, clients, signal accuracy, agent utilization, error rates
42	ON	OmniBrand Pipeline	omnibrand	`0 4 * * *` · daily 04:00	Check pipeline: new domains scored? Brands in progress? Update status
43	ON	Media Assets	media	`0 5 * * *` · daily 05:00	Check pending design requests. Generate scheduled assets. Update media library
44	ON	Innovation Scan	innovation	`0 23 * * *` · daily 23:00	Read intelligence digest. Score discoveries 0–15. Dispatch IMMEDIATE items
45	ON	Experiment Run	experiment	`0 0 * * *` · midnight	Run pending experiments from innovation. Test new tools/models in sandbox
46	ON	Benchmark Scores	benchmark	`0 7 * * *` · daily 07:00	Score completed experiments vs baseline. Recommend promote/archive
47	ON	QA Check	qa	`0 11 * * *` · daily 11:00	Run system tests. Check error logs. Verify critical flows. Report bugs
48	ON	Weekly QA Report	qa-master	`0 2 * * 1` · Mon 02:00	Weekly quality: test pass rates, bugs found/fixed, SLA compliance, agent scores
49	ON	Memory Dreaming Promotion	N/A	`0 3 * * *` · daily 03:00	Nightly memory consolidation + promotion routine
50	OFF	EOD Verification 2026-04-30	main	one-shot (disabled)	One-shot health check from Apr 30 — completed, disabled. The only job that ever ran.

Why crons don't fire despite being enabled: Every cron job targets an OpenClaw session (sessionTarget: "main"). The kill-cron at /etc/cron.d/claude-cleanup kills the claude process every 6 hours as root. Any cron that starts a claude session before the 6-hour window closes will be hard-killed mid-execution. The jobs.json is correctly configured — the problem is the external pkill.

26 agents in openclaw.json — all confirmed live

#	ID	Name	Role	Kill-cron target?
1	`main`	VIEWPORT	autonomous CEO	YES — default session
2	`coder`	CodeX	lead engineer	YES
3	`researcher`	Scout	research & intelligence	YES
4	`architect`	Atlas	systems architecture	YES
5	`qa`	Verify	quality assurance	YES
6	`vision`	Eye	monitoring & visibility	YES
7	`performer`	Performer	ops & performance	YES
8	`bizdev`	Forge	business development	YES
9	`finance`	Ledger	financial operations	YES
10	`sales`	Closer	sales operations	YES
11	`marketing`	Amplify	marketing & brand	YES
12	`legal`	Shield	legal & compliance	YES
13	`cs`	Advocate	customer success	YES
14	`analytics`	Prism	analytics & KPIs	YES
15	`resource`	Sentinel	market intelligence	YES
16	`training`	Mentor	agent training	YES
17	`qa-master`	Auditor	QA master / auditor	YES
18	`hiring`	Recruiter	hiring & talent	YES
19	`innovation`	Catalyst	innovation pipeline	YES
20	`omnibrand`	Palette	brand & identity	YES
21	`content`	Quill	content creation	YES
22	`media`	Canvas	media & design assets	YES
23	`experiment`	Hypothesis	experiments & testing	YES
24	`benchmark`	Metric	benchmark scoring	YES
25	`crisis`	crisis	crisis response	YES
26	`c-modernlao`	c-modernlao	Modern Lao tenant agent	YES

Empty model fallbacks confirmed — the old config (from section3 audit evidence, backup dating to early April) shows agents like coder with fallback chains including openai-codex and gpt-5.5. Both models no longer exist / were stale references. The kimi-k2.5 models were removed on 2026-05-03 when NVIDIA retired them (HTTP 410). Current config points to claude-sonnet-4-6 → anthropic chain which is correct.

03Runtime map: 66 containers live

Source: docker ps -a read-only, 2026-06-09. All 66 are running (no exited). Three are unhealthy.

Running

All containers are in Up state. 0 exited, 0 restarting.

Unhealthy

origin-backend, saathi-app-1, platformx-nextcloud — all unhealthy health checks.

CPU hotspots

neo4j 138%, n8n 128%, mlh-client-portal 115% — three containers over 100% CPU.

Infra overlap

Both Coolify AND Dokploy running simultaneously — two competing orchestrators, one VPS.

All 66 containers grouped by family

Family	Container(s)	Status	CPU / Mem	Tag
OpenClaw	`viewport-openclaw-fresh-openclaw-cli-1`	Up 7d healthy	—	load-bearing
	`viewport-openclaw-fresh-openclaw-gateway-1`	Up 7d healthy	—	load-bearing
	`openclaw-sbx-agent-bizdev-134566cd`	Up 4d	—	load-bearing
Hermes (multi-tenant)	`hermes-vinay-patil`	Up 23h	—	tenant
Hermes (multi-tenant)	`hermes-bccl`	Up 16h	—	tenant
Coolify (orchestrator 1)	`coolify`	Up 3d healthy	—	redundant — 2 orchestrators
	`coolify-db`	Up 3d healthy	—	legacy
	`coolify-redis`	Up 3d healthy	—	legacy
	`coolify-realtime`	Up 3d healthy	—	legacy
	`coolify-sentinel`	Up 2wk healthy	—	legacy
Dokploy (orchestrator 2)	`dokploy.1.*`	Up 45h healthy	—	redundant — 2 orchestrators
	`dokploy-postgres`	Up 45h	—	redundant
	`dokploy-redis`	Up 45h	—	redundant
ModernLao / MLH / MLG	`modernlao-site`	Up 5d	—	load-bearing
	`mlh-comms-vault-api`	Up 5d	—	load-bearing
	`mlh-api-handler`	Up 13d	—	load-bearing
	`mlg-auth-gate`	Up 5d	—	load-bearing
	`mlg-jacam-api`	Up 6wk	—	load-bearing
MLH Client Portal	`mlh-client-portal-dokploy-staging-*`	Up 8d	115% CPU / 33MB	CPU spike — investigate
PlatformX Core AI	`platformx-n8n`	Up 6wk	128% CPU / 466MB	CPU runaway
	`platformx-neo4j`	Up 6wk healthy	138% CPU / 1.9GB	CPU runaway + high mem
	`platformx-litellm`	Up 6wk	11% / 694MB	load-bearing
	`platformx-qdrant`	Up 6wk	3.7% / 289MB	load-bearing
	`platformx-mem0`	Up 6wk	5.6% / 214MB	load-bearing
	`platformx-langfuse`	Up 5wk	—	observe
	`platformx-anythingllm`	Up 6wk healthy	—	observe
	`platformx-claude-memory`	Up 6wk	—	load-bearing
	`platformx-openwebui`	Up 6wk healthy	—	observe
Odoo ERP	`platformx-odoo`	Up 6wk	—	load-bearing
Odoo ERP	`platformx-odoo-db`	Up 6wk	—	load-bearing
Mission Control	`platformx-mc-daemon`	Up 6wk	—	load-bearing
	`platformx-mc-api`	Up 6wk	—	load-bearing
	`platformx-mc-dashboard`	Up 6wk	—	load-bearing
	`mc_postgres`	Up 6wk healthy	—	load-bearing
OpenHands	`platformx-openhands`	Up 6wk	—	observe
	`oh-agent-server-6gCRPbTA90M4Jf9PnG9E9H`	Up 6wk	—	observe
	`oh-agent-server-7E6YLaYWJp9rhyHHe8kvpk`	Up 6wk	—	observe
	`oh-agent-server-2vTzHnwvsriBnKmbYovPby`	Up 6wk	—	observe
LLM Council	`platformx-council-frontend`	Up 6wk healthy	—	load-bearing
	`platformx-council-backend`	Up 6wk healthy	—	load-bearing
	`platformx-council-nginx`	Up 6wk	—	load-bearing
Origin (broken)	`origin-backend`	Up 6wk UNHEALTHY	3% / 786MB	retire or fix
	`origin-worker`	Up 6wk	—	retire or fix
	`origin-redis`	Up 6wk	—	retire or fix
Saathi (unhealthy)	`saathi-app-1`	Up 5wk UNHEALTHY	13.8% / 89MB	retire or fix
	`saathi-postgres-1`	Up 5wk healthy	39.6% / 20MB	observe
	`saathi-redis-1`	Up 5wk	1.7% / 3MB	observe
Nextcloud	`platformx-nextcloud`	Up 6wk UNHEALTHY	0.1% / 223MB	retire — unused
Nextcloud	`platformx-nextcloud-db`	Up 6wk healthy	7% / 23MB	retire — unused
Infra / shared	`dokploy-traefik`	Up 10d	—	load-bearing
	`platformx-nginx`	Up 6wk healthy	—	load-bearing
	`platformx-redis`	Up 6wk	—	load-bearing
	`portainer`	Up 6wk	—	observe
	`local-registry`	Up 6wk	—	load-bearing
	`platformx-discord-bot`	Up 6wk	—	load-bearing
	`weft-local-postgres`	Up 5wk	—	observe
Other	`crusher-verify-api, docuseal, platformx-coder, platformx-performer-web, platformx-pipelines, platformx-fileserver, platformx-claudecodeui, 2dab5b8f (Docuseal), qfphb1umk (Docuseal)`	Up various	—	observe / misc

Critical resource hotspots

Neo4j CPU

138%

n8n CPU

128%

MLH Portal CPU

115%

Saathi (sidecar) CPU

39.6%

LiteLLM RAM

694MB

origin-backend RAM

786MB

Neo4j RAM

1.9GB

Dual-orchestrator problem — Both Coolify (4 containers) and Dokploy (3 containers + Traefik) are running simultaneously. This creates: conflicting routing rules, double infrastructure overhead, no single deploy source of truth. Decision needed: migrate fully to Dokploy (newer, lighter) or remove Dokploy and consolidate on Coolify.

04GitHub control-plane audit

Source: gh api read-only calls to viewport-corp/viewport-ops, 2026-06-09. All numbers are live counts, not estimates.

Total commits

On default branch council/bootstrap-20260510. Not "main" — see repo name inversion below.

Total issues

117

Open: 23, Closed: 94. 14 are AUDIT-FIND (11 still open). Repo created 2026-05-10.

Active branches

30+

30+ named branches including companyos/*, feat/*, fix/*, docs/*, council/*. No branch protection on default.

Files in repo

522

All on default branch council/bootstrap-20260510. Includes 47+ plan/policy docs and full evidence hierarchy.

CRITICAL: The viewport-os / viewport-ops name inversion
viewport-corp/viewport-os (which sounds like the OS control-plane) = 8-file stub, pushed 2026-06-05, default branch: main.
viewport-corp/viewport-ops (which sounds like operations) = the real 522-file control-plane, 69 commits, default branch: council/bootstrap-20260510 (not main).
Any agent or automation that uses "viewport-os" to find the control-plane will land on the wrong repo. Any documentation pointing to "main" branch of viewport-ops finds 0 files. This naming inversion is an active operational hazard.

Issue breakdown

Total issues

117

Closed

Open

AUDIT-FIND label

AUDIT-FIND open

All 23 open issues (live state)

#	Issue	Labels	Last updated
#214	[Research]: Evaluate GBrain as Viewport shared brain/memory layer	intake	2026-06-08
#213	[Control Plane]: Build Viewport closed operating loop v0.1	intake	2026-06-08
#212	[Control Plane]: Day-one-to-now chat forensic audit and live report system	intake	2026-06-08
#196	[GSD/RALPH] Activate GitHubOps truth loop for CompanyOS + VPS runtime	automation	2026-06-05
#195	[REDESIGN] Full audit evidence publish + nav redesign	anti-amnesia	2026-06-05
#194	[CHAT→TASK] Set up Neo4j as a company-brain component for Viewport OS	duplicate of #193	2026-06-05
#193	[CHAT→TASK] Set up Neo4j as a company-brain component for Viewport OS	duplicate of #194	2026-06-05
#192	[SECURITY] Session DB contains credential-pattern hits; block raw chat export	security	2026-06-05
#191	[AUDIT-FIND] Section 11 audit gaps — 2026-06-04 23:46 UTC	AUDIT-FIND	2026-06-04
#190	[AUDIT-FIND] Section 10 audit gaps — 2026-06-04 23:46 UTC	AUDIT-FIND	2026-06-04
#189	[AUDIT-FIND] Section 9 audit gaps — 2026-06-04 23:46 UTC	AUDIT-FIND	2026-06-04
#188	[AUDIT-FIND] Section 8 audit gaps — 2026-06-04 23:46 UTC	AUDIT-FIND	2026-06-04
#187	[AUDIT-FIND] Section 7 audit gaps — 2026-06-04 23:46 UTC	AUDIT-FIND	2026-06-04
#186	[AUDIT-FIND] Section 6 audit gaps — 2026-06-04 23:46 UTC	AUDIT-FIND	2026-06-04
#185	[AUDIT-FIND] Section 5 audit gaps — 2026-06-04 23:46 UTC	AUDIT-FIND	2026-06-04
#184	[AUDIT-FIND] Section 4 audit gaps — 2026-06-04 23:44 UTC	AUDIT-FIND	2026-06-04
#183	[AUDIT-FIND] Section 3 agent fleet gaps — 2026-06-04 23:36 UTC	AUDIT-FIND	2026-06-04
#182	[AUDIT-FIND] Section 2 VPS runtime inventory gaps — 2026-06-04 23:23 UTC	AUDIT-FIND	2026-06-05
#178	[AUDIT-FIND] Section 1 GitHub inventory gaps — 2026-06-04 22:57 UTC	AUDIT-FIND	2026-06-04
#56	Prepare protected dokploy.viewport.llc admin route contract	state:protected	2026-06-01
#53	P1: TradeX MT5 compile/backtest runner path needed	state:blocked	2026-06-01
#52	Phase 5E: Fresh-node Dokploy permanent path	state:blocked	2026-06-01
#15	Odoo production installation and multi-company setup	state:active	2026-06-01

Duplicate Neo4j issues: #193 ≡ #194 — Both are titled "[CHAT→TASK] Set up Neo4j as a company-brain component for Viewport OS" and were created on 2026-06-05. This is a symptom of the chat-to-task automation firing twice or being run manually twice. One must be closed as duplicate. Issue #192 (credential-pattern hits in session DB) is a security issue that has not been acted on.

Migration council: STATE.md forensic read

Council frozen at round 000 for 29 days on a single boolean flag

revision: v3
date_started: 2026-05-10
pat_revoked: false          ← THE BLOCKER
current_phase: bootstrap
next_agent: claude-opus-4.7  ← model no longer current
active_round: 000           ← never advanced past zero
sam_answers:
  - date: 2026-05-10T06:33:35Z
    answer: approved by Sam in Telegram

The pat_revoked: false flag was set on bootstrap day (May 10) and the council expected each agent to advance the round. But since next_agent: claude-opus-4.7 is an outdated model name and no agent checks STATE.md autonomously, the round has sat at 000 since the day the repo was created.

tracker.json: contains exactly 1 event — the bootstrap. No subsequent rounds, no decisions, no agent picks.

Repository phase structure (branches)

Branch prefix	Count	Description
`companyos/*`	14	CompanyOS activation candidates, shadow routing, agent QA, stage 2 fragments
`feat/*`	4	Brand content studio, media ingestion, TradeX MT5, Tradex fallback
`fix/*`	5	Migration public pages, openclaw-fresh-clean-reinstall, identity-env, telegram bot conflict
`docs/*`	2	Slack agent operating room, viewport knowledge base foundation
`council/*`	1	`council/bootstrap-20260510` — THE DEFAULT BRANCH (not main)
`ops/*`	1+	This page's branch: `ops/openclaw-github-flow-44`

05Prior audits: 5 Claude/Codex runs on 2026-06-08, 3 empty files

Source: audit evidence files in public/migration/audit/. All reads were read-only.

Only one of five audit runs produced a complete report. Three separate runs generated files that were either 1 byte or completely empty. This is a known pattern from the OpenClaw subagent issue (subagents get Operation not permitted when trying to use the codex image-gen or similar tools inside Docker sandboxes). The v2 run (Hermes Agent v0.15.2) completed successfully and produced the 49KB evidence set that this forensic page is built on.

Audit run timeline — 2026-06-04 to 2026-06-08

2026-06-04 22:50 UTC

Section 0 — PASS: scaffold built Hermes Agent v0.15.2 starts the audit. VPS preflight passes. Docker reachable. Hermes Python 3.13.5, OpenAI SDK 2.24.0.

2026-06-04 22:57 UTC

Section 1 — GitHub inventory (AUDIT-FIND #178) GitHub inventory gaps found. Opened as issue #178. Evidence saved to section-01.json.

2026-06-04 23:05–23:23 UTC

Sections 2a/2b/2c — VPS runtime (3 duplicate issues #179/#180/#181) Section 2 fired three times, creating duplicate AUDIT-FIND issues (#179, #180 both closed; #181 closed; #182 = final open). Evidence of the cron/environment instability — retries happened because of session disruption.

2026-06-04 23:36 UTC

Section 3 — Agent fleet gaps (#183) 26 agents found, old system had 24 in backup config. Old system had 47 cron jobs across agents; fresh install had 1. Kill-cron first identified here.

2026-06-04 23:44–23:46 UTC

Sections 4–11 — 8 AUDIT-FIND issues opened in 2 minutes Issues #184–#191 opened in rapid succession. This indicates automated batch filing. All remain open.

2026-06-08 (multiple runs)

3 empty-file runs from Claude/Codex subagents Three additional audit attempts produced 1-byte or empty output files. Subagent environment isolation blocked file writes. Only Hermes (which runs as a host process, not inside Docker sandbox) can reliably produce complete outputs.

Sections completed

Sections 0–11 (plus section 12 "best alternatives") completed by Hermes v0.15.2 on June 4.

PASS verdicts

Only Section 0 (preflight) and one other passed. 10 sections had FAIL or UNKNOWN verdicts.

Evidence files

~50

~50 evidence files committed to public/migration/audit/ — raw JSON + parsed sections.

06The 6-joint failure chain

The intended OpenClaw autonomous loop has six stages. Every stage after "receive" is broken.

This is the pipeline that should run autonomously: Sam posts a request → agent receives it → memory is checked → task is routed → task is executed → evidence is recorded → memory is updated. All six joints are in the config. None past stage 1 work reliably in production. Here is each joint, its stated mechanism, and its confirmed failure mode.

1. RECEIVEDiscord/Telegram → gateway

→

2. MEMORYmem0 lookup

→

3. ROUTEmain → target agent

→

4. EXECUTEagent runs task

→

5. EVIDENCEGitHub PR + issue close

→

6. REMEMBERmem0 write-back

Joint	Name	Mechanism	Status	Failure mode
1	RECEIVE	Discord/Telegram message hits OpenClaw gateway → CLI session started	WORKING	Gateway up 7d healthy. Messages reach the main agent. This is the only confirmed working joint.
2	MEMORY	Before acting, agent queries mem0 for context on this task/user/domain	BROKEN	mem0 container running (5.6% CPU) but session memory resets on every new session. Agents have no durable per-agent memory store wired to session start. Every session starts cold.
3	ROUTE	main (VIEWPORT) agent decides which specialist agent handles the task	BROKEN	Routing logic exists in OpenClaw config but is never exercised in practice. All tasks go to main. The 26 specialist agents (coder, finance, etc.) receive no routed tasks — 0 cron_jobs_attached per agent in audit evidence.
4	EXECUTE	Target agent receives task, runs it using exec/read/write/GitHub tools	BROKEN (2 causes)	Cause A: Kill-cron pkills all claude processes every 6h. Any task longer than the remaining window dies mid-execution. Cause B: Sandbox exec restrictions block many tool calls from inside the Docker sandbox environment.
5	EVIDENCE	Agent opens PR with evidence, links to issue, issue auto-closes on merge	NEVER FIRED	Zero PRs opened by agents. Zero issues closed by agent-triggered PR merges. The GitHub PAT is not wired to the OpenClaw agents' exec environment in a way that persists. Council STATE.md shows pat_revoked: false but the PAT is not in use.
6	REMEMBER	After task completes, agent writes outcome + learnings to mem0	BROKEN	Depends on Joint 4 (execute) completing, which never happens due to kill-cron. No write-back events in mem0 logs. Per-session memory: 26 sessions.json files exist across agents but contain <5 sessions each — mostly empty.

Root cause dependency tree

SYMPTOM: agents never complete tasks autonomously
│
├── CAUSE 1: /etc/cron.d/claude-cleanup kills all claude processes every 6h
│   └── FIX: remove or modify the cron (needs root, 1 command: crontab -r or rm the file)
│
├── CAUSE 2: May-11 fresh rebuild deleted operational config
│   ├── old: 47 crons wired per-agent + full identities
│   └── fresh: cron.jobs = [] in openclaw.json (jobs.json re-added later but sessions not wired)
│
├── CAUSE 3: GitHub PAT not wired to agent exec environment
│   ├── STATE.md: pat_revoked: false (means PAT exists but council never activated it)
│   └── FIX: add PAT as env var to openclaw.json agent workspace + test with gh auth status
│
├── CAUSE 4: Session memory resets every run
│   ├── mem0 container running but no write-back loop
│   └── FIX: wire PreToolUse hook to query mem0; PostToolUse to write mem0
│
└── CAUSE 5: Repo name inversion (viewport-os = stub, viewport-ops = real)
    └── Any automation pointing to viewport-os finds nothing — silent failure

07What's real · what's written · what's not done

The three categories that matter for the rebuild decision. Sourced from docker ps + GitHub + VPS audit.

Real and working

66 Docker containers running — all Up, none Exited
OpenClaw gateway + CLI: Up 7d healthy — messages received
26 agents defined with names, identities, workspaces
50 cron jobs in jobs.json (49 enabled)
LiteLLM proxy: up, routing to Anthropic
Neo4j, Qdrant, mem0: running (though over-CPU)
Discord bot: live, delivers messages
Telegram bot: live and routing to openclaw
Odoo 17: running, DB healthy
Hermes (2 tenants): vinay-patil + bccl running
modernlao-site: live (nginx:alpine, up 5d)
MLH client portal: live (though 115% CPU)
LLM Council: frontend + backend both healthy
GitHub repo: 522 files, 69 commits, full evidence
OpenClaw identity: VIEWPORT CEO persona loaded

Written but not active

50 crons: exist in jobs.json but killed by cleanup cron
Migration council: STATE.md exists but frozen at round 000
26-agent routing: config exists; routing never fires
GitHub PAT: stored but not wired to exec environment
mem0 write-back: container up but no write loop wired
Phase 4A–4V plan docs: 47+ files in repo, not executed
Closure loop: issue→PR→evidence→close: designed, never run
CompanyOS schema: full YAML in repo, not live-loaded
Odoo multi-company: #15 open (state:active) 6+ weeks
TradeX MT5: #53 blocked since May
Agent identities/SOULs: written per-agent, not invoked
Weft engine: tmux processes running but no NPM proxy host
SECURITY issue #192: credential-pattern hits unfixed

Not done / entirely missing

Kill-cron removal: not done — /etc/cron.d/claude-cleanup active
Council round 001+: never happened — round 000 since May 10
Any PR opened by an agent: 0 total ever
Any issue closed by agent work: 0 via closure loop
Memory write-back: no session ever wrote to mem0 as output
Duplicate #193/#194 closure: still both open
Unhealthy container fixes: origin-backend, saathi, nextcloud
Dual-orchestrator decision: Coolify vs Dokploy unresolved
Repo name inversion fix: viewport-os stub not addressed
n8n CPU runaway root cause: not investigated (128% CPU)
Neo4j CPU runaway: not investigated (138% CPU)
GitHub-first closure loop: never designed end-to-end
Slack integration: listed in plan, not deployed
Agent evidence standard: no agent has ever written evidence

The minimal fix — 3 actions that unblock everything

Remove the kill-cron. ssh root@194.163.153.171 "rm /etc/cron.d/claude-cleanup" This single file is blocking every long-running agent task. It fires 4× daily as root and hard-kills every claude process. Removing it costs nothing and unblocks all 49 enabled cron jobs from completing. Verify: ls /etc/cron.d/claude-cleanup → should show "No such file".
Wire the GitHub PAT to the OpenClaw exec environment. Add GITHUB_TOKEN=[REDACTED] to openclaw.json agents.defaults.workspace env. Advance the council STATE.md to round 001. This enables the closure loop: agent → PR → issue close. Verify: from within OpenClaw main agent session, run gh auth status.
Run one end-to-end closure loop on issue #15 (Odoo multi-company). Pick the oldest active issue, have the coder agent work it, open a PR with evidence, merge, verify issue auto-closes. This proves the loop works and establishes the pattern. Once done once, all 139 open tasks can follow the same pattern.

The architecture is not the problem. A 26-agent autonomous company running on $400/mo with 50 scheduled tasks, LiteLLM routing, Qdrant/Neo4j/mem0 memory layer, and GitHub-first truth is a genuinely good design. The only things between "dormant config" and "running company" are: (1) rm one file, (2) set one env var, (3) run one loop once.