# Semantic Batching and Output Chunking Implementation Plan > **For agentic workers:** REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (`- [ ]`) syntax for tracking. **All dispatched subagents must use `model="opus"`** (project convention). **Goal:** Replace count-based file-analyzer batching with Louvain semantic batching (Phase 1.5), and add defensive output chunking in file-analyzer (60 nodes / 120 edges per part), so `/understand` stops hitting Bedrock OPUS output caps and produces better cross-batch semantic edge coverage. One PR. **Architecture:** Add `compute-batches.mjs` (Phase 1.5) which runs Louvain on the import graph from `scan-result.json` and writes `batches.json` containing pre-built `batchImportData` + `neighborMap` (paths + exported symbols). file-analyzer reads neighborMap to confidently emit cross-batch edges, and self-splits its output into `batch--part-.json` when above thresholds. `merge-batch-graphs.py` glob already accepts multi-part naming (no code change, only stderr report + missing-part warning). **Tech Stack:** Node.js ≥22 + pnpm ≥10, `graphology` + `graphology-communities-louvain` (new deps), `@understand-anything/core` TreeSitterPlugin (existing), Vitest for `.mjs` tests, Python `unittest` for `merge-batch-graphs.py` tests. **Source spec:** [`docs/superpowers/specs/2026-05-24-semantic-batching-and-output-chunking-design.md`](../specs/2026-05-24-semantic-batching-and-output-chunking-design.md) **Branch:** `feat/semantic-batching-and-output-chunking` (already created). --- ## File Structure **Create:** - `understand-anything-plugin/skills/understand/compute-batches.mjs` — Phase 1.5 script - `understand-anything-plugin/skills/understand/test_compute_batches.test.mjs` — Vitest unit tests - `understand-anything-plugin/skills/understand/test/fixtures/scan-result-3-cliques.json` — synthetic test fixture (3 disjoint import cliques) - `understand-anything-plugin/skills/understand/test/fixtures/scan-result-large-community.json` — synthetic test fixture (50-node complete graph) - `understand-anything-plugin/skills/understand/test/fixtures/scan-result-non-code.json` — synthetic test fixture (Dockerfile/CI/SQL groups) **Modify:** - `understand-anything-plugin/package.json` — add `graphology` + `graphology-communities-louvain` to `dependencies` - `understand-anything-plugin/skills/understand/SKILL.md` — insert Phase 1.5; replace Phase 2 ANALYZE batching prose; replace Incremental update path - `understand-anything-plugin/agents/file-analyzer.md` — add Cross-batch context (neighborMap) section; replace Writing Results with multi-part protocol - `understand-anything-plugin/skills/understand/merge-batch-graphs.py` — multi-part stderr summary + missing-part warning - `understand-anything-plugin/skills/understand/test_merge_batch_graphs.py` — new `TestMultiPart` class - `understand-anything-plugin/package.json`, `understand-anything-plugin/.claude-plugin/plugin.json`, `.claude-plugin/plugin.json`, `.cursor-plugin/plugin.json`, `.copilot-plugin/plugin.json` — version bump (Task 16) --- ## Task 1: Add graphology dependencies **Files:** - Modify: `understand-anything-plugin/package.json` - [ ] **Step 1: Add deps to package.json** Edit `understand-anything-plugin/package.json` `dependencies` block: ```json { "name": "@understand-anything/skill", "version": "2.7.4", "type": "module", "main": "dist/index.js", "types": "dist/index.d.ts", "scripts": { "build": "tsc", "test": "vitest run" }, "dependencies": { "@understand-anything/core": "workspace:*", "graphology": "^0.26.0", "graphology-communities-louvain": "^2.0.2" }, "devDependencies": { "@types/node": "^22.0.0", "typescript": "^5.7.0", "vitest": "^3.1.0" } } ``` - [ ] **Step 2: Install** Run from repo root: ```bash pnpm install ``` Expected: lockfile updates with graphology + graphology-communities-louvain; no other version churn. - [ ] **Step 3: Smoke test the imports work** Run from `understand-anything-plugin/`: ```bash node -e "import('graphology').then(m => { const G = m.default; const g = new G({type:'undirected'}); g.addNode('a'); g.addNode('b'); g.addEdge('a','b'); console.log('graphology ok, edges:', g.size); })" node -e "Promise.all([import('graphology'), import('graphology-communities-louvain')]).then(([G,L]) => { const g = new G.default({type:'undirected'}); ['a','b','c'].forEach(n => g.addNode(n)); g.addEdge('a','b'); g.addEdge('b','c'); console.log('louvain ok:', JSON.stringify(L.default(g))); })" ``` Expected: prints `graphology ok, edges: 1` and `louvain ok: {...}` with community ids assigned. - [ ] **Step 4: Commit** ```bash git add understand-anything-plugin/package.json pnpm-lock.yaml git commit -m "deps: add graphology + graphology-communities-louvain" ``` --- ## Task 2: Prototype compute-batches.mjs (load + Louvain print) This is the **feasibility prototype** — the spec gates the size-enforcement design on what real community sizes look like. Build the skeleton, then run it against a synthetic fixture (and optionally a real `scan-result.json` from this repo if one exists) before adding more code. **Files:** - Create: `understand-anything-plugin/skills/understand/compute-batches.mjs` - Create: `understand-anything-plugin/skills/understand/test/fixtures/scan-result-3-cliques.json` - [ ] **Step 1: Create test fixture (3 disjoint import cliques)** Create `understand-anything-plugin/skills/understand/test/fixtures/scan-result-3-cliques.json`: ```json { "name": "fixture-3-cliques", "description": "Three disjoint import cliques for Louvain testing", "languages": ["typescript"], "frameworks": [], "files": [ {"path": "src/auth/login.ts", "language": "typescript", "sizeLines": 50, "fileCategory": "code"}, {"path": "src/auth/session.ts", "language": "typescript", "sizeLines": 40, "fileCategory": "code"}, {"path": "src/auth/tokens.ts", "language": "typescript", "sizeLines": 60, "fileCategory": "code"}, {"path": "src/api/handlers.ts", "language": "typescript", "sizeLines": 80, "fileCategory": "code"}, {"path": "src/api/middleware.ts", "language": "typescript", "sizeLines": 30, "fileCategory": "code"}, {"path": "src/api/routes.ts", "language": "typescript", "sizeLines": 45, "fileCategory": "code"}, {"path": "src/db/users.ts", "language": "typescript", "sizeLines": 70, "fileCategory": "code"}, {"path": "src/db/queries.ts", "language": "typescript", "sizeLines": 55, "fileCategory": "code"}, {"path": "src/db/migrations.ts", "language": "typescript", "sizeLines": 35, "fileCategory": "code"} ], "totalFiles": 9, "filteredByIgnore": 0, "estimatedComplexity": "small", "importMap": { "src/auth/login.ts": ["src/auth/session.ts", "src/auth/tokens.ts"], "src/auth/session.ts": ["src/auth/tokens.ts"], "src/auth/tokens.ts": [], "src/api/handlers.ts": ["src/api/middleware.ts", "src/api/routes.ts"], "src/api/middleware.ts": ["src/api/routes.ts"], "src/api/routes.ts": [], "src/db/users.ts": ["src/db/queries.ts", "src/db/migrations.ts"], "src/db/queries.ts": ["src/db/migrations.ts"], "src/db/migrations.ts": [] } } ``` - [ ] **Step 2: Write skeleton compute-batches.mjs (Louvain only, no neighborMap, no exports, no fallback)** Create `understand-anything-plugin/skills/understand/compute-batches.mjs`: ```javascript #!/usr/bin/env node /** * compute-batches.mjs — Phase 1.5 of /understand * * Reads scan-result.json, runs Louvain community detection on the import * graph, and writes batches.json containing batches + neighborMap. * * Usage: * node compute-batches.mjs [--changed-files=] * * Input: /.understand-anything/intermediate/scan-result.json * Output: /.understand-anything/intermediate/batches.json */ import { readFileSync, writeFileSync, existsSync } from 'node:fs'; import { dirname, resolve, join } from 'node:path'; import { fileURLToPath } from 'node:url'; import Graph from 'graphology'; import louvain from 'graphology-communities-louvain'; // ── Skeleton main: load → Louvain → print sizes ─────────────────────────── async function main() { const projectRoot = process.argv[2]; if (!projectRoot) { process.stderr.write('Usage: node compute-batches.mjs [--changed-files=]\n'); process.exit(1); } const scanPath = join(projectRoot, '.understand-anything', 'intermediate', 'scan-result.json'); if (!existsSync(scanPath)) { process.stderr.write(`Error: scan-result.json not found at ${scanPath}\n`); process.exit(1); } const scan = JSON.parse(readFileSync(scanPath, 'utf-8')); const codeFiles = (scan.files || []).filter(f => f.fileCategory === 'code'); const importMap = scan.importMap || {}; process.stderr.write(`Loaded ${scan.files.length} files (${codeFiles.length} code).\n`); // Build undirected import graph const g = new Graph({ type: 'undirected', allowSelfLoops: false }); for (const f of codeFiles) g.addNode(f.path); for (const [src, targets] of Object.entries(importMap)) { if (!g.hasNode(src)) continue; for (const tgt of targets) { if (!g.hasNode(tgt) || src === tgt || g.hasEdge(src, tgt)) continue; g.addEdge(src, tgt); } } // Run Louvain const communities = louvain(g); // { nodeId: communityId } // Print size distribution const sizeByCommunity = new Map(); for (const [, cid] of Object.entries(communities)) { sizeByCommunity.set(cid, (sizeByCommunity.get(cid) || 0) + 1); } const sizes = [...sizeByCommunity.values()].sort((a, b) => b - a); process.stderr.write( `Louvain produced ${sizes.length} communities. Size distribution: [${sizes.join(', ')}]\n`, ); process.stderr.write( `Max community size: ${sizes[0] ?? 0}, min: ${sizes.at(-1) ?? 0}, ` + `>35: ${sizes.filter(s => s > 35).length}, <5: ${sizes.filter(s => s < 5).length}\n`, ); } // CLI entry guard (mirrors extract-structure.mjs pattern) import { realpathSync } from 'node:fs'; function isCliEntry() { if (!process.argv[1]) return false; try { return realpathSync(fileURLToPath(import.meta.url)) === realpathSync(process.argv[1]); } catch { return false; } } if (isCliEntry()) { try { await main(); } catch (err) { process.stderr.write(`compute-batches.mjs failed: ${err.message}\n${err.stack}\n`); process.exit(1); } } ``` - [ ] **Step 3: Run skeleton against the fixture** Create a temporary scratch directory with the fixture in the expected layout: ```bash mkdir -p /tmp/ua-prototype/.understand-anything/intermediate cp understand-anything-plugin/skills/understand/test/fixtures/scan-result-3-cliques.json \ /tmp/ua-prototype/.understand-anything/intermediate/scan-result.json node understand-anything-plugin/skills/understand/compute-batches.mjs /tmp/ua-prototype ``` Expected stderr: ``` Loaded 9 files (9 code). Louvain produced 3 communities. Size distribution: [3, 3, 3] Max community size: 3, min: 3, >35: 0, <5: 3 ``` (All 9 files split into 3 cliques of 3. All under min=5 — that's expected for the fixture; in the real plan we accept this and don't merge.) - [ ] **Step 4: (Optional) Run against this repo's scan-result.json if it exists** ```bash if [ -f .understand-anything/intermediate/scan-result.json ]; then node understand-anything-plugin/skills/understand/compute-batches.mjs "$(pwd)" else echo "No real scan-result.json — skipping (fixture run is sufficient for prototype)." fi ``` Record the output: if the real-repo run shows any community size > 35, implement edge-betweenness split in Task 4. Otherwise, Task 4 can be a minimal defensive WCC partition. - [ ] **Step 5: Commit skeleton** ```bash git add understand-anything-plugin/skills/understand/compute-batches.mjs \ understand-anything-plugin/skills/understand/test/fixtures/scan-result-3-cliques.json git commit -m "feat(compute-batches): skeleton — Louvain on import graph (prototype)" ``` --- ## Task 3: Write Vitest harness + first Louvain unit test **Files:** - Create: `understand-anything-plugin/skills/understand/test_compute_batches.test.mjs` - [ ] **Step 1: Write failing test (Louvain produces 3 batches for 3 cliques)** Create `understand-anything-plugin/skills/understand/test_compute_batches.test.mjs`: ```javascript import { describe, it, expect, beforeEach } from 'vitest'; import { mkdtempSync, mkdirSync, writeFileSync, readFileSync, rmSync } from 'node:fs'; import { tmpdir } from 'node:os'; import { join } from 'node:path'; import { spawnSync } from 'node:child_process'; import { fileURLToPath } from 'node:url'; import { dirname, resolve } from 'node:path'; const __dirname = dirname(fileURLToPath(import.meta.url)); const SCRIPT = resolve(__dirname, 'compute-batches.mjs'); const FIXTURES = resolve(__dirname, 'test/fixtures'); function runScript(projectRoot, extraArgs = []) { return spawnSync('node', [SCRIPT, projectRoot, ...extraArgs], { encoding: 'utf-8', }); } function setupProject(fixtureName) { const root = mkdtempSync(join(tmpdir(), 'ua-cb-test-')); mkdirSync(join(root, '.understand-anything', 'intermediate'), { recursive: true }); const fixturePath = join(FIXTURES, fixtureName); const dest = join(root, '.understand-anything', 'intermediate', 'scan-result.json'); writeFileSync(dest, readFileSync(fixturePath, 'utf-8')); return root; } function readBatches(projectRoot) { const p = join(projectRoot, '.understand-anything', 'intermediate', 'batches.json'); return JSON.parse(readFileSync(p, 'utf-8')); } describe('compute-batches.mjs — Louvain basic', () => { let projectRoot; beforeEach(() => { projectRoot = setupProject('scan-result-3-cliques.json'); }); it('produces 3 batches for 3 disjoint cliques', () => { const result = runScript(projectRoot); expect(result.status).toBe(0); const batches = readBatches(projectRoot); expect(batches.algorithm).toBe('louvain'); expect(batches.totalFiles).toBe(9); expect(batches.batches.length).toBe(3); // Each batch should contain exactly one clique (3 files) for (const b of batches.batches) { expect(b.files.length).toBe(3); const dirs = new Set(b.files.map(f => f.path.split('/')[1])); expect(dirs.size).toBe(1); // all files in the batch share src// } }); }); ``` - [ ] **Step 2: Run test, expect FAIL** ```bash pnpm --filter @understand-anything/skill exec vitest run skills/understand/test_compute_batches.test.mjs -t "Louvain basic" ``` Expected: FAIL — `compute-batches.mjs` skeleton from Task 2 only prints to stderr, doesn't write `batches.json`. Test fails on `readBatches` → ENOENT. - [ ] **Step 3: Make skeleton write batches.json** Replace the trailing `process.stderr.write(...)` lines in `compute-batches.mjs` `main()` with the full minimal-batches output. Replace lines starting from `// Print size distribution` to end of `main()`: ```javascript // Group files by community id, sorted by largest first for stable assignment const filesByCommunity = new Map(); for (const [path, cid] of Object.entries(communities)) { if (!filesByCommunity.has(cid)) filesByCommunity.set(cid, []); filesByCommunity.get(cid).push(path); } // Sort communities by size desc, then by min-path asc for determinism const sortedCommunities = [...filesByCommunity.entries()] .sort((a, b) => { if (b[1].length !== a[1].length) return b[1].length - a[1].length; const minA = [...a[1]].sort()[0]; const minB = [...b[1]].sort()[0]; return minA.localeCompare(minB); }); // Build per-batch file list with full file metadata from scan const fileMetaByPath = new Map(scan.files.map(f => [f.path, f])); const batches = sortedCommunities.map(([, paths], idx) => ({ batchIndex: idx + 1, files: paths.sort().map(p => fileMetaByPath.get(p)), batchImportData: {}, neighborMap: {}, })); const output = { schemaVersion: 1, algorithm: 'louvain', totalFiles: scan.files.length, totalBatches: batches.length, batches, }; const outPath = join(projectRoot, '.understand-anything', 'intermediate', 'batches.json'); writeFileSync(outPath, JSON.stringify(output, null, 2), 'utf-8'); process.stderr.write(`Wrote ${batches.length} batches to ${outPath}\n`); ``` - [ ] **Step 4: Run test, expect PASS** ```bash pnpm --filter @understand-anything/skill exec vitest run skills/understand/test_compute_batches.test.mjs -t "Louvain basic" ``` Expected: PASS. - [ ] **Step 5: Commit** ```bash git add understand-anything-plugin/skills/understand/compute-batches.mjs \ understand-anything-plugin/skills/understand/test_compute_batches.test.mjs git commit -m "feat(compute-batches): emit batches.json with code communities" ``` --- ## Task 4: Size enforcement — split oversized communities If the Task 2 prototype run showed any community > 35 files, implement edge-betweenness split. Otherwise, implement a minimal weakly-connected-component (WCC) split as a defensive guard. **Files:** - Modify: `understand-anything-plugin/skills/understand/compute-batches.mjs` - Modify: `understand-anything-plugin/skills/understand/test_compute_batches.test.mjs` - Create: `understand-anything-plugin/skills/understand/test/fixtures/scan-result-large-community.json` - [ ] **Step 1: Create large-community fixture (40-node complete graph in one community)** Create `understand-anything-plugin/skills/understand/test/fixtures/scan-result-large-community.json`. Build programmatically once and commit the JSON: ```bash node -e " const files = []; const importMap = {}; for (let i = 0; i < 40; i++) { const p = 'src/big/f' + i + '.ts'; files.push({ path: p, language: 'typescript', sizeLines: 50, fileCategory: 'code' }); importMap[p] = []; // Every file imports every other — guarantees a single community of 40 for (let j = 0; j < 40; j++) if (i !== j) importMap[p].push('src/big/f' + j + '.ts'); } const out = { name: 'fixture-large-community', description: '40 files all importing each other — one community over the max=35 cap', languages: ['typescript'], frameworks: [], files, totalFiles: 40, filteredByIgnore: 0, estimatedComplexity: 'moderate', importMap, }; console.log(JSON.stringify(out, null, 2)); " > understand-anything-plugin/skills/understand/test/fixtures/scan-result-large-community.json ``` - [ ] **Step 2: Write failing test (large community splits to ≤ 35)** Append to `test_compute_batches.test.mjs`: ```javascript describe('compute-batches.mjs — size enforcement', () => { it('splits a 40-node clique into batches ≤ 35', () => { const root = setupProject('scan-result-large-community.json'); const result = runScript(root); expect(result.status).toBe(0); const batches = readBatches(root); expect(batches.totalFiles).toBe(40); for (const b of batches.batches) { expect(b.files.length).toBeLessThanOrEqual(35); } // Sum of all batch file counts equals total files const sum = batches.batches.reduce((acc, b) => acc + b.files.length, 0); expect(sum).toBe(40); // Warning was emitted to stderr expect(result.stderr).toMatch(/Warning: compute-batches: community size 40 > max 35/); }); }); ``` - [ ] **Step 3: Run test, expect FAIL** ```bash pnpm --filter @understand-anything/skill exec vitest run skills/understand/test_compute_batches.test.mjs -t "size enforcement" ``` Expected: FAIL — at least one batch has 40 files; no warning emitted. - [ ] **Step 4: Implement WCC-style split + warning** In `compute-batches.mjs`, after the `const communities = louvain(g);` line and before grouping by community, insert size-enforcement logic. Replace the existing grouping block with: ```javascript // Group files by community id const filesByCommunity = new Map(); for (const [path, cid] of Object.entries(communities)) { if (!filesByCommunity.has(cid)) filesByCommunity.set(cid, []); filesByCommunity.get(cid).push(path); } // Size enforcement: split any community > MAX_COMMUNITY_SIZE. // Strategy: deterministic alphabetical chunking within the oversize community. // Edge-betweenness would be more modularity-aware but adds dependency surface; // alphabetical chunking is deterministic, locality-preserving for co-located // files, and bounded by the cap. Each sub-community gets a fresh synthetic id. const MAX_COMMUNITY_SIZE = 35; const splitCommunities = new Map(); let nextSyntheticId = 0; for (const [cid, paths] of filesByCommunity) { if (paths.length <= MAX_COMMUNITY_SIZE) { splitCommunities.set(cid, paths); continue; } process.stderr.write( `Warning: compute-batches: community size ${paths.length} > max ${MAX_COMMUNITY_SIZE} ` + `— splitting via alphabetical chunking — modularity may decrease\n`, ); const sorted = [...paths].sort(); const parts = Math.ceil(paths.length / MAX_COMMUNITY_SIZE); const perPart = Math.ceil(paths.length / parts); for (let i = 0; i < parts; i++) { const slice = sorted.slice(i * perPart, (i + 1) * perPart); const synthId = `__split_${cid}_${nextSyntheticId++}`; splitCommunities.set(synthId, slice); } } ``` Then update the `sortedCommunities` line to use `splitCommunities` instead of `filesByCommunity`: ```javascript const sortedCommunities = [...splitCommunities.entries()] ``` - [ ] **Step 5: Run test, expect PASS** ```bash pnpm --filter @understand-anything/skill exec vitest run skills/understand/test_compute_batches.test.mjs -t "size enforcement" ``` Expected: PASS — 40 files split into 2 batches of 20 each, warning emitted. - [ ] **Step 6: Run prior test too, expect still PASS** ```bash pnpm --filter @understand-anything/skill exec vitest run skills/understand/test_compute_batches.test.mjs ``` Expected: all tests PASS. - [ ] **Step 7: Commit** ```bash git add understand-anything-plugin/skills/understand/compute-batches.mjs \ understand-anything-plugin/skills/understand/test_compute_batches.test.mjs \ understand-anything-plugin/skills/understand/test/fixtures/scan-result-large-community.json git commit -m "feat(compute-batches): split communities > 35 with visible warning" ``` --- ## Task 5: Exports extraction via TreeSitterPlugin **Files:** - Modify: `understand-anything-plugin/skills/understand/compute-batches.mjs` - Modify: `understand-anything-plugin/skills/understand/test_compute_batches.test.mjs` - [ ] **Step 1: Write failing test (exports populated on real TS files)** Add a fixture-on-disk test that writes real source files and points the fixture at them. Append to `test_compute_batches.test.mjs`: ```javascript describe('compute-batches.mjs — exports extraction', () => { it('populates exports for code files via tree-sitter', () => { const root = mkdtempSync(join(tmpdir(), 'ua-cb-exp-')); mkdirSync(join(root, '.understand-anything', 'intermediate'), { recursive: true }); mkdirSync(join(root, 'src'), { recursive: true }); writeFileSync(join(root, 'src', 'a.ts'), 'export function greet(name: string) { return "hi " + name; }\n' + 'export class Greeter { greet(n: string) { return "hi " + n; } }\n'); writeFileSync(join(root, 'src', 'b.ts'), 'import { greet } from "./a";\nexport const helper = () => greet("world");\n'); const scan = { name: 'exports-test', description: '', languages: ['typescript'], frameworks: [], files: [ { path: 'src/a.ts', language: 'typescript', sizeLines: 2, fileCategory: 'code' }, { path: 'src/b.ts', language: 'typescript', sizeLines: 2, fileCategory: 'code' }, ], totalFiles: 2, filteredByIgnore: 0, estimatedComplexity: 'small', importMap: { 'src/a.ts': [], 'src/b.ts': ['src/a.ts'] }, }; writeFileSync( join(root, '.understand-anything', 'intermediate', 'scan-result.json'), JSON.stringify(scan)); const result = runScript(root); expect(result.status).toBe(0); const batches = readBatches(root); // batches.json doesn't directly store exports — they live in neighborMap. // For this test, dig into the script's internal exports map by re-reading // it. Add an `exportsByPath` debug field to batches.json output (see impl). expect(batches.exportsByPath).toBeDefined(); expect(batches.exportsByPath['src/a.ts']).toEqual( expect.arrayContaining(['greet', 'Greeter'])); expect(batches.exportsByPath['src/b.ts']).toEqual( expect.arrayContaining(['helper'])); }); }); ``` (The `exportsByPath` debug field is a temporary affordance that we keep so future tasks can inspect exports without going through neighborMap. It's emitted in the script output but not consumed by Phase 2 — it's a side-channel for testing and observability.) - [ ] **Step 2: Run test, expect FAIL** ```bash pnpm --filter @understand-anything/skill exec vitest run skills/understand/test_compute_batches.test.mjs -t "exports extraction" ``` Expected: FAIL — `batches.exportsByPath` is undefined. - [ ] **Step 3: Add TreeSitterPlugin loader + exports loop** In `compute-batches.mjs`, add core import dance at top of the file (after existing imports): ```javascript import { createRequire } from 'node:module'; import { pathToFileURL } from 'node:url'; const __filename = fileURLToPath(import.meta.url); const PLUGIN_ROOT = resolve(dirname(__filename), '../..'); const require = createRequire(resolve(PLUGIN_ROOT, 'package.json')); let core; try { core = await import(pathToFileURL(require.resolve('@understand-anything/core')).href); } catch { core = await import(pathToFileURL(resolve(PLUGIN_ROOT, 'packages/core/dist/index.js')).href); } const { TreeSitterPlugin, PluginRegistry, builtinLanguageConfigs, registerAllParsers } = core; ``` Then add an `extractExports(projectRoot, codeFiles)` function before `main()`: ```javascript /** * For each code file, returns its top-level exported symbol names (functions, * classes, exported consts). Per-file errors are swallowed into [] with a * visible warning so a single bad file does not abort batching. * * Returns Map. */ async function extractExports(projectRoot, codeFiles) { const tsConfigs = builtinLanguageConfigs.filter(c => c.treeSitter); const tsPlugin = new TreeSitterPlugin(tsConfigs); await tsPlugin.init(); const registry = new PluginRegistry(); registry.register(tsPlugin); registerAllParsers(registry); const exportsByPath = new Map(); for (const file of codeFiles) { const abs = join(projectRoot, file.path); let content; try { content = readFileSync(abs, 'utf-8'); } catch (err) { process.stderr.write( `Warning: compute-batches: exports extraction failed for ${file.path} ` + `(read error: ${err.message}) — symbols=[] in neighborMap — ` + `cross-batch edges to this file limited to file-level\n`, ); exportsByPath.set(file.path, []); continue; } try { const analysis = registry.analyzeFile(file.path, content); const names = (analysis?.exports || []).map(e => e.name).filter(Boolean); exportsByPath.set(file.path, names); } catch (err) { process.stderr.write( `Warning: compute-batches: exports extraction failed for ${file.path} ` + `(${err.message}) — symbols=[] in neighborMap — ` + `cross-batch edges to this file limited to file-level\n`, ); exportsByPath.set(file.path, []); } } return exportsByPath; } ``` In `main()`, after building `codeFiles` and before Louvain, call: ```javascript const exportsByPath = await extractExports(projectRoot, codeFiles); ``` In the output object, attach the debug field: ```javascript const output = { schemaVersion: 1, algorithm: 'louvain', totalFiles: scan.files.length, totalBatches: batches.length, exportsByPath: Object.fromEntries(exportsByPath), batches, }; ``` - [ ] **Step 4: Run test, expect PASS** ```bash pnpm --filter @understand-anything/skill exec vitest run skills/understand/test_compute_batches.test.mjs -t "exports extraction" ``` Expected: PASS. - [ ] **Step 5: Run all tests, expect still PASS** ```bash pnpm --filter @understand-anything/skill exec vitest run skills/understand/test_compute_batches.test.mjs ``` Expected: all PASS. - [ ] **Step 6: Commit** ```bash git add understand-anything-plugin/skills/understand/compute-batches.mjs \ understand-anything-plugin/skills/understand/test_compute_batches.test.mjs git commit -m "feat(compute-batches): extract top-level exports via TreeSitter, warn on failure" ``` --- ## Task 6: Non-code batching (Groups A-E) **Files:** - Modify: `understand-anything-plugin/skills/understand/compute-batches.mjs` - Modify: `understand-anything-plugin/skills/understand/test_compute_batches.test.mjs` - Create: `understand-anything-plugin/skills/understand/test/fixtures/scan-result-non-code.json` - [ ] **Step 1: Create non-code fixture** Create `understand-anything-plugin/skills/understand/test/fixtures/scan-result-non-code.json`: ```json { "name": "fixture-non-code", "description": "Mix of non-code files exercising Groups A-E", "languages": ["typescript", "dockerfile", "yaml", "sql", "markdown"], "frameworks": [], "files": [ {"path": "src/index.ts", "language": "typescript", "sizeLines": 10, "fileCategory": "code"}, {"path": "Dockerfile", "language": "dockerfile", "sizeLines": 20, "fileCategory": "infra"}, {"path": "docker-compose.yml", "language": "yaml", "sizeLines": 15, "fileCategory": "infra"}, {"path": ".dockerignore", "language": "config", "sizeLines": 5, "fileCategory": "config"}, {"path": "services/api/Dockerfile", "language": "dockerfile", "sizeLines": 18, "fileCategory": "infra"}, {"path": "services/api/docker-compose.yml", "language": "yaml", "sizeLines": 12, "fileCategory": "infra"}, {"path": ".github/workflows/ci.yml", "language": "yaml", "sizeLines": 30, "fileCategory": "infra"}, {"path": ".github/workflows/deploy.yml", "language": "yaml", "sizeLines": 25, "fileCategory": "infra"}, {"path": "migrations/001_init.sql", "language": "sql", "sizeLines": 40, "fileCategory": "data"}, {"path": "migrations/002_users.sql", "language": "sql", "sizeLines": 20, "fileCategory": "data"}, {"path": "docs/getting-started.md", "language": "markdown", "sizeLines": 100, "fileCategory": "docs"}, {"path": "README.md", "language": "markdown", "sizeLines": 200, "fileCategory": "docs"} ], "totalFiles": 12, "filteredByIgnore": 0, "estimatedComplexity": "small", "importMap": { "src/index.ts": [], "Dockerfile": [], "docker-compose.yml": [], ".dockerignore": [], "services/api/Dockerfile": [], "services/api/docker-compose.yml": [], ".github/workflows/ci.yml": [], ".github/workflows/deploy.yml": [], "migrations/001_init.sql": [], "migrations/002_users.sql": [], "docs/getting-started.md": [], "README.md": [] } } ``` - [ ] **Step 2: Write failing tests for each non-code group** Append to `test_compute_batches.test.mjs`: ```javascript describe('compute-batches.mjs — non-code grouping', () => { let root; let batches; beforeEach(() => { root = setupProject('scan-result-non-code.json'); const result = runScript(root); expect(result.status).toBe(0); batches = readBatches(root); }); it('Group A: bundles Dockerfile cluster per directory', () => { // Root-level cluster: Dockerfile + docker-compose.yml + .dockerignore → one batch const rootDockerBatch = batches.batches.find(b => b.files.some(f => f.path === 'Dockerfile')); expect(rootDockerBatch).toBeDefined(); const paths = rootDockerBatch.files.map(f => f.path).sort(); expect(paths).toEqual(['.dockerignore', 'Dockerfile', 'docker-compose.yml']); // services/api cluster is a separate batch const apiDockerBatch = batches.batches.find(b => b.files.some(f => f.path === 'services/api/Dockerfile')); expect(apiDockerBatch).toBeDefined(); expect(apiDockerBatch).not.toBe(rootDockerBatch); expect(apiDockerBatch.files.map(f => f.path).sort()).toEqual([ 'services/api/Dockerfile', 'services/api/docker-compose.yml', ]); }); it('Group B: .github/workflows/* all in one batch', () => { const wfBatch = batches.batches.find(b => b.files.some(f => f.path.startsWith('.github/workflows/'))); expect(wfBatch).toBeDefined(); const wfPaths = wfBatch.files.map(f => f.path).filter(p => p.startsWith('.github/workflows/')); expect(wfPaths.sort()).toEqual([ '.github/workflows/ci.yml', '.github/workflows/deploy.yml', ]); }); it('Group D: SQL migrations under migrations/ in one batch', () => { const migBatch = batches.batches.find(b => b.files.some(f => f.path.startsWith('migrations/'))); expect(migBatch).toBeDefined(); const migPaths = migBatch.files.map(f => f.path).filter(p => p.startsWith('migrations/')); expect(migPaths.sort()).toEqual([ 'migrations/001_init.sql', 'migrations/002_users.sql', ]); }); it('non-code batch indices follow code batches', () => { const codeBatches = batches.batches.filter(b => b.files.every(f => f.fileCategory === 'code')); const nonCodeBatches = batches.batches.filter(b => b.files.some(f => f.fileCategory !== 'code')); expect(codeBatches.length).toBeGreaterThan(0); expect(nonCodeBatches.length).toBeGreaterThan(0); const maxCodeIdx = Math.max(...codeBatches.map(b => b.batchIndex)); const minNonCodeIdx = Math.min(...nonCodeBatches.map(b => b.batchIndex)); expect(minNonCodeIdx).toBeGreaterThan(maxCodeIdx); }); }); ``` - [ ] **Step 3: Run tests, expect FAIL** ```bash pnpm --filter @understand-anything/skill exec vitest run skills/understand/test_compute_batches.test.mjs -t "non-code grouping" ``` Expected: FAIL on all four (non-code files currently end up nowhere — they're not in `codeFiles`, not in any batch). - [ ] **Step 4: Implement non-code grouping** In `compute-batches.mjs`, add a `buildNonCodeBatches(nonCodeFiles, startIndex)` function before `main()`: ```javascript /** * Build batches for non-code files per Groups A-E in the design spec. * Returns Array<{ files: FileMeta[] }> (without batchIndex — caller assigns). */ function buildNonCodeBatches(nonCodeFiles) { const byPath = new Map(nonCodeFiles.map(f => [f.path, f])); const consumed = new Set(); const groups = []; const dirOf = p => p.includes('/') ? p.slice(0, p.lastIndexOf('/')) : ''; const baseOf = p => p.includes('/') ? p.slice(p.lastIndexOf('/') + 1) : p; // Group A: per-directory Dockerfile clusters. const dirsWithDockerfile = new Set( [...byPath.keys()] .filter(p => baseOf(p) === 'Dockerfile') .map(dirOf), ); for (const dir of dirsWithDockerfile) { const inDir = [...byPath.keys()].filter(p => dirOf(p) === dir); const cluster = inDir.filter(p => { const b = baseOf(p); return b === 'Dockerfile' || b === '.dockerignore' || b.startsWith('docker-compose.'); }); if (cluster.length) { groups.push({ files: cluster.map(p => byPath.get(p)) }); cluster.forEach(p => consumed.add(p)); } } // Group B: .github/workflows/* const ghWorkflows = [...byPath.keys()].filter( p => p.startsWith('.github/workflows/') && (p.endsWith('.yml') || p.endsWith('.yaml')), ).filter(p => !consumed.has(p)); if (ghWorkflows.length) { groups.push({ files: ghWorkflows.map(p => byPath.get(p)) }); ghWorkflows.forEach(p => consumed.add(p)); } // Group C: .gitlab-ci.yml + .circleci/* const ciFiles = [...byPath.keys()].filter( p => (p === '.gitlab-ci.yml' || p.startsWith('.circleci/')) && !consumed.has(p), ); if (ciFiles.length) { groups.push({ files: ciFiles.map(p => byPath.get(p)) }); ciFiles.forEach(p => consumed.add(p)); } // Group D: SQL migrations per migrations/ or migration/ directory const migrationDirs = new Set( [...byPath.keys()] .filter(p => p.endsWith('.sql')) .map(dirOf) .filter(d => /(^|\/)migrations?$/.test(d)), ); for (const dir of migrationDirs) { const sqls = [...byPath.keys()] .filter(p => dirOf(p) === dir && p.endsWith('.sql') && !consumed.has(p)) .sort(); if (sqls.length) { groups.push({ files: sqls.map(p => byPath.get(p)) }); sqls.forEach(p => consumed.add(p)); } } // Group E: all remaining grouped by immediate parent dir, max 20 per batch const remainingByDir = new Map(); for (const p of [...byPath.keys()].sort()) { if (consumed.has(p)) continue; const dir = dirOf(p); if (!remainingByDir.has(dir)) remainingByDir.set(dir, []); remainingByDir.get(dir).push(p); } const MAX_E = 20; for (const [, paths] of remainingByDir) { for (let i = 0; i < paths.length; i += MAX_E) { const slice = paths.slice(i, i + MAX_E); groups.push({ files: slice.map(p => byPath.get(p)) }); } } return groups; } ``` In `main()`, after `const codeFiles = ...` add: ```javascript const nonCodeFiles = (scan.files || []).filter(f => f.fileCategory !== 'code'); ``` After the `sortedCommunities`/batches construction for code, build non-code batches and append: ```javascript // Assign code batchIndex first const codeBatchObjs = sortedCommunities.map(([, paths], idx) => ({ batchIndex: idx + 1, files: paths.sort().map(p => fileMetaByPath.get(p)), batchImportData: {}, neighborMap: {}, })); // Append non-code batches after code const nonCodeGroups = buildNonCodeBatches(nonCodeFiles); const nonCodeBatchObjs = nonCodeGroups.map((g, i) => ({ batchIndex: codeBatchObjs.length + i + 1, files: g.files, batchImportData: {}, neighborMap: {}, })); const batches = [...codeBatchObjs, ...nonCodeBatchObjs]; ``` (Remove the old `const batches = sortedCommunities.map(...)` line — it's been replaced.) - [ ] **Step 5: Run tests, expect PASS** ```bash pnpm --filter @understand-anything/skill exec vitest run skills/understand/test_compute_batches.test.mjs ``` Expected: all PASS. - [ ] **Step 6: Commit** ```bash git add understand-anything-plugin/skills/understand/compute-batches.mjs \ understand-anything-plugin/skills/understand/test_compute_batches.test.mjs \ understand-anything-plugin/skills/understand/test/fixtures/scan-result-non-code.json git commit -m "feat(compute-batches): non-code grouping Groups A-E" ``` --- ## Task 7: batchImportData + neighborMap **Files:** - Modify: `understand-anything-plugin/skills/understand/compute-batches.mjs` - Modify: `understand-anything-plugin/skills/understand/test_compute_batches.test.mjs` - [ ] **Step 1: Write failing tests (batchImportData populated, neighborMap correct, excludes same-batch)** Append to `test_compute_batches.test.mjs`: ```javascript describe('compute-batches.mjs — neighborMap + batchImportData', () => { let batches; let batchOf; // path → batchIndex beforeEach(() => { const root = setupProject('scan-result-3-cliques.json'); const result = runScript(root); expect(result.status).toBe(0); batches = readBatches(root); batchOf = new Map(); for (const b of batches.batches) { for (const f of b.files) batchOf.set(f.path, b.batchIndex); } }); it('batchImportData mirrors scan importMap per batch', () => { for (const b of batches.batches) { for (const f of b.files) { expect(b.batchImportData[f.path]).toBeDefined(); // each file's batchImportData should be an array (possibly empty) expect(Array.isArray(b.batchImportData[f.path])).toBe(true); } } // src/auth/login.ts imports src/auth/session.ts and src/auth/tokens.ts const loginBatch = batches.batches.find(b => b.files.some(f => f.path === 'src/auth/login.ts')); expect(loginBatch.batchImportData['src/auth/login.ts'].sort()).toEqual([ 'src/auth/session.ts', 'src/auth/tokens.ts', ]); }); it('neighborMap excludes same-batch files', () => { // The fixture's three cliques each go into one batch — all imports are // intra-batch, so no neighbor map should reference any same-batch file. for (const b of batches.batches) { const sameBatchPaths = new Set(b.files.map(f => f.path)); for (const [file, neighbors] of Object.entries(b.neighborMap)) { for (const n of neighbors) { expect(sameBatchPaths.has(n.path)).toBe(false); } } } }); it('neighborMap entries carry symbols when target has exports', () => { // For a custom case where two cliques cross-import each other, ensure // the neighborMap entry includes the target's exported symbol names. // Build a custom fixture inline. const root = mkdtempSync(join(tmpdir(), 'ua-cb-nbr-')); mkdirSync(join(root, '.understand-anything', 'intermediate'), { recursive: true }); mkdirSync(join(root, 'src'), { recursive: true }); writeFileSync(join(root, 'src', 'a.ts'), 'export function findUser(id: string) { return null; }\nexport class User {}\n'); writeFileSync(join(root, 'src', 'b.ts'), 'import { findUser } from "./a";\nexport const wrap = () => findUser("x");\n'); // To force a/b into different batches, add a third unrelated clique that // dominates one community; here we just rely on small graph behavior. const scan = { name: 't', description: '', languages: ['typescript'], frameworks: [], files: [ { path: 'src/a.ts', language: 'typescript', sizeLines: 2, fileCategory: 'code' }, { path: 'src/b.ts', language: 'typescript', sizeLines: 2, fileCategory: 'code' }, ], totalFiles: 2, filteredByIgnore: 0, estimatedComplexity: 'small', importMap: { 'src/a.ts': [], 'src/b.ts': ['src/a.ts'] }, }; writeFileSync( join(root, '.understand-anything', 'intermediate', 'scan-result.json'), JSON.stringify(scan)); const result = runScript(root); expect(result.status).toBe(0); const out = readBatches(root); // If Louvain puts a and b in the same community, this test is degenerate. // We just assert: for every cross-batch neighbor entry that points to a.ts, // the symbols list includes findUser and User. for (const b of out.batches) { for (const [, neighbors] of Object.entries(b.neighborMap)) { for (const n of neighbors) { if (n.path === 'src/a.ts') { expect(n.symbols).toEqual(expect.arrayContaining(['findUser', 'User'])); } } } } }); }); ``` - [ ] **Step 2: Run tests, expect FAIL** ```bash pnpm --filter @understand-anything/skill exec vitest run skills/understand/test_compute_batches.test.mjs -t "neighborMap" ``` Expected: FAIL — `batchImportData` and `neighborMap` are currently empty `{}` on every batch. - [ ] **Step 3: Implement batchImportData + neighborMap construction** In `compute-batches.mjs`, before the final `output = {...}` write, add a populate step. Replace the `codeBatchObjs` + `nonCodeBatchObjs` construction with the following: ```javascript // Helper: lookup batchIndex by path (any batch — code or non-code) // Build it after we know batch assignments. function buildBatchOfMap(allBatches) { const m = new Map(); for (const b of allBatches) { for (const f of b.files) m.set(f.path, b.batchIndex); } return m; } // First-pass: assemble files-only batches const codeBatchObjsBare = sortedCommunities.map(([, paths], idx) => ({ batchIndex: idx + 1, files: paths.sort().map(p => fileMetaByPath.get(p)), })); const nonCodeGroups = buildNonCodeBatches(nonCodeFiles); const nonCodeBatchObjsBare = nonCodeGroups.map((g, i) => ({ batchIndex: codeBatchObjsBare.length + i + 1, files: g.files, })); const bareBatches = [...codeBatchObjsBare, ...nonCodeBatchObjsBare]; const batchOf = buildBatchOfMap(bareBatches); // Build reverse import map: target → [sources that import target] const reverseImportMap = new Map(); for (const [src, targets] of Object.entries(importMap)) { for (const tgt of targets) { if (!reverseImportMap.has(tgt)) reverseImportMap.set(tgt, []); reverseImportMap.get(tgt).push(src); } } // Compute neighbor degree (number of import relations) per path, used for // truncation when neighborMap[file] has > MAX_NEIGHBORS entries. const NEIGHBOR_DEGREE = new Map(); for (const f of codeFiles) { const outDeg = (importMap[f.path] || []).length; const inDeg = (reverseImportMap.get(f.path) || []).length; NEIGHBOR_DEGREE.set(f.path, outDeg + inDeg); } const MAX_NEIGHBORS = 50; // Second-pass: enrich each batch with batchImportData + neighborMap const batches = bareBatches.map(b => { const batchPaths = new Set(b.files.map(f => f.path)); const batchImportData = {}; const neighborMap = {}; for (const f of b.files) { batchImportData[f.path] = (importMap[f.path] || []).slice(); // 1-hop neighbors: imports out + imported-by in, excluding same batch. const outNeighbors = importMap[f.path] || []; const inNeighbors = reverseImportMap.get(f.path) || []; const all = new Set([...outNeighbors, ...inNeighbors]); const filtered = [...all].filter(p => batchOf.has(p) && !batchPaths.has(p)); let kept = filtered.map(p => ({ path: p, batchIndex: batchOf.get(p), symbols: exportsByPath.get(p) || [], })); if (kept.length > MAX_NEIGHBORS) { const original = kept.length; kept.sort((a, b2) => (NEIGHBOR_DEGREE.get(b2.path) || 0) - (NEIGHBOR_DEGREE.get(a.path) || 0)); kept = kept.slice(0, MAX_NEIGHBORS); process.stderr.write( `Warning: compute-batches: neighborMap for ${f.path} truncated from ` + `${original} to top ${MAX_NEIGHBORS} (by neighbor degree)\n`, ); } if (kept.length) neighborMap[f.path] = kept; } return { batchIndex: b.batchIndex, files: b.files, batchImportData, neighborMap }; }); ``` - [ ] **Step 4: Run tests, expect PASS** ```bash pnpm --filter @understand-anything/skill exec vitest run skills/understand/test_compute_batches.test.mjs ``` Expected: all PASS. - [ ] **Step 5: Add neighborMap truncation test** Append: ```javascript describe('compute-batches.mjs — neighborMap truncation', () => { it('truncates and warns when neighbors > 50', () => { const root = mkdtempSync(join(tmpdir(), 'ua-cb-trunc-')); mkdirSync(join(root, '.understand-anything', 'intermediate'), { recursive: true }); // hub.ts imported by 60 other files const files = [{ path: 'src/hub.ts', language: 'typescript', sizeLines: 1, fileCategory: 'code' }]; const importMap = { 'src/hub.ts': [] }; for (let i = 0; i < 60; i++) { const p = `src/leaf${i}.ts`; files.push({ path: p, language: 'typescript', sizeLines: 1, fileCategory: 'code' }); importMap[p] = ['src/hub.ts']; } const scan = { name: 't', description: '', languages: ['typescript'], frameworks: [], files, totalFiles: files.length, filteredByIgnore: 0, estimatedComplexity: 'moderate', importMap, }; writeFileSync( join(root, '.understand-anything', 'intermediate', 'scan-result.json'), JSON.stringify(scan)); const result = runScript(root); expect(result.status).toBe(0); expect(result.stderr).toMatch(/neighborMap for src\/hub\.ts truncated from 60 to top 50/); const out = readBatches(root); // Find hub.ts and confirm its neighbor list capped at 50 (in whichever batch it landed) for (const b of out.batches) { const nbrs = b.neighborMap['src/hub.ts']; if (nbrs) expect(nbrs.length).toBeLessThanOrEqual(50); } }); }); ``` - [ ] **Step 6: Run tests, expect PASS** ```bash pnpm --filter @understand-anything/skill exec vitest run skills/understand/test_compute_batches.test.mjs ``` Expected: all PASS. - [ ] **Step 7: Commit** ```bash git add understand-anything-plugin/skills/understand/compute-batches.mjs \ understand-anything-plugin/skills/understand/test_compute_batches.test.mjs git commit -m "feat(compute-batches): batchImportData + neighborMap with truncation warning" ``` --- ## Task 8: Fallback path + Louvain warning **Files:** - Modify: `understand-anything-plugin/skills/understand/compute-batches.mjs` - Modify: `understand-anything-plugin/skills/understand/test_compute_batches.test.mjs` - [ ] **Step 1: Write failing test (Louvain crash → fallback, warning emitted, batches still valid)** Append to `test_compute_batches.test.mjs`: ```javascript describe('compute-batches.mjs — fallback', () => { it('falls back to count-based when Louvain throws (env-injected mock)', () => { // We can't easily monkey-patch louvain mid-script in Vitest because the // script runs in a subprocess. Instead, set an env var the script honors: // UA_COMPUTE_BATCHES_FORCE_LOUVAIN_THROW=1 → script throws inside its // Louvain branch, exercising the fallback path. const root = setupProject('scan-result-3-cliques.json'); const result = spawnSync('node', [SCRIPT, root], { encoding: 'utf-8', env: { ...process.env, UA_COMPUTE_BATCHES_FORCE_LOUVAIN_THROW: '1' } }, ); expect(result.status).toBe(0); expect(result.stderr).toMatch( /Warning: compute-batches: Louvain failed.*falling back to count-based grouping/); const out = readBatches(root); expect(out.algorithm).toBe('count-fallback'); expect(out.totalFiles).toBe(9); // Count-based: 12 files per batch → all 9 fit in one batch const codeBatchFileCount = out.batches .filter(b => b.files.every(f => f.fileCategory === 'code')) .reduce((sum, b) => sum + b.files.length, 0); expect(codeBatchFileCount).toBe(9); }); }); ``` - [ ] **Step 2: Run test, expect FAIL** ```bash pnpm --filter @understand-anything/skill exec vitest run skills/understand/test_compute_batches.test.mjs -t "fallback" ``` Expected: FAIL — no fallback path exists; script crashes or produces `algorithm: "louvain"`. - [ ] **Step 3: Implement fallback** In `compute-batches.mjs`, refactor the Louvain section into a function and wrap it in try/catch. **Boundary explicitly:** the block to replace **starts** at `const g = new Graph({ type: 'undirected', allowSelfLoops: false });` and **ends** at the closing brace of the `for (const [cid, paths] of filesByCommunity) { ... }` size-enforcement loop (the loop introduced in Task 4 step 4). Do NOT replace the `const sortedCommunities = [...splitCommunities.entries()] ...` line that follows — it stays as-is and continues to work because the replacement still produces `splitCommunities`. Add a `runLouvain(codeFiles, importMap)` function before `main()`: ```javascript /** * Returns Map via Louvain. May throw — caller must catch * and fall back if it does. Honors UA_COMPUTE_BATCHES_FORCE_LOUVAIN_THROW=1 * to allow tests to exercise the fallback path. */ function runLouvain(codeFiles, importMap) { if (process.env.UA_COMPUTE_BATCHES_FORCE_LOUVAIN_THROW === '1') { throw new Error('forced throw for test'); } const g = new Graph({ type: 'undirected', allowSelfLoops: false }); for (const f of codeFiles) g.addNode(f.path); for (const [src, targets] of Object.entries(importMap)) { if (!g.hasNode(src)) continue; for (const tgt of targets) { if (!g.hasNode(tgt) || src === tgt || g.hasEdge(src, tgt)) continue; g.addEdge(src, tgt); } } const cs = louvain(g); // { nodeId: communityId } return new Map(Object.entries(cs)); } /** * Returns Map via alphabetical chunking of 12 files per * batch. Deterministic, used as fallback when Louvain fails. */ function countBasedAssignment(codeFiles, batchSize = 12) { const out = new Map(); const sorted = [...codeFiles].map(f => f.path).sort(); for (let i = 0; i < sorted.length; i++) { out.set(sorted[i], `count_${Math.floor(i / batchSize)}`); } return out; } ``` In `main()`, replace the Louvain call + size-enforcement block with: ```javascript let algorithm = 'louvain'; let perFileCommunity; try { perFileCommunity = runLouvain(codeFiles, importMap); } catch (err) { process.stderr.write( `Warning: compute-batches: Louvain failed (${err.message}) ` + `— falling back to count-based grouping (12 files/batch) ` + `— module semantic boundaries lost\n`, ); perFileCommunity = countBasedAssignment(codeFiles, 12); algorithm = 'count-fallback'; } // Group files by community id const filesByCommunity = new Map(); for (const [path, cid] of perFileCommunity) { if (!filesByCommunity.has(cid)) filesByCommunity.set(cid, []); filesByCommunity.get(cid).push(path); } // Size enforcement only on louvain output. count-fallback already chunked. const MAX_COMMUNITY_SIZE = 35; const splitCommunities = new Map(); let nextSyntheticId = 0; if (algorithm === 'louvain') { for (const [cid, paths] of filesByCommunity) { if (paths.length <= MAX_COMMUNITY_SIZE) { splitCommunities.set(cid, paths); continue; } process.stderr.write( `Warning: compute-batches: community size ${paths.length} > max ${MAX_COMMUNITY_SIZE} ` + `— splitting via alphabetical chunking — modularity may decrease\n`, ); const sorted = [...paths].sort(); const parts = Math.ceil(paths.length / MAX_COMMUNITY_SIZE); const perPart = Math.ceil(paths.length / parts); for (let i = 0; i < parts; i++) { const slice = sorted.slice(i * perPart, (i + 1) * perPart); const synthId = `__split_${cid}_${nextSyntheticId++}`; splitCommunities.set(synthId, slice); } } } else { for (const [cid, paths] of filesByCommunity) splitCommunities.set(cid, paths); } ``` And update the output object's `algorithm` field: ```javascript const output = { schemaVersion: 1, algorithm, totalFiles: scan.files.length, totalBatches: batches.length, exportsByPath: Object.fromEntries(exportsByPath), batches, }; ``` - [ ] **Step 4: Run tests, expect PASS** ```bash pnpm --filter @understand-anything/skill exec vitest run skills/understand/test_compute_batches.test.mjs ``` Expected: all PASS including new fallback test. - [ ] **Step 5: Commit** ```bash git add understand-anything-plugin/skills/understand/compute-batches.mjs \ understand-anything-plugin/skills/understand/test_compute_batches.test.mjs git commit -m "feat(compute-batches): count-based fallback with visible warning" ``` --- ## Task 9: --changed-files mode **Files:** - Modify: `understand-anything-plugin/skills/understand/compute-batches.mjs` - Modify: `understand-anything-plugin/skills/understand/test_compute_batches.test.mjs` - [ ] **Step 1: Write failing test** Append: ```javascript describe('compute-batches.mjs — --changed-files', () => { it('emits only batches containing changed files', () => { const root = setupProject('scan-result-3-cliques.json'); const changedPath = join(root, 'changed.txt'); // Only the auth clique is changed writeFileSync(changedPath, ['src/auth/login.ts', 'src/auth/tokens.ts'].join('\n')); const result = runScript(root, [`--changed-files=${changedPath}`]); expect(result.status).toBe(0); const out = readBatches(root); // Auth files are in batches; other cliques' batches must be omitted const allPaths = out.batches.flatMap(b => b.files.map(f => f.path)); expect(allPaths).toContain('src/auth/login.ts'); expect(allPaths).toContain('src/auth/tokens.ts'); expect(allPaths).not.toContain('src/api/handlers.ts'); expect(allPaths).not.toContain('src/db/users.ts'); // neighborMap may still reference unchanged files (with their full-graph batchIndex) const loginBatch = out.batches.find(b => b.files.some(f => f.path === 'src/auth/login.ts')); // No assertion on neighborMap content here — the auth clique is fully // changed, so neighborMap entries may be empty. The point is the script // doesn't crash and only emits relevant batches. expect(loginBatch).toBeDefined(); }); }); ``` - [ ] **Step 2: Run test, expect FAIL** ```bash pnpm --filter @understand-anything/skill exec vitest run skills/understand/test_compute_batches.test.mjs -t "changed-files" ``` Expected: FAIL — flag is unrecognized; output contains all batches. - [ ] **Step 3: Implement --changed-files filtering** In `compute-batches.mjs`, at the start of `main()`, after reading `projectRoot`: ```javascript let changedFiles = null; for (const arg of process.argv.slice(3)) { const m = arg.match(/^--changed-files=(.+)$/); if (m) { const p = m[1]; const lines = readFileSync(p, 'utf-8') .split('\n') .map(s => s.trim()) .filter(Boolean); changedFiles = new Set(lines); } } ``` Just before writing the output (after `batches` is assembled), filter: ```javascript let finalBatches = batches; if (changedFiles) { finalBatches = batches.filter(b => b.files.some(f => changedFiles.has(f.path))); // batchIndex on filtered batches retains the full-graph assignment // (the design says neighborMap should still reference unchanged files' // full-graph batchIndex). No renumbering. } const output = { schemaVersion: 1, algorithm, totalFiles: scan.files.length, totalBatches: finalBatches.length, exportsByPath: Object.fromEntries(exportsByPath), batches: finalBatches, }; ``` - [ ] **Step 4: Run test, expect PASS** ```bash pnpm --filter @understand-anything/skill exec vitest run skills/understand/test_compute_batches.test.mjs ``` Expected: all PASS. - [ ] **Step 5: Commit** ```bash git add understand-anything-plugin/skills/understand/compute-batches.mjs \ understand-anything-plugin/skills/understand/test_compute_batches.test.mjs git commit -m "feat(compute-batches): --changed-files mode for incremental updates" ``` --- ## Task 10: file-analyzer.md — add Cross-batch context (neighborMap) section **Files:** - Modify: `understand-anything-plugin/agents/file-analyzer.md` - [ ] **Step 1: Insert the new section** In `understand-anything-plugin/agents/file-analyzer.md`, find the existing line: ``` ### Step 1 — Prepare the input JSON ``` (This is at approximately line 32.) After Step 1's closing code block (the bash heredoc that ends with `ENDJSON`), and **before** `### Step 2 — Execute the bundled extraction script`, insert a new sub-section. Use the Edit tool: Old text (the boundary between Step 1 and Step 2): ``` ENDJSON ``` ### Step 2 — Execute the bundled extraction script ``` New text: ``` ENDJSON ``` ### Cross-batch context (neighborMap) Your dispatch prompt includes a `neighborMap` — for each file in your batch, it lists project-internal neighbors in OTHER batches (files that import yours or that you import), with their exported symbols. Use neighborMap as a confidence boost for cross-batch edges (`calls`, `related`, `inherits`, `implements` to nodes outside your batch): - If your source clearly references a symbol that appears in some `neighbor.symbols`, emit the edge to `function::` or `class::` with confidence. - If your source references a cross-batch symbol that is NOT in neighborMap (the project-scanner may not have extracted it), you may still emit the edge if you saw it explicitly in the imported file's surface — but prefer matching neighborMap symbols when available. - Imports continue to use `batchImportData` (fully resolved), not neighborMap. The merge script's dangling-edge dropper is the safety net for genuinely unresolvable targets. ### Step 2 — Execute the bundled extraction script ``` - [ ] **Step 2: Verify the section was inserted correctly** ```bash grep -n "Cross-batch context (neighborMap)" understand-anything-plugin/agents/file-analyzer.md grep -n "Step 1 — Prepare the input JSON" understand-anything-plugin/agents/file-analyzer.md grep -n "Step 2 — Execute the bundled extraction script" understand-anything-plugin/agents/file-analyzer.md ``` Expected: all three lines exist, and the Cross-batch context line number is between Step 1's and Step 2's line numbers. - [ ] **Step 3: Commit** ```bash git add understand-anything-plugin/agents/file-analyzer.md git commit -m "docs(file-analyzer): add Cross-batch context (neighborMap) section" ``` --- ## Task 11: file-analyzer.md — replace Writing Results with multi-part protocol **Files:** - Modify: `understand-anything-plugin/agents/file-analyzer.md` - [ ] **Step 1: Replace the Writing Results section** In `understand-anything-plugin/agents/file-analyzer.md`, find the existing block (at approximately lines 467-475): Old text: ``` ## Writing Results After producing the JSON: 1. Write the JSON to: `/.understand-anything/intermediate/batch-.json` 2. The project root and batch index will be provided in your prompt. 3. Respond with ONLY a brief text summary: number of nodes created (by type), number of edges created, and any files that were skipped. Do NOT include the full JSON in your text response. ``` New text: ``` ## Writing Results — single or multi-part **Step A — Compute totals.** ``` nodeCount = nodes.length edgeCount = edges.length ``` **Step B — Decide split.** - If `nodeCount ≤ 60` AND `edgeCount ≤ 120`: write ONE file to `.understand-anything/intermediate/batch-.json`. Done. Skip to Step F. - Otherwise: `parts = ceil(max(nodeCount / 60, edgeCount / 120))`. **Step C — Partition.** Sort files in your batch alphabetically by path. Chunk them sequentially into `parts` groups of size `ceil(N / parts)`. For each part: - All nodes whose `filePath` is in this part's files (for non-file nodes like `module`/`concept`, use the file they belong to). - All edges whose `source` is in this part's nodes (target may be anywhere — same part, different part of same batch, different batch). **Step D — Write each part.** Write part `k` (1-indexed) to `.understand-anything/intermediate/batch--part-.json`. Each part is a valid GraphFragment: `{ "nodes": [...], "edges": [...] }`. **Step E — Self-validate.** For each file written, verify: - Valid JSON. - `nodes` array exists and is well-formed. - For every edge: `source` and `target` both appear as either (a) a node `id` in this part's nodes, OR (b) a `file:` reference where `` is in `neighborMap` or `batchImportData`, OR (c) a `function::` / `class::` reference where `` is in some `neighbor.symbols`. If validation fails on a part, do NOT silently rebuild. Respond with an explicit error stating which part failed, which edge(s) failed validation, and why. The dispatching session can then retry. **Step F — Respond.** Respond with ONLY a brief text summary: parts written (1 or more), total nodes/edges across all parts, any files skipped. Do NOT include JSON content in the response. ``` - [ ] **Step 2: Verify** ```bash grep -n "Writing Results — single or multi-part" understand-anything-plugin/agents/file-analyzer.md grep -n "Step A — Compute totals" understand-anything-plugin/agents/file-analyzer.md grep -n "Step F — Respond" understand-anything-plugin/agents/file-analyzer.md # Confirm old prose is gone: ! grep -n "After producing the JSON:" understand-anything-plugin/agents/file-analyzer.md ``` Expected: first three exist, last `grep` returns non-zero (i.e. no match). - [ ] **Step 3: Commit** ```bash git add understand-anything-plugin/agents/file-analyzer.md git commit -m "docs(file-analyzer): replace Writing Results with multi-part output protocol" ``` --- ## Task 12: SKILL.md — Phase 1.5 + Phase 2 rewrite + Incremental path rewrite **Files:** - Modify: `understand-anything-plugin/skills/understand/SKILL.md` - [ ] **Step 1: Insert Phase 1.5 after Phase 1** In `understand-anything-plugin/skills/understand/SKILL.md`, find the line: ``` ## Phase 2 — ANALYZE ``` (At approximately line 278.) Immediately before that line, insert the Phase 1.5 block. The boundary is the `---` separator above `## Phase 2 — ANALYZE`. Use the Edit tool to replace: Old text (the separator + Phase 2 header): ``` --- ## Phase 2 — ANALYZE ``` New text: ``` --- ## Phase 1.5 — BATCH Report: `[Phase 1.5/7] Computing semantic batches...` Run the bundled batching script: ```bash node /compute-batches.mjs $PROJECT_ROOT ``` Reads `.understand-anything/intermediate/scan-result.json`, writes `.understand-anything/intermediate/batches.json`. Capture stderr. Append any line starting with `Warning:` to `$PHASE_WARNINGS` for the final report. If the script exits non-zero, the failure is hard — relay the full stderr to the user as a Phase 1.5 failure. Do not attempt to recover; the script's internal fallback (count-based) already handles recoverable issues. A non-zero exit means a fundamental problem (missing input file, malformed JSON, etc.). --- ## Phase 2 — ANALYZE ``` - [ ] **Step 2: Replace Phase 2 ANALYZE Full analysis path** In SKILL.md, find the block starting `### Full analysis path` (at approximately line 280) and ending just before `### Incremental update path`. Old text (the entire Full analysis path section — multi-paragraph; use Edit to replace from `### Full analysis path` through the line `Include the script's warnings in \`$PHASE_WARNINGS\` for the reviewer.`): ``` ### Full analysis path Batch the file list from Phase 1 into groups of **20-30 files each** (aim for ~25 files per batch for balanced sizes). **Batching strategy for non-code files:** - Group related non-code files together in the same batch when possible: - Dockerfile + docker-compose.yml + .dockerignore → same batch - SQL migration files → same batch (ordered by filename) - CI/CD config files (.github/workflows/*) → same batch - Documentation files (docs/*.md) → same batch - This allows the file-analyzer to create cross-file edges (e.g., docker-compose `depends_on` Dockerfile) - Non-code files can be mixed with code files in the same batch if batch sizes are small - Each file's `fileCategory` from Phase 1 must be included in the batch file list After batching, report the plan to the user: > `[Phase 2/7] Analyzing files — files in batches (up to 5 concurrent)...` For each batch, dispatch a subagent using the `file-analyzer` agent definition (at `agents/file-analyzer.md`). Run up to **5 subagents concurrently** using parallel dispatch. Append the following additional context: > **Additional context from main session:** > > Project: `` — `` > Languages: `` > > $LANGUAGE_DIRECTIVE Before dispatching each batch, construct `batchImportData` from `$IMPORT_MAP`: ```json batchImportData = {} for each file in this batch: batchImportData[file.path] = $IMPORT_MAP[file.path] ?? [] ``` Fill in batch-specific parameters below and dispatch: > Analyze these files and produce GraphNode and GraphEdge objects. > Project root: `$PROJECT_ROOT` > Project: `` > Languages: `` > Batch: `/` > Skill directory (for bundled scripts): `` > Write output to: `$PROJECT_ROOT/.understand-anything/intermediate/batch-.json` > > Pre-resolved import data for this batch (use this for all import edge creation — do NOT re-resolve imports from source): > ```json > > ``` > > Files to analyze in this batch (every entry MUST be passed through to `batchFiles` with all four fields — `path`, `language`, `sizeLines`, `fileCategory`): > 1. `` ( lines, language: ``, fileCategory: ``) > 2. `` ( lines, language: ``, fileCategory: ``) > ... After ALL batches complete, report to the user: `Phase 2 complete. All batches analyzed.` Run the merge-and-normalize script bundled with this skill (located next to this SKILL.md file — use the skill directory path, not the project root): ```bash python /merge-batch-graphs.py $PROJECT_ROOT ``` This script reads all `batch-*.json` files from `$PROJECT_ROOT/.understand-anything/intermediate/`, then in one pass: - Combines all nodes and edges across batches - Normalizes node IDs (strips double prefixes, project-name prefixes, adds missing prefixes) - Normalizes complexity values (`low`→`simple`, `medium`→`moderate`, `high`→`complex`, etc.) - Rewrites edge references to match corrected node IDs - Deduplicates nodes by ID (keeps last occurrence) and edges by `(source, target, type)` - Drops dangling edges referencing missing nodes - Logs all corrections and dropped items to stderr The merge script also runs a `tested_by` linker that canonicalizes test-coverage edges in two passes. **Pass 1** walks LLM-emitted `tested_by` edges and flips inverted ones in place (the LLM systematically emits `test → production` because it sees the import only when analyzing the test file); semantically broken edges (test↔test, prod↔prod, orphan endpoints) are dropped. **Pass 2** supplements with path-convention pairings (`X.ts` ↔ `X.test.ts`, JS/TS `__tests__/` and `/test/` walk-out, Python in-package `tests/`, Go `_test.go` sibling, Maven/Gradle `src/test/...` ↔ `src/main/...`, .NET `/tests/` ↔ `/src/...` and `.Tests/` ↔ `/`). Production nodes that end up sourcing any `tested_by` edge get a `"tested"` tag. All resulting edges run `production → test`. Output: `$PROJECT_ROOT/.understand-anything/intermediate/assembled-graph.json` Include the script's warnings in `$PHASE_WARNINGS` for the reviewer. ``` New text: ``` ### Full analysis path Load `.understand-anything/intermediate/batches.json` (produced by Phase 1.5). Iterate the `batches[]` array. Report: `[Phase 2/7] Analyzing files — files in batches (up to 5 concurrent)...` For each batch, dispatch a subagent using the `file-analyzer` agent definition (at `agents/file-analyzer.md`). Run up to **5 subagents concurrently**. Append the following additional context: > **Additional context from main session:** > > Project: `` — `` > Languages: `` > > $LANGUAGE_DIRECTIVE Dispatch prompt template (fill in batch-specific values from `batches.json[i]`): > Analyze these files and produce GraphNode and GraphEdge objects. > Project root: `$PROJECT_ROOT` > Project: `` > Languages: `` > Batch: `/` > Skill directory (for bundled scripts): `` > Output: write to `$PROJECT_ROOT/.understand-anything/intermediate/batch-.json` (single-file mode) OR `batch--part-.json` (split mode, per Step B of your output protocol). > > Pre-resolved import data for this batch (use directly — do NOT re-resolve imports from source): > ```json > > ``` > > Cross-batch neighbors with their exported symbols (confidence boost for cross-batch edges): > ```json > > ``` > > Files to analyze in this batch (every entry MUST be passed through to `batchFiles` with all four fields — `path`, `language`, `sizeLines`, `fileCategory`): > 1. `` ( lines, language: ``, fileCategory: ``) > 2. `` ( lines, language: ``, fileCategory: ``) > ... After ALL batches complete, report to the user: `Phase 2 complete. All batches analyzed.` Run the merge-and-normalize script bundled with this skill: ```bash python /merge-batch-graphs.py $PROJECT_ROOT ``` This script reads all `batch-*.json` files (including `batch--part-.json` produced by file-analyzers that split their output) from `$PROJECT_ROOT/.understand-anything/intermediate/`, then in one pass: - Combines all nodes and edges across batches - Normalizes node IDs (strips double prefixes, project-name prefixes, adds missing prefixes) - Normalizes complexity values (`low`→`simple`, `medium`→`moderate`, `high`→`complex`, etc.) - Rewrites edge references to match corrected node IDs - Deduplicates nodes by ID (keeps last occurrence) and edges by `(source, target, type)` - Drops dangling edges referencing missing nodes - Logs all corrections and dropped items to stderr The merge script also runs a `tested_by` linker that canonicalizes test-coverage edges in two passes. **Pass 1** walks LLM-emitted `tested_by` edges and flips inverted ones in place; semantically broken edges (test↔test, prod↔prod, orphan endpoints) are dropped. **Pass 2** supplements with path-convention pairings. Production nodes that end up sourcing any `tested_by` edge get a `"tested"` tag. All resulting edges run `production → test`. Output: `$PROJECT_ROOT/.understand-anything/intermediate/assembled-graph.json` Include the script's warnings in `$PHASE_WARNINGS` for the reviewer. ``` - [ ] **Step 3: Replace Incremental update path** Find: ``` ### Incremental update path Use the changed files list from Phase 0. Batch and dispatch file-analyzer subagents using the same process as above (20-30 files per batch, up to 5 concurrent, with batchImportData constructed from $IMPORT_MAP), but only for changed files. After batches complete: 1. Remove old nodes whose `filePath` matches any changed file from the existing graph 2. Remove old edges whose `source` or `target` references a removed node 3. Write the pruned existing nodes/edges as `batch-existing.json` in the intermediate directory 4. Run the same merge script — it will combine `batch-existing.json` with the fresh `batch-*.json` files: ```bash python /merge-batch-graphs.py $PROJECT_ROOT ``` ``` Replace with: ``` ### Incremental update path Write the changed-files list (one path per line) to a temp file: ```bash git diff ..HEAD --name-only > $PROJECT_ROOT/.understand-anything/tmp/changed-files.txt ``` Run compute-batches with `--changed-files`: ```bash node /compute-batches.mjs $PROJECT_ROOT \ --changed-files=$PROJECT_ROOT/.understand-anything/tmp/changed-files.txt ``` This produces a `batches.json` that contains only batches with changed files, but neighborMap entries still reference unchanged files (with their full-graph batchIndex) so cross-batch edges remain emittable. Then dispatch file-analyzer subagents per the same template as the full path. After batches complete: 1. Remove old nodes whose `filePath` matches any changed file from the existing graph 2. Remove old edges whose `source` or `target` references a removed node 3. Write the pruned existing nodes/edges as `batch-existing.json` in the intermediate directory 4. Run the same merge script — it will combine `batch-existing.json` with the fresh `batch-*.json` files: ```bash python /merge-batch-graphs.py $PROJECT_ROOT ``` ``` - [ ] **Step 4: Verify** ```bash grep -n "Phase 1.5 — BATCH" understand-anything-plugin/skills/understand/SKILL.md grep -n "Load \`.understand-anything/intermediate/batches.json\`" understand-anything-plugin/skills/understand/SKILL.md grep -n "compute-batches.mjs" understand-anything-plugin/skills/understand/SKILL.md # Confirm old prose is gone (each command should print "OK: ... absent"): if grep -q "groups of \*\*20-30 files each\*\*" understand-anything-plugin/skills/understand/SKILL.md; then echo "FAIL: old batching prose still present"; else echo "OK: old batching prose absent"; fi if grep -qF "Dockerfile + docker-compose.yml + .dockerignore → same batch" understand-anything-plugin/skills/understand/SKILL.md; then echo "FAIL: old non-code prose still present"; else echo "OK: old non-code prose absent"; fi ``` Expected: first three exist (compute-batches.mjs should appear at least 3 times — Phase 1.5 + Incremental); both check commands print "OK: ... absent". - [ ] **Step 5: Commit** ```bash git add understand-anything-plugin/skills/understand/SKILL.md git commit -m "feat(understand): introduce Phase 1.5 (compute-batches) and rewrite Phase 2 prose" ``` --- ## Task 13: merge-batch-graphs.py — multi-part stderr report + missing-part warning **Files:** - Modify: `understand-anything-plugin/skills/understand/merge-batch-graphs.py` - [ ] **Step 1: Replace the "Found N batch files:" report** In `merge-batch-graphs.py`, find the block at approximately line 1026: Old text: ```python print(f"Found {len(batch_files)} batch files:", file=sys.stderr) ``` New text: ```python # Group by logical batch index so the report distinguishes single-batch # files from multi-part file-analyzer outputs. from collections import defaultdict as _dd by_batch = _dd(list) for f in batch_files: m = re.match(r"batch-(\d+)(?:-part-(\d+))?\.json", f.name) if m: by_batch[int(m.group(1))].append((f.name, int(m.group(2)) if m.group(2) else None)) logical_count = len(by_batch) multi_part = sum(1 for entries in by_batch.values() if len(entries) > 1) print( f"Found {len(batch_files)} batch files " f"({logical_count} logical batches, {multi_part} multi-part):", file=sys.stderr, ) # Missing-part detection: for any logical batch with parts (len > 1), the # set of part numbers MUST be contiguous starting at 1. Gaps suggest a # truncated write — emit a visible warning so the user can investigate. for idx, entries in by_batch.items(): part_nums = [p for (_n, p) in entries if p is not None] if not part_nums: continue present = set(part_nums) expected = set(range(1, max(part_nums) + 1)) missing = sorted(expected - present) if missing: print( f"Warning: merge: batch {idx} has parts {sorted(present)} but " f"missing part {missing} — possible truncated write — " f"affected nodes/edges may be lost", file=sys.stderr, ) ``` - [ ] **Step 2: Verify the file still parses** ```bash python3 -c "import ast; ast.parse(open('understand-anything-plugin/skills/understand/merge-batch-graphs.py').read())" && echo "OK" ``` Expected: prints `OK`. - [ ] **Step 3: Smoke-test the existing test suite still passes** ```bash cd understand-anything-plugin/skills/understand && python3 -m unittest test_merge_batch_graphs.py -v 2>&1 | tail -20 ``` Expected: all existing tests pass (we haven't broken anything). - [ ] **Step 4: Commit** ```bash git add understand-anything-plugin/skills/understand/merge-batch-graphs.py git commit -m "feat(merge-batch-graphs): multi-part aware stderr report + missing-part warning" ``` --- ## Task 14: merge-batch-graphs.py — multi-part unit tests **Files:** - Modify: `understand-anything-plugin/skills/understand/test_merge_batch_graphs.py` - [ ] **Step 1: Append TestMultiPart class** Append to `understand-anything-plugin/skills/understand/test_merge_batch_graphs.py`: ```python # ── Multi-part batch handling ───────────────────────────────────────────── class TestMultiPart(unittest.TestCase): """End-to-end tests for batch--part-.json input handling. These tests invoke merge-batch-graphs.py as a subprocess in a temp directory so we exercise the full path: glob → load → merge → write. """ def setUp(self) -> None: import tempfile self.tmp = Path(tempfile.mkdtemp(prefix="ua-mbg-")) self.intermediate = self.tmp / ".understand-anything" / "intermediate" self.intermediate.mkdir(parents=True, exist_ok=True) def tearDown(self) -> None: import shutil shutil.rmtree(self.tmp, ignore_errors=True) def _write_batch(self, name: str, nodes: list, edges: list) -> None: import json as _j (self.intermediate / name).write_text( _j.dumps({"nodes": nodes, "edges": edges}), encoding="utf-8", ) def _run_merge(self) -> tuple[int, str, dict]: import subprocess import json as _j result = subprocess.run( ["python3", str(_MODULE_PATH), str(self.tmp)], capture_output=True, text=True, ) out_path = self.intermediate / "assembled-graph.json" assembled = _j.loads(out_path.read_text()) if out_path.exists() else {} return result.returncode, result.stderr, assembled def test_two_parts_of_one_logical_batch_merge(self) -> None: self._write_batch("batch-1-part-1.json", [_file_node("src/a.ts")], [{"source": "file:src/a.ts", "target": "file:src/b.ts", "type": "imports", "direction": "forward", "weight": 0.7}]) self._write_batch("batch-1-part-2.json", [_file_node("src/b.ts")], []) rc, _stderr, assembled = self._run_merge() self.assertEqual(rc, 0) node_ids = {n["id"] for n in assembled["nodes"]} self.assertEqual(node_ids, {"file:src/a.ts", "file:src/b.ts"}) # Cross-part edge survived edge_keys = {(e["source"], e["target"], e["type"]) for e in assembled["edges"]} self.assertIn( ("file:src/a.ts", "file:src/b.ts", "imports"), edge_keys) def test_three_parts_of_one_logical_batch_merge(self) -> None: for k, path in enumerate(["src/a.ts", "src/b.ts", "src/c.ts"], start=1): self._write_batch(f"batch-1-part-{k}.json", [_file_node(path)], []) rc, _stderr, assembled = self._run_merge() self.assertEqual(rc, 0) node_ids = {n["id"] for n in assembled["nodes"]} self.assertEqual(node_ids, {"file:src/a.ts", "file:src/b.ts", "file:src/c.ts"}) def test_malformed_part_is_skipped_with_warning(self) -> None: (self.intermediate / "batch-1-part-1.json").write_text( "{ this is not valid json", encoding="utf-8") self._write_batch("batch-1-part-2.json", [_file_node("src/b.ts")], []) rc, stderr, assembled = self._run_merge() self.assertEqual(rc, 0) # The skip warning is from existing load_batch logic self.assertIn("skipping batch-1-part-1.json", stderr) # part-2 content still made it in node_ids = {n["id"] for n in assembled["nodes"]} self.assertEqual(node_ids, {"file:src/b.ts"}) def test_mixed_single_and_multi_part(self) -> None: self._write_batch("batch-1.json", [_file_node("src/single.ts")], []) self._write_batch("batch-2-part-1.json", [_file_node("src/multi-a.ts")], []) self._write_batch("batch-2-part-2.json", [_file_node("src/multi-b.ts")], []) self._write_batch("batch-3.json", [_file_node("src/another-single.ts")], []) rc, _stderr, assembled = self._run_merge() self.assertEqual(rc, 0) node_ids = {n["id"] for n in assembled["nodes"]} self.assertEqual(node_ids, { "file:src/single.ts", "file:src/multi-a.ts", "file:src/multi-b.ts", "file:src/another-single.ts", }) def test_missing_part_emits_warning(self) -> None: # parts {2, 3} present, part-1 missing self._write_batch("batch-1-part-2.json", [_file_node("src/b.ts")], []) self._write_batch("batch-1-part-3.json", [_file_node("src/c.ts")], []) rc, stderr, assembled = self._run_merge() self.assertEqual(rc, 0) self.assertRegex(stderr, r"Warning: merge: batch 1 has parts \[2, 3\] but " r"missing part \[1\] — possible truncated write") def test_stderr_report_format(self) -> None: self._write_batch("batch-1.json", [_file_node("src/a.ts")], []) self._write_batch("batch-2-part-1.json", [_file_node("src/b.ts")], []) self._write_batch("batch-2-part-2.json", [_file_node("src/c.ts")], []) rc, stderr, _assembled = self._run_merge() self.assertEqual(rc, 0) # 3 files on disk, 2 logical batches, 1 multi-part self.assertIn( "Found 3 batch files (2 logical batches, 1 multi-part)", stderr) ``` - [ ] **Step 2: Run tests, expect PASS** ```bash cd understand-anything-plugin/skills/understand && python3 -m unittest test_merge_batch_graphs.TestMultiPart -v ``` Expected: all 6 tests PASS. - [ ] **Step 3: Run full test suite** ```bash cd understand-anything-plugin/skills/understand && python3 -m unittest test_merge_batch_graphs -v 2>&1 | tail -5 ``` Expected: all tests PASS (pre-existing + new). - [ ] **Step 4: Commit** ```bash git add understand-anything-plugin/skills/understand/test_merge_batch_graphs.py git commit -m "test(merge-batch-graphs): TestMultiPart for batch-i-part-k handling" ``` --- ## Task 15: Integration acceptance gate (manual) This task is a **gated manual checklist** — execute interactively, mark each item, do not auto-merge without all green. **Files:** none (this is a verification step) - [ ] **Step 1: Install + build clean** ```bash pnpm install pnpm --filter @understand-anything/core build pnpm --filter @understand-anything/skill build ``` Expected: all succeed. - [ ] **Step 2: Sync local plugin into Claude Code's plugin cache for testing** Per project's CLAUDE.md "Testing Local Plugin Changes" section. From repo root: ```bash INSTALLED_VERSION=$(ls ~/.claude/plugins/cache/understand-anything/understand-anything/ | head -1) echo "Installed version: $INSTALLED_VERSION" rm -rf ~/.claude/plugins/cache/understand-anything/understand-anything/$INSTALLED_VERSION cp -R ./understand-anything-plugin ~/.claude/plugins/cache/understand-anything/understand-anything/$INSTALLED_VERSION ``` - [ ] **Step 3: Start a fresh Claude Code session and run /understand --full on this repo** In a fresh session in this repo's directory: ``` /understand --full ``` Expected during run: - `[Phase 1.5/7] Computing semantic batches...` appears - Phase 2 reports batch count from `batches.json` (not arbitrary count-based) - At least one batch with > 60 nodes / > 120 edges triggers multi-part output (look in `.understand-anything/intermediate/` for any `batch--part-.json` files) Expected after run: - `knowledge-graph.json` exists with reasonable node/edge counts compared to current main - Dashboard renders normally - Phase 7 final report's warnings section includes any compute-batches warnings IF they fired - [ ] **Step 4: Sanity-check batches.json contents** ```bash jq '.algorithm, .totalFiles, .totalBatches, (.batches | length), [.batches[].files | length]' \ .understand-anything/intermediate/batches.json 2>/dev/null \ || echo "batches.json was cleaned up by Phase 7 — re-run with /understand --full and inspect before Phase 7 cleanup, or check git diff for the script's behavior." ``` Note: Phase 7 cleans up `.understand-anything/intermediate/` so this is best inspected mid-run, not after. - [ ] **Step 5: Run on a small repo (5-10 files) to verify fallback batch path** ```bash mkdir -p /tmp/ua-smoke-small/src cd /tmp/ua-smoke-small git init && git commit --allow-empty -m init echo 'export const a = 1;' > src/a.ts echo 'export const b = 2;' > src/b.ts echo 'export const c = 3;' > src/c.ts echo '{"name":"smoke","version":"0.0.1"}' > package.json git add . && git commit -m setup ``` Then `cd /tmp/ua-smoke-small` in a Claude Code session and run `/understand --full`. Expected: completes without errors, single small batch. - [ ] **Step 6: Run on a ~100-file repo to validate the bug fix** If you have a ~100-file repo handy (or use the largest test fixture from the project), run `/understand --full` and confirm no "output limit" errors appear, even on Bedrock OPUS. If you do not have a suitable repo, document this in the PR description as a deferred manual verification step. - [ ] **Step 7: Stage results** This task does not commit anything — it's a verification gate. If Step 3 reveals bugs, go back to the relevant task and fix; otherwise proceed to Task 16. --- ## Task 16: Version bump in 5 files Per project CLAUDE.md: when pushing to remote, bump version in **all five** files listed. **Files:** - Modify: `understand-anything-plugin/package.json` - Modify: `understand-anything-plugin/.claude-plugin/plugin.json` - Modify: `.claude-plugin/plugin.json` - Modify: `.cursor-plugin/plugin.json` - Modify: `.copilot-plugin/plugin.json` - [ ] **Step 1: Determine new version** Current version is `2.7.4` (per `understand-anything-plugin/package.json` line 3). This PR adds a substantial feature (Phase 1.5 + multi-part output) — bump **minor**: `2.8.0`. - [ ] **Step 2: Confirm all five files have the same current version** ```bash grep -H '"version"' \ understand-anything-plugin/package.json \ understand-anything-plugin/.claude-plugin/plugin.json \ .claude-plugin/plugin.json \ .cursor-plugin/plugin.json \ .copilot-plugin/plugin.json ``` Expected: all five print `"version": "2.7.4"` (or whatever the current version is — use that as the baseline). If they diverge, stop and reconcile with the user. - [ ] **Step 3: Bump each file from `2.7.4` to `2.8.0`** Use the Edit tool on each of the five files. For each, replace `"version": "2.7.4"` with `"version": "2.8.0"`. - [ ] **Step 4: Verify all five updated** ```bash grep -H '"version"' \ understand-anything-plugin/package.json \ understand-anything-plugin/.claude-plugin/plugin.json \ .claude-plugin/plugin.json \ .cursor-plugin/plugin.json \ .copilot-plugin/plugin.json ``` Expected: all five print `"version": "2.8.0"`. - [ ] **Step 5: Commit** ```bash git add understand-anything-plugin/package.json \ understand-anything-plugin/.claude-plugin/plugin.json \ .claude-plugin/plugin.json \ .cursor-plugin/plugin.json \ .copilot-plugin/plugin.json git commit -m "chore: bump version to 2.8.0" ``` - [ ] **Step 6: Push branch and open PR** ```bash git push -u origin feat/semantic-batching-and-output-chunking gh pr create --title "feat(understand): semantic batching (Phase 1.5) + output chunking — fixes #159" --body "$(cat <<'EOF' ## Summary - Replace count-based file-analyzer batching with Louvain community detection on the import graph (new Phase 1.5, deterministic `compute-batches.mjs` script). - file-analyzer self-splits its output into `batch--part-.json` when above 60 nodes / 120 edges per part (Bedrock OPUS output cap safety). - Cross-batch neighbors (with their exported symbols) passed to file-analyzer via `neighborMap` so semantic edges like `calls` and `inherits` can be confidently emitted across batches. - Every fallback path emits a visible `Warning:` line that bubbles to `$PHASE_WARNINGS` in the Phase 7 final report. - merge-batch-graphs.py multi-part-aware stderr report + missing-part warning; glob/sort-key already accepted multi-part naming so no algorithmic change required there. Fixes #159. Design: `docs/superpowers/specs/2026-05-24-semantic-batching-and-output-chunking-design.md` Plan: `docs/superpowers/plans/2026-05-24-semantic-batching-and-output-chunking-impl.md` ## Test plan - [x] `pnpm install` (graphology + graphology-communities-louvain install cleanly) - [x] `pnpm --filter @understand-anything/core build` - [x] `pnpm --filter @understand-anything/skill exec vitest run skills/understand/test_compute_batches.test.mjs` — all green - [x] `cd understand-anything-plugin/skills/understand && python3 -m unittest test_merge_batch_graphs -v` — all green - [x] Run `/understand --full` on this repo — `batches.json` generated; multi-part triggered on at least one batch; assembled-graph node/edge counts within expected range vs current main; dashboard renders normally; Phase 7 warnings section includes any compute-batches warnings. - [ ] (Deferred / external) Run on a ~100-file repo on Bedrock OPUS — confirm no "output limit" errors. Document any deferred verification in PR comments. EOF )" ``` Expected: PR URL returned. --- ## Implementation done. Final check before merge: - [ ] All 16 tasks above complete with checkboxes ticked. - [ ] Branch builds + tests green: `pnpm install && pnpm --filter @understand-anything/core build && pnpm --filter @understand-anything/skill exec vitest run skills/understand/ && cd understand-anything-plugin/skills/understand && python3 -m unittest test_merge_batch_graphs test_compute_batches 2>&1 | tail -10` (note: `test_compute_batches` is the Vitest tree, this just sanity-checks Python; the Vitest run is separate) - [ ] No `try { ... } catch { /* silent */ }` or `except: pass` patterns added (grep your diff). - [ ] Spec ↔ plan ↔ code alignment spot-checked: every Failure-mode warning string in the spec is asserted by at least one unit test.