2354 lines
92 KiB
Markdown
2354 lines
92 KiB
Markdown
# Semantic Batching and Output Chunking Implementation Plan
|
|
|
|
> **For agentic workers:** REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (`- [ ]`) syntax for tracking. **All dispatched subagents must use `model="opus"`** (project convention).
|
|
|
|
**Goal:** Replace count-based file-analyzer batching with Louvain semantic batching (Phase 1.5), and add defensive output chunking in file-analyzer (60 nodes / 120 edges per part), so `/understand` stops hitting Bedrock OPUS output caps and produces better cross-batch semantic edge coverage. One PR.
|
|
|
|
**Architecture:** Add `compute-batches.mjs` (Phase 1.5) which runs Louvain on the import graph from `scan-result.json` and writes `batches.json` containing pre-built `batchImportData` + `neighborMap` (paths + exported symbols). file-analyzer reads neighborMap to confidently emit cross-batch edges, and self-splits its output into `batch-<i>-part-<k>.json` when above thresholds. `merge-batch-graphs.py` glob already accepts multi-part naming (no code change, only stderr report + missing-part warning).
|
|
|
|
**Tech Stack:** Node.js ≥22 + pnpm ≥10, `graphology` + `graphology-communities-louvain` (new deps), `@understand-anything/core` TreeSitterPlugin (existing), Vitest for `.mjs` tests, Python `unittest` for `merge-batch-graphs.py` tests.
|
|
|
|
**Source spec:** [`docs/superpowers/specs/2026-05-24-semantic-batching-and-output-chunking-design.md`](../specs/2026-05-24-semantic-batching-and-output-chunking-design.md)
|
|
|
|
**Branch:** `feat/semantic-batching-and-output-chunking` (already created).
|
|
|
|
---
|
|
|
|
## File Structure
|
|
|
|
**Create:**
|
|
|
|
- `understand-anything-plugin/skills/understand/compute-batches.mjs` — Phase 1.5 script
|
|
- `understand-anything-plugin/skills/understand/test_compute_batches.test.mjs` — Vitest unit tests
|
|
- `understand-anything-plugin/skills/understand/test/fixtures/scan-result-3-cliques.json` — synthetic test fixture (3 disjoint import cliques)
|
|
- `understand-anything-plugin/skills/understand/test/fixtures/scan-result-large-community.json` — synthetic test fixture (50-node complete graph)
|
|
- `understand-anything-plugin/skills/understand/test/fixtures/scan-result-non-code.json` — synthetic test fixture (Dockerfile/CI/SQL groups)
|
|
|
|
**Modify:**
|
|
|
|
- `understand-anything-plugin/package.json` — add `graphology` + `graphology-communities-louvain` to `dependencies`
|
|
- `understand-anything-plugin/skills/understand/SKILL.md` — insert Phase 1.5; replace Phase 2 ANALYZE batching prose; replace Incremental update path
|
|
- `understand-anything-plugin/agents/file-analyzer.md` — add Cross-batch context (neighborMap) section; replace Writing Results with multi-part protocol
|
|
- `understand-anything-plugin/skills/understand/merge-batch-graphs.py` — multi-part stderr summary + missing-part warning
|
|
- `understand-anything-plugin/skills/understand/test_merge_batch_graphs.py` — new `TestMultiPart` class
|
|
- `understand-anything-plugin/package.json`, `understand-anything-plugin/.claude-plugin/plugin.json`, `.claude-plugin/plugin.json`, `.cursor-plugin/plugin.json`, `.copilot-plugin/plugin.json` — version bump (Task 16)
|
|
|
|
---
|
|
|
|
## Task 1: Add graphology dependencies
|
|
|
|
**Files:**
|
|
- Modify: `understand-anything-plugin/package.json`
|
|
|
|
- [ ] **Step 1: Add deps to package.json**
|
|
|
|
Edit `understand-anything-plugin/package.json` `dependencies` block:
|
|
|
|
```json
|
|
{
|
|
"name": "@understand-anything/skill",
|
|
"version": "2.7.4",
|
|
"type": "module",
|
|
"main": "dist/index.js",
|
|
"types": "dist/index.d.ts",
|
|
"scripts": {
|
|
"build": "tsc",
|
|
"test": "vitest run"
|
|
},
|
|
"dependencies": {
|
|
"@understand-anything/core": "workspace:*",
|
|
"graphology": "^0.26.0",
|
|
"graphology-communities-louvain": "^2.0.2"
|
|
},
|
|
"devDependencies": {
|
|
"@types/node": "^22.0.0",
|
|
"typescript": "^5.7.0",
|
|
"vitest": "^3.1.0"
|
|
}
|
|
}
|
|
```
|
|
|
|
- [ ] **Step 2: Install**
|
|
|
|
Run from repo root:
|
|
```bash
|
|
pnpm install
|
|
```
|
|
Expected: lockfile updates with graphology + graphology-communities-louvain; no other version churn.
|
|
|
|
- [ ] **Step 3: Smoke test the imports work**
|
|
|
|
Run from `understand-anything-plugin/`:
|
|
```bash
|
|
node -e "import('graphology').then(m => { const G = m.default; const g = new G({type:'undirected'}); g.addNode('a'); g.addNode('b'); g.addEdge('a','b'); console.log('graphology ok, edges:', g.size); })"
|
|
node -e "Promise.all([import('graphology'), import('graphology-communities-louvain')]).then(([G,L]) => { const g = new G.default({type:'undirected'}); ['a','b','c'].forEach(n => g.addNode(n)); g.addEdge('a','b'); g.addEdge('b','c'); console.log('louvain ok:', JSON.stringify(L.default(g))); })"
|
|
```
|
|
Expected: prints `graphology ok, edges: 1` and `louvain ok: {...}` with community ids assigned.
|
|
|
|
- [ ] **Step 4: Commit**
|
|
|
|
```bash
|
|
git add understand-anything-plugin/package.json pnpm-lock.yaml
|
|
git commit -m "deps: add graphology + graphology-communities-louvain"
|
|
```
|
|
|
|
---
|
|
|
|
## Task 2: Prototype compute-batches.mjs (load + Louvain print)
|
|
|
|
This is the **feasibility prototype** — the spec gates the size-enforcement design on what real community sizes look like. Build the skeleton, then run it against a synthetic fixture (and optionally a real `scan-result.json` from this repo if one exists) before adding more code.
|
|
|
|
**Files:**
|
|
- Create: `understand-anything-plugin/skills/understand/compute-batches.mjs`
|
|
- Create: `understand-anything-plugin/skills/understand/test/fixtures/scan-result-3-cliques.json`
|
|
|
|
- [ ] **Step 1: Create test fixture (3 disjoint import cliques)**
|
|
|
|
Create `understand-anything-plugin/skills/understand/test/fixtures/scan-result-3-cliques.json`:
|
|
|
|
```json
|
|
{
|
|
"name": "fixture-3-cliques",
|
|
"description": "Three disjoint import cliques for Louvain testing",
|
|
"languages": ["typescript"],
|
|
"frameworks": [],
|
|
"files": [
|
|
{"path": "src/auth/login.ts", "language": "typescript", "sizeLines": 50, "fileCategory": "code"},
|
|
{"path": "src/auth/session.ts", "language": "typescript", "sizeLines": 40, "fileCategory": "code"},
|
|
{"path": "src/auth/tokens.ts", "language": "typescript", "sizeLines": 60, "fileCategory": "code"},
|
|
{"path": "src/api/handlers.ts", "language": "typescript", "sizeLines": 80, "fileCategory": "code"},
|
|
{"path": "src/api/middleware.ts", "language": "typescript", "sizeLines": 30, "fileCategory": "code"},
|
|
{"path": "src/api/routes.ts", "language": "typescript", "sizeLines": 45, "fileCategory": "code"},
|
|
{"path": "src/db/users.ts", "language": "typescript", "sizeLines": 70, "fileCategory": "code"},
|
|
{"path": "src/db/queries.ts", "language": "typescript", "sizeLines": 55, "fileCategory": "code"},
|
|
{"path": "src/db/migrations.ts", "language": "typescript", "sizeLines": 35, "fileCategory": "code"}
|
|
],
|
|
"totalFiles": 9,
|
|
"filteredByIgnore": 0,
|
|
"estimatedComplexity": "small",
|
|
"importMap": {
|
|
"src/auth/login.ts": ["src/auth/session.ts", "src/auth/tokens.ts"],
|
|
"src/auth/session.ts": ["src/auth/tokens.ts"],
|
|
"src/auth/tokens.ts": [],
|
|
"src/api/handlers.ts": ["src/api/middleware.ts", "src/api/routes.ts"],
|
|
"src/api/middleware.ts": ["src/api/routes.ts"],
|
|
"src/api/routes.ts": [],
|
|
"src/db/users.ts": ["src/db/queries.ts", "src/db/migrations.ts"],
|
|
"src/db/queries.ts": ["src/db/migrations.ts"],
|
|
"src/db/migrations.ts": []
|
|
}
|
|
}
|
|
```
|
|
|
|
- [ ] **Step 2: Write skeleton compute-batches.mjs (Louvain only, no neighborMap, no exports, no fallback)**
|
|
|
|
Create `understand-anything-plugin/skills/understand/compute-batches.mjs`:
|
|
|
|
```javascript
|
|
#!/usr/bin/env node
|
|
/**
|
|
* compute-batches.mjs — Phase 1.5 of /understand
|
|
*
|
|
* Reads scan-result.json, runs Louvain community detection on the import
|
|
* graph, and writes batches.json containing batches + neighborMap.
|
|
*
|
|
* Usage:
|
|
* node compute-batches.mjs <project-root> [--changed-files=<path>]
|
|
*
|
|
* Input: <project-root>/.understand-anything/intermediate/scan-result.json
|
|
* Output: <project-root>/.understand-anything/intermediate/batches.json
|
|
*/
|
|
|
|
import { readFileSync, writeFileSync, existsSync } from 'node:fs';
|
|
import { dirname, resolve, join } from 'node:path';
|
|
import { fileURLToPath } from 'node:url';
|
|
|
|
import Graph from 'graphology';
|
|
import louvain from 'graphology-communities-louvain';
|
|
|
|
// ── Skeleton main: load → Louvain → print sizes ───────────────────────────
|
|
async function main() {
|
|
const projectRoot = process.argv[2];
|
|
if (!projectRoot) {
|
|
process.stderr.write('Usage: node compute-batches.mjs <project-root> [--changed-files=<path>]\n');
|
|
process.exit(1);
|
|
}
|
|
|
|
const scanPath = join(projectRoot, '.understand-anything', 'intermediate', 'scan-result.json');
|
|
if (!existsSync(scanPath)) {
|
|
process.stderr.write(`Error: scan-result.json not found at ${scanPath}\n`);
|
|
process.exit(1);
|
|
}
|
|
|
|
const scan = JSON.parse(readFileSync(scanPath, 'utf-8'));
|
|
const codeFiles = (scan.files || []).filter(f => f.fileCategory === 'code');
|
|
const importMap = scan.importMap || {};
|
|
|
|
process.stderr.write(`Loaded ${scan.files.length} files (${codeFiles.length} code).\n`);
|
|
|
|
// Build undirected import graph
|
|
const g = new Graph({ type: 'undirected', allowSelfLoops: false });
|
|
for (const f of codeFiles) g.addNode(f.path);
|
|
for (const [src, targets] of Object.entries(importMap)) {
|
|
if (!g.hasNode(src)) continue;
|
|
for (const tgt of targets) {
|
|
if (!g.hasNode(tgt) || src === tgt || g.hasEdge(src, tgt)) continue;
|
|
g.addEdge(src, tgt);
|
|
}
|
|
}
|
|
|
|
// Run Louvain
|
|
const communities = louvain(g); // { nodeId: communityId }
|
|
|
|
// Print size distribution
|
|
const sizeByCommunity = new Map();
|
|
for (const [, cid] of Object.entries(communities)) {
|
|
sizeByCommunity.set(cid, (sizeByCommunity.get(cid) || 0) + 1);
|
|
}
|
|
const sizes = [...sizeByCommunity.values()].sort((a, b) => b - a);
|
|
process.stderr.write(
|
|
`Louvain produced ${sizes.length} communities. Size distribution: [${sizes.join(', ')}]\n`,
|
|
);
|
|
process.stderr.write(
|
|
`Max community size: ${sizes[0] ?? 0}, min: ${sizes.at(-1) ?? 0}, ` +
|
|
`>35: ${sizes.filter(s => s > 35).length}, <5: ${sizes.filter(s => s < 5).length}\n`,
|
|
);
|
|
}
|
|
|
|
// CLI entry guard (mirrors extract-structure.mjs pattern)
|
|
import { realpathSync } from 'node:fs';
|
|
function isCliEntry() {
|
|
if (!process.argv[1]) return false;
|
|
try {
|
|
return realpathSync(fileURLToPath(import.meta.url)) === realpathSync(process.argv[1]);
|
|
} catch {
|
|
return false;
|
|
}
|
|
}
|
|
|
|
if (isCliEntry()) {
|
|
try {
|
|
await main();
|
|
} catch (err) {
|
|
process.stderr.write(`compute-batches.mjs failed: ${err.message}\n${err.stack}\n`);
|
|
process.exit(1);
|
|
}
|
|
}
|
|
```
|
|
|
|
- [ ] **Step 3: Run skeleton against the fixture**
|
|
|
|
Create a temporary scratch directory with the fixture in the expected layout:
|
|
|
|
```bash
|
|
mkdir -p /tmp/ua-prototype/.understand-anything/intermediate
|
|
cp understand-anything-plugin/skills/understand/test/fixtures/scan-result-3-cliques.json \
|
|
/tmp/ua-prototype/.understand-anything/intermediate/scan-result.json
|
|
node understand-anything-plugin/skills/understand/compute-batches.mjs /tmp/ua-prototype
|
|
```
|
|
|
|
Expected stderr:
|
|
```
|
|
Loaded 9 files (9 code).
|
|
Louvain produced 3 communities. Size distribution: [3, 3, 3]
|
|
Max community size: 3, min: 3, >35: 0, <5: 3
|
|
```
|
|
|
|
(All 9 files split into 3 cliques of 3. All under min=5 — that's expected for the fixture; in the real plan we accept this and don't merge.)
|
|
|
|
- [ ] **Step 4: (Optional) Run against this repo's scan-result.json if it exists**
|
|
|
|
```bash
|
|
if [ -f .understand-anything/intermediate/scan-result.json ]; then
|
|
node understand-anything-plugin/skills/understand/compute-batches.mjs "$(pwd)"
|
|
else
|
|
echo "No real scan-result.json — skipping (fixture run is sufficient for prototype)."
|
|
fi
|
|
```
|
|
|
|
Record the output: if the real-repo run shows any community size > 35, implement edge-betweenness split in Task 4. Otherwise, Task 4 can be a minimal defensive WCC partition.
|
|
|
|
- [ ] **Step 5: Commit skeleton**
|
|
|
|
```bash
|
|
git add understand-anything-plugin/skills/understand/compute-batches.mjs \
|
|
understand-anything-plugin/skills/understand/test/fixtures/scan-result-3-cliques.json
|
|
git commit -m "feat(compute-batches): skeleton — Louvain on import graph (prototype)"
|
|
```
|
|
|
|
---
|
|
|
|
## Task 3: Write Vitest harness + first Louvain unit test
|
|
|
|
**Files:**
|
|
- Create: `understand-anything-plugin/skills/understand/test_compute_batches.test.mjs`
|
|
|
|
- [ ] **Step 1: Write failing test (Louvain produces 3 batches for 3 cliques)**
|
|
|
|
Create `understand-anything-plugin/skills/understand/test_compute_batches.test.mjs`:
|
|
|
|
```javascript
|
|
import { describe, it, expect, beforeEach } from 'vitest';
|
|
import { mkdtempSync, mkdirSync, writeFileSync, readFileSync, rmSync } from 'node:fs';
|
|
import { tmpdir } from 'node:os';
|
|
import { join } from 'node:path';
|
|
import { spawnSync } from 'node:child_process';
|
|
import { fileURLToPath } from 'node:url';
|
|
import { dirname, resolve } from 'node:path';
|
|
|
|
const __dirname = dirname(fileURLToPath(import.meta.url));
|
|
const SCRIPT = resolve(__dirname, 'compute-batches.mjs');
|
|
const FIXTURES = resolve(__dirname, 'test/fixtures');
|
|
|
|
function runScript(projectRoot, extraArgs = []) {
|
|
return spawnSync('node', [SCRIPT, projectRoot, ...extraArgs], {
|
|
encoding: 'utf-8',
|
|
});
|
|
}
|
|
|
|
function setupProject(fixtureName) {
|
|
const root = mkdtempSync(join(tmpdir(), 'ua-cb-test-'));
|
|
mkdirSync(join(root, '.understand-anything', 'intermediate'), { recursive: true });
|
|
const fixturePath = join(FIXTURES, fixtureName);
|
|
const dest = join(root, '.understand-anything', 'intermediate', 'scan-result.json');
|
|
writeFileSync(dest, readFileSync(fixturePath, 'utf-8'));
|
|
return root;
|
|
}
|
|
|
|
function readBatches(projectRoot) {
|
|
const p = join(projectRoot, '.understand-anything', 'intermediate', 'batches.json');
|
|
return JSON.parse(readFileSync(p, 'utf-8'));
|
|
}
|
|
|
|
describe('compute-batches.mjs — Louvain basic', () => {
|
|
let projectRoot;
|
|
|
|
beforeEach(() => {
|
|
projectRoot = setupProject('scan-result-3-cliques.json');
|
|
});
|
|
|
|
it('produces 3 batches for 3 disjoint cliques', () => {
|
|
const result = runScript(projectRoot);
|
|
expect(result.status).toBe(0);
|
|
|
|
const batches = readBatches(projectRoot);
|
|
expect(batches.algorithm).toBe('louvain');
|
|
expect(batches.totalFiles).toBe(9);
|
|
expect(batches.batches.length).toBe(3);
|
|
|
|
// Each batch should contain exactly one clique (3 files)
|
|
for (const b of batches.batches) {
|
|
expect(b.files.length).toBe(3);
|
|
const dirs = new Set(b.files.map(f => f.path.split('/')[1]));
|
|
expect(dirs.size).toBe(1); // all files in the batch share src/<dir>/
|
|
}
|
|
});
|
|
});
|
|
```
|
|
|
|
- [ ] **Step 2: Run test, expect FAIL**
|
|
|
|
```bash
|
|
pnpm --filter @understand-anything/skill exec vitest run skills/understand/test_compute_batches.test.mjs -t "Louvain basic"
|
|
```
|
|
|
|
Expected: FAIL — `compute-batches.mjs` skeleton from Task 2 only prints to stderr, doesn't write `batches.json`. Test fails on `readBatches` → ENOENT.
|
|
|
|
- [ ] **Step 3: Make skeleton write batches.json**
|
|
|
|
Replace the trailing `process.stderr.write(...)` lines in `compute-batches.mjs` `main()` with the full minimal-batches output. Replace lines starting from `// Print size distribution` to end of `main()`:
|
|
|
|
```javascript
|
|
// Group files by community id, sorted by largest first for stable assignment
|
|
const filesByCommunity = new Map();
|
|
for (const [path, cid] of Object.entries(communities)) {
|
|
if (!filesByCommunity.has(cid)) filesByCommunity.set(cid, []);
|
|
filesByCommunity.get(cid).push(path);
|
|
}
|
|
|
|
// Sort communities by size desc, then by min-path asc for determinism
|
|
const sortedCommunities = [...filesByCommunity.entries()]
|
|
.sort((a, b) => {
|
|
if (b[1].length !== a[1].length) return b[1].length - a[1].length;
|
|
const minA = [...a[1]].sort()[0];
|
|
const minB = [...b[1]].sort()[0];
|
|
return minA.localeCompare(minB);
|
|
});
|
|
|
|
// Build per-batch file list with full file metadata from scan
|
|
const fileMetaByPath = new Map(scan.files.map(f => [f.path, f]));
|
|
const batches = sortedCommunities.map(([, paths], idx) => ({
|
|
batchIndex: idx + 1,
|
|
files: paths.sort().map(p => fileMetaByPath.get(p)),
|
|
batchImportData: {},
|
|
neighborMap: {},
|
|
}));
|
|
|
|
const output = {
|
|
schemaVersion: 1,
|
|
algorithm: 'louvain',
|
|
totalFiles: scan.files.length,
|
|
totalBatches: batches.length,
|
|
batches,
|
|
};
|
|
|
|
const outPath = join(projectRoot, '.understand-anything', 'intermediate', 'batches.json');
|
|
writeFileSync(outPath, JSON.stringify(output, null, 2), 'utf-8');
|
|
process.stderr.write(`Wrote ${batches.length} batches to ${outPath}\n`);
|
|
```
|
|
|
|
- [ ] **Step 4: Run test, expect PASS**
|
|
|
|
```bash
|
|
pnpm --filter @understand-anything/skill exec vitest run skills/understand/test_compute_batches.test.mjs -t "Louvain basic"
|
|
```
|
|
|
|
Expected: PASS.
|
|
|
|
- [ ] **Step 5: Commit**
|
|
|
|
```bash
|
|
git add understand-anything-plugin/skills/understand/compute-batches.mjs \
|
|
understand-anything-plugin/skills/understand/test_compute_batches.test.mjs
|
|
git commit -m "feat(compute-batches): emit batches.json with code communities"
|
|
```
|
|
|
|
---
|
|
|
|
## Task 4: Size enforcement — split oversized communities
|
|
|
|
If the Task 2 prototype run showed any community > 35 files, implement edge-betweenness split. Otherwise, implement a minimal weakly-connected-component (WCC) split as a defensive guard.
|
|
|
|
**Files:**
|
|
- Modify: `understand-anything-plugin/skills/understand/compute-batches.mjs`
|
|
- Modify: `understand-anything-plugin/skills/understand/test_compute_batches.test.mjs`
|
|
- Create: `understand-anything-plugin/skills/understand/test/fixtures/scan-result-large-community.json`
|
|
|
|
- [ ] **Step 1: Create large-community fixture (40-node complete graph in one community)**
|
|
|
|
Create `understand-anything-plugin/skills/understand/test/fixtures/scan-result-large-community.json`. Build programmatically once and commit the JSON:
|
|
|
|
```bash
|
|
node -e "
|
|
const files = [];
|
|
const importMap = {};
|
|
for (let i = 0; i < 40; i++) {
|
|
const p = 'src/big/f' + i + '.ts';
|
|
files.push({ path: p, language: 'typescript', sizeLines: 50, fileCategory: 'code' });
|
|
importMap[p] = [];
|
|
// Every file imports every other — guarantees a single community of 40
|
|
for (let j = 0; j < 40; j++) if (i !== j) importMap[p].push('src/big/f' + j + '.ts');
|
|
}
|
|
const out = {
|
|
name: 'fixture-large-community',
|
|
description: '40 files all importing each other — one community over the max=35 cap',
|
|
languages: ['typescript'],
|
|
frameworks: [],
|
|
files,
|
|
totalFiles: 40,
|
|
filteredByIgnore: 0,
|
|
estimatedComplexity: 'moderate',
|
|
importMap,
|
|
};
|
|
console.log(JSON.stringify(out, null, 2));
|
|
" > understand-anything-plugin/skills/understand/test/fixtures/scan-result-large-community.json
|
|
```
|
|
|
|
- [ ] **Step 2: Write failing test (large community splits to ≤ 35)**
|
|
|
|
Append to `test_compute_batches.test.mjs`:
|
|
|
|
```javascript
|
|
describe('compute-batches.mjs — size enforcement', () => {
|
|
it('splits a 40-node clique into batches ≤ 35', () => {
|
|
const root = setupProject('scan-result-large-community.json');
|
|
const result = runScript(root);
|
|
expect(result.status).toBe(0);
|
|
|
|
const batches = readBatches(root);
|
|
expect(batches.totalFiles).toBe(40);
|
|
for (const b of batches.batches) {
|
|
expect(b.files.length).toBeLessThanOrEqual(35);
|
|
}
|
|
// Sum of all batch file counts equals total files
|
|
const sum = batches.batches.reduce((acc, b) => acc + b.files.length, 0);
|
|
expect(sum).toBe(40);
|
|
// Warning was emitted to stderr
|
|
expect(result.stderr).toMatch(/Warning: compute-batches: community size 40 > max 35/);
|
|
});
|
|
});
|
|
```
|
|
|
|
- [ ] **Step 3: Run test, expect FAIL**
|
|
|
|
```bash
|
|
pnpm --filter @understand-anything/skill exec vitest run skills/understand/test_compute_batches.test.mjs -t "size enforcement"
|
|
```
|
|
|
|
Expected: FAIL — at least one batch has 40 files; no warning emitted.
|
|
|
|
- [ ] **Step 4: Implement WCC-style split + warning**
|
|
|
|
In `compute-batches.mjs`, after the `const communities = louvain(g);` line and before grouping by community, insert size-enforcement logic. Replace the existing grouping block with:
|
|
|
|
```javascript
|
|
// Group files by community id
|
|
const filesByCommunity = new Map();
|
|
for (const [path, cid] of Object.entries(communities)) {
|
|
if (!filesByCommunity.has(cid)) filesByCommunity.set(cid, []);
|
|
filesByCommunity.get(cid).push(path);
|
|
}
|
|
|
|
// Size enforcement: split any community > MAX_COMMUNITY_SIZE.
|
|
// Strategy: deterministic alphabetical chunking within the oversize community.
|
|
// Edge-betweenness would be more modularity-aware but adds dependency surface;
|
|
// alphabetical chunking is deterministic, locality-preserving for co-located
|
|
// files, and bounded by the cap. Each sub-community gets a fresh synthetic id.
|
|
const MAX_COMMUNITY_SIZE = 35;
|
|
const splitCommunities = new Map();
|
|
let nextSyntheticId = 0;
|
|
for (const [cid, paths] of filesByCommunity) {
|
|
if (paths.length <= MAX_COMMUNITY_SIZE) {
|
|
splitCommunities.set(cid, paths);
|
|
continue;
|
|
}
|
|
process.stderr.write(
|
|
`Warning: compute-batches: community size ${paths.length} > max ${MAX_COMMUNITY_SIZE} ` +
|
|
`— splitting via alphabetical chunking — modularity may decrease\n`,
|
|
);
|
|
const sorted = [...paths].sort();
|
|
const parts = Math.ceil(paths.length / MAX_COMMUNITY_SIZE);
|
|
const perPart = Math.ceil(paths.length / parts);
|
|
for (let i = 0; i < parts; i++) {
|
|
const slice = sorted.slice(i * perPart, (i + 1) * perPart);
|
|
const synthId = `__split_${cid}_${nextSyntheticId++}`;
|
|
splitCommunities.set(synthId, slice);
|
|
}
|
|
}
|
|
```
|
|
|
|
Then update the `sortedCommunities` line to use `splitCommunities` instead of `filesByCommunity`:
|
|
|
|
```javascript
|
|
const sortedCommunities = [...splitCommunities.entries()]
|
|
```
|
|
|
|
- [ ] **Step 5: Run test, expect PASS**
|
|
|
|
```bash
|
|
pnpm --filter @understand-anything/skill exec vitest run skills/understand/test_compute_batches.test.mjs -t "size enforcement"
|
|
```
|
|
|
|
Expected: PASS — 40 files split into 2 batches of 20 each, warning emitted.
|
|
|
|
- [ ] **Step 6: Run prior test too, expect still PASS**
|
|
|
|
```bash
|
|
pnpm --filter @understand-anything/skill exec vitest run skills/understand/test_compute_batches.test.mjs
|
|
```
|
|
|
|
Expected: all tests PASS.
|
|
|
|
- [ ] **Step 7: Commit**
|
|
|
|
```bash
|
|
git add understand-anything-plugin/skills/understand/compute-batches.mjs \
|
|
understand-anything-plugin/skills/understand/test_compute_batches.test.mjs \
|
|
understand-anything-plugin/skills/understand/test/fixtures/scan-result-large-community.json
|
|
git commit -m "feat(compute-batches): split communities > 35 with visible warning"
|
|
```
|
|
|
|
---
|
|
|
|
## Task 5: Exports extraction via TreeSitterPlugin
|
|
|
|
**Files:**
|
|
- Modify: `understand-anything-plugin/skills/understand/compute-batches.mjs`
|
|
- Modify: `understand-anything-plugin/skills/understand/test_compute_batches.test.mjs`
|
|
|
|
- [ ] **Step 1: Write failing test (exports populated on real TS files)**
|
|
|
|
Add a fixture-on-disk test that writes real source files and points the fixture at them. Append to `test_compute_batches.test.mjs`:
|
|
|
|
```javascript
|
|
describe('compute-batches.mjs — exports extraction', () => {
|
|
it('populates exports for code files via tree-sitter', () => {
|
|
const root = mkdtempSync(join(tmpdir(), 'ua-cb-exp-'));
|
|
mkdirSync(join(root, '.understand-anything', 'intermediate'), { recursive: true });
|
|
mkdirSync(join(root, 'src'), { recursive: true });
|
|
writeFileSync(join(root, 'src', 'a.ts'),
|
|
'export function greet(name: string) { return "hi " + name; }\n' +
|
|
'export class Greeter { greet(n: string) { return "hi " + n; } }\n');
|
|
writeFileSync(join(root, 'src', 'b.ts'),
|
|
'import { greet } from "./a";\nexport const helper = () => greet("world");\n');
|
|
|
|
const scan = {
|
|
name: 'exports-test',
|
|
description: '',
|
|
languages: ['typescript'],
|
|
frameworks: [],
|
|
files: [
|
|
{ path: 'src/a.ts', language: 'typescript', sizeLines: 2, fileCategory: 'code' },
|
|
{ path: 'src/b.ts', language: 'typescript', sizeLines: 2, fileCategory: 'code' },
|
|
],
|
|
totalFiles: 2, filteredByIgnore: 0, estimatedComplexity: 'small',
|
|
importMap: { 'src/a.ts': [], 'src/b.ts': ['src/a.ts'] },
|
|
};
|
|
writeFileSync(
|
|
join(root, '.understand-anything', 'intermediate', 'scan-result.json'),
|
|
JSON.stringify(scan));
|
|
|
|
const result = runScript(root);
|
|
expect(result.status).toBe(0);
|
|
|
|
const batches = readBatches(root);
|
|
// batches.json doesn't directly store exports — they live in neighborMap.
|
|
// For this test, dig into the script's internal exports map by re-reading
|
|
// it. Add an `exportsByPath` debug field to batches.json output (see impl).
|
|
expect(batches.exportsByPath).toBeDefined();
|
|
expect(batches.exportsByPath['src/a.ts']).toEqual(
|
|
expect.arrayContaining(['greet', 'Greeter']));
|
|
expect(batches.exportsByPath['src/b.ts']).toEqual(
|
|
expect.arrayContaining(['helper']));
|
|
});
|
|
});
|
|
```
|
|
|
|
(The `exportsByPath` debug field is a temporary affordance that we keep so future tasks can inspect exports without going through neighborMap. It's emitted in the script output but not consumed by Phase 2 — it's a side-channel for testing and observability.)
|
|
|
|
- [ ] **Step 2: Run test, expect FAIL**
|
|
|
|
```bash
|
|
pnpm --filter @understand-anything/skill exec vitest run skills/understand/test_compute_batches.test.mjs -t "exports extraction"
|
|
```
|
|
|
|
Expected: FAIL — `batches.exportsByPath` is undefined.
|
|
|
|
- [ ] **Step 3: Add TreeSitterPlugin loader + exports loop**
|
|
|
|
In `compute-batches.mjs`, add core import dance at top of the file (after existing imports):
|
|
|
|
```javascript
|
|
import { createRequire } from 'node:module';
|
|
import { pathToFileURL } from 'node:url';
|
|
|
|
const __filename = fileURLToPath(import.meta.url);
|
|
const PLUGIN_ROOT = resolve(dirname(__filename), '../..');
|
|
const require = createRequire(resolve(PLUGIN_ROOT, 'package.json'));
|
|
|
|
let core;
|
|
try {
|
|
core = await import(pathToFileURL(require.resolve('@understand-anything/core')).href);
|
|
} catch {
|
|
core = await import(pathToFileURL(resolve(PLUGIN_ROOT, 'packages/core/dist/index.js')).href);
|
|
}
|
|
const { TreeSitterPlugin, PluginRegistry, builtinLanguageConfigs, registerAllParsers } = core;
|
|
```
|
|
|
|
Then add an `extractExports(projectRoot, codeFiles)` function before `main()`:
|
|
|
|
```javascript
|
|
/**
|
|
* For each code file, returns its top-level exported symbol names (functions,
|
|
* classes, exported consts). Per-file errors are swallowed into [] with a
|
|
* visible warning so a single bad file does not abort batching.
|
|
*
|
|
* Returns Map<path, string[]>.
|
|
*/
|
|
async function extractExports(projectRoot, codeFiles) {
|
|
const tsConfigs = builtinLanguageConfigs.filter(c => c.treeSitter);
|
|
const tsPlugin = new TreeSitterPlugin(tsConfigs);
|
|
await tsPlugin.init();
|
|
const registry = new PluginRegistry();
|
|
registry.register(tsPlugin);
|
|
registerAllParsers(registry);
|
|
|
|
const exportsByPath = new Map();
|
|
for (const file of codeFiles) {
|
|
const abs = join(projectRoot, file.path);
|
|
let content;
|
|
try {
|
|
content = readFileSync(abs, 'utf-8');
|
|
} catch (err) {
|
|
process.stderr.write(
|
|
`Warning: compute-batches: exports extraction failed for ${file.path} ` +
|
|
`(read error: ${err.message}) — symbols=[] in neighborMap — ` +
|
|
`cross-batch edges to this file limited to file-level\n`,
|
|
);
|
|
exportsByPath.set(file.path, []);
|
|
continue;
|
|
}
|
|
try {
|
|
const analysis = registry.analyzeFile(file.path, content);
|
|
const names = (analysis?.exports || []).map(e => e.name).filter(Boolean);
|
|
exportsByPath.set(file.path, names);
|
|
} catch (err) {
|
|
process.stderr.write(
|
|
`Warning: compute-batches: exports extraction failed for ${file.path} ` +
|
|
`(${err.message}) — symbols=[] in neighborMap — ` +
|
|
`cross-batch edges to this file limited to file-level\n`,
|
|
);
|
|
exportsByPath.set(file.path, []);
|
|
}
|
|
}
|
|
return exportsByPath;
|
|
}
|
|
```
|
|
|
|
In `main()`, after building `codeFiles` and before Louvain, call:
|
|
|
|
```javascript
|
|
const exportsByPath = await extractExports(projectRoot, codeFiles);
|
|
```
|
|
|
|
In the output object, attach the debug field:
|
|
|
|
```javascript
|
|
const output = {
|
|
schemaVersion: 1,
|
|
algorithm: 'louvain',
|
|
totalFiles: scan.files.length,
|
|
totalBatches: batches.length,
|
|
exportsByPath: Object.fromEntries(exportsByPath),
|
|
batches,
|
|
};
|
|
```
|
|
|
|
- [ ] **Step 4: Run test, expect PASS**
|
|
|
|
```bash
|
|
pnpm --filter @understand-anything/skill exec vitest run skills/understand/test_compute_batches.test.mjs -t "exports extraction"
|
|
```
|
|
|
|
Expected: PASS.
|
|
|
|
- [ ] **Step 5: Run all tests, expect still PASS**
|
|
|
|
```bash
|
|
pnpm --filter @understand-anything/skill exec vitest run skills/understand/test_compute_batches.test.mjs
|
|
```
|
|
|
|
Expected: all PASS.
|
|
|
|
- [ ] **Step 6: Commit**
|
|
|
|
```bash
|
|
git add understand-anything-plugin/skills/understand/compute-batches.mjs \
|
|
understand-anything-plugin/skills/understand/test_compute_batches.test.mjs
|
|
git commit -m "feat(compute-batches): extract top-level exports via TreeSitter, warn on failure"
|
|
```
|
|
|
|
---
|
|
|
|
## Task 6: Non-code batching (Groups A-E)
|
|
|
|
**Files:**
|
|
- Modify: `understand-anything-plugin/skills/understand/compute-batches.mjs`
|
|
- Modify: `understand-anything-plugin/skills/understand/test_compute_batches.test.mjs`
|
|
- Create: `understand-anything-plugin/skills/understand/test/fixtures/scan-result-non-code.json`
|
|
|
|
- [ ] **Step 1: Create non-code fixture**
|
|
|
|
Create `understand-anything-plugin/skills/understand/test/fixtures/scan-result-non-code.json`:
|
|
|
|
```json
|
|
{
|
|
"name": "fixture-non-code",
|
|
"description": "Mix of non-code files exercising Groups A-E",
|
|
"languages": ["typescript", "dockerfile", "yaml", "sql", "markdown"],
|
|
"frameworks": [],
|
|
"files": [
|
|
{"path": "src/index.ts", "language": "typescript", "sizeLines": 10, "fileCategory": "code"},
|
|
{"path": "Dockerfile", "language": "dockerfile", "sizeLines": 20, "fileCategory": "infra"},
|
|
{"path": "docker-compose.yml", "language": "yaml", "sizeLines": 15, "fileCategory": "infra"},
|
|
{"path": ".dockerignore", "language": "config", "sizeLines": 5, "fileCategory": "config"},
|
|
{"path": "services/api/Dockerfile", "language": "dockerfile", "sizeLines": 18, "fileCategory": "infra"},
|
|
{"path": "services/api/docker-compose.yml", "language": "yaml", "sizeLines": 12, "fileCategory": "infra"},
|
|
{"path": ".github/workflows/ci.yml", "language": "yaml", "sizeLines": 30, "fileCategory": "infra"},
|
|
{"path": ".github/workflows/deploy.yml", "language": "yaml", "sizeLines": 25, "fileCategory": "infra"},
|
|
{"path": "migrations/001_init.sql", "language": "sql", "sizeLines": 40, "fileCategory": "data"},
|
|
{"path": "migrations/002_users.sql", "language": "sql", "sizeLines": 20, "fileCategory": "data"},
|
|
{"path": "docs/getting-started.md", "language": "markdown", "sizeLines": 100, "fileCategory": "docs"},
|
|
{"path": "README.md", "language": "markdown", "sizeLines": 200, "fileCategory": "docs"}
|
|
],
|
|
"totalFiles": 12,
|
|
"filteredByIgnore": 0,
|
|
"estimatedComplexity": "small",
|
|
"importMap": {
|
|
"src/index.ts": [],
|
|
"Dockerfile": [], "docker-compose.yml": [], ".dockerignore": [],
|
|
"services/api/Dockerfile": [], "services/api/docker-compose.yml": [],
|
|
".github/workflows/ci.yml": [], ".github/workflows/deploy.yml": [],
|
|
"migrations/001_init.sql": [], "migrations/002_users.sql": [],
|
|
"docs/getting-started.md": [], "README.md": []
|
|
}
|
|
}
|
|
```
|
|
|
|
- [ ] **Step 2: Write failing tests for each non-code group**
|
|
|
|
Append to `test_compute_batches.test.mjs`:
|
|
|
|
```javascript
|
|
describe('compute-batches.mjs — non-code grouping', () => {
|
|
let root;
|
|
let batches;
|
|
|
|
beforeEach(() => {
|
|
root = setupProject('scan-result-non-code.json');
|
|
const result = runScript(root);
|
|
expect(result.status).toBe(0);
|
|
batches = readBatches(root);
|
|
});
|
|
|
|
it('Group A: bundles Dockerfile cluster per directory', () => {
|
|
// Root-level cluster: Dockerfile + docker-compose.yml + .dockerignore → one batch
|
|
const rootDockerBatch = batches.batches.find(b =>
|
|
b.files.some(f => f.path === 'Dockerfile'));
|
|
expect(rootDockerBatch).toBeDefined();
|
|
const paths = rootDockerBatch.files.map(f => f.path).sort();
|
|
expect(paths).toEqual(['.dockerignore', 'Dockerfile', 'docker-compose.yml']);
|
|
|
|
// services/api cluster is a separate batch
|
|
const apiDockerBatch = batches.batches.find(b =>
|
|
b.files.some(f => f.path === 'services/api/Dockerfile'));
|
|
expect(apiDockerBatch).toBeDefined();
|
|
expect(apiDockerBatch).not.toBe(rootDockerBatch);
|
|
expect(apiDockerBatch.files.map(f => f.path).sort()).toEqual([
|
|
'services/api/Dockerfile', 'services/api/docker-compose.yml',
|
|
]);
|
|
});
|
|
|
|
it('Group B: .github/workflows/* all in one batch', () => {
|
|
const wfBatch = batches.batches.find(b =>
|
|
b.files.some(f => f.path.startsWith('.github/workflows/')));
|
|
expect(wfBatch).toBeDefined();
|
|
const wfPaths = wfBatch.files.map(f => f.path).filter(p => p.startsWith('.github/workflows/'));
|
|
expect(wfPaths.sort()).toEqual([
|
|
'.github/workflows/ci.yml', '.github/workflows/deploy.yml',
|
|
]);
|
|
});
|
|
|
|
it('Group D: SQL migrations under migrations/ in one batch', () => {
|
|
const migBatch = batches.batches.find(b =>
|
|
b.files.some(f => f.path.startsWith('migrations/')));
|
|
expect(migBatch).toBeDefined();
|
|
const migPaths = migBatch.files.map(f => f.path).filter(p => p.startsWith('migrations/'));
|
|
expect(migPaths.sort()).toEqual([
|
|
'migrations/001_init.sql', 'migrations/002_users.sql',
|
|
]);
|
|
});
|
|
|
|
it('non-code batch indices follow code batches', () => {
|
|
const codeBatches = batches.batches.filter(b =>
|
|
b.files.every(f => f.fileCategory === 'code'));
|
|
const nonCodeBatches = batches.batches.filter(b =>
|
|
b.files.some(f => f.fileCategory !== 'code'));
|
|
expect(codeBatches.length).toBeGreaterThan(0);
|
|
expect(nonCodeBatches.length).toBeGreaterThan(0);
|
|
const maxCodeIdx = Math.max(...codeBatches.map(b => b.batchIndex));
|
|
const minNonCodeIdx = Math.min(...nonCodeBatches.map(b => b.batchIndex));
|
|
expect(minNonCodeIdx).toBeGreaterThan(maxCodeIdx);
|
|
});
|
|
});
|
|
```
|
|
|
|
- [ ] **Step 3: Run tests, expect FAIL**
|
|
|
|
```bash
|
|
pnpm --filter @understand-anything/skill exec vitest run skills/understand/test_compute_batches.test.mjs -t "non-code grouping"
|
|
```
|
|
|
|
Expected: FAIL on all four (non-code files currently end up nowhere — they're not in `codeFiles`, not in any batch).
|
|
|
|
- [ ] **Step 4: Implement non-code grouping**
|
|
|
|
In `compute-batches.mjs`, add a `buildNonCodeBatches(nonCodeFiles, startIndex)` function before `main()`:
|
|
|
|
```javascript
|
|
/**
|
|
* Build batches for non-code files per Groups A-E in the design spec.
|
|
* Returns Array<{ files: FileMeta[] }> (without batchIndex — caller assigns).
|
|
*/
|
|
function buildNonCodeBatches(nonCodeFiles) {
|
|
const byPath = new Map(nonCodeFiles.map(f => [f.path, f]));
|
|
const consumed = new Set();
|
|
const groups = [];
|
|
|
|
const dirOf = p => p.includes('/') ? p.slice(0, p.lastIndexOf('/')) : '';
|
|
const baseOf = p => p.includes('/') ? p.slice(p.lastIndexOf('/') + 1) : p;
|
|
|
|
// Group A: per-directory Dockerfile clusters.
|
|
const dirsWithDockerfile = new Set(
|
|
[...byPath.keys()]
|
|
.filter(p => baseOf(p) === 'Dockerfile')
|
|
.map(dirOf),
|
|
);
|
|
for (const dir of dirsWithDockerfile) {
|
|
const inDir = [...byPath.keys()].filter(p => dirOf(p) === dir);
|
|
const cluster = inDir.filter(p => {
|
|
const b = baseOf(p);
|
|
return b === 'Dockerfile'
|
|
|| b === '.dockerignore'
|
|
|| b.startsWith('docker-compose.');
|
|
});
|
|
if (cluster.length) {
|
|
groups.push({ files: cluster.map(p => byPath.get(p)) });
|
|
cluster.forEach(p => consumed.add(p));
|
|
}
|
|
}
|
|
|
|
// Group B: .github/workflows/*
|
|
const ghWorkflows = [...byPath.keys()].filter(
|
|
p => p.startsWith('.github/workflows/') && (p.endsWith('.yml') || p.endsWith('.yaml')),
|
|
).filter(p => !consumed.has(p));
|
|
if (ghWorkflows.length) {
|
|
groups.push({ files: ghWorkflows.map(p => byPath.get(p)) });
|
|
ghWorkflows.forEach(p => consumed.add(p));
|
|
}
|
|
|
|
// Group C: .gitlab-ci.yml + .circleci/*
|
|
const ciFiles = [...byPath.keys()].filter(
|
|
p => (p === '.gitlab-ci.yml' || p.startsWith('.circleci/'))
|
|
&& !consumed.has(p),
|
|
);
|
|
if (ciFiles.length) {
|
|
groups.push({ files: ciFiles.map(p => byPath.get(p)) });
|
|
ciFiles.forEach(p => consumed.add(p));
|
|
}
|
|
|
|
// Group D: SQL migrations per migrations/ or migration/ directory
|
|
const migrationDirs = new Set(
|
|
[...byPath.keys()]
|
|
.filter(p => p.endsWith('.sql'))
|
|
.map(dirOf)
|
|
.filter(d => /(^|\/)migrations?$/.test(d)),
|
|
);
|
|
for (const dir of migrationDirs) {
|
|
const sqls = [...byPath.keys()]
|
|
.filter(p => dirOf(p) === dir && p.endsWith('.sql') && !consumed.has(p))
|
|
.sort();
|
|
if (sqls.length) {
|
|
groups.push({ files: sqls.map(p => byPath.get(p)) });
|
|
sqls.forEach(p => consumed.add(p));
|
|
}
|
|
}
|
|
|
|
// Group E: all remaining grouped by immediate parent dir, max 20 per batch
|
|
const remainingByDir = new Map();
|
|
for (const p of [...byPath.keys()].sort()) {
|
|
if (consumed.has(p)) continue;
|
|
const dir = dirOf(p);
|
|
if (!remainingByDir.has(dir)) remainingByDir.set(dir, []);
|
|
remainingByDir.get(dir).push(p);
|
|
}
|
|
const MAX_E = 20;
|
|
for (const [, paths] of remainingByDir) {
|
|
for (let i = 0; i < paths.length; i += MAX_E) {
|
|
const slice = paths.slice(i, i + MAX_E);
|
|
groups.push({ files: slice.map(p => byPath.get(p)) });
|
|
}
|
|
}
|
|
|
|
return groups;
|
|
}
|
|
```
|
|
|
|
In `main()`, after `const codeFiles = ...` add:
|
|
|
|
```javascript
|
|
const nonCodeFiles = (scan.files || []).filter(f => f.fileCategory !== 'code');
|
|
```
|
|
|
|
After the `sortedCommunities`/batches construction for code, build non-code batches and append:
|
|
|
|
```javascript
|
|
// Assign code batchIndex first
|
|
const codeBatchObjs = sortedCommunities.map(([, paths], idx) => ({
|
|
batchIndex: idx + 1,
|
|
files: paths.sort().map(p => fileMetaByPath.get(p)),
|
|
batchImportData: {},
|
|
neighborMap: {},
|
|
}));
|
|
|
|
// Append non-code batches after code
|
|
const nonCodeGroups = buildNonCodeBatches(nonCodeFiles);
|
|
const nonCodeBatchObjs = nonCodeGroups.map((g, i) => ({
|
|
batchIndex: codeBatchObjs.length + i + 1,
|
|
files: g.files,
|
|
batchImportData: {},
|
|
neighborMap: {},
|
|
}));
|
|
|
|
const batches = [...codeBatchObjs, ...nonCodeBatchObjs];
|
|
```
|
|
|
|
(Remove the old `const batches = sortedCommunities.map(...)` line — it's been replaced.)
|
|
|
|
- [ ] **Step 5: Run tests, expect PASS**
|
|
|
|
```bash
|
|
pnpm --filter @understand-anything/skill exec vitest run skills/understand/test_compute_batches.test.mjs
|
|
```
|
|
|
|
Expected: all PASS.
|
|
|
|
- [ ] **Step 6: Commit**
|
|
|
|
```bash
|
|
git add understand-anything-plugin/skills/understand/compute-batches.mjs \
|
|
understand-anything-plugin/skills/understand/test_compute_batches.test.mjs \
|
|
understand-anything-plugin/skills/understand/test/fixtures/scan-result-non-code.json
|
|
git commit -m "feat(compute-batches): non-code grouping Groups A-E"
|
|
```
|
|
|
|
---
|
|
|
|
## Task 7: batchImportData + neighborMap
|
|
|
|
**Files:**
|
|
- Modify: `understand-anything-plugin/skills/understand/compute-batches.mjs`
|
|
- Modify: `understand-anything-plugin/skills/understand/test_compute_batches.test.mjs`
|
|
|
|
- [ ] **Step 1: Write failing tests (batchImportData populated, neighborMap correct, excludes same-batch)**
|
|
|
|
Append to `test_compute_batches.test.mjs`:
|
|
|
|
```javascript
|
|
describe('compute-batches.mjs — neighborMap + batchImportData', () => {
|
|
let batches;
|
|
let batchOf; // path → batchIndex
|
|
|
|
beforeEach(() => {
|
|
const root = setupProject('scan-result-3-cliques.json');
|
|
const result = runScript(root);
|
|
expect(result.status).toBe(0);
|
|
batches = readBatches(root);
|
|
batchOf = new Map();
|
|
for (const b of batches.batches) {
|
|
for (const f of b.files) batchOf.set(f.path, b.batchIndex);
|
|
}
|
|
});
|
|
|
|
it('batchImportData mirrors scan importMap per batch', () => {
|
|
for (const b of batches.batches) {
|
|
for (const f of b.files) {
|
|
expect(b.batchImportData[f.path]).toBeDefined();
|
|
// each file's batchImportData should be an array (possibly empty)
|
|
expect(Array.isArray(b.batchImportData[f.path])).toBe(true);
|
|
}
|
|
}
|
|
// src/auth/login.ts imports src/auth/session.ts and src/auth/tokens.ts
|
|
const loginBatch = batches.batches.find(b =>
|
|
b.files.some(f => f.path === 'src/auth/login.ts'));
|
|
expect(loginBatch.batchImportData['src/auth/login.ts'].sort()).toEqual([
|
|
'src/auth/session.ts', 'src/auth/tokens.ts',
|
|
]);
|
|
});
|
|
|
|
it('neighborMap excludes same-batch files', () => {
|
|
// The fixture's three cliques each go into one batch — all imports are
|
|
// intra-batch, so no neighbor map should reference any same-batch file.
|
|
for (const b of batches.batches) {
|
|
const sameBatchPaths = new Set(b.files.map(f => f.path));
|
|
for (const [file, neighbors] of Object.entries(b.neighborMap)) {
|
|
for (const n of neighbors) {
|
|
expect(sameBatchPaths.has(n.path)).toBe(false);
|
|
}
|
|
}
|
|
}
|
|
});
|
|
|
|
it('neighborMap entries carry symbols when target has exports', () => {
|
|
// For a custom case where two cliques cross-import each other, ensure
|
|
// the neighborMap entry includes the target's exported symbol names.
|
|
// Build a custom fixture inline.
|
|
const root = mkdtempSync(join(tmpdir(), 'ua-cb-nbr-'));
|
|
mkdirSync(join(root, '.understand-anything', 'intermediate'), { recursive: true });
|
|
mkdirSync(join(root, 'src'), { recursive: true });
|
|
writeFileSync(join(root, 'src', 'a.ts'),
|
|
'export function findUser(id: string) { return null; }\nexport class User {}\n');
|
|
writeFileSync(join(root, 'src', 'b.ts'),
|
|
'import { findUser } from "./a";\nexport const wrap = () => findUser("x");\n');
|
|
// To force a/b into different batches, add a third unrelated clique that
|
|
// dominates one community; here we just rely on small graph behavior.
|
|
const scan = {
|
|
name: 't', description: '',
|
|
languages: ['typescript'], frameworks: [],
|
|
files: [
|
|
{ path: 'src/a.ts', language: 'typescript', sizeLines: 2, fileCategory: 'code' },
|
|
{ path: 'src/b.ts', language: 'typescript', sizeLines: 2, fileCategory: 'code' },
|
|
],
|
|
totalFiles: 2, filteredByIgnore: 0, estimatedComplexity: 'small',
|
|
importMap: { 'src/a.ts': [], 'src/b.ts': ['src/a.ts'] },
|
|
};
|
|
writeFileSync(
|
|
join(root, '.understand-anything', 'intermediate', 'scan-result.json'),
|
|
JSON.stringify(scan));
|
|
const result = runScript(root);
|
|
expect(result.status).toBe(0);
|
|
const out = readBatches(root);
|
|
// If Louvain puts a and b in the same community, this test is degenerate.
|
|
// We just assert: for every cross-batch neighbor entry that points to a.ts,
|
|
// the symbols list includes findUser and User.
|
|
for (const b of out.batches) {
|
|
for (const [, neighbors] of Object.entries(b.neighborMap)) {
|
|
for (const n of neighbors) {
|
|
if (n.path === 'src/a.ts') {
|
|
expect(n.symbols).toEqual(expect.arrayContaining(['findUser', 'User']));
|
|
}
|
|
}
|
|
}
|
|
}
|
|
});
|
|
});
|
|
```
|
|
|
|
- [ ] **Step 2: Run tests, expect FAIL**
|
|
|
|
```bash
|
|
pnpm --filter @understand-anything/skill exec vitest run skills/understand/test_compute_batches.test.mjs -t "neighborMap"
|
|
```
|
|
|
|
Expected: FAIL — `batchImportData` and `neighborMap` are currently empty `{}` on every batch.
|
|
|
|
- [ ] **Step 3: Implement batchImportData + neighborMap construction**
|
|
|
|
In `compute-batches.mjs`, before the final `output = {...}` write, add a populate step. Replace the `codeBatchObjs` + `nonCodeBatchObjs` construction with the following:
|
|
|
|
```javascript
|
|
// Helper: lookup batchIndex by path (any batch — code or non-code)
|
|
// Build it after we know batch assignments.
|
|
function buildBatchOfMap(allBatches) {
|
|
const m = new Map();
|
|
for (const b of allBatches) {
|
|
for (const f of b.files) m.set(f.path, b.batchIndex);
|
|
}
|
|
return m;
|
|
}
|
|
|
|
// First-pass: assemble files-only batches
|
|
const codeBatchObjsBare = sortedCommunities.map(([, paths], idx) => ({
|
|
batchIndex: idx + 1,
|
|
files: paths.sort().map(p => fileMetaByPath.get(p)),
|
|
}));
|
|
const nonCodeGroups = buildNonCodeBatches(nonCodeFiles);
|
|
const nonCodeBatchObjsBare = nonCodeGroups.map((g, i) => ({
|
|
batchIndex: codeBatchObjsBare.length + i + 1,
|
|
files: g.files,
|
|
}));
|
|
const bareBatches = [...codeBatchObjsBare, ...nonCodeBatchObjsBare];
|
|
const batchOf = buildBatchOfMap(bareBatches);
|
|
|
|
// Build reverse import map: target → [sources that import target]
|
|
const reverseImportMap = new Map();
|
|
for (const [src, targets] of Object.entries(importMap)) {
|
|
for (const tgt of targets) {
|
|
if (!reverseImportMap.has(tgt)) reverseImportMap.set(tgt, []);
|
|
reverseImportMap.get(tgt).push(src);
|
|
}
|
|
}
|
|
|
|
// Compute neighbor degree (number of import relations) per path, used for
|
|
// truncation when neighborMap[file] has > MAX_NEIGHBORS entries.
|
|
const NEIGHBOR_DEGREE = new Map();
|
|
for (const f of codeFiles) {
|
|
const outDeg = (importMap[f.path] || []).length;
|
|
const inDeg = (reverseImportMap.get(f.path) || []).length;
|
|
NEIGHBOR_DEGREE.set(f.path, outDeg + inDeg);
|
|
}
|
|
|
|
const MAX_NEIGHBORS = 50;
|
|
|
|
// Second-pass: enrich each batch with batchImportData + neighborMap
|
|
const batches = bareBatches.map(b => {
|
|
const batchPaths = new Set(b.files.map(f => f.path));
|
|
const batchImportData = {};
|
|
const neighborMap = {};
|
|
for (const f of b.files) {
|
|
batchImportData[f.path] = (importMap[f.path] || []).slice();
|
|
|
|
// 1-hop neighbors: imports out + imported-by in, excluding same batch.
|
|
const outNeighbors = importMap[f.path] || [];
|
|
const inNeighbors = reverseImportMap.get(f.path) || [];
|
|
const all = new Set([...outNeighbors, ...inNeighbors]);
|
|
const filtered = [...all].filter(p => batchOf.has(p) && !batchPaths.has(p));
|
|
|
|
let kept = filtered.map(p => ({
|
|
path: p,
|
|
batchIndex: batchOf.get(p),
|
|
symbols: exportsByPath.get(p) || [],
|
|
}));
|
|
|
|
if (kept.length > MAX_NEIGHBORS) {
|
|
const original = kept.length;
|
|
kept.sort((a, b2) => (NEIGHBOR_DEGREE.get(b2.path) || 0)
|
|
- (NEIGHBOR_DEGREE.get(a.path) || 0));
|
|
kept = kept.slice(0, MAX_NEIGHBORS);
|
|
process.stderr.write(
|
|
`Warning: compute-batches: neighborMap for ${f.path} truncated from ` +
|
|
`${original} to top ${MAX_NEIGHBORS} (by neighbor degree)\n`,
|
|
);
|
|
}
|
|
|
|
if (kept.length) neighborMap[f.path] = kept;
|
|
}
|
|
return { batchIndex: b.batchIndex, files: b.files, batchImportData, neighborMap };
|
|
});
|
|
```
|
|
|
|
- [ ] **Step 4: Run tests, expect PASS**
|
|
|
|
```bash
|
|
pnpm --filter @understand-anything/skill exec vitest run skills/understand/test_compute_batches.test.mjs
|
|
```
|
|
|
|
Expected: all PASS.
|
|
|
|
- [ ] **Step 5: Add neighborMap truncation test**
|
|
|
|
Append:
|
|
|
|
```javascript
|
|
describe('compute-batches.mjs — neighborMap truncation', () => {
|
|
it('truncates and warns when neighbors > 50', () => {
|
|
const root = mkdtempSync(join(tmpdir(), 'ua-cb-trunc-'));
|
|
mkdirSync(join(root, '.understand-anything', 'intermediate'), { recursive: true });
|
|
// hub.ts imported by 60 other files
|
|
const files = [{ path: 'src/hub.ts', language: 'typescript', sizeLines: 1, fileCategory: 'code' }];
|
|
const importMap = { 'src/hub.ts': [] };
|
|
for (let i = 0; i < 60; i++) {
|
|
const p = `src/leaf${i}.ts`;
|
|
files.push({ path: p, language: 'typescript', sizeLines: 1, fileCategory: 'code' });
|
|
importMap[p] = ['src/hub.ts'];
|
|
}
|
|
const scan = {
|
|
name: 't', description: '', languages: ['typescript'], frameworks: [],
|
|
files, totalFiles: files.length, filteredByIgnore: 0,
|
|
estimatedComplexity: 'moderate', importMap,
|
|
};
|
|
writeFileSync(
|
|
join(root, '.understand-anything', 'intermediate', 'scan-result.json'),
|
|
JSON.stringify(scan));
|
|
const result = runScript(root);
|
|
expect(result.status).toBe(0);
|
|
expect(result.stderr).toMatch(/neighborMap for src\/hub\.ts truncated from 60 to top 50/);
|
|
const out = readBatches(root);
|
|
// Find hub.ts and confirm its neighbor list capped at 50 (in whichever batch it landed)
|
|
for (const b of out.batches) {
|
|
const nbrs = b.neighborMap['src/hub.ts'];
|
|
if (nbrs) expect(nbrs.length).toBeLessThanOrEqual(50);
|
|
}
|
|
});
|
|
});
|
|
```
|
|
|
|
- [ ] **Step 6: Run tests, expect PASS**
|
|
|
|
```bash
|
|
pnpm --filter @understand-anything/skill exec vitest run skills/understand/test_compute_batches.test.mjs
|
|
```
|
|
|
|
Expected: all PASS.
|
|
|
|
- [ ] **Step 7: Commit**
|
|
|
|
```bash
|
|
git add understand-anything-plugin/skills/understand/compute-batches.mjs \
|
|
understand-anything-plugin/skills/understand/test_compute_batches.test.mjs
|
|
git commit -m "feat(compute-batches): batchImportData + neighborMap with truncation warning"
|
|
```
|
|
|
|
---
|
|
|
|
## Task 8: Fallback path + Louvain warning
|
|
|
|
**Files:**
|
|
- Modify: `understand-anything-plugin/skills/understand/compute-batches.mjs`
|
|
- Modify: `understand-anything-plugin/skills/understand/test_compute_batches.test.mjs`
|
|
|
|
- [ ] **Step 1: Write failing test (Louvain crash → fallback, warning emitted, batches still valid)**
|
|
|
|
Append to `test_compute_batches.test.mjs`:
|
|
|
|
```javascript
|
|
describe('compute-batches.mjs — fallback', () => {
|
|
it('falls back to count-based when Louvain throws (env-injected mock)', () => {
|
|
// We can't easily monkey-patch louvain mid-script in Vitest because the
|
|
// script runs in a subprocess. Instead, set an env var the script honors:
|
|
// UA_COMPUTE_BATCHES_FORCE_LOUVAIN_THROW=1 → script throws inside its
|
|
// Louvain branch, exercising the fallback path.
|
|
const root = setupProject('scan-result-3-cliques.json');
|
|
const result = spawnSync('node',
|
|
[SCRIPT, root],
|
|
{ encoding: 'utf-8', env: { ...process.env, UA_COMPUTE_BATCHES_FORCE_LOUVAIN_THROW: '1' } },
|
|
);
|
|
expect(result.status).toBe(0);
|
|
expect(result.stderr).toMatch(
|
|
/Warning: compute-batches: Louvain failed.*falling back to count-based grouping/);
|
|
const out = readBatches(root);
|
|
expect(out.algorithm).toBe('count-fallback');
|
|
expect(out.totalFiles).toBe(9);
|
|
// Count-based: 12 files per batch → all 9 fit in one batch
|
|
const codeBatchFileCount = out.batches
|
|
.filter(b => b.files.every(f => f.fileCategory === 'code'))
|
|
.reduce((sum, b) => sum + b.files.length, 0);
|
|
expect(codeBatchFileCount).toBe(9);
|
|
});
|
|
});
|
|
```
|
|
|
|
- [ ] **Step 2: Run test, expect FAIL**
|
|
|
|
```bash
|
|
pnpm --filter @understand-anything/skill exec vitest run skills/understand/test_compute_batches.test.mjs -t "fallback"
|
|
```
|
|
|
|
Expected: FAIL — no fallback path exists; script crashes or produces `algorithm: "louvain"`.
|
|
|
|
- [ ] **Step 3: Implement fallback**
|
|
|
|
In `compute-batches.mjs`, refactor the Louvain section into a function and wrap it in try/catch.
|
|
|
|
**Boundary explicitly:** the block to replace **starts** at `const g = new Graph({ type: 'undirected', allowSelfLoops: false });` and **ends** at the closing brace of the `for (const [cid, paths] of filesByCommunity) { ... }` size-enforcement loop (the loop introduced in Task 4 step 4). Do NOT replace the `const sortedCommunities = [...splitCommunities.entries()] ...` line that follows — it stays as-is and continues to work because the replacement still produces `splitCommunities`.
|
|
|
|
Add a `runLouvain(codeFiles, importMap)` function before `main()`:
|
|
|
|
```javascript
|
|
/**
|
|
* Returns Map<path, communityId> via Louvain. May throw — caller must catch
|
|
* and fall back if it does. Honors UA_COMPUTE_BATCHES_FORCE_LOUVAIN_THROW=1
|
|
* to allow tests to exercise the fallback path.
|
|
*/
|
|
function runLouvain(codeFiles, importMap) {
|
|
if (process.env.UA_COMPUTE_BATCHES_FORCE_LOUVAIN_THROW === '1') {
|
|
throw new Error('forced throw for test');
|
|
}
|
|
const g = new Graph({ type: 'undirected', allowSelfLoops: false });
|
|
for (const f of codeFiles) g.addNode(f.path);
|
|
for (const [src, targets] of Object.entries(importMap)) {
|
|
if (!g.hasNode(src)) continue;
|
|
for (const tgt of targets) {
|
|
if (!g.hasNode(tgt) || src === tgt || g.hasEdge(src, tgt)) continue;
|
|
g.addEdge(src, tgt);
|
|
}
|
|
}
|
|
const cs = louvain(g); // { nodeId: communityId }
|
|
return new Map(Object.entries(cs));
|
|
}
|
|
|
|
/**
|
|
* Returns Map<path, communityId> via alphabetical chunking of 12 files per
|
|
* batch. Deterministic, used as fallback when Louvain fails.
|
|
*/
|
|
function countBasedAssignment(codeFiles, batchSize = 12) {
|
|
const out = new Map();
|
|
const sorted = [...codeFiles].map(f => f.path).sort();
|
|
for (let i = 0; i < sorted.length; i++) {
|
|
out.set(sorted[i], `count_${Math.floor(i / batchSize)}`);
|
|
}
|
|
return out;
|
|
}
|
|
```
|
|
|
|
In `main()`, replace the Louvain call + size-enforcement block with:
|
|
|
|
```javascript
|
|
let algorithm = 'louvain';
|
|
let perFileCommunity;
|
|
try {
|
|
perFileCommunity = runLouvain(codeFiles, importMap);
|
|
} catch (err) {
|
|
process.stderr.write(
|
|
`Warning: compute-batches: Louvain failed (${err.message}) ` +
|
|
`— falling back to count-based grouping (12 files/batch) ` +
|
|
`— module semantic boundaries lost\n`,
|
|
);
|
|
perFileCommunity = countBasedAssignment(codeFiles, 12);
|
|
algorithm = 'count-fallback';
|
|
}
|
|
|
|
// Group files by community id
|
|
const filesByCommunity = new Map();
|
|
for (const [path, cid] of perFileCommunity) {
|
|
if (!filesByCommunity.has(cid)) filesByCommunity.set(cid, []);
|
|
filesByCommunity.get(cid).push(path);
|
|
}
|
|
|
|
// Size enforcement only on louvain output. count-fallback already chunked.
|
|
const MAX_COMMUNITY_SIZE = 35;
|
|
const splitCommunities = new Map();
|
|
let nextSyntheticId = 0;
|
|
if (algorithm === 'louvain') {
|
|
for (const [cid, paths] of filesByCommunity) {
|
|
if (paths.length <= MAX_COMMUNITY_SIZE) {
|
|
splitCommunities.set(cid, paths);
|
|
continue;
|
|
}
|
|
process.stderr.write(
|
|
`Warning: compute-batches: community size ${paths.length} > max ${MAX_COMMUNITY_SIZE} ` +
|
|
`— splitting via alphabetical chunking — modularity may decrease\n`,
|
|
);
|
|
const sorted = [...paths].sort();
|
|
const parts = Math.ceil(paths.length / MAX_COMMUNITY_SIZE);
|
|
const perPart = Math.ceil(paths.length / parts);
|
|
for (let i = 0; i < parts; i++) {
|
|
const slice = sorted.slice(i * perPart, (i + 1) * perPart);
|
|
const synthId = `__split_${cid}_${nextSyntheticId++}`;
|
|
splitCommunities.set(synthId, slice);
|
|
}
|
|
}
|
|
} else {
|
|
for (const [cid, paths] of filesByCommunity) splitCommunities.set(cid, paths);
|
|
}
|
|
```
|
|
|
|
And update the output object's `algorithm` field:
|
|
|
|
```javascript
|
|
const output = {
|
|
schemaVersion: 1,
|
|
algorithm,
|
|
totalFiles: scan.files.length,
|
|
totalBatches: batches.length,
|
|
exportsByPath: Object.fromEntries(exportsByPath),
|
|
batches,
|
|
};
|
|
```
|
|
|
|
- [ ] **Step 4: Run tests, expect PASS**
|
|
|
|
```bash
|
|
pnpm --filter @understand-anything/skill exec vitest run skills/understand/test_compute_batches.test.mjs
|
|
```
|
|
|
|
Expected: all PASS including new fallback test.
|
|
|
|
- [ ] **Step 5: Commit**
|
|
|
|
```bash
|
|
git add understand-anything-plugin/skills/understand/compute-batches.mjs \
|
|
understand-anything-plugin/skills/understand/test_compute_batches.test.mjs
|
|
git commit -m "feat(compute-batches): count-based fallback with visible warning"
|
|
```
|
|
|
|
---
|
|
|
|
## Task 9: --changed-files mode
|
|
|
|
**Files:**
|
|
- Modify: `understand-anything-plugin/skills/understand/compute-batches.mjs`
|
|
- Modify: `understand-anything-plugin/skills/understand/test_compute_batches.test.mjs`
|
|
|
|
- [ ] **Step 1: Write failing test**
|
|
|
|
Append:
|
|
|
|
```javascript
|
|
describe('compute-batches.mjs — --changed-files', () => {
|
|
it('emits only batches containing changed files', () => {
|
|
const root = setupProject('scan-result-3-cliques.json');
|
|
const changedPath = join(root, 'changed.txt');
|
|
// Only the auth clique is changed
|
|
writeFileSync(changedPath, ['src/auth/login.ts', 'src/auth/tokens.ts'].join('\n'));
|
|
|
|
const result = runScript(root, [`--changed-files=${changedPath}`]);
|
|
expect(result.status).toBe(0);
|
|
|
|
const out = readBatches(root);
|
|
// Auth files are in batches; other cliques' batches must be omitted
|
|
const allPaths = out.batches.flatMap(b => b.files.map(f => f.path));
|
|
expect(allPaths).toContain('src/auth/login.ts');
|
|
expect(allPaths).toContain('src/auth/tokens.ts');
|
|
expect(allPaths).not.toContain('src/api/handlers.ts');
|
|
expect(allPaths).not.toContain('src/db/users.ts');
|
|
|
|
// neighborMap may still reference unchanged files (with their full-graph batchIndex)
|
|
const loginBatch = out.batches.find(b =>
|
|
b.files.some(f => f.path === 'src/auth/login.ts'));
|
|
// No assertion on neighborMap content here — the auth clique is fully
|
|
// changed, so neighborMap entries may be empty. The point is the script
|
|
// doesn't crash and only emits relevant batches.
|
|
expect(loginBatch).toBeDefined();
|
|
});
|
|
});
|
|
```
|
|
|
|
- [ ] **Step 2: Run test, expect FAIL**
|
|
|
|
```bash
|
|
pnpm --filter @understand-anything/skill exec vitest run skills/understand/test_compute_batches.test.mjs -t "changed-files"
|
|
```
|
|
|
|
Expected: FAIL — flag is unrecognized; output contains all batches.
|
|
|
|
- [ ] **Step 3: Implement --changed-files filtering**
|
|
|
|
In `compute-batches.mjs`, at the start of `main()`, after reading `projectRoot`:
|
|
|
|
```javascript
|
|
let changedFiles = null;
|
|
for (const arg of process.argv.slice(3)) {
|
|
const m = arg.match(/^--changed-files=(.+)$/);
|
|
if (m) {
|
|
const p = m[1];
|
|
const lines = readFileSync(p, 'utf-8')
|
|
.split('\n')
|
|
.map(s => s.trim())
|
|
.filter(Boolean);
|
|
changedFiles = new Set(lines);
|
|
}
|
|
}
|
|
```
|
|
|
|
Just before writing the output (after `batches` is assembled), filter:
|
|
|
|
```javascript
|
|
let finalBatches = batches;
|
|
if (changedFiles) {
|
|
finalBatches = batches.filter(b => b.files.some(f => changedFiles.has(f.path)));
|
|
// batchIndex on filtered batches retains the full-graph assignment
|
|
// (the design says neighborMap should still reference unchanged files'
|
|
// full-graph batchIndex). No renumbering.
|
|
}
|
|
|
|
const output = {
|
|
schemaVersion: 1,
|
|
algorithm,
|
|
totalFiles: scan.files.length,
|
|
totalBatches: finalBatches.length,
|
|
exportsByPath: Object.fromEntries(exportsByPath),
|
|
batches: finalBatches,
|
|
};
|
|
```
|
|
|
|
- [ ] **Step 4: Run test, expect PASS**
|
|
|
|
```bash
|
|
pnpm --filter @understand-anything/skill exec vitest run skills/understand/test_compute_batches.test.mjs
|
|
```
|
|
|
|
Expected: all PASS.
|
|
|
|
- [ ] **Step 5: Commit**
|
|
|
|
```bash
|
|
git add understand-anything-plugin/skills/understand/compute-batches.mjs \
|
|
understand-anything-plugin/skills/understand/test_compute_batches.test.mjs
|
|
git commit -m "feat(compute-batches): --changed-files mode for incremental updates"
|
|
```
|
|
|
|
---
|
|
|
|
## Task 10: file-analyzer.md — add Cross-batch context (neighborMap) section
|
|
|
|
**Files:**
|
|
- Modify: `understand-anything-plugin/agents/file-analyzer.md`
|
|
|
|
- [ ] **Step 1: Insert the new section**
|
|
|
|
In `understand-anything-plugin/agents/file-analyzer.md`, find the existing line:
|
|
```
|
|
### Step 1 — Prepare the input JSON
|
|
```
|
|
|
|
(This is at approximately line 32.)
|
|
|
|
After Step 1's closing code block (the bash heredoc that ends with `ENDJSON`), and **before** `### Step 2 — Execute the bundled extraction script`, insert a new sub-section. Use the Edit tool:
|
|
|
|
Old text (the boundary between Step 1 and Step 2):
|
|
```
|
|
ENDJSON
|
|
```
|
|
|
|
### Step 2 — Execute the bundled extraction script
|
|
```
|
|
|
|
New text:
|
|
```
|
|
ENDJSON
|
|
```
|
|
|
|
### Cross-batch context (neighborMap)
|
|
|
|
Your dispatch prompt includes a `neighborMap` — for each file in your batch, it lists project-internal neighbors in OTHER batches (files that import yours or that you import), with their exported symbols.
|
|
|
|
Use neighborMap as a confidence boost for cross-batch edges (`calls`, `related`, `inherits`, `implements` to nodes outside your batch):
|
|
|
|
- If your source clearly references a symbol that appears in some `neighbor.symbols`, emit the edge to `function:<neighbor.path>:<symbol>` or `class:<neighbor.path>:<symbol>` with confidence.
|
|
- If your source references a cross-batch symbol that is NOT in neighborMap (the project-scanner may not have extracted it), you may still emit the edge if you saw it explicitly in the imported file's surface — but prefer matching neighborMap symbols when available.
|
|
- Imports continue to use `batchImportData` (fully resolved), not neighborMap.
|
|
|
|
The merge script's dangling-edge dropper is the safety net for genuinely unresolvable targets.
|
|
|
|
### Step 2 — Execute the bundled extraction script
|
|
```
|
|
|
|
- [ ] **Step 2: Verify the section was inserted correctly**
|
|
|
|
```bash
|
|
grep -n "Cross-batch context (neighborMap)" understand-anything-plugin/agents/file-analyzer.md
|
|
grep -n "Step 1 — Prepare the input JSON" understand-anything-plugin/agents/file-analyzer.md
|
|
grep -n "Step 2 — Execute the bundled extraction script" understand-anything-plugin/agents/file-analyzer.md
|
|
```
|
|
|
|
Expected: all three lines exist, and the Cross-batch context line number is between Step 1's and Step 2's line numbers.
|
|
|
|
- [ ] **Step 3: Commit**
|
|
|
|
```bash
|
|
git add understand-anything-plugin/agents/file-analyzer.md
|
|
git commit -m "docs(file-analyzer): add Cross-batch context (neighborMap) section"
|
|
```
|
|
|
|
---
|
|
|
|
## Task 11: file-analyzer.md — replace Writing Results with multi-part protocol
|
|
|
|
**Files:**
|
|
- Modify: `understand-anything-plugin/agents/file-analyzer.md`
|
|
|
|
- [ ] **Step 1: Replace the Writing Results section**
|
|
|
|
In `understand-anything-plugin/agents/file-analyzer.md`, find the existing block (at approximately lines 467-475):
|
|
|
|
Old text:
|
|
```
|
|
## Writing Results
|
|
|
|
After producing the JSON:
|
|
|
|
1. Write the JSON to: `<project-root>/.understand-anything/intermediate/batch-<batchIndex>.json`
|
|
2. The project root and batch index will be provided in your prompt.
|
|
3. Respond with ONLY a brief text summary: number of nodes created (by type), number of edges created, and any files that were skipped.
|
|
|
|
Do NOT include the full JSON in your text response.
|
|
```
|
|
|
|
New text:
|
|
```
|
|
## Writing Results — single or multi-part
|
|
|
|
**Step A — Compute totals.**
|
|
```
|
|
nodeCount = nodes.length
|
|
edgeCount = edges.length
|
|
```
|
|
|
|
**Step B — Decide split.**
|
|
- If `nodeCount ≤ 60` AND `edgeCount ≤ 120`: write ONE file to `.understand-anything/intermediate/batch-<batchIndex>.json`. Done. Skip to Step F.
|
|
- Otherwise: `parts = ceil(max(nodeCount / 60, edgeCount / 120))`.
|
|
|
|
**Step C — Partition.**
|
|
Sort files in your batch alphabetically by path. Chunk them sequentially into `parts` groups of size `ceil(N / parts)`. For each part:
|
|
- All nodes whose `filePath` is in this part's files (for non-file nodes like `module`/`concept`, use the file they belong to).
|
|
- All edges whose `source` is in this part's nodes (target may be anywhere — same part, different part of same batch, different batch).
|
|
|
|
**Step D — Write each part.**
|
|
Write part `k` (1-indexed) to `.understand-anything/intermediate/batch-<batchIndex>-part-<k>.json`. Each part is a valid GraphFragment: `{ "nodes": [...], "edges": [...] }`.
|
|
|
|
**Step E — Self-validate.**
|
|
For each file written, verify:
|
|
- Valid JSON.
|
|
- `nodes` array exists and is well-formed.
|
|
- For every edge: `source` and `target` both appear as either (a) a node `id` in this part's nodes, OR (b) a `file:<path>` reference where `<path>` is in `neighborMap` or `batchImportData`, OR (c) a `function:<path>:<symbol>` / `class:<path>:<symbol>` reference where `<symbol>` is in some `neighbor.symbols`.
|
|
|
|
If validation fails on a part, do NOT silently rebuild. Respond with an explicit error stating which part failed, which edge(s) failed validation, and why. The dispatching session can then retry.
|
|
|
|
**Step F — Respond.**
|
|
Respond with ONLY a brief text summary: parts written (1 or more), total nodes/edges across all parts, any files skipped. Do NOT include JSON content in the response.
|
|
```
|
|
|
|
- [ ] **Step 2: Verify**
|
|
|
|
```bash
|
|
grep -n "Writing Results — single or multi-part" understand-anything-plugin/agents/file-analyzer.md
|
|
grep -n "Step A — Compute totals" understand-anything-plugin/agents/file-analyzer.md
|
|
grep -n "Step F — Respond" understand-anything-plugin/agents/file-analyzer.md
|
|
# Confirm old prose is gone:
|
|
! grep -n "After producing the JSON:" understand-anything-plugin/agents/file-analyzer.md
|
|
```
|
|
|
|
Expected: first three exist, last `grep` returns non-zero (i.e. no match).
|
|
|
|
- [ ] **Step 3: Commit**
|
|
|
|
```bash
|
|
git add understand-anything-plugin/agents/file-analyzer.md
|
|
git commit -m "docs(file-analyzer): replace Writing Results with multi-part output protocol"
|
|
```
|
|
|
|
---
|
|
|
|
## Task 12: SKILL.md — Phase 1.5 + Phase 2 rewrite + Incremental path rewrite
|
|
|
|
**Files:**
|
|
- Modify: `understand-anything-plugin/skills/understand/SKILL.md`
|
|
|
|
- [ ] **Step 1: Insert Phase 1.5 after Phase 1**
|
|
|
|
In `understand-anything-plugin/skills/understand/SKILL.md`, find the line:
|
|
```
|
|
## Phase 2 — ANALYZE
|
|
```
|
|
|
|
(At approximately line 278.)
|
|
|
|
Immediately before that line, insert the Phase 1.5 block. The boundary is the `---` separator above `## Phase 2 — ANALYZE`. Use the Edit tool to replace:
|
|
|
|
Old text (the separator + Phase 2 header):
|
|
```
|
|
---
|
|
|
|
## Phase 2 — ANALYZE
|
|
```
|
|
|
|
New text:
|
|
```
|
|
---
|
|
|
|
## Phase 1.5 — BATCH
|
|
|
|
Report: `[Phase 1.5/7] Computing semantic batches...`
|
|
|
|
Run the bundled batching script:
|
|
```bash
|
|
node <SKILL_DIR>/compute-batches.mjs $PROJECT_ROOT
|
|
```
|
|
|
|
Reads `.understand-anything/intermediate/scan-result.json`, writes `.understand-anything/intermediate/batches.json`.
|
|
|
|
Capture stderr. Append any line starting with `Warning:` to `$PHASE_WARNINGS` for the final report.
|
|
|
|
If the script exits non-zero, the failure is hard — relay the full stderr to the user as a Phase 1.5 failure. Do not attempt to recover; the script's internal fallback (count-based) already handles recoverable issues. A non-zero exit means a fundamental problem (missing input file, malformed JSON, etc.).
|
|
|
|
---
|
|
|
|
## Phase 2 — ANALYZE
|
|
```
|
|
|
|
- [ ] **Step 2: Replace Phase 2 ANALYZE Full analysis path**
|
|
|
|
In SKILL.md, find the block starting `### Full analysis path` (at approximately line 280) and ending just before `### Incremental update path`.
|
|
|
|
Old text (the entire Full analysis path section — multi-paragraph; use Edit to replace from `### Full analysis path` through the line `Include the script's warnings in \`$PHASE_WARNINGS\` for the reviewer.`):
|
|
|
|
```
|
|
### Full analysis path
|
|
|
|
Batch the file list from Phase 1 into groups of **20-30 files each** (aim for ~25 files per batch for balanced sizes).
|
|
|
|
**Batching strategy for non-code files:**
|
|
- Group related non-code files together in the same batch when possible:
|
|
- Dockerfile + docker-compose.yml + .dockerignore → same batch
|
|
- SQL migration files → same batch (ordered by filename)
|
|
- CI/CD config files (.github/workflows/*) → same batch
|
|
- Documentation files (docs/*.md) → same batch
|
|
- This allows the file-analyzer to create cross-file edges (e.g., docker-compose `depends_on` Dockerfile)
|
|
- Non-code files can be mixed with code files in the same batch if batch sizes are small
|
|
- Each file's `fileCategory` from Phase 1 must be included in the batch file list
|
|
|
|
After batching, report the plan to the user:
|
|
> `[Phase 2/7] Analyzing files — <totalFiles> files in <totalBatches> batches (up to 5 concurrent)...`
|
|
|
|
For each batch, dispatch a subagent using the `file-analyzer` agent definition (at `agents/file-analyzer.md`). Run up to **5 subagents concurrently** using parallel dispatch. Append the following additional context:
|
|
|
|
> **Additional context from main session:**
|
|
>
|
|
> Project: `<projectName>` — `<projectDescription>`
|
|
> Languages: `<languages from Phase 1>`
|
|
>
|
|
> $LANGUAGE_DIRECTIVE
|
|
|
|
Before dispatching each batch, construct `batchImportData` from `$IMPORT_MAP`:
|
|
```json
|
|
batchImportData = {}
|
|
for each file in this batch:
|
|
batchImportData[file.path] = $IMPORT_MAP[file.path] ?? []
|
|
```
|
|
|
|
Fill in batch-specific parameters below and dispatch:
|
|
|
|
> Analyze these files and produce GraphNode and GraphEdge objects.
|
|
> Project root: `$PROJECT_ROOT`
|
|
> Project: `<projectName>`
|
|
> Languages: `<languages>`
|
|
> Batch: `<batchIndex>/<totalBatches>`
|
|
> Skill directory (for bundled scripts): `<SKILL_DIR>`
|
|
> Write output to: `$PROJECT_ROOT/.understand-anything/intermediate/batch-<batchIndex>.json`
|
|
>
|
|
> Pre-resolved import data for this batch (use this for all import edge creation — do NOT re-resolve imports from source):
|
|
> ```json
|
|
> <batchImportData JSON>
|
|
> ```
|
|
>
|
|
> Files to analyze in this batch (every entry MUST be passed through to `batchFiles` with all four fields — `path`, `language`, `sizeLines`, `fileCategory`):
|
|
> 1. `<path>` (<sizeLines> lines, language: `<language>`, fileCategory: `<fileCategory>`)
|
|
> 2. `<path>` (<sizeLines> lines, language: `<language>`, fileCategory: `<fileCategory>`)
|
|
> ...
|
|
|
|
After ALL batches complete, report to the user: `Phase 2 complete. All <totalBatches> batches analyzed.`
|
|
|
|
Run the merge-and-normalize script bundled with this skill (located next to this SKILL.md file — use the skill directory path, not the project root):
|
|
```bash
|
|
python <SKILL_DIR>/merge-batch-graphs.py $PROJECT_ROOT
|
|
```
|
|
|
|
This script reads all `batch-*.json` files from `$PROJECT_ROOT/.understand-anything/intermediate/`, then in one pass:
|
|
- Combines all nodes and edges across batches
|
|
- Normalizes node IDs (strips double prefixes, project-name prefixes, adds missing prefixes)
|
|
- Normalizes complexity values (`low`→`simple`, `medium`→`moderate`, `high`→`complex`, etc.)
|
|
- Rewrites edge references to match corrected node IDs
|
|
- Deduplicates nodes by ID (keeps last occurrence) and edges by `(source, target, type)`
|
|
- Drops dangling edges referencing missing nodes
|
|
- Logs all corrections and dropped items to stderr
|
|
|
|
The merge script also runs a `tested_by` linker that canonicalizes test-coverage edges in two passes. **Pass 1** walks LLM-emitted `tested_by` edges and flips inverted ones in place (the LLM systematically emits `test → production` because it sees the import only when analyzing the test file); semantically broken edges (test↔test, prod↔prod, orphan endpoints) are dropped. **Pass 2** supplements with path-convention pairings (`X.ts` ↔ `X.test.ts`, JS/TS `__tests__/` and `<dir>/test/` walk-out, Python in-package `tests/`, Go `_test.go` sibling, Maven/Gradle `src/test/...` ↔ `src/main/...`, .NET `<svc>/tests/` ↔ `<svc>/src/...` and `<App>.Tests/` ↔ `<App>/`). Production nodes that end up sourcing any `tested_by` edge get a `"tested"` tag. All resulting edges run `production → test`.
|
|
|
|
Output: `$PROJECT_ROOT/.understand-anything/intermediate/assembled-graph.json`
|
|
|
|
Include the script's warnings in `$PHASE_WARNINGS` for the reviewer.
|
|
```
|
|
|
|
New text:
|
|
```
|
|
### Full analysis path
|
|
|
|
Load `.understand-anything/intermediate/batches.json` (produced by Phase 1.5). Iterate the `batches[]` array.
|
|
|
|
Report: `[Phase 2/7] Analyzing files — <totalFiles> files in <totalBatches> batches (up to 5 concurrent)...`
|
|
|
|
For each batch, dispatch a subagent using the `file-analyzer` agent definition (at `agents/file-analyzer.md`). Run up to **5 subagents concurrently**. Append the following additional context:
|
|
|
|
> **Additional context from main session:**
|
|
>
|
|
> Project: `<projectName>` — `<projectDescription>`
|
|
> Languages: `<languages from Phase 1>`
|
|
>
|
|
> $LANGUAGE_DIRECTIVE
|
|
|
|
Dispatch prompt template (fill in batch-specific values from `batches.json[i]`):
|
|
|
|
> Analyze these files and produce GraphNode and GraphEdge objects.
|
|
> Project root: `$PROJECT_ROOT`
|
|
> Project: `<projectName>`
|
|
> Languages: `<languages>`
|
|
> Batch: `<batchIndex>/<totalBatches>`
|
|
> Skill directory (for bundled scripts): `<SKILL_DIR>`
|
|
> Output: write to `$PROJECT_ROOT/.understand-anything/intermediate/batch-<batchIndex>.json` (single-file mode) OR `batch-<batchIndex>-part-<k>.json` (split mode, per Step B of your output protocol).
|
|
>
|
|
> Pre-resolved import data for this batch (use directly — do NOT re-resolve imports from source):
|
|
> ```json
|
|
> <batchImportData JSON from batches.json[i].batchImportData>
|
|
> ```
|
|
>
|
|
> Cross-batch neighbors with their exported symbols (confidence boost for cross-batch edges):
|
|
> ```json
|
|
> <neighborMap JSON from batches.json[i].neighborMap>
|
|
> ```
|
|
>
|
|
> Files to analyze in this batch (every entry MUST be passed through to `batchFiles` with all four fields — `path`, `language`, `sizeLines`, `fileCategory`):
|
|
> 1. `<path>` (<sizeLines> lines, language: `<language>`, fileCategory: `<fileCategory>`)
|
|
> 2. `<path>` (<sizeLines> lines, language: `<language>`, fileCategory: `<fileCategory>`)
|
|
> ...
|
|
|
|
After ALL batches complete, report to the user: `Phase 2 complete. All <totalBatches> batches analyzed.`
|
|
|
|
Run the merge-and-normalize script bundled with this skill:
|
|
```bash
|
|
python <SKILL_DIR>/merge-batch-graphs.py $PROJECT_ROOT
|
|
```
|
|
|
|
This script reads all `batch-*.json` files (including `batch-<i>-part-<k>.json` produced by file-analyzers that split their output) from `$PROJECT_ROOT/.understand-anything/intermediate/`, then in one pass:
|
|
- Combines all nodes and edges across batches
|
|
- Normalizes node IDs (strips double prefixes, project-name prefixes, adds missing prefixes)
|
|
- Normalizes complexity values (`low`→`simple`, `medium`→`moderate`, `high`→`complex`, etc.)
|
|
- Rewrites edge references to match corrected node IDs
|
|
- Deduplicates nodes by ID (keeps last occurrence) and edges by `(source, target, type)`
|
|
- Drops dangling edges referencing missing nodes
|
|
- Logs all corrections and dropped items to stderr
|
|
|
|
The merge script also runs a `tested_by` linker that canonicalizes test-coverage edges in two passes. **Pass 1** walks LLM-emitted `tested_by` edges and flips inverted ones in place; semantically broken edges (test↔test, prod↔prod, orphan endpoints) are dropped. **Pass 2** supplements with path-convention pairings. Production nodes that end up sourcing any `tested_by` edge get a `"tested"` tag. All resulting edges run `production → test`.
|
|
|
|
Output: `$PROJECT_ROOT/.understand-anything/intermediate/assembled-graph.json`
|
|
|
|
Include the script's warnings in `$PHASE_WARNINGS` for the reviewer.
|
|
```
|
|
|
|
- [ ] **Step 3: Replace Incremental update path**
|
|
|
|
Find:
|
|
```
|
|
### Incremental update path
|
|
|
|
Use the changed files list from Phase 0. Batch and dispatch file-analyzer subagents using the same process as above (20-30 files per batch, up to 5 concurrent, with batchImportData constructed from $IMPORT_MAP), but only for changed files.
|
|
|
|
After batches complete:
|
|
1. Remove old nodes whose `filePath` matches any changed file from the existing graph
|
|
2. Remove old edges whose `source` or `target` references a removed node
|
|
3. Write the pruned existing nodes/edges as `batch-existing.json` in the intermediate directory
|
|
4. Run the same merge script — it will combine `batch-existing.json` with the fresh `batch-*.json` files:
|
|
```bash
|
|
python <SKILL_DIR>/merge-batch-graphs.py $PROJECT_ROOT
|
|
```
|
|
```
|
|
|
|
Replace with:
|
|
```
|
|
### Incremental update path
|
|
|
|
Write the changed-files list (one path per line) to a temp file:
|
|
```bash
|
|
git diff <lastCommitHash>..HEAD --name-only > $PROJECT_ROOT/.understand-anything/tmp/changed-files.txt
|
|
```
|
|
|
|
Run compute-batches with `--changed-files`:
|
|
```bash
|
|
node <SKILL_DIR>/compute-batches.mjs $PROJECT_ROOT \
|
|
--changed-files=$PROJECT_ROOT/.understand-anything/tmp/changed-files.txt
|
|
```
|
|
|
|
This produces a `batches.json` that contains only batches with changed files, but neighborMap entries still reference unchanged files (with their full-graph batchIndex) so cross-batch edges remain emittable.
|
|
|
|
Then dispatch file-analyzer subagents per the same template as the full path.
|
|
|
|
After batches complete:
|
|
1. Remove old nodes whose `filePath` matches any changed file from the existing graph
|
|
2. Remove old edges whose `source` or `target` references a removed node
|
|
3. Write the pruned existing nodes/edges as `batch-existing.json` in the intermediate directory
|
|
4. Run the same merge script — it will combine `batch-existing.json` with the fresh `batch-*.json` files:
|
|
```bash
|
|
python <SKILL_DIR>/merge-batch-graphs.py $PROJECT_ROOT
|
|
```
|
|
```
|
|
|
|
- [ ] **Step 4: Verify**
|
|
|
|
```bash
|
|
grep -n "Phase 1.5 — BATCH" understand-anything-plugin/skills/understand/SKILL.md
|
|
grep -n "Load \`.understand-anything/intermediate/batches.json\`" understand-anything-plugin/skills/understand/SKILL.md
|
|
grep -n "compute-batches.mjs" understand-anything-plugin/skills/understand/SKILL.md
|
|
# Confirm old prose is gone (each command should print "OK: ... absent"):
|
|
if grep -q "groups of \*\*20-30 files each\*\*" understand-anything-plugin/skills/understand/SKILL.md; then echo "FAIL: old batching prose still present"; else echo "OK: old batching prose absent"; fi
|
|
if grep -qF "Dockerfile + docker-compose.yml + .dockerignore → same batch" understand-anything-plugin/skills/understand/SKILL.md; then echo "FAIL: old non-code prose still present"; else echo "OK: old non-code prose absent"; fi
|
|
```
|
|
|
|
Expected: first three exist (compute-batches.mjs should appear at least 3 times — Phase 1.5 + Incremental); both check commands print "OK: ... absent".
|
|
|
|
- [ ] **Step 5: Commit**
|
|
|
|
```bash
|
|
git add understand-anything-plugin/skills/understand/SKILL.md
|
|
git commit -m "feat(understand): introduce Phase 1.5 (compute-batches) and rewrite Phase 2 prose"
|
|
```
|
|
|
|
---
|
|
|
|
## Task 13: merge-batch-graphs.py — multi-part stderr report + missing-part warning
|
|
|
|
**Files:**
|
|
- Modify: `understand-anything-plugin/skills/understand/merge-batch-graphs.py`
|
|
|
|
- [ ] **Step 1: Replace the "Found N batch files:" report**
|
|
|
|
In `merge-batch-graphs.py`, find the block at approximately line 1026:
|
|
|
|
Old text:
|
|
```python
|
|
print(f"Found {len(batch_files)} batch files:", file=sys.stderr)
|
|
```
|
|
|
|
New text:
|
|
```python
|
|
# Group by logical batch index so the report distinguishes single-batch
|
|
# files from multi-part file-analyzer outputs.
|
|
from collections import defaultdict as _dd
|
|
by_batch = _dd(list)
|
|
for f in batch_files:
|
|
m = re.match(r"batch-(\d+)(?:-part-(\d+))?\.json", f.name)
|
|
if m:
|
|
by_batch[int(m.group(1))].append((f.name, int(m.group(2)) if m.group(2) else None))
|
|
|
|
logical_count = len(by_batch)
|
|
multi_part = sum(1 for entries in by_batch.values() if len(entries) > 1)
|
|
print(
|
|
f"Found {len(batch_files)} batch files "
|
|
f"({logical_count} logical batches, {multi_part} multi-part):",
|
|
file=sys.stderr,
|
|
)
|
|
|
|
# Missing-part detection: for any logical batch with parts (len > 1), the
|
|
# set of part numbers MUST be contiguous starting at 1. Gaps suggest a
|
|
# truncated write — emit a visible warning so the user can investigate.
|
|
for idx, entries in by_batch.items():
|
|
part_nums = [p for (_n, p) in entries if p is not None]
|
|
if not part_nums:
|
|
continue
|
|
present = set(part_nums)
|
|
expected = set(range(1, max(part_nums) + 1))
|
|
missing = sorted(expected - present)
|
|
if missing:
|
|
print(
|
|
f"Warning: merge: batch {idx} has parts {sorted(present)} but "
|
|
f"missing part {missing} — possible truncated write — "
|
|
f"affected nodes/edges may be lost",
|
|
file=sys.stderr,
|
|
)
|
|
```
|
|
|
|
- [ ] **Step 2: Verify the file still parses**
|
|
|
|
```bash
|
|
python3 -c "import ast; ast.parse(open('understand-anything-plugin/skills/understand/merge-batch-graphs.py').read())" && echo "OK"
|
|
```
|
|
|
|
Expected: prints `OK`.
|
|
|
|
- [ ] **Step 3: Smoke-test the existing test suite still passes**
|
|
|
|
```bash
|
|
cd understand-anything-plugin/skills/understand && python3 -m unittest test_merge_batch_graphs.py -v 2>&1 | tail -20
|
|
```
|
|
|
|
Expected: all existing tests pass (we haven't broken anything).
|
|
|
|
- [ ] **Step 4: Commit**
|
|
|
|
```bash
|
|
git add understand-anything-plugin/skills/understand/merge-batch-graphs.py
|
|
git commit -m "feat(merge-batch-graphs): multi-part aware stderr report + missing-part warning"
|
|
```
|
|
|
|
---
|
|
|
|
## Task 14: merge-batch-graphs.py — multi-part unit tests
|
|
|
|
**Files:**
|
|
- Modify: `understand-anything-plugin/skills/understand/test_merge_batch_graphs.py`
|
|
|
|
- [ ] **Step 1: Append TestMultiPart class**
|
|
|
|
Append to `understand-anything-plugin/skills/understand/test_merge_batch_graphs.py`:
|
|
|
|
```python
|
|
|
|
|
|
# ── Multi-part batch handling ─────────────────────────────────────────────
|
|
|
|
|
|
class TestMultiPart(unittest.TestCase):
|
|
"""End-to-end tests for batch-<i>-part-<k>.json input handling.
|
|
|
|
These tests invoke merge-batch-graphs.py as a subprocess in a temp
|
|
directory so we exercise the full path: glob → load → merge → write.
|
|
"""
|
|
|
|
def setUp(self) -> None:
|
|
import tempfile
|
|
self.tmp = Path(tempfile.mkdtemp(prefix="ua-mbg-"))
|
|
self.intermediate = self.tmp / ".understand-anything" / "intermediate"
|
|
self.intermediate.mkdir(parents=True, exist_ok=True)
|
|
|
|
def tearDown(self) -> None:
|
|
import shutil
|
|
shutil.rmtree(self.tmp, ignore_errors=True)
|
|
|
|
def _write_batch(self, name: str, nodes: list, edges: list) -> None:
|
|
import json as _j
|
|
(self.intermediate / name).write_text(
|
|
_j.dumps({"nodes": nodes, "edges": edges}),
|
|
encoding="utf-8",
|
|
)
|
|
|
|
def _run_merge(self) -> tuple[int, str, dict]:
|
|
import subprocess
|
|
import json as _j
|
|
result = subprocess.run(
|
|
["python3", str(_MODULE_PATH), str(self.tmp)],
|
|
capture_output=True, text=True,
|
|
)
|
|
out_path = self.intermediate / "assembled-graph.json"
|
|
assembled = _j.loads(out_path.read_text()) if out_path.exists() else {}
|
|
return result.returncode, result.stderr, assembled
|
|
|
|
def test_two_parts_of_one_logical_batch_merge(self) -> None:
|
|
self._write_batch("batch-1-part-1.json",
|
|
[_file_node("src/a.ts")],
|
|
[{"source": "file:src/a.ts", "target": "file:src/b.ts",
|
|
"type": "imports", "direction": "forward", "weight": 0.7}])
|
|
self._write_batch("batch-1-part-2.json",
|
|
[_file_node("src/b.ts")],
|
|
[])
|
|
rc, _stderr, assembled = self._run_merge()
|
|
self.assertEqual(rc, 0)
|
|
node_ids = {n["id"] for n in assembled["nodes"]}
|
|
self.assertEqual(node_ids, {"file:src/a.ts", "file:src/b.ts"})
|
|
# Cross-part edge survived
|
|
edge_keys = {(e["source"], e["target"], e["type"]) for e in assembled["edges"]}
|
|
self.assertIn(
|
|
("file:src/a.ts", "file:src/b.ts", "imports"), edge_keys)
|
|
|
|
def test_three_parts_of_one_logical_batch_merge(self) -> None:
|
|
for k, path in enumerate(["src/a.ts", "src/b.ts", "src/c.ts"], start=1):
|
|
self._write_batch(f"batch-1-part-{k}.json",
|
|
[_file_node(path)], [])
|
|
rc, _stderr, assembled = self._run_merge()
|
|
self.assertEqual(rc, 0)
|
|
node_ids = {n["id"] for n in assembled["nodes"]}
|
|
self.assertEqual(node_ids,
|
|
{"file:src/a.ts", "file:src/b.ts", "file:src/c.ts"})
|
|
|
|
def test_malformed_part_is_skipped_with_warning(self) -> None:
|
|
(self.intermediate / "batch-1-part-1.json").write_text(
|
|
"{ this is not valid json", encoding="utf-8")
|
|
self._write_batch("batch-1-part-2.json",
|
|
[_file_node("src/b.ts")], [])
|
|
rc, stderr, assembled = self._run_merge()
|
|
self.assertEqual(rc, 0)
|
|
# The skip warning is from existing load_batch logic
|
|
self.assertIn("skipping batch-1-part-1.json", stderr)
|
|
# part-2 content still made it in
|
|
node_ids = {n["id"] for n in assembled["nodes"]}
|
|
self.assertEqual(node_ids, {"file:src/b.ts"})
|
|
|
|
def test_mixed_single_and_multi_part(self) -> None:
|
|
self._write_batch("batch-1.json",
|
|
[_file_node("src/single.ts")], [])
|
|
self._write_batch("batch-2-part-1.json",
|
|
[_file_node("src/multi-a.ts")], [])
|
|
self._write_batch("batch-2-part-2.json",
|
|
[_file_node("src/multi-b.ts")], [])
|
|
self._write_batch("batch-3.json",
|
|
[_file_node("src/another-single.ts")], [])
|
|
rc, _stderr, assembled = self._run_merge()
|
|
self.assertEqual(rc, 0)
|
|
node_ids = {n["id"] for n in assembled["nodes"]}
|
|
self.assertEqual(node_ids, {
|
|
"file:src/single.ts", "file:src/multi-a.ts",
|
|
"file:src/multi-b.ts", "file:src/another-single.ts",
|
|
})
|
|
|
|
def test_missing_part_emits_warning(self) -> None:
|
|
# parts {2, 3} present, part-1 missing
|
|
self._write_batch("batch-1-part-2.json",
|
|
[_file_node("src/b.ts")], [])
|
|
self._write_batch("batch-1-part-3.json",
|
|
[_file_node("src/c.ts")], [])
|
|
rc, stderr, assembled = self._run_merge()
|
|
self.assertEqual(rc, 0)
|
|
self.assertRegex(stderr,
|
|
r"Warning: merge: batch 1 has parts \[2, 3\] but "
|
|
r"missing part \[1\] — possible truncated write")
|
|
|
|
def test_stderr_report_format(self) -> None:
|
|
self._write_batch("batch-1.json", [_file_node("src/a.ts")], [])
|
|
self._write_batch("batch-2-part-1.json", [_file_node("src/b.ts")], [])
|
|
self._write_batch("batch-2-part-2.json", [_file_node("src/c.ts")], [])
|
|
rc, stderr, _assembled = self._run_merge()
|
|
self.assertEqual(rc, 0)
|
|
# 3 files on disk, 2 logical batches, 1 multi-part
|
|
self.assertIn(
|
|
"Found 3 batch files (2 logical batches, 1 multi-part)", stderr)
|
|
```
|
|
|
|
- [ ] **Step 2: Run tests, expect PASS**
|
|
|
|
```bash
|
|
cd understand-anything-plugin/skills/understand && python3 -m unittest test_merge_batch_graphs.TestMultiPart -v
|
|
```
|
|
|
|
Expected: all 6 tests PASS.
|
|
|
|
- [ ] **Step 3: Run full test suite**
|
|
|
|
```bash
|
|
cd understand-anything-plugin/skills/understand && python3 -m unittest test_merge_batch_graphs -v 2>&1 | tail -5
|
|
```
|
|
|
|
Expected: all tests PASS (pre-existing + new).
|
|
|
|
- [ ] **Step 4: Commit**
|
|
|
|
```bash
|
|
git add understand-anything-plugin/skills/understand/test_merge_batch_graphs.py
|
|
git commit -m "test(merge-batch-graphs): TestMultiPart for batch-i-part-k handling"
|
|
```
|
|
|
|
---
|
|
|
|
## Task 15: Integration acceptance gate (manual)
|
|
|
|
This task is a **gated manual checklist** — execute interactively, mark each item, do not auto-merge without all green.
|
|
|
|
**Files:** none (this is a verification step)
|
|
|
|
- [ ] **Step 1: Install + build clean**
|
|
|
|
```bash
|
|
pnpm install
|
|
pnpm --filter @understand-anything/core build
|
|
pnpm --filter @understand-anything/skill build
|
|
```
|
|
|
|
Expected: all succeed.
|
|
|
|
- [ ] **Step 2: Sync local plugin into Claude Code's plugin cache for testing**
|
|
|
|
Per project's CLAUDE.md "Testing Local Plugin Changes" section. From repo root:
|
|
|
|
```bash
|
|
INSTALLED_VERSION=$(ls ~/.claude/plugins/cache/understand-anything/understand-anything/ | head -1)
|
|
echo "Installed version: $INSTALLED_VERSION"
|
|
rm -rf ~/.claude/plugins/cache/understand-anything/understand-anything/$INSTALLED_VERSION
|
|
cp -R ./understand-anything-plugin ~/.claude/plugins/cache/understand-anything/understand-anything/$INSTALLED_VERSION
|
|
```
|
|
|
|
- [ ] **Step 3: Start a fresh Claude Code session and run /understand --full on this repo**
|
|
|
|
In a fresh session in this repo's directory:
|
|
```
|
|
/understand --full
|
|
```
|
|
|
|
Expected during run:
|
|
- `[Phase 1.5/7] Computing semantic batches...` appears
|
|
- Phase 2 reports batch count from `batches.json` (not arbitrary count-based)
|
|
- At least one batch with > 60 nodes / > 120 edges triggers multi-part output (look in `.understand-anything/intermediate/` for any `batch-<i>-part-<k>.json` files)
|
|
|
|
Expected after run:
|
|
- `knowledge-graph.json` exists with reasonable node/edge counts compared to current main
|
|
- Dashboard renders normally
|
|
- Phase 7 final report's warnings section includes any compute-batches warnings IF they fired
|
|
|
|
- [ ] **Step 4: Sanity-check batches.json contents**
|
|
|
|
```bash
|
|
jq '.algorithm, .totalFiles, .totalBatches, (.batches | length), [.batches[].files | length]' \
|
|
.understand-anything/intermediate/batches.json 2>/dev/null \
|
|
|| echo "batches.json was cleaned up by Phase 7 — re-run with /understand --full and inspect before Phase 7 cleanup, or check git diff for the script's behavior."
|
|
```
|
|
|
|
Note: Phase 7 cleans up `.understand-anything/intermediate/` so this is best inspected mid-run, not after.
|
|
|
|
- [ ] **Step 5: Run on a small repo (5-10 files) to verify fallback batch path**
|
|
|
|
```bash
|
|
mkdir -p /tmp/ua-smoke-small/src
|
|
cd /tmp/ua-smoke-small
|
|
git init && git commit --allow-empty -m init
|
|
echo 'export const a = 1;' > src/a.ts
|
|
echo 'export const b = 2;' > src/b.ts
|
|
echo 'export const c = 3;' > src/c.ts
|
|
echo '{"name":"smoke","version":"0.0.1"}' > package.json
|
|
git add . && git commit -m setup
|
|
```
|
|
|
|
Then `cd /tmp/ua-smoke-small` in a Claude Code session and run `/understand --full`. Expected: completes without errors, single small batch.
|
|
|
|
- [ ] **Step 6: Run on a ~100-file repo to validate the bug fix**
|
|
|
|
If you have a ~100-file repo handy (or use the largest test fixture from the project), run `/understand --full` and confirm no "output limit" errors appear, even on Bedrock OPUS.
|
|
|
|
If you do not have a suitable repo, document this in the PR description as a deferred manual verification step.
|
|
|
|
- [ ] **Step 7: Stage results**
|
|
|
|
This task does not commit anything — it's a verification gate. If Step 3 reveals bugs, go back to the relevant task and fix; otherwise proceed to Task 16.
|
|
|
|
---
|
|
|
|
## Task 16: Version bump in 5 files
|
|
|
|
Per project CLAUDE.md: when pushing to remote, bump version in **all five** files listed.
|
|
|
|
**Files:**
|
|
- Modify: `understand-anything-plugin/package.json`
|
|
- Modify: `understand-anything-plugin/.claude-plugin/plugin.json`
|
|
- Modify: `.claude-plugin/plugin.json`
|
|
- Modify: `.cursor-plugin/plugin.json`
|
|
- Modify: `.copilot-plugin/plugin.json`
|
|
|
|
- [ ] **Step 1: Determine new version**
|
|
|
|
Current version is `2.7.4` (per `understand-anything-plugin/package.json` line 3). This PR adds a substantial feature (Phase 1.5 + multi-part output) — bump **minor**: `2.8.0`.
|
|
|
|
- [ ] **Step 2: Confirm all five files have the same current version**
|
|
|
|
```bash
|
|
grep -H '"version"' \
|
|
understand-anything-plugin/package.json \
|
|
understand-anything-plugin/.claude-plugin/plugin.json \
|
|
.claude-plugin/plugin.json \
|
|
.cursor-plugin/plugin.json \
|
|
.copilot-plugin/plugin.json
|
|
```
|
|
|
|
Expected: all five print `"version": "2.7.4"` (or whatever the current version is — use that as the baseline). If they diverge, stop and reconcile with the user.
|
|
|
|
- [ ] **Step 3: Bump each file from `2.7.4` to `2.8.0`**
|
|
|
|
Use the Edit tool on each of the five files. For each, replace `"version": "2.7.4"` with `"version": "2.8.0"`.
|
|
|
|
- [ ] **Step 4: Verify all five updated**
|
|
|
|
```bash
|
|
grep -H '"version"' \
|
|
understand-anything-plugin/package.json \
|
|
understand-anything-plugin/.claude-plugin/plugin.json \
|
|
.claude-plugin/plugin.json \
|
|
.cursor-plugin/plugin.json \
|
|
.copilot-plugin/plugin.json
|
|
```
|
|
|
|
Expected: all five print `"version": "2.8.0"`.
|
|
|
|
- [ ] **Step 5: Commit**
|
|
|
|
```bash
|
|
git add understand-anything-plugin/package.json \
|
|
understand-anything-plugin/.claude-plugin/plugin.json \
|
|
.claude-plugin/plugin.json \
|
|
.cursor-plugin/plugin.json \
|
|
.copilot-plugin/plugin.json
|
|
git commit -m "chore: bump version to 2.8.0"
|
|
```
|
|
|
|
- [ ] **Step 6: Push branch and open PR**
|
|
|
|
```bash
|
|
git push -u origin feat/semantic-batching-and-output-chunking
|
|
gh pr create --title "feat(understand): semantic batching (Phase 1.5) + output chunking — fixes #159" --body "$(cat <<'EOF'
|
|
## Summary
|
|
- Replace count-based file-analyzer batching with Louvain community detection on the import graph (new Phase 1.5, deterministic `compute-batches.mjs` script).
|
|
- file-analyzer self-splits its output into `batch-<i>-part-<k>.json` when above 60 nodes / 120 edges per part (Bedrock OPUS output cap safety).
|
|
- Cross-batch neighbors (with their exported symbols) passed to file-analyzer via `neighborMap` so semantic edges like `calls` and `inherits` can be confidently emitted across batches.
|
|
- Every fallback path emits a visible `Warning:` line that bubbles to `$PHASE_WARNINGS` in the Phase 7 final report.
|
|
- merge-batch-graphs.py multi-part-aware stderr report + missing-part warning; glob/sort-key already accepted multi-part naming so no algorithmic change required there.
|
|
|
|
Fixes #159.
|
|
|
|
Design: `docs/superpowers/specs/2026-05-24-semantic-batching-and-output-chunking-design.md`
|
|
Plan: `docs/superpowers/plans/2026-05-24-semantic-batching-and-output-chunking-impl.md`
|
|
|
|
## Test plan
|
|
- [x] `pnpm install` (graphology + graphology-communities-louvain install cleanly)
|
|
- [x] `pnpm --filter @understand-anything/core build`
|
|
- [x] `pnpm --filter @understand-anything/skill exec vitest run skills/understand/test_compute_batches.test.mjs` — all green
|
|
- [x] `cd understand-anything-plugin/skills/understand && python3 -m unittest test_merge_batch_graphs -v` — all green
|
|
- [x] Run `/understand --full` on this repo — `batches.json` generated; multi-part triggered on at least one batch; assembled-graph node/edge counts within expected range vs current main; dashboard renders normally; Phase 7 warnings section includes any compute-batches warnings.
|
|
- [ ] (Deferred / external) Run on a ~100-file repo on Bedrock OPUS — confirm no "output limit" errors. Document any deferred verification in PR comments.
|
|
EOF
|
|
)"
|
|
```
|
|
|
|
Expected: PR URL returned.
|
|
|
|
---
|
|
|
|
## Implementation done. Final check before merge:
|
|
|
|
- [ ] All 16 tasks above complete with checkboxes ticked.
|
|
- [ ] Branch builds + tests green: `pnpm install && pnpm --filter @understand-anything/core build && pnpm --filter @understand-anything/skill exec vitest run skills/understand/ && cd understand-anything-plugin/skills/understand && python3 -m unittest test_merge_batch_graphs test_compute_batches 2>&1 | tail -10` (note: `test_compute_batches` is the Vitest tree, this just sanity-checks Python; the Vitest run is separate)
|
|
- [ ] No `try { ... } catch { /* silent */ }` or `except: pass` patterns added (grep your diff).
|
|
- [ ] Spec ↔ plan ↔ code alignment spot-checked: every Failure-mode warning string in the spec is asserted by at least one unit test.
|