Files
2026-05-27 15:40:32 +08:00

321 lines
15 KiB
Markdown

# Auto-Update Knowledge Graph (Internal — Hook-Triggered)
Incrementally update the knowledge graph using deterministic structural fingerprinting to minimize token usage. This prompt is triggered automatically by the post-commit hook when `autoUpdate` is enabled. It is NOT a user-facing skill.
**Key principle:** Spend zero LLM tokens when changes are cosmetic (formatting, internal logic). Only invoke LLM agents when structural changes (new/removed functions, classes, imports, exports) are detected.
---
## Phase 0 — Pre-flight (Zero Token Cost)
1. Set `PROJECT_ROOT` to the current working directory.
2. Check that `$PROJECT_ROOT/.understand-anything/knowledge-graph.json` exists.
- If not: report "No existing knowledge graph found. Run `/understand` first to create one." and **STOP**.
3. Check that `$PROJECT_ROOT/.understand-anything/meta.json` exists and read `gitCommitHash`.
- If not: report "No analysis metadata found. Run `/understand` to create a baseline." and **STOP**.
4. Get current commit hash:
```bash
git rev-parse HEAD
```
5. If commit hashes match and `--force` is NOT in `$ARGUMENTS`: report "Knowledge graph is already up to date." and **STOP**.
6. Get changed files:
```bash
git diff <lastCommitHash>..HEAD --name-only
```
If no files changed: update `meta.json` with the new commit hash and **STOP**.
7. Filter to source files only (`.ts`, `.tsx`, `.js`, `.jsx`, `.py`, `.go`, `.rs`, `.java`, `.rb`, `.cpp`, `.c`, `.h`, `.cs`, `.swift`, `.kt`, `.php`).
If no source files changed: update `meta.json` with the new commit hash, report "Only non-source files changed. Metadata updated." and **STOP**.
8. Create intermediate directory:
```bash
mkdir -p $PROJECT_ROOT/.understand-anything/intermediate
```
9. **Apply `.understandignore` exclusions** (same semantics as `/understand` Step 2.5 in `agents/project-scanner.md`).
Without this step, files in user-excluded paths (migrations, vendored code, tests) are counted as structural changes and can spuriously escalate the action to `FULL_UPDATE` even when the real change set is tiny.
1. If neither `$PROJECT_ROOT/.understand-anything/.understandignore` nor `$PROJECT_ROOT/.understandignore` exists, the step 7 extension filter is sufficient — skip to Phase 1.
2. Write the step 7 file list to `$PROJECT_ROOT/.understand-anything/intermediate/changed-files-pre.json` as a JSON array of relative paths.
3. Resolve `$PLUGIN_ROOT`:
- Use `$CLAUDE_PLUGIN_ROOT` if set (Claude Code's hook context sets this).
- Otherwise try `$HOME/.understand-anything-plugin`.
- Validate the chosen candidate by checking `$candidate/packages/core/dist/ignore-filter.js` exists.
- If neither resolves: report "Cannot locate plugin install at `$CLAUDE_PLUGIN_ROOT` or `$HOME/.understand-anything-plugin`; auto-update aborted. Run `/understand` to re-baseline." and **STOP**. Do **not** silently skip — silent skip reproduces issue #153.
4. Write `$PROJECT_ROOT/.understand-anything/intermediate/ignore-filter.mjs`:
```javascript
import { readFileSync, writeFileSync } from 'node:fs';
import { pathToFileURL } from 'node:url';
import path from 'node:path';
const PROJECT_ROOT = process.cwd();
const PLUGIN_ROOT = process.argv[2];
const inputPath = process.argv[3];
const modUrl = pathToFileURL(
path.join(PLUGIN_ROOT, 'packages/core/dist/ignore-filter.js'),
).href;
const { createIgnoreFilter } = await import(modUrl);
const filter = createIgnoreFilter(PROJECT_ROOT);
const input = JSON.parse(readFileSync(inputPath, 'utf-8'));
const kept = input.filter((p) => !filter.isIgnored(p));
const removed = input.length - kept.length;
writeFileSync(
path.join(PROJECT_ROOT, '.understand-anything/intermediate/changed-files.json'),
JSON.stringify({ kept, removed, total: input.length }, null, 2),
);
console.log(`.understandignore: kept ${kept.length}/${input.length} (removed ${removed})`);
```
5. Run it:
```bash
node $PROJECT_ROOT/.understand-anything/intermediate/ignore-filter.mjs \
"$PLUGIN_ROOT" \
$PROJECT_ROOT/.understand-anything/intermediate/changed-files-pre.json
```
6. Read `$PROJECT_ROOT/.understand-anything/intermediate/changed-files.json`. Pass the `kept` array as the input file list for Phase 1's fingerprint-check script.
7. If `kept.length === 0`: update `meta.json` with the new commit hash, report "All changed source files are in ignored paths. Metadata updated." and **STOP**.
---
## Phase 1 — Structural Fingerprint Check (Zero LLM Tokens)
This phase runs a deterministic Node.js script that compares file structures against stored fingerprints. It costs **zero LLM tokens** — only the script execution cost.
1. Write and execute a Node.js script (`$PROJECT_ROOT/.understand-anything/intermediate/fingerprint-check.mjs`):
```javascript
// The script should:
// 1. Read fingerprints.json from .understand-anything/fingerprints.json
// 2. For each changed source file:
// a. Read the file content
// b. Compute SHA-256 content hash
// c. If content hash matches stored hash → NONE (skip)
// d. Extract structural elements via regex:
// - Functions: match patterns like `function NAME(`, `const NAME = (`, `export function NAME(`
// - Classes: match `class NAME`, `export class NAME`
// - Imports: match `import ... from '...'`, `import '...'`
// - Exports: match `export { ... }`, `export default`, `export function`, `export class`, `export const`
// e. Compare extracted elements against stored fingerprint
// f. Classify as NONE, COSMETIC, or STRUCTURAL
// 3. For new files (not in fingerprints.json): classify as STRUCTURAL
// 4. For deleted files (in fingerprints.json but not on disk): classify as STRUCTURAL
// 5. Determine overall decision:
// - All NONE/COSMETIC → action: "SKIP"
// - Some STRUCTURAL, ≤10 files, same directories → action: "PARTIAL_UPDATE"
// - New/deleted directories or >10 structural files → action: "ARCHITECTURE_UPDATE"
// - >30 structural files or >50% of graph → action: "FULL_UPDATE"
// 6. Write result to .understand-anything/intermediate/change-analysis.json
```
The output JSON should have this shape:
```json
{
"action": "SKIP | PARTIAL_UPDATE | ARCHITECTURE_UPDATE | FULL_UPDATE",
"filesToReanalyze": ["src/new-feature.ts"],
"rerunArchitecture": false,
"rerunTour": false,
"reason": "1 file has structural changes (new function added)",
"fileChanges": [
{ "filePath": "src/utils.ts", "changeLevel": "COSMETIC", "details": ["internal logic changed"] },
{ "filePath": "src/new-feature.ts", "changeLevel": "STRUCTURAL", "details": ["new function: handleRequest"] }
]
}
```
2. Read `.understand-anything/intermediate/change-analysis.json`.
3. **Decision gate:**
| Action | What to do |
|---|---|
| `SKIP` | Update `meta.json` with new commit hash. Report: "No structural changes detected. Graph metadata updated. Zero tokens spent." **STOP.** |
| `FULL_UPDATE` | Report: "Major structural changes detected (reason). Recommend running `/understand --full` for a complete rebuild." **STOP.** |
| `PARTIAL_UPDATE` | Proceed to Phase 2 with `filesToReanalyze` |
| `ARCHITECTURE_UPDATE` | Proceed to Phase 2 with `filesToReanalyze`, flag architecture re-run |
---
## Phase 2 — Targeted Re-Analysis (Minimal Token Cost)
Only re-analyze files with structural changes. This is the **only** phase that costs LLM tokens.
1. Read the existing knowledge graph from `$PROJECT_ROOT/.understand-anything/knowledge-graph.json`.
2. Batch the files from `filesToReanalyze` (from Phase 1). Use a single batch if ≤10 files, otherwise batch into groups of 5-10.
3. For each batch, dispatch a subagent using the `file-analyzer` agent definition (at `agents/file-analyzer.md`). Append:
> **Additional context from main session:**
>
> Project: `<projectName from existing graph>` — `<projectDescription>`
> Frameworks detected: `<frameworks from existing graph>`
> Languages: `<languages from existing graph>`
>
> **IMPORTANT:** This is an incremental update. Only the files listed below have structural changes. Analyze them thoroughly but do not invent nodes for files not in this batch.
Fill in batch-specific parameters:
> Analyze these source files and produce GraphNode and GraphEdge objects.
> Project root: `$PROJECT_ROOT`
> Project: `<projectName>`
> Languages: `<languages>`
> Batch index: `1`
> Write output to: `$PROJECT_ROOT/.understand-anything/intermediate/batch-1.json`
>
> All project files (for import resolution):
> `<file list from existing graph nodes>`
>
> Files to analyze in this batch:
> 1. `<path>` (`<sizeLines>` lines)
> ...
4. After batch(es) complete, read each `batch-<N>.json` and merge results.
5. **Merge with existing graph:**
- Remove old nodes whose `filePath` matches any file in `filesToReanalyze` or in the deleted files list
- Remove old edges whose `source` or `target` references a removed node
- Add new nodes and edges from the fresh analysis
- Deduplicate nodes by ID (keep latest), edges by `source + target + type`
- Remove any edge with dangling `source` or `target` references
---
## Phase 3 — Conditional Architecture/Tour + Save
### 3a. Architecture update (only if `rerunArchitecture === true`)
If the change analysis flagged `ARCHITECTURE_UPDATE`:
1. Dispatch a subagent using the `architecture-analyzer` agent definition (at `agents/architecture-analyzer.md`), passing the full merged node set and import edges. Include previous layer definitions for naming consistency:
> Previous layer definitions (for naming consistency):
> ```json
> [previous layers from existing graph]
> ```
> Maintain the same layer names and IDs where possible. Only add/remove layers if the file structure has materially changed.
2. After completion, read and normalize layers (same normalization as `/understand` Phase 4).
3. Optionally re-run tour builder if layers changed significantly.
### 3b. Lite layer update (if `rerunArchitecture === false`)
If only a partial update:
1. For **new files**: assign them to the most likely existing layer based on directory path matching
2. For **deleted files**: remove their IDs from layer `nodeIds` arrays
3. Remove any layer that ends up with zero nodeIds
### 3c. Lite validation
Perform lightweight validation (no graph-reviewer agent):
1. Remove any edge with dangling `source` or `target`
2. Remove any layer `nodeIds` entry that doesn't exist in the node set
3. Ensure every file node appears in exactly one layer (add to a catch-all layer if missing)
### 3d. Save
1. Write the final knowledge graph to `$PROJECT_ROOT/.understand-anything/knowledge-graph.json`.
2. Write updated metadata to `$PROJECT_ROOT/.understand-anything/meta.json`:
```json
{
"lastAnalyzedAt": "<ISO 8601 timestamp>",
"gitCommitHash": "<current commit hash>",
"version": "1.0.0",
"analyzedFiles": <total file count in graph>
}
```
3. **Update fingerprints (LOAD-PATCH-SAVE, not OVERWRITE).**
The most common failure mode here: writing only the freshly-computed batch entries to `fingerprints.json`, discarding every other file's fingerprint. The next auto-update then sees all those files as new (no stored fingerprint), classifies them as STRUCTURAL, and escalates to FULL_UPDATE permanently (issue #152). The script must LOAD ALL existing entries, PATCH only the re-analyzed ones, and SAVE the full dict back.
Write and execute a Node.js script in this exact ordering:
```javascript
import { readFileSync, writeFileSync, existsSync } from 'node:fs';
import { createHash } from 'node:crypto';
import path from 'node:path';
const fpPath = path.join(PROJECT_ROOT, '.understand-anything', 'fingerprints.json');
const existedAndNonEmpty = existsSync(fpPath) && readFileSync(fpPath, 'utf-8').trim().length > 0;
// 1. LOAD ALL existing entries (NEVER skip — preserves un-analyzed files)
const all = existedAndNonEmpty
? JSON.parse(readFileSync(fpPath, 'utf-8'))
: {};
const before = Object.keys(all).length;
// 2. PATCH (file still exists) or REMOVE (file deleted) for each re-analyzed path.
// `filesToReanalyze` may include paths that were deleted in this commit —
// handle both branches inline rather than expecting a separate deleted list.
for (const filePath of filesToReanalyze) {
const fullPath = path.join(PROJECT_ROOT, filePath);
if (!existsSync(fullPath)) {
delete all[filePath];
continue;
}
const content = readFileSync(fullPath, 'utf-8');
const contentHash = createHash('sha256').update(content).digest('hex');
// Extract functions, classes, imports, exports via the same regex as Phase 1.
all[filePath] = { contentHash, functions, classes, imports, exports };
}
// 3. GUARD against silent load failure: if fingerprints.json existed and was
// non-empty but `before` came out as 0, refuse to overwrite — something
// went wrong reading the file and writing now would clobber every entry.
if (existedAndNonEmpty && before === 0) {
throw new Error('fingerprints.json existed and was non-empty but loaded as {} — refusing to overwrite');
}
// 4. SAVE ALL entries back (full dict — not just the patched subset)
writeFileSync(fpPath, JSON.stringify(all, null, 2));
console.log(`Fingerprints: ${before} → ${Object.keys(all).length}`);
```
The `existedAndNonEmpty && before === 0` guard catches the silent-load-failure case before it corrupts the store. If the count shrinks from N to a small number that matches the batch size, the LOAD step was skipped — abort the write rather than persist the wrong dict.
4. Clean up intermediate files:
```bash
rm -rf $PROJECT_ROOT/.understand-anything/intermediate
```
5. Report a summary:
- Files checked: N (total changed)
- Structural changes found: N files
- Cosmetic-only changes: N files (skipped)
- Nodes updated: N
- Action taken: PARTIAL_UPDATE / ARCHITECTURE_UPDATE
- Path to output: `$PROJECT_ROOT/.understand-anything/knowledge-graph.json`
---
## Error Handling
- If the fingerprint check script fails: fall back to treating all changed files as STRUCTURAL (conservative approach).
- If `fingerprints.json` doesn't exist: treat all changed files as STRUCTURAL and regenerate fingerprints after the update.
- If a subagent dispatch fails: retry once. If it fails again, save partial results and report the error.
- ALWAYS save partial results — a partially updated graph is better than no update.
---
## Notes
- This skill reuses the same `file-analyzer` and `architecture-analyzer` agent definitions as `/understand` — no separate agent prompts needed.
- The fingerprint comparison in Phase 1 uses regex-based extraction (not tree-sitter) because it runs as a temporary Node.js script and doesn't need full AST accuracy — just signature-level detection.
- The authoritative fingerprints stored in `fingerprints.json` are generated by `/understand` Phase 7 using the core `fingerprint.ts` module (which uses tree-sitter for precise extraction).