Add under-anything knowledge dashboard
This commit is contained in:
File diff suppressed because it is too large
Load Diff
File diff suppressed because it is too large
Load Diff
File diff suppressed because it is too large
Load Diff
File diff suppressed because it is too large
Load Diff
File diff suppressed because it is too large
Load Diff
@@ -0,0 +1,560 @@
|
||||
# Multi-Platform Simple Implementation Plan
|
||||
|
||||
> **For Claude:** REQUIRED SUB-SKILL: Use superpowers:executing-plans to implement this plan task-by-task.
|
||||
|
||||
**Goal:** Make Understand-Anything skills work across Codex, OpenClaw, OpenCode, and Cursor — same files everywhere, no build step.
|
||||
|
||||
**Architecture:** Move 5 pipeline agents into `skills/understand/` as prompt templates. Create a reusable `knowledge-graph-guide` agent. Move per-platform config directories to repo root for auto-discovery. Add Cursor and Claude plugin descriptors.
|
||||
|
||||
**Tech Stack:** Markdown (SKILL.md, INSTALL.md), YAML frontmatter, JSON (plugin descriptors), Bash (symlink/clone commands in install docs).
|
||||
|
||||
**Design Doc:** `docs/plans/2026-03-18-multi-platform-simple-design.md`
|
||||
|
||||
---
|
||||
|
||||
### Task 1: Move pipeline agents into skills/understand/ as prompt templates
|
||||
|
||||
**Files:**
|
||||
- Move: `understand-anything-plugin/agents/project-scanner.md` → `understand-anything-plugin/skills/understand/project-scanner-prompt.md`
|
||||
- Move: `understand-anything-plugin/agents/file-analyzer.md` → `understand-anything-plugin/skills/understand/file-analyzer-prompt.md`
|
||||
- Move: `understand-anything-plugin/agents/architecture-analyzer.md` → `understand-anything-plugin/skills/understand/architecture-analyzer-prompt.md`
|
||||
- Move: `understand-anything-plugin/agents/tour-builder.md` → `understand-anything-plugin/skills/understand/tour-builder-prompt.md`
|
||||
- Move: `understand-anything-plugin/agents/graph-reviewer.md` → `understand-anything-plugin/skills/understand/graph-reviewer-prompt.md`
|
||||
|
||||
**Step 1: Copy each agent file to the new location**
|
||||
|
||||
For each of the 5 files, copy from `agents/` to `skills/understand/` with the new name.
|
||||
|
||||
**Step 2: Strip agent frontmatter from the prompt templates**
|
||||
|
||||
Each prompt template file should remove the agent-specific YAML frontmatter (`name`, `description`, `tools`, `model`). Replace it with a simple Markdown header describing the template's purpose.
|
||||
|
||||
For example, `project-scanner-prompt.md` changes from:
|
||||
|
||||
```markdown
|
||||
---
|
||||
name: project-scanner
|
||||
description: Scans a project directory...
|
||||
tools: Bash, Glob, Grep, Read, Write
|
||||
model: sonnet
|
||||
---
|
||||
|
||||
You are a meticulous project inventory specialist...
|
||||
```
|
||||
|
||||
To:
|
||||
|
||||
```markdown
|
||||
# Project Scanner — Prompt Template
|
||||
|
||||
> Used by `/understand` Phase 1. Dispatch as a subagent with this full content as the prompt.
|
||||
|
||||
You are a meticulous project inventory specialist...
|
||||
```
|
||||
|
||||
Apply this pattern to all 5 files:
|
||||
- `project-scanner-prompt.md` — "Used by `/understand` Phase 1"
|
||||
- `file-analyzer-prompt.md` — "Used by `/understand` Phase 2"
|
||||
- `architecture-analyzer-prompt.md` — "Used by `/understand` Phase 4"
|
||||
- `tour-builder-prompt.md` — "Used by `/understand` Phase 5"
|
||||
- `graph-reviewer-prompt.md` — "Used by `/understand` Phase 6"
|
||||
|
||||
Keep the rest of the file content (the body instructions) exactly as-is.
|
||||
|
||||
**Step 3: Delete the original agent files**
|
||||
|
||||
```bash
|
||||
cd understand-anything-plugin
|
||||
rm agents/project-scanner.md agents/file-analyzer.md agents/architecture-analyzer.md agents/tour-builder.md agents/graph-reviewer.md
|
||||
```
|
||||
|
||||
**Step 4: Verify the files exist in the new location**
|
||||
|
||||
```bash
|
||||
ls understand-anything-plugin/skills/understand/
|
||||
```
|
||||
|
||||
Expected: `SKILL.md`, plus the 5 `*-prompt.md` files.
|
||||
|
||||
**Step 5: Commit**
|
||||
|
||||
```bash
|
||||
git add -A understand-anything-plugin/agents/ understand-anything-plugin/skills/understand/
|
||||
git commit -m "refactor: move pipeline agents into skills/understand/ as prompt templates"
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### Task 2: Update SKILL.md dispatch references with context injection
|
||||
|
||||
**Files:**
|
||||
- Modify: `understand-anything-plugin/skills/understand/SKILL.md`
|
||||
|
||||
**Step 1: Read the current SKILL.md**
|
||||
|
||||
Read `understand-anything-plugin/skills/understand/SKILL.md` in full.
|
||||
|
||||
**Step 2: Update Phase 0 — add context collection**
|
||||
|
||||
After the decision logic table (line ~47), add a new section for collecting project context that will be injected into later phases:
|
||||
|
||||
```markdown
|
||||
7. **Collect project context for subagent injection:**
|
||||
- Read `README.md` (or `README.rst`, `readme.md`) from `$PROJECT_ROOT` if it exists. Store as `$README_CONTENT` (first 3000 characters).
|
||||
- Read the primary package manifest (`package.json`, `pyproject.toml`, `Cargo.toml`, `go.mod`, `pom.xml`) if it exists. Store as `$MANIFEST_CONTENT`.
|
||||
- Capture the top-level directory tree:
|
||||
```bash
|
||||
find $PROJECT_ROOT -maxdepth 2 -type f | head -100
|
||||
```
|
||||
Store as `$DIR_TREE`.
|
||||
- Detect the project entry point by checking for common patterns: `src/index.ts`, `src/main.ts`, `src/App.tsx`, `main.py`, `main.go`, `src/main.rs`, `index.js`. Store first match as `$ENTRY_POINT`.
|
||||
```
|
||||
|
||||
**Step 3: Update Phase 1 dispatch — inject README + manifest**
|
||||
|
||||
Replace the Phase 1 dispatch line:
|
||||
```
|
||||
Dispatch the **project-scanner** agent with this prompt:
|
||||
```
|
||||
|
||||
With:
|
||||
```markdown
|
||||
Dispatch a subagent using the prompt template at `./project-scanner-prompt.md`. Read the template file and pass the full content as the subagent's prompt, appending the following additional context:
|
||||
|
||||
> **Additional context from main session:**
|
||||
>
|
||||
> Project README (first 3000 chars):
|
||||
> ```
|
||||
> $README_CONTENT
|
||||
> ```
|
||||
>
|
||||
> Package manifest:
|
||||
> ```
|
||||
> $MANIFEST_CONTENT
|
||||
> ```
|
||||
>
|
||||
> Use this context to produce more accurate project name, description, and framework detection. The README and manifest are authoritative — prefer their information over heuristics.
|
||||
|
||||
Pass these parameters in the dispatch prompt:
|
||||
```
|
||||
|
||||
**Step 4: Update Phase 2 dispatch — inject scan results + framework context**
|
||||
|
||||
Replace the Phase 2 dispatch paragraph:
|
||||
```
|
||||
For each batch, dispatch a **file-analyzer** agent. Run up to **3 agents concurrently** using parallel dispatch. Each agent gets this prompt:
|
||||
```
|
||||
|
||||
With:
|
||||
```markdown
|
||||
For each batch, dispatch a subagent using the prompt template at `./file-analyzer-prompt.md`. Run up to **3 subagents concurrently** using parallel dispatch. Read the template once, then for each batch pass the full template content as the subagent's prompt, appending the following additional context:
|
||||
|
||||
> **Additional context from main session:**
|
||||
>
|
||||
> Project: `<projectName>` — `<projectDescription>`
|
||||
> Frameworks detected: `<frameworks from Phase 1>`
|
||||
> Languages: `<languages from Phase 1>`
|
||||
>
|
||||
> Framework-specific guidance:
|
||||
> - If React/Next.js: files in `app/` or `pages/` are routes, `components/` are UI, `lib/` or `utils/` are utilities
|
||||
> - If Express/Fastify: files in `routes/` are API endpoints, `middleware/` is middleware, `models/` or `db/` is data
|
||||
> - If Python Django: `views.py` are controllers, `models.py` is data, `urls.py` is routing, `templates/` is UI
|
||||
> - If Go: `cmd/` is entry points, `internal/` is private packages, `pkg/` is public packages
|
||||
>
|
||||
> Use this context to produce more accurate summaries and better classify file roles.
|
||||
|
||||
Fill in batch-specific parameters below and dispatch:
|
||||
```
|
||||
|
||||
**Step 5: Update Phase 4 dispatch — inject framework hints + directory tree**
|
||||
|
||||
Replace the Phase 4 dispatch line:
|
||||
```
|
||||
Dispatch the **architecture-analyzer** agent with this prompt:
|
||||
```
|
||||
|
||||
With:
|
||||
```markdown
|
||||
Dispatch a subagent using the prompt template at `./architecture-analyzer-prompt.md`. Read the template file and pass the full content as the subagent's prompt, appending the following additional context:
|
||||
|
||||
> **Additional context from main session:**
|
||||
>
|
||||
> Frameworks detected: `<frameworks from Phase 1>`
|
||||
>
|
||||
> Directory tree (top 2 levels):
|
||||
> ```
|
||||
> $DIR_TREE
|
||||
> ```
|
||||
>
|
||||
> Framework-specific layer hints:
|
||||
> - If React/Next.js: `app/` or `pages/` → UI Layer, `api/` → API Layer, `lib/` → Service Layer, `components/` → UI Layer
|
||||
> - If Express: `routes/` → API Layer, `controllers/` → Service Layer, `models/` → Data Layer, `middleware/` → Middleware Layer
|
||||
> - If Python Django: `views/` → API Layer, `models/` → Data Layer, `templates/` → UI Layer, `management/` → CLI Layer
|
||||
> - If Go: `cmd/` → Entry Points, `internal/` → Service Layer, `pkg/` → Shared Library, `api/` → API Layer
|
||||
>
|
||||
> Use the directory tree and framework hints to inform layer assignments. Directory structure is strong evidence for layer boundaries.
|
||||
|
||||
Pass these parameters in the dispatch prompt:
|
||||
```
|
||||
|
||||
Also add after the "For incremental updates" note:
|
||||
```markdown
|
||||
**Context for incremental updates:** When re-running architecture analysis, also inject the previous layer definitions:
|
||||
|
||||
> Previous layer definitions (for naming consistency):
|
||||
> ```json
|
||||
> [previous layers from existing graph]
|
||||
> ```
|
||||
>
|
||||
> Maintain the same layer names and IDs where possible. Only add/remove layers if the file structure has materially changed.
|
||||
```
|
||||
|
||||
**Step 6: Update Phase 5 dispatch — inject README + entry point**
|
||||
|
||||
Replace the Phase 5 dispatch line:
|
||||
```
|
||||
Dispatch the **tour-builder** agent with this prompt:
|
||||
```
|
||||
|
||||
With:
|
||||
```markdown
|
||||
Dispatch a subagent using the prompt template at `./tour-builder-prompt.md`. Read the template file and pass the full content as the subagent's prompt, appending the following additional context:
|
||||
|
||||
> **Additional context from main session:**
|
||||
>
|
||||
> Project README (first 3000 chars):
|
||||
> ```
|
||||
> $README_CONTENT
|
||||
> ```
|
||||
>
|
||||
> Project entry point: `$ENTRY_POINT`
|
||||
>
|
||||
> Use the README to align the tour narrative with the project's own documentation. Start the tour from the entry point if one was detected. The tour should tell the same story the README tells, but through the lens of actual code structure.
|
||||
|
||||
Pass these parameters in the dispatch prompt:
|
||||
```
|
||||
|
||||
**Step 7: Update Phase 6 dispatch — inject scan results for cross-validation**
|
||||
|
||||
Replace the Phase 6 dispatch line:
|
||||
```
|
||||
2. Dispatch the **graph-reviewer** agent with this prompt:
|
||||
```
|
||||
|
||||
With:
|
||||
```markdown
|
||||
2. Dispatch a subagent using the prompt template at `./graph-reviewer-prompt.md`. Read the template file and pass the full content as the subagent's prompt, appending the following additional context:
|
||||
|
||||
> **Additional context from main session:**
|
||||
>
|
||||
> Phase 1 scan results (file inventory):
|
||||
> ```json
|
||||
> [list of {path, sizeLines} from scan-result.json]
|
||||
> ```
|
||||
>
|
||||
> Phase warnings/errors accumulated during analysis:
|
||||
> - [list any batch failures, skipped files, or warnings from Phases 2-5]
|
||||
>
|
||||
> Cross-validate: every file in the scan inventory should have a corresponding `file:` node in the graph. Flag any missing files. Also flag any graph nodes whose `filePath` doesn't appear in the scan inventory.
|
||||
|
||||
Pass these parameters in the dispatch prompt:
|
||||
```
|
||||
|
||||
**Step 8: Update Error Handling section**
|
||||
|
||||
Change:
|
||||
```
|
||||
- If any agent dispatch fails, retry **once** with the same prompt plus additional context about the failure.
|
||||
```
|
||||
|
||||
To:
|
||||
```
|
||||
- If any subagent dispatch fails, retry **once** with the same prompt plus additional context about the failure.
|
||||
- Track all warnings and errors from each phase in a `$PHASE_WARNINGS` list. Pass this list to the graph-reviewer in Phase 6 for comprehensive validation.
|
||||
```
|
||||
|
||||
**Step 9: Verify no references to named agent dispatch remain**
|
||||
|
||||
Search for "Dispatch the **" in the file — should find 0 results.
|
||||
|
||||
**Step 10: Commit**
|
||||
|
||||
```bash
|
||||
git add understand-anything-plugin/skills/understand/SKILL.md
|
||||
git commit -m "refactor: update SKILL.md to dispatch subagents with context injection"
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### Task 3: Create knowledge-graph-guide agent
|
||||
|
||||
**Files:**
|
||||
- Create: `understand-anything-plugin/agents/knowledge-graph-guide.md`
|
||||
|
||||
**Step 1: Write the agent definition**
|
||||
|
||||
Create `understand-anything-plugin/agents/knowledge-graph-guide.md`:
|
||||
|
||||
```markdown
|
||||
---
|
||||
name: knowledge-graph-guide
|
||||
description: |
|
||||
Use this agent when users need help understanding, querying, or working
|
||||
with an Understand-Anything knowledge graph. Guides users through graph
|
||||
structure, node/edge relationships, layer architecture, tours, and
|
||||
dashboard usage.
|
||||
model: inherit
|
||||
---
|
||||
|
||||
You are an expert on Understand-Anything knowledge graphs. You help users navigate, query, and understand the `knowledge-graph.json` files produced by the `/understand` skill.
|
||||
|
||||
## What You Know
|
||||
|
||||
### Graph Location
|
||||
|
||||
The knowledge graph lives at `<project-root>/.understand-anything/knowledge-graph.json`. Metadata is at `<project-root>/.understand-anything/meta.json`.
|
||||
|
||||
### Graph Structure
|
||||
|
||||
The JSON has this top-level shape:
|
||||
|
||||
```json
|
||||
{
|
||||
"version": "1.0.0",
|
||||
"project": { "name", "languages", "frameworks", "description", "analyzedAt", "gitCommitHash" },
|
||||
"nodes": [...],
|
||||
"edges": [...],
|
||||
"layers": [...],
|
||||
"tour": [...]
|
||||
}
|
||||
```
|
||||
|
||||
### Node Types (5)
|
||||
|
||||
| Type | ID Convention | Description |
|
||||
|---|---|---|
|
||||
| `file` | `file:<relative-path>` | Source file |
|
||||
| `function` | `func:<relative-path>:<name>` | Function or method |
|
||||
| `class` | `class:<relative-path>:<name>` | Class, interface, or type |
|
||||
| `module` | `module:<name>` | Logical module or package |
|
||||
| `concept` | `concept:<name>` | Abstract concept or pattern |
|
||||
|
||||
### Edge Types (18)
|
||||
|
||||
| Category | Types |
|
||||
|---|---|
|
||||
| Structural | `imports`, `exports`, `contains`, `inherits`, `implements` |
|
||||
| Behavioral | `calls`, `subscribes`, `publishes`, `middleware` |
|
||||
| Data flow | `reads_from`, `writes_to`, `transforms`, `validates` |
|
||||
| Dependencies | `depends_on`, `tested_by`, `configures` |
|
||||
| Semantic | `related`, `similar_to` |
|
||||
|
||||
### Layers
|
||||
|
||||
Layers represent architectural groupings (e.g., API, Service, Data, UI). Each layer has an `id`, `name`, `description`, and `nodeIds` array.
|
||||
|
||||
### Tours
|
||||
|
||||
Tours are guided walkthroughs with sequential steps. Each step has a `title`, `description`, `nodeId` (focus node), and optional `highlightEdges`.
|
||||
|
||||
## How to Help Users
|
||||
|
||||
1. **Finding things**: Help users locate nodes by file path, function name, or concept. Use `jq` or grep on the JSON.
|
||||
2. **Understanding relationships**: Trace edges between nodes to explain dependencies, call chains, and data flow.
|
||||
3. **Architecture overview**: Summarize layers and their contents.
|
||||
4. **Onboarding**: Walk through the tour steps to explain the codebase.
|
||||
5. **Dashboard**: Guide users to run `/understand-dashboard` to visualize the graph interactively.
|
||||
6. **Querying**: Help users write `jq` commands to extract specific information from the graph JSON.
|
||||
```
|
||||
|
||||
**Step 2: Commit**
|
||||
|
||||
```bash
|
||||
git add understand-anything-plugin/agents/knowledge-graph-guide.md
|
||||
git commit -m "feat: add knowledge-graph-guide agent for graph navigation and querying"
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### Task 4: Move platform INSTALL.md files to repo root
|
||||
|
||||
**Files:**
|
||||
- Move: `understand-anything-plugin/.codex/INSTALL.md` → `.codex/INSTALL.md`
|
||||
- Move: `understand-anything-plugin/.opencode/INSTALL.md` → `.opencode/INSTALL.md`
|
||||
- Move: `understand-anything-plugin/.openclaw/INSTALL.md` → `.openclaw/INSTALL.md`
|
||||
- Delete: `understand-anything-plugin/.cursor/INSTALL.md` (replaced by `.cursor-plugin/plugin.json`)
|
||||
|
||||
**Step 1: Move the three platform directories to root**
|
||||
|
||||
```bash
|
||||
cd /Users/yuxianglin/Desktop/opensource/Understand-Anything
|
||||
git mv understand-anything-plugin/.codex ./.codex
|
||||
git mv understand-anything-plugin/.opencode ./.opencode
|
||||
git mv understand-anything-plugin/.openclaw ./.openclaw
|
||||
```
|
||||
|
||||
**Step 2: Delete .cursor/ (replaced by .cursor-plugin/ in Task 5)**
|
||||
|
||||
```bash
|
||||
git rm -r understand-anything-plugin/.cursor/
|
||||
```
|
||||
|
||||
**Step 3: Verify symlink paths are correct**
|
||||
|
||||
Read each INSTALL.md. The symlink paths should reference `understand-anything-plugin/skills` — this is still correct since the skills directory remains inside the plugin wrapper.
|
||||
|
||||
**Step 4: Commit**
|
||||
|
||||
```bash
|
||||
git add -A
|
||||
git commit -m "refactor: move platform config directories to repo root for discovery"
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### Task 5: Add plugin descriptors
|
||||
|
||||
**Files:**
|
||||
- Create: `.cursor-plugin/plugin.json`
|
||||
- Create: `.claude-plugin/plugin.json`
|
||||
|
||||
**Step 1: Create `.cursor-plugin/plugin.json`**
|
||||
|
||||
```json
|
||||
{
|
||||
"name": "understand-anything",
|
||||
"displayName": "Understand Anything",
|
||||
"description": "AI-powered codebase understanding — analyze, visualize, and explain any project",
|
||||
"version": "1.0.5",
|
||||
"author": { "name": "Lum1104" },
|
||||
"homepage": "https://github.com/Lum1104/Understand-Anything",
|
||||
"repository": "https://github.com/Lum1104/Understand-Anything",
|
||||
"license": "MIT",
|
||||
"keywords": ["codebase-analysis", "knowledge-graph", "architecture", "onboarding", "dashboard"],
|
||||
"skills": "./understand-anything-plugin/skills/",
|
||||
"agents": "./understand-anything-plugin/agents/"
|
||||
}
|
||||
```
|
||||
|
||||
Note: paths point into `understand-anything-plugin/` since the source stays nested.
|
||||
|
||||
**Step 2: Create `.claude-plugin/plugin.json`**
|
||||
|
||||
```json
|
||||
{
|
||||
"name": "understand-anything",
|
||||
"description": "AI-powered codebase understanding — analyze, visualize, and explain any project",
|
||||
"version": "1.0.5",
|
||||
"author": { "name": "Lum1104" },
|
||||
"homepage": "https://github.com/Lum1104/Understand-Anything",
|
||||
"repository": "https://github.com/Lum1104/Understand-Anything",
|
||||
"license": "MIT",
|
||||
"keywords": ["codebase-analysis", "knowledge-graph", "architecture", "onboarding", "dashboard"]
|
||||
}
|
||||
```
|
||||
|
||||
**Step 3: Commit**
|
||||
|
||||
```bash
|
||||
git add .cursor-plugin/ .claude-plugin/plugin.json
|
||||
git commit -m "feat: add Cursor and Claude plugin descriptors for auto-discovery"
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### Task 6: Update README with corrected multi-platform URLs
|
||||
|
||||
**Files:**
|
||||
- Modify: `README.md`
|
||||
|
||||
**Step 1: Read current README**
|
||||
|
||||
Read `README.md` in full.
|
||||
|
||||
**Step 2: Update raw GitHub URLs for INSTALL.md files**
|
||||
|
||||
The INSTALL.md files moved from `understand-anything-plugin/.codex/INSTALL.md` to `.codex/INSTALL.md`. Update all raw GitHub URLs:
|
||||
|
||||
```
|
||||
OLD: .../refs/heads/main/understand-anything-plugin/.codex/INSTALL.md
|
||||
NEW: .../refs/heads/main/.codex/INSTALL.md
|
||||
|
||||
OLD: .../refs/heads/main/understand-anything-plugin/.openclaw/INSTALL.md
|
||||
NEW: .../refs/heads/main/.openclaw/INSTALL.md
|
||||
|
||||
OLD: .../refs/heads/main/understand-anything-plugin/.opencode/INSTALL.md
|
||||
NEW: .../refs/heads/main/.opencode/INSTALL.md
|
||||
```
|
||||
|
||||
**Step 3: Replace Cursor section**
|
||||
|
||||
Replace the Cursor AI-driven install section with:
|
||||
|
||||
```markdown
|
||||
### Cursor
|
||||
|
||||
Cursor auto-discovers the plugin via `.cursor-plugin/plugin.json` when this repo is cloned. No manual installation needed — just clone and open in Cursor.
|
||||
```
|
||||
|
||||
**Step 4: Commit**
|
||||
|
||||
```bash
|
||||
git add README.md
|
||||
git commit -m "docs: update multi-platform URLs after moving configs to root"
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### Task 7: Verify everything works
|
||||
|
||||
**Step 1: Check platform configs at root**
|
||||
|
||||
```bash
|
||||
ls .codex/INSTALL.md .opencode/INSTALL.md .openclaw/INSTALL.md
|
||||
ls .cursor-plugin/plugin.json .claude-plugin/plugin.json
|
||||
```
|
||||
|
||||
All should exist.
|
||||
|
||||
**Step 2: Verify plugin source is intact**
|
||||
|
||||
```bash
|
||||
ls understand-anything-plugin/skills/understand/
|
||||
ls understand-anything-plugin/agents/
|
||||
ls understand-anything-plugin/packages/
|
||||
```
|
||||
|
||||
Skills, agents, and packages should all still exist inside the wrapper.
|
||||
|
||||
**Step 3: Verify no platform configs remain inside the wrapper**
|
||||
|
||||
```bash
|
||||
ls understand-anything-plugin/.codex/ 2>/dev/null # should fail
|
||||
ls understand-anything-plugin/.cursor/ 2>/dev/null # should fail
|
||||
ls understand-anything-plugin/.opencode/ 2>/dev/null # should fail
|
||||
ls understand-anything-plugin/.openclaw/ 2>/dev/null # should fail
|
||||
```
|
||||
|
||||
**Step 4: Run tests**
|
||||
|
||||
```bash
|
||||
pnpm --filter @understand-anything/core build && pnpm --filter @understand-anything/core test
|
||||
```
|
||||
|
||||
All tests should pass — only config files moved, not source code.
|
||||
|
||||
**Step 5: Verify marketplace.json is unchanged**
|
||||
|
||||
```bash
|
||||
cat .claude-plugin/marketplace.json | grep source
|
||||
```
|
||||
|
||||
Expected: `"source": "./understand-anything-plugin"` — unchanged, still correct.
|
||||
|
||||
**Step 6: Verify no stale raw GitHub URLs**
|
||||
|
||||
```bash
|
||||
grep -r "understand-anything-plugin/\." README.md
|
||||
```
|
||||
|
||||
Expected: 0 results (no URLs pointing to old nested platform config locations).
|
||||
File diff suppressed because it is too large
Load Diff
File diff suppressed because it is too large
Load Diff
@@ -0,0 +1,149 @@
|
||||
# Design: Dashboard Robustness — Permissive Graph Loading
|
||||
|
||||
## Problem
|
||||
|
||||
When the LLM agent produces a knowledge-graph.json that deviates from the strict Zod schema, the dashboard shows a blank screen with cryptic Zod error paths. Users don't know whether it's a system bug or an agent generation issue, and their only recourse is a full re-run of `/understand`.
|
||||
|
||||
## Goals
|
||||
|
||||
1. **Maximize what the user can see** — load valid nodes/edges even if some are broken
|
||||
2. **Clearly communicate generation issues** — amber warnings (not red errors) with copy-paste-friendly messages
|
||||
3. **Empower targeted fixes** — users can copy the issue report and ask their agent to fix specific problems instead of a full re-run
|
||||
|
||||
## Design
|
||||
|
||||
### Three-Layer Robustness Pipeline
|
||||
|
||||
```
|
||||
Raw JSON → Sanitize (Tier 1) → Normalize + Auto-fix (Tier 2) → Validate per-item (Tier 3) → Fatal check (Tier 4) → Dashboard
|
||||
```
|
||||
|
||||
### Tier 1: Sanitize Silently
|
||||
|
||||
Common LLM quirks that are pure noise — fix without reporting.
|
||||
|
||||
| Issue | Fix |
|
||||
|-------|-----|
|
||||
| `null` on optional fields (`filePath`, `lineRange`, `description`, `languageNotes`) | Convert to `undefined` |
|
||||
| Mixed-case enum strings (`"Forward"`, `"SIMPLE"`) | Lowercase before matching |
|
||||
|
||||
### Tier 2: Auto-fix With Info Notice
|
||||
|
||||
Recoverable issues — apply sensible defaults, track as `auto-corrected` issues.
|
||||
|
||||
| Issue | Default | Notes |
|
||||
|-------|---------|-------|
|
||||
| Missing `complexity` | `"moderate"` | Most common LLM omission |
|
||||
| Missing `tags` | `[]` | Empty is valid |
|
||||
| Missing `weight` | `0.5` | Middle of 0–1 range |
|
||||
| `weight` as string | Coerce to number | e.g., `"0.8"` → `0.8` |
|
||||
| Missing `direction` | `"forward"` | Safe default |
|
||||
| Missing `summary` | Use node `name` | Better than empty |
|
||||
| `tour: null` / `layers: null` | `[]` | Null vs empty array |
|
||||
| Complexity aliases | `low/easy→simple`, `medium/intermediate→moderate`, `high/hard→complex` | |
|
||||
| Direction aliases | `to/outbound→forward`, `from/inbound→backward`, `both→bidirectional` | |
|
||||
| Existing node/edge type aliases | Already handled by `normalizeGraph` | No change needed |
|
||||
| Missing node `type` | `"file"` | Safe fallback |
|
||||
| Missing edge `type` | `"depends_on"` | Generic fallback |
|
||||
|
||||
### Tier 3: Drop With Warning
|
||||
|
||||
Can't safely guess — remove the item, track as `dropped` issue.
|
||||
|
||||
| Issue | Action |
|
||||
|-------|--------|
|
||||
| Edge references non-existent node ID | Drop edge |
|
||||
| Node missing `id` | Drop node |
|
||||
| Node missing `name` | Drop node |
|
||||
| Edge missing `source` or `target` | Drop edge |
|
||||
| Unrecognizable `type` value (not in canonical or alias list) | Drop item |
|
||||
| `weight` not coercible to number | Drop edge |
|
||||
|
||||
### Tier 4: Fatal
|
||||
|
||||
Graph is unsalvageable — show red error banner.
|
||||
|
||||
| Condition | Message |
|
||||
|-----------|---------|
|
||||
| 0 valid nodes after filtering | "No valid nodes found in knowledge graph" |
|
||||
| Missing `project` metadata entirely | "Missing project metadata" |
|
||||
| Input is not an object / not valid JSON | "Invalid input format" |
|
||||
|
||||
### Return Type
|
||||
|
||||
```typescript
|
||||
interface GraphIssue {
|
||||
level: 'auto-corrected' | 'dropped' | 'fatal';
|
||||
category: string; // e.g., "missing-field", "invalid-reference", "type-coercion"
|
||||
message: string; // human-readable, copy-paste friendly
|
||||
path?: string; // e.g., "nodes[3].complexity"
|
||||
}
|
||||
|
||||
interface ValidationResult {
|
||||
success: boolean;
|
||||
data?: KnowledgeGraph;
|
||||
issues: GraphIssue[];
|
||||
fatal?: string;
|
||||
}
|
||||
```
|
||||
|
||||
### Dashboard UI: WarningBanner Component
|
||||
|
||||
**New component** in `packages/dashboard/src/components/WarningBanner.tsx`.
|
||||
|
||||
**Visual design:**
|
||||
- **Amber/gold theme** — `bg-amber-900/20`, `border-amber-700`, `text-amber-200`
|
||||
- Matches dashboard's gold accent aesthetic; signals "generation quality issue" not "system crash"
|
||||
- **Collapsed by default** — summary line: "Knowledge graph loaded with 5 auto-corrections and 2 dropped items"
|
||||
- **Expandable** — click to reveal categorized issue list
|
||||
- **Copy button** — one-click copies the full issue report as a pre-formatted message
|
||||
- **Actionable footer** — tells users to copy issues and ask their agent to fix them
|
||||
|
||||
**Copy-paste output format:**
|
||||
```
|
||||
The following issues were found in your knowledge-graph.json.
|
||||
These are LLM generation errors — not a system bug.
|
||||
You can ask your agent to fix these specific issues in the knowledge-graph.json file:
|
||||
|
||||
[Auto-corrected] nodes[3] ("AuthService"): missing "complexity" — defaulted to "moderate"
|
||||
[Auto-corrected] nodes[7] ("utils.ts"): missing "tags" — defaulted to []
|
||||
[Auto-corrected] edges[12]: weight was string "0.8" — coerced to number
|
||||
[Dropped] edges[5]: target "file:src/nonexistent.ts" does not exist in nodes
|
||||
[Dropped] nodes[14]: missing required "id" field — cannot recover
|
||||
```
|
||||
|
||||
**Fatal errors** stay red (`bg-red-900/30`) with message: "Knowledge graph is unsalvageable: [reason]. Please re-run `/understand` to generate a new one."
|
||||
|
||||
**Existing red error banner** for network/JSON-parse errors stays as-is (those ARE system/infra issues).
|
||||
|
||||
### App.tsx Changes
|
||||
|
||||
- On `result.success === true` with `result.issues.length > 0`: show `WarningBanner` with issues, load graph normally
|
||||
- On `result.fatal`: show existing red banner with fatal message
|
||||
- `console.warn` for auto-corrected items, `console.error` for dropped items
|
||||
|
||||
### Test Coverage
|
||||
|
||||
All in `packages/core/src/__tests__/schema.test.ts`:
|
||||
|
||||
- **Tier 1:** `null` optional fields silently become `undefined`
|
||||
- **Tier 2:** Missing `complexity`/`tags`/`weight`/`direction`/`summary` get defaults; issues tracked
|
||||
- **Tier 2:** String `weight` coerced; complexity/direction aliases mapped
|
||||
- **Tier 3:** Dangling edge references dropped; nodes missing `id` dropped; issues recorded
|
||||
- **Tier 4:** Empty graph after filtering → fatal; missing `project` → fatal
|
||||
- **Integration:** Graph with mixed good/bad nodes → loads with correct node count + correct issues list
|
||||
|
||||
### Files Changed
|
||||
|
||||
| File | Change |
|
||||
|------|--------|
|
||||
| `packages/core/src/schema.ts` | Sanitize, expanded normalize, permissive validate, new types |
|
||||
| `packages/dashboard/src/components/WarningBanner.tsx` | New component |
|
||||
| `packages/dashboard/src/App.tsx` | Wire issues to WarningBanner |
|
||||
| `packages/core/src/__tests__/schema.test.ts` | Tests for all tiers |
|
||||
|
||||
### Files NOT Changed
|
||||
|
||||
- Agent prompts (can be tightened later as a separate effort)
|
||||
- GraphView / store logic (they already handle valid `KnowledgeGraph` objects)
|
||||
- Existing node/edge type alias maps (preserved, extended around)
|
||||
File diff suppressed because it is too large
Load Diff
@@ -0,0 +1,971 @@
|
||||
# Token Reduction Implementation Plan
|
||||
|
||||
> **For Claude:** REQUIRED SUB-SKILL: Use superpowers:executing-plans to implement this plan task-by-task.
|
||||
|
||||
**Goal:** Reduce `/understand` token cost by ~85% on large codebases through import pre-resolution, batch consolidation, addendum removal, payload slimming, and gating the LLM reviewer.
|
||||
|
||||
**Architecture:** Five changes (C5 → C4 → C3 → C1+C2) applied in rollout order — lowest risk first. All changes are to prompt/skill markdown files in `understand-anything-plugin/skills/understand/`. No TypeScript source changes required.
|
||||
|
||||
**Tech Stack:** Markdown skill files, Node.js inline scripts embedded in SKILL.md, knowledge-graph JSON pipeline.
|
||||
|
||||
**Design doc:** `docs/plans/2026-03-27-token-reduction-design.md`
|
||||
|
||||
---
|
||||
|
||||
## Task 1: C5 — Gate graph-reviewer behind `--review` flag
|
||||
|
||||
Replaces the always-on LLM graph-reviewer subagent with a deterministic inline validation script. The LLM reviewer only runs when `--review` is in `$ARGUMENTS`. Saves ~58,500 tokens per default run.
|
||||
|
||||
**Files:**
|
||||
- Modify: `understand-anything-plugin/skills/understand/SKILL.md` (Phase 6, lines 330–362)
|
||||
|
||||
### Step 1: Open SKILL.md and locate Phase 6
|
||||
|
||||
Read the file and find "## Phase 6 — REVIEW" (line 297). Identify steps 3–6 (lines 330–362) which currently always dispatch the LLM graph-reviewer subagent.
|
||||
|
||||
### Step 2: Replace Phase 6 steps 3–6 with conditional reviewer logic
|
||||
|
||||
Replace lines 330–362 (from "3. Dispatch a subagent using the prompt template" through "6. **If `approved: true`:** Proceed to Phase 7.") with:
|
||||
|
||||
```markdown
|
||||
3. **Check `$ARGUMENTS` for `--review` flag.** Then run the appropriate validation path:
|
||||
|
||||
---
|
||||
|
||||
#### Default path (no `--review`): inline deterministic validation
|
||||
|
||||
Write the following Node.js script to `$PROJECT_ROOT/.understand-anything/tmp/ua-inline-validate.js`:
|
||||
|
||||
```javascript
|
||||
#!/usr/bin/env node
|
||||
const fs = require('fs');
|
||||
const graphPath = process.argv[2];
|
||||
const outputPath = process.argv[3];
|
||||
try {
|
||||
const graph = JSON.parse(fs.readFileSync(graphPath, 'utf8'));
|
||||
const issues = [], warnings = [];
|
||||
const nodeIds = new Set();
|
||||
const seen = new Map();
|
||||
graph.nodes.forEach((n, i) => {
|
||||
if (!n.id) { issues.push(`Node[${i}] missing id`); return; }
|
||||
if (!n.type) issues.push(`Node[${i}] '${n.id}' missing type`);
|
||||
if (!n.name) issues.push(`Node[${i}] '${n.id}' missing name`);
|
||||
if (!n.summary) issues.push(`Node[${i}] '${n.id}' missing summary`);
|
||||
if (!n.tags || !n.tags.length) issues.push(`Node[${i}] '${n.id}' missing tags`);
|
||||
if (seen.has(n.id)) issues.push(`Duplicate node ID '${n.id}' at indices ${seen.get(n.id)} and ${i}`);
|
||||
else seen.set(n.id, i);
|
||||
nodeIds.add(n.id);
|
||||
});
|
||||
graph.edges.forEach((e, i) => {
|
||||
if (!nodeIds.has(e.source)) issues.push(`Edge[${i}] source '${e.source}' not found`);
|
||||
if (!nodeIds.has(e.target)) issues.push(`Edge[${i}] target '${e.target}' not found`);
|
||||
});
|
||||
const fileNodes = graph.nodes.filter(n => n.type === 'file').map(n => n.id);
|
||||
const assigned = new Map();
|
||||
(graph.layers || []).forEach(layer => {
|
||||
(layer.nodeIds || []).forEach(id => {
|
||||
if (!nodeIds.has(id)) issues.push(`Layer '${layer.id}' refs missing node '${id}'`);
|
||||
if (assigned.has(id)) issues.push(`Node '${id}' appears in multiple layers`);
|
||||
assigned.set(id, layer.id);
|
||||
});
|
||||
});
|
||||
fileNodes.forEach(id => {
|
||||
if (!assigned.has(id)) issues.push(`File node '${id}' not in any layer`);
|
||||
});
|
||||
(graph.tour || []).forEach((step, i) => {
|
||||
(step.nodeIds || []).forEach(id => {
|
||||
if (!nodeIds.has(id)) issues.push(`Tour step[${i}] refs missing node '${id}'`);
|
||||
});
|
||||
});
|
||||
const withEdges = new Set([
|
||||
...graph.edges.map(e => e.source),
|
||||
...graph.edges.map(e => e.target)
|
||||
]);
|
||||
graph.nodes.forEach(n => {
|
||||
if (!withEdges.has(n.id)) warnings.push(`Node '${n.id}' has no edges (orphan)`);
|
||||
});
|
||||
const stats = {
|
||||
totalNodes: graph.nodes.length,
|
||||
totalEdges: graph.edges.length,
|
||||
totalLayers: (graph.layers || []).length,
|
||||
tourSteps: (graph.tour || []).length,
|
||||
nodeTypes: graph.nodes.reduce((a, n) => { a[n.type] = (a[n.type]||0)+1; return a; }, {}),
|
||||
edgeTypes: graph.edges.reduce((a, e) => { a[e.type] = (a[e.type]||0)+1; return a; }, {})
|
||||
};
|
||||
fs.writeFileSync(outputPath, JSON.stringify({ issues, warnings, stats }, null, 2));
|
||||
process.exit(0);
|
||||
} catch (err) { process.stderr.write(err.message + '\n'); process.exit(1); }
|
||||
```
|
||||
|
||||
Execute it:
|
||||
```bash
|
||||
node $PROJECT_ROOT/.understand-anything/tmp/ua-inline-validate.js \
|
||||
"$PROJECT_ROOT/.understand-anything/intermediate/assembled-graph.json" \
|
||||
"$PROJECT_ROOT/.understand-anything/intermediate/review.json"
|
||||
```
|
||||
|
||||
If the script exits non-zero, read stderr, fix the script, and retry once.
|
||||
|
||||
---
|
||||
|
||||
#### `--review` path: full LLM reviewer
|
||||
|
||||
If `--review` IS in `$ARGUMENTS`, dispatch the LLM graph-reviewer subagent as follows:
|
||||
|
||||
Dispatch a subagent using the prompt template at `./graph-reviewer-prompt.md`. Read the template file and pass the full content as the subagent's prompt, appending the following additional context:
|
||||
|
||||
> **Additional context from main session:**
|
||||
>
|
||||
> Phase 1 scan results (file inventory):
|
||||
> ```json
|
||||
> [list of {path, sizeLines} from scan-result.json]
|
||||
> ```
|
||||
>
|
||||
> Phase warnings/errors accumulated during analysis:
|
||||
> - [list any batch failures, skipped files, or warnings from Phases 2-5]
|
||||
>
|
||||
> Cross-validate: every file in the scan inventory should have a corresponding `file:` node in the graph. Flag any missing files. Also flag any graph nodes whose `filePath` doesn't appear in the scan inventory.
|
||||
|
||||
Pass these parameters in the dispatch prompt:
|
||||
|
||||
> Validate the knowledge graph at `$PROJECT_ROOT/.understand-anything/intermediate/assembled-graph.json`.
|
||||
> Project root: `$PROJECT_ROOT`
|
||||
> Read the file and validate it for completeness and correctness.
|
||||
> Write output to: `$PROJECT_ROOT/.understand-anything/intermediate/review.json`
|
||||
|
||||
---
|
||||
|
||||
4. Read `$PROJECT_ROOT/.understand-anything/intermediate/review.json`.
|
||||
|
||||
5. **If `issues` array is non-empty:**
|
||||
- Review the `issues` list
|
||||
- Apply automated fixes where possible:
|
||||
- Remove edges with dangling references
|
||||
- Fill missing required fields with sensible defaults (e.g., empty `tags` -> `["untagged"]`, empty `summary` -> `"No summary available"`)
|
||||
- Remove nodes with invalid types
|
||||
- Re-run the final graph validation after automated fixes
|
||||
- If critical issues remain after one fix attempt, save the graph anyway but include the warnings in the final report and mark dashboard auto-launch as skipped
|
||||
|
||||
6. **If `issues` array is empty:** Proceed to Phase 7.
|
||||
```
|
||||
|
||||
### Step 3: Verify the edit
|
||||
|
||||
Re-read SKILL.md lines 297–380 and confirm:
|
||||
- Phase 6 step 3 now checks for `--review` flag
|
||||
- The inline validation script is present and complete
|
||||
- The `--review` path still dispatches the LLM subagent identically to before
|
||||
- Steps 4–6 handle the `review.json` output the same way as before
|
||||
|
||||
### Step 4: Commit
|
||||
|
||||
```bash
|
||||
git add understand-anything-plugin/skills/understand/SKILL.md
|
||||
git commit -m "perf(understand): gate LLM graph-reviewer behind --review flag, add inline deterministic validation"
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Task 2: C4a — Slim Phase 4 (architecture) node payload
|
||||
|
||||
Removes `name` and `languageNotes` from the file node format injected into the architecture-analyzer subagent. These fields are not needed for architectural layer assignment and add unnecessary tokens.
|
||||
|
||||
**Files:**
|
||||
- Modify: `understand-anything-plugin/skills/understand/SKILL.md` (Phase 4, around line 188–196)
|
||||
|
||||
### Step 1: Locate the Phase 4 dispatch prompt in SKILL.md
|
||||
|
||||
Find the block starting "Pass these parameters in the dispatch prompt:" under Phase 4 (around line 181). Look for:
|
||||
|
||||
```
|
||||
> File nodes:
|
||||
> ```json
|
||||
> [list of {id, name, filePath, summary, tags} for all file-type nodes]
|
||||
> ```
|
||||
```
|
||||
|
||||
### Step 2: Update the file node format
|
||||
|
||||
Change the file nodes line from:
|
||||
```
|
||||
> [list of {id, name, filePath, summary, tags} for all file-type nodes]
|
||||
```
|
||||
|
||||
To:
|
||||
```
|
||||
> [list of {id, filePath, summary, tags} for all file-type nodes — omit name, complexity, languageNotes]
|
||||
```
|
||||
|
||||
### Step 3: Verify
|
||||
|
||||
Re-read Phase 4 and confirm the node format line is updated. Import edges line below it (`[list of edges with type "imports"]`) is unchanged.
|
||||
|
||||
### Step 4: Commit
|
||||
|
||||
```bash
|
||||
git add understand-anything-plugin/skills/understand/SKILL.md
|
||||
git commit -m "perf(understand): slim Phase 4 architecture payload — drop redundant node fields"
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Task 3: C4b — Slim Phase 5 (tour builder) payload
|
||||
|
||||
Phase 5 currently injects all nodes (including function/class), all edge types, and full layer objects (with nodeIds arrays). Only file nodes, import+calls edges, and slim layers are needed for tour design. This is the largest single payload change, saving ~105,000 tokens on a 500-file project.
|
||||
|
||||
**Files:**
|
||||
- Modify: `understand-anything-plugin/skills/understand/SKILL.md` (Phase 5, lines 257–270)
|
||||
- Modify: `understand-anything-plugin/skills/understand/tour-builder-prompt.md` (input schema)
|
||||
|
||||
### Step 1: Locate the Phase 5 dispatch prompt in SKILL.md
|
||||
|
||||
Find the block starting with (around line 257):
|
||||
```
|
||||
> Nodes (summarized):
|
||||
> ```json
|
||||
> [list of {id, name, filePath, summary, type} for key nodes]
|
||||
> ```
|
||||
>
|
||||
> Layers:
|
||||
> ```json
|
||||
> [layers from Phase 4]
|
||||
> ```
|
||||
>
|
||||
> Key edges:
|
||||
> ```json
|
||||
> [imports and calls edges]
|
||||
> ```
|
||||
```
|
||||
|
||||
### Step 2: Replace all three payload sections
|
||||
|
||||
Replace those lines with:
|
||||
|
||||
```markdown
|
||||
> Nodes (file nodes only):
|
||||
> ```json
|
||||
> [list of {id, name, filePath, summary, type} for file-type nodes ONLY — do NOT include function or class nodes]
|
||||
> ```
|
||||
>
|
||||
> Layers:
|
||||
> ```json
|
||||
> [list of {id, name, description} for each layer — omit nodeIds]
|
||||
> ```
|
||||
>
|
||||
> Edges (imports and calls only):
|
||||
> ```json
|
||||
> [list of edges where type is "imports" or "calls" only — exclude all other edge types]
|
||||
> ```
|
||||
```
|
||||
|
||||
### Step 3: Update tour-builder-prompt.md input schema
|
||||
|
||||
Open `tour-builder-prompt.md` and find the "Script Requirements" section (around line 18–35). The input schema currently shows:
|
||||
```json
|
||||
{
|
||||
"nodes": [...],
|
||||
"edges": [...],
|
||||
"layers": [
|
||||
{"id": "layer:core", "name": "Core", "nodeIds": ["file:src/index.ts"]}
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
Update the layers example to reflect the slim format:
|
||||
```json
|
||||
{
|
||||
"nodes": [
|
||||
{"id": "file:src/index.ts", "type": "file", "name": "index.ts", "filePath": "src/index.ts", "summary": "..."}
|
||||
],
|
||||
"edges": [
|
||||
{"source": "file:src/index.ts", "target": "file:src/utils.ts", "type": "imports"}
|
||||
],
|
||||
"layers": [
|
||||
{"id": "layer:core", "name": "Core", "description": "Core application logic"}
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
Also update the "G. Node Summary Index" description (around line 84) to reflect that input nodes are file-type only:
|
||||
|
||||
Find:
|
||||
```
|
||||
**G. Node Summary Index**
|
||||
|
||||
Create a lookup of each node ID to its `summary`, `type`, `tags` (default to empty array `[]` if not present in input), and `name` for easy reference.
|
||||
```
|
||||
|
||||
Add a note after it:
|
||||
```
|
||||
Note: input nodes are file-type only. The nodeSummaryIndex will contain only file nodes.
|
||||
```
|
||||
|
||||
### Step 4: Verify
|
||||
|
||||
- Re-read SKILL.md Phase 5 payload block: confirms file-only nodes, slim layers (no nodeIds), imports+calls edges only
|
||||
- Re-read tour-builder-prompt.md input schema: layers no longer have nodeIds
|
||||
|
||||
### Step 5: Commit
|
||||
|
||||
```bash
|
||||
git add understand-anything-plugin/skills/understand/SKILL.md \
|
||||
understand-anything-plugin/skills/understand/tour-builder-prompt.md
|
||||
git commit -m "perf(understand): slim Phase 5 tour payload — file nodes only, imports+calls edges, slim layers"
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Task 4: C3 — Remove language/framework addendums from file-analyzer batches
|
||||
|
||||
The addendums (`languages/typescript.md`, `frameworks/react.md`, etc.) are currently injected into every file-analyzer batch prompt. They cost ~1,300 tokens × N batches. The model already knows these languages. Replace with a compact inline reference table (~150 tokens, paid once, embedded in the base template).
|
||||
|
||||
**Files:**
|
||||
- Modify: `understand-anything-plugin/skills/understand/SKILL.md` (Phase 2, lines 104–117)
|
||||
- Modify: `understand-anything-plugin/skills/understand/file-analyzer-prompt.md` (add quick reference section)
|
||||
|
||||
### Step 1: Update the "Build the combined prompt template" block in SKILL.md Phase 2
|
||||
|
||||
Find the block at lines 104–117:
|
||||
```
|
||||
**Build the combined prompt template:**
|
||||
1. Read the base template at `./file-analyzer-prompt.md`.
|
||||
2. **Language context injection:** ...
|
||||
3. **Framework addendum injection:** ...
|
||||
|
||||
Then for each batch pass the combined template content as the subagent's prompt, appending the following additional context:
|
||||
|
||||
> **Additional context from main session:**
|
||||
>
|
||||
> Project: `<projectName>` — `<projectDescription>`
|
||||
> Frameworks detected: `<frameworks from Phase 1>`
|
||||
> Languages: `<languages from Phase 1>`
|
||||
>
|
||||
> Use the language context and framework addendums (appended above) to produce more accurate summaries and better classify file roles.
|
||||
```
|
||||
|
||||
Replace it with:
|
||||
```markdown
|
||||
**Build the prompt for each batch:**
|
||||
1. Read the base template at `./file-analyzer-prompt.md`. (Language and framework hints are embedded in the template — do NOT append addendum files for Phase 2 batches. Addendums are reserved for Phase 4.)
|
||||
|
||||
Then for each batch pass the template content as the subagent's prompt, appending the following additional context:
|
||||
|
||||
> **Additional context from main session:**
|
||||
>
|
||||
> Project: `<projectName>` — `<projectDescription>`
|
||||
> Languages: `<languages from Phase 1>`
|
||||
```
|
||||
|
||||
This removes steps 2 and 3 (the addendum injection loops) entirely from Phase 2.
|
||||
|
||||
### Step 2: Add Language and Framework Quick Reference to file-analyzer-prompt.md
|
||||
|
||||
Open `file-analyzer-prompt.md`. Find the "## Critical Constraints" section near the bottom (around line 299). Insert the following new section **before** "## Critical Constraints":
|
||||
|
||||
```markdown
|
||||
## Language and Framework Quick Reference
|
||||
|
||||
Use these hints to improve tag and edge accuracy for common patterns. Your training knowledge covers these — this is a fast lookup for the most impactful signals.
|
||||
|
||||
**Tag signals:**
|
||||
|
||||
| Signal | Tags to apply |
|
||||
|---|---|
|
||||
| File in `hooks/`, exports a function starting with `use` | `hook`, `service` |
|
||||
| File in `contexts/` or `context/`, exports a Provider component | `service`, `state` |
|
||||
| File in `pages/` or `views/` | `ui`, `routing` |
|
||||
| File in `store/`, `slices/`, `reducers/`, `state/` | `state` |
|
||||
| File in `services/`, `api/`, `client/` | `service` |
|
||||
| `__init__.py` at a package root with re-exports | `entry-point`, `barrel` |
|
||||
| `manage.py` at the project root | `entry-point` |
|
||||
| `mod.rs` in a directory | `barrel` |
|
||||
| `main.go` in a `cmd/` subdirectory | `entry-point` |
|
||||
|
||||
**Edge signals:**
|
||||
|
||||
| Pattern | Edge to create |
|
||||
|---|---|
|
||||
| React component renders another component in its JSX | `contains` from parent to child |
|
||||
| Component/hook calls a custom hook (`useX`) | `depends_on` from consumer to hook file |
|
||||
| Context provider wraps components | `publishes` from provider to context definition |
|
||||
| Component calls `useContext` or custom context hook | `subscribes` from consumer to context definition |
|
||||
| Python file uses `from x import y` where x is a project file | `imports` edge (same rule as JS/TS) |
|
||||
| Go file `import`s an internal package path | `imports` edge to the resolved file |
|
||||
|
||||
```
|
||||
|
||||
### Step 3: Verify
|
||||
|
||||
- Re-read SKILL.md Phase 2 "Build the prompt" block: steps 2 and 3 (addendum loops) are gone; "Frameworks detected" line in additional context is gone
|
||||
- Re-read file-analyzer-prompt.md: new "Language and Framework Quick Reference" section appears before Critical Constraints; no reference to addendum files
|
||||
- Confirm Phase 4 "Build the combined prompt template" (lines 163–167) is **unchanged** — addendums still apply there
|
||||
|
||||
### Step 4: Commit
|
||||
|
||||
```bash
|
||||
git add understand-anything-plugin/skills/understand/SKILL.md \
|
||||
understand-anything-plugin/skills/understand/file-analyzer-prompt.md
|
||||
git commit -m "perf(understand): remove addendum injection from Phase 2 batches, add compact inline hints to file-analyzer"
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Task 5: C1a — Extend scanner to pre-resolve imports
|
||||
|
||||
Adds a new Step 8 to the project scanner script: parse import statements from every source file and resolve relative imports against the discovered file list. The resolved map is written into `scan-result.json` as `importMap`. This is the data that lets us eliminate `allProjectFiles` from every batch in Task 7.
|
||||
|
||||
**Files:**
|
||||
- Modify: `understand-anything-plugin/skills/understand/project-scanner-prompt.md`
|
||||
|
||||
### Step 1: Add Step 8 to the scanner script requirements
|
||||
|
||||
Open `project-scanner-prompt.md`. Find "**Step 7 -- Project Name**" (around line 100). After its content (the priority list), add a new step:
|
||||
|
||||
```markdown
|
||||
**Step 8 -- Import Resolution**
|
||||
|
||||
For each file in the discovered source list, extract and resolve relative import statements. The goal is to produce a map from each file's path to the list of project-internal files it imports. External package imports are ignored.
|
||||
|
||||
For each file, read its content and extract import paths using language-appropriate patterns:
|
||||
|
||||
| Language | Import patterns to match |
|
||||
|---|---|
|
||||
| TypeScript/JavaScript | `import ... from './...'` or `'../'`, `require('./...')` or `require('../...')` |
|
||||
| Python | `from .x import y`, `from ..x import y`, `import .x` (relative only) |
|
||||
| Go | Paths in `import (...)` blocks that start with the module path from `go.mod` |
|
||||
| Rust | `use crate::`, `use super::`, `mod x` (within the same crate) |
|
||||
| Java/Kotlin | Not resolvable by path — skip import resolution for these languages |
|
||||
| Ruby | `require_relative '...'` paths |
|
||||
|
||||
For each extracted import path:
|
||||
1. Compute the resolved file path relative to project root:
|
||||
- For relative imports (`./x`, `../x`): resolve from the importing file's directory
|
||||
- Try these extension variants in order if the import has no extension: `.ts`, `.tsx`, `.js`, `.jsx`, `/index.ts`, `/index.js`, `/index.tsx`, `/index.jsx`, `.py`, `.go`, `.rs`, `.rb`
|
||||
2. Check if the resolved path exists in the discovered file list
|
||||
3. If yes: add to this file's resolved imports list
|
||||
4. If no: skip (external, unresolvable, or dynamic import)
|
||||
|
||||
Output format in the script result:
|
||||
```json
|
||||
"importMap": {
|
||||
"src/index.ts": ["src/utils.ts", "src/config.ts"],
|
||||
"src/utils.ts": [],
|
||||
"src/components/App.tsx": ["src/hooks/useAuth.ts", "src/store/index.ts"]
|
||||
}
|
||||
```
|
||||
|
||||
Keys are project-relative paths. Values are arrays of resolved project-relative paths. Every key in the file list must appear in `importMap` (use an empty array `[]` if no imports were resolved). External packages and unresolvable imports are omitted entirely.
|
||||
```
|
||||
|
||||
### Step 2: Update the scanner script output format
|
||||
|
||||
Find the "### Script Output Format" section (around line 109) and update the example JSON to include `importMap`:
|
||||
|
||||
Find this in the example:
|
||||
```json
|
||||
{
|
||||
"scriptCompleted": true,
|
||||
"name": "project-name",
|
||||
...
|
||||
"estimatedComplexity": "moderate"
|
||||
}
|
||||
```
|
||||
|
||||
Add `importMap` to the example:
|
||||
```json
|
||||
{
|
||||
"scriptCompleted": true,
|
||||
"name": "project-name",
|
||||
"rawDescription": "...",
|
||||
"readmeHead": "...",
|
||||
"languages": ["javascript", "typescript"],
|
||||
"frameworks": ["React", "Vite"],
|
||||
"files": [
|
||||
{"path": "src/index.ts", "language": "typescript", "sizeLines": 150}
|
||||
],
|
||||
"totalFiles": 42,
|
||||
"estimatedComplexity": "moderate",
|
||||
"importMap": {
|
||||
"src/index.ts": ["src/utils.ts", "src/config.ts"],
|
||||
"src/utils.ts": []
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
Also update the field documentation list below the example to add:
|
||||
```
|
||||
- `importMap` (object) — map from every source file path to its list of resolved project-internal import paths; empty array if no resolved imports; external packages excluded
|
||||
```
|
||||
|
||||
### Step 3: Update the final assembly section to preserve importMap
|
||||
|
||||
Find "## Phase 2 -- Description and Final Assembly" (around line 153). Find the IMPORTANT note:
|
||||
```
|
||||
**IMPORTANT:** The final output must NOT contain the `scriptCompleted`, `rawDescription`, or `readmeHead` fields.
|
||||
```
|
||||
|
||||
Update it to:
|
||||
```
|
||||
**IMPORTANT:** The final output must NOT contain the `scriptCompleted`, `rawDescription`, or `readmeHead` fields. All other fields — including `importMap` — MUST be preserved exactly as output by the script.
|
||||
```
|
||||
|
||||
Also update the final output example to include `importMap`:
|
||||
```json
|
||||
{
|
||||
"name": "project-name",
|
||||
"description": "...",
|
||||
"languages": ["typescript"],
|
||||
"frameworks": ["React"],
|
||||
"files": [...],
|
||||
"totalFiles": 42,
|
||||
"estimatedComplexity": "moderate",
|
||||
"importMap": {
|
||||
"src/index.ts": ["src/utils.ts"]
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### Step 4: Verify
|
||||
|
||||
Re-read `project-scanner-prompt.md` and confirm:
|
||||
- Step 8 is present with full import resolution logic
|
||||
- Script output format includes `importMap`
|
||||
- Field documentation includes `importMap`
|
||||
- Final assembly section preserves `importMap` in output
|
||||
|
||||
### Step 5: Commit
|
||||
|
||||
```bash
|
||||
git add understand-anything-plugin/skills/understand/project-scanner-prompt.md
|
||||
git commit -m "perf(understand): extend scanner to pre-resolve imports, output importMap in scan-result.json"
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Task 6: C1b — Update file-analyzer to use batchImportData
|
||||
|
||||
Removes `allProjectFiles` from the file-analyzer input schema and replaces it with `batchImportData` (pre-resolved imports for this batch's files only). Updates the extraction script section to skip import resolution entirely (already done by scanner). Updates the edge creation step to use `batchImportData` directly.
|
||||
|
||||
**Files:**
|
||||
- Modify: `understand-anything-plugin/skills/understand/file-analyzer-prompt.md`
|
||||
|
||||
### Step 1: Update the input JSON schema (Script Requirements, step 1)
|
||||
|
||||
Find the input schema block around line 19:
|
||||
```json
|
||||
{
|
||||
"projectRoot": "/path/to/project",
|
||||
"allProjectFiles": ["src/index.ts", "src/utils.ts", "..."],
|
||||
"batchFiles": [
|
||||
{"path": "src/index.ts", "language": "typescript", "sizeLines": 150},
|
||||
{"path": "src/utils.ts", "language": "typescript", "sizeLines": 80}
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
Replace with:
|
||||
```json
|
||||
{
|
||||
"projectRoot": "/path/to/project",
|
||||
"batchFiles": [
|
||||
{"path": "src/index.ts", "language": "typescript", "sizeLines": 150},
|
||||
{"path": "src/utils.ts", "language": "typescript", "sizeLines": 80}
|
||||
],
|
||||
"batchImportData": {
|
||||
"src/index.ts": ["src/utils.ts", "src/config.ts"],
|
||||
"src/utils.ts": []
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
Update the field descriptions:
|
||||
- Remove: `allProjectFiles` description
|
||||
- Add: `batchImportData` (object) — map from each batch file's project-relative path to its list of pre-resolved project-internal imports. Produced by the project scanner. Use this directly for import edge creation — do NOT attempt to re-resolve imports yourself.
|
||||
|
||||
### Step 2: Remove the imports extraction from "What the Script Must Extract"
|
||||
|
||||
Find the "**Imports:**" subsection under "What the Script Must Extract" (around lines 49–53):
|
||||
```
|
||||
**Imports:**
|
||||
- Source module path (exactly as written in the import statement)
|
||||
- Imported specifiers (named imports, default import, namespace import)
|
||||
- Line number
|
||||
- For relative imports (starting with `./` or `../`), compute the resolved path...
|
||||
```
|
||||
|
||||
Replace this entire subsection with:
|
||||
```markdown
|
||||
**Imports:**
|
||||
- Do NOT extract imports in the script. Import resolution has already been performed by the project scanner.
|
||||
- The pre-resolved imports for each file are provided in `batchImportData` in the input JSON.
|
||||
- Do not include an `imports` field in the script output — import edges will be created in Phase 2 using `batchImportData` directly.
|
||||
```
|
||||
|
||||
### Step 3: Update the script output format to remove imports
|
||||
|
||||
Find the `results` array in the script output format (around line 67). The current `imports` array in the output:
|
||||
```json
|
||||
"imports": [
|
||||
{"source": "./utils", "resolvedPath": "src/utils.ts", "specifiers": ["formatDate"], "line": 1, "isExternal": false},
|
||||
{"source": "express", "resolvedPath": null, "specifiers": ["default"], "line": 2, "isExternal": true}
|
||||
],
|
||||
```
|
||||
|
||||
Remove the `imports` array from the script output format entirely. The result for each file should be:
|
||||
```json
|
||||
{
|
||||
"path": "src/index.ts",
|
||||
"language": "typescript",
|
||||
"totalLines": 150,
|
||||
"nonEmptyLines": 120,
|
||||
"functions": [...],
|
||||
"classes": [...],
|
||||
"exports": [...],
|
||||
"metrics": {
|
||||
"importCount": 5,
|
||||
"exportCount": 3,
|
||||
"functionCount": 4,
|
||||
"classCount": 1
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
Keep `metrics.importCount` (derived from `batchImportData[path].length`) as a useful metric.
|
||||
|
||||
Update the metrics description to say:
|
||||
```
|
||||
- `importCount` (integer) — use `batchImportData[file.path].length` from the input JSON
|
||||
```
|
||||
|
||||
### Step 4: Update "Preparing the Script Input" section
|
||||
|
||||
Find the `cat` command around line 113 that creates the input JSON:
|
||||
```bash
|
||||
cat > $PROJECT_ROOT/.understand-anything/tmp/ua-file-analyzer-input-<batchIndex>.json << 'ENDJSON'
|
||||
{
|
||||
"projectRoot": "<project-root>",
|
||||
"allProjectFiles": [<full file list from scan>],
|
||||
"batchFiles": [<this batch's files>]
|
||||
}
|
||||
ENDJSON
|
||||
```
|
||||
|
||||
Replace with:
|
||||
```bash
|
||||
cat > $PROJECT_ROOT/.understand-anything/tmp/ua-file-analyzer-input-<batchIndex>.json << 'ENDJSON'
|
||||
{
|
||||
"projectRoot": "<project-root>",
|
||||
"batchFiles": [<this batch's files>],
|
||||
"batchImportData": <batchImportData JSON object — provided in your dispatch prompt>
|
||||
}
|
||||
ENDJSON
|
||||
```
|
||||
|
||||
### Step 5: Update Step 3 (Create Edges) — Import edge creation rule
|
||||
|
||||
Find the "**Import edge creation rule:**" in the "Step 3 -- Create Edges" section (around line 213):
|
||||
```
|
||||
**Import edge creation rule:** For each import in the script output where `isExternal` is `false` and `resolvedPath` is non-null, create an `imports` edge from the current file node to `file:<resolvedPath>`. Do NOT create edges for external package imports.
|
||||
```
|
||||
|
||||
Replace with:
|
||||
```markdown
|
||||
**Import edge creation rule:** For each resolved path in `batchImportData[filePath]` (provided in the input JSON), create an `imports` edge from the current file node to `file:<resolvedPath>`. The `batchImportData` values contain only resolved project-internal paths — external packages have already been filtered out. Do NOT attempt to re-resolve imports from source.
|
||||
```
|
||||
|
||||
### Step 6: Remove `allProjectFiles` references from Critical Constraints
|
||||
|
||||
Find the last bullet in "## Critical Constraints" (around line 304):
|
||||
```
|
||||
- For import edges, use the script's `resolvedPath` field directly. Do NOT attempt to resolve import paths yourself -- the script already did this deterministically.
|
||||
```
|
||||
|
||||
Replace with:
|
||||
```markdown
|
||||
- For import edges, use `batchImportData[filePath]` directly from the input JSON. Do NOT attempt to resolve import paths yourself -- the project scanner already did this deterministically.
|
||||
```
|
||||
|
||||
### Step 7: Verify
|
||||
|
||||
Re-read `file-analyzer-prompt.md` and confirm:
|
||||
- Input schema has `batchImportData`, no `allProjectFiles`
|
||||
- Script "What to Extract" section: imports extraction replaced with "do not extract"
|
||||
- Script output format: no `imports` array per file
|
||||
- Preparing the Script Input: cat command has no `allProjectFiles`
|
||||
- Import edge creation rule: uses `batchImportData` not script output
|
||||
- Critical Constraints: no reference to `resolvedPath` from script
|
||||
|
||||
### Step 8: Commit
|
||||
|
||||
```bash
|
||||
git add understand-anything-plugin/skills/understand/file-analyzer-prompt.md
|
||||
git commit -m "perf(understand): replace allProjectFiles with batchImportData in file-analyzer — import resolution now done by scanner"
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Task 7: C1c + C2 — Update SKILL.md Phase 2 orchestration
|
||||
|
||||
Wires up the `importMap` from Phase 1 into per-batch `batchImportData` slices. Increases batch size from 5-10 to 20-30 files. Increases concurrency from 3 to 5. Removes `allProjectFiles` from the dispatch prompt.
|
||||
|
||||
**Files:**
|
||||
- Modify: `understand-anything-plugin/skills/understand/SKILL.md` (Phase 0, Phase 1, Phase 2)
|
||||
|
||||
### Step 1: Update Phase 1 to note importMap is now in scan-result.json
|
||||
|
||||
Find Phase 1 (around line 62) where it says:
|
||||
```
|
||||
After the subagent completes, read `$PROJECT_ROOT/.understand-anything/intermediate/scan-result.json` to get:
|
||||
- Project name, description
|
||||
- Languages, frameworks
|
||||
- File list with line counts
|
||||
- Complexity estimate
|
||||
```
|
||||
|
||||
Add one item to the list:
|
||||
```
|
||||
- Import map (`importMap`): pre-resolved project-internal imports per file
|
||||
```
|
||||
|
||||
Also add a note:
|
||||
```
|
||||
Store `importMap` in memory as `$IMPORT_MAP` for use in Phase 2 batch construction.
|
||||
```
|
||||
|
||||
### Step 2: Change batch size and concurrency in Phase 2
|
||||
|
||||
Find line 100:
|
||||
```
|
||||
Batch the file list from Phase 1 into groups of **5-10 files each** (aim for balanced batch sizes).
|
||||
```
|
||||
|
||||
Replace with:
|
||||
```
|
||||
Batch the file list from Phase 1 into groups of **20-30 files each** (aim for ~25 files per batch for balanced sizes).
|
||||
```
|
||||
|
||||
Find line 102:
|
||||
```
|
||||
For each batch, dispatch a subagent using the prompt template at `./file-analyzer-prompt.md`. Run up to **3 subagents concurrently** using parallel dispatch.
|
||||
```
|
||||
|
||||
Replace with:
|
||||
```
|
||||
For each batch, dispatch a subagent using the prompt template at `./file-analyzer-prompt.md`. Run up to **5 subagents concurrently** using parallel dispatch.
|
||||
```
|
||||
|
||||
### Step 3: Add batchImportData construction to the dispatch block
|
||||
|
||||
Find the dispatch prompt block (around lines 119–134):
|
||||
```
|
||||
Fill in batch-specific parameters below and dispatch:
|
||||
|
||||
> Analyze these source files and produce GraphNode and GraphEdge objects.
|
||||
> Project root: `$PROJECT_ROOT`
|
||||
> Project: `<projectName>`
|
||||
> Languages: `<languages>`
|
||||
> Batch index: `<batchIndex>`
|
||||
> Write output to: `$PROJECT_ROOT/.understand-anything/intermediate/batch-<batchIndex>.json`
|
||||
>
|
||||
> All project files (for import resolution):
|
||||
> `<full file path list from scan>`
|
||||
>
|
||||
> Files to analyze in this batch:
|
||||
> 1. `<path>` (<sizeLines> lines)
|
||||
> ...
|
||||
```
|
||||
|
||||
Replace with:
|
||||
```markdown
|
||||
Before dispatching each batch, construct `batchImportData` from `$IMPORT_MAP`:
|
||||
```json
|
||||
batchImportData = {}
|
||||
for each file in this batch:
|
||||
batchImportData[file.path] = $IMPORT_MAP[file.path] ?? []
|
||||
```
|
||||
|
||||
Fill in batch-specific parameters below and dispatch:
|
||||
|
||||
> Analyze these source files and produce GraphNode and GraphEdge objects.
|
||||
> Project root: `$PROJECT_ROOT`
|
||||
> Project: `<projectName>`
|
||||
> Languages: `<languages>`
|
||||
> Batch index: `<batchIndex>`
|
||||
> Write output to: `$PROJECT_ROOT/.understand-anything/intermediate/batch-<batchIndex>.json`
|
||||
>
|
||||
> Pre-resolved import data for this batch (use this for all import edge creation — do NOT re-resolve imports from source):
|
||||
> ```json
|
||||
> <batchImportData JSON>
|
||||
> ```
|
||||
>
|
||||
> Files to analyze in this batch:
|
||||
> 1. `<path>` (<sizeLines> lines)
|
||||
> 2. `<path>` (<sizeLines> lines)
|
||||
> ...
|
||||
```
|
||||
|
||||
### Step 4: Update incremental update path
|
||||
|
||||
Find "### Incremental update path" (around line 140):
|
||||
```
|
||||
Use the changed files list from Phase 0. Batch and dispatch file-analyzer subagents using the same process as above, but only for changed files.
|
||||
```
|
||||
|
||||
Update to clarify that batchImportData still applies:
|
||||
```
|
||||
Use the changed files list from Phase 0. Batch and dispatch file-analyzer subagents using the same process as above (20-30 files per batch, up to 5 concurrent, with batchImportData constructed from $IMPORT_MAP), but only for changed files.
|
||||
```
|
||||
|
||||
### Step 5: Verify all Phase 2 changes
|
||||
|
||||
Re-read SKILL.md Phase 2 in full and confirm:
|
||||
- Batch size says "20-30 files"
|
||||
- Concurrency says "5 subagents concurrently"
|
||||
- "Build the prompt" block: only step 1 (read base template), no addendum steps
|
||||
- Additional context block: no "Frameworks detected" line, no addendum reference
|
||||
- Dispatch prompt: has `batchImportData` injection, no `allProjectFiles`
|
||||
- Incremental path: mentions batchImportData
|
||||
|
||||
### Step 6: Commit
|
||||
|
||||
```bash
|
||||
git add understand-anything-plugin/skills/understand/SKILL.md
|
||||
git commit -m "perf(understand): wire importMap into batchImportData per batch, increase batch size 5-10→20-30, concurrency 3→5"
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Task 8: Version bump
|
||||
|
||||
Per project convention, all four version files must stay in sync when changes are pushed.
|
||||
|
||||
**Files:**
|
||||
- Modify: `understand-anything-plugin/package.json`
|
||||
- Modify: `.claude-plugin/marketplace.json`
|
||||
- Modify: `.claude-plugin/plugin.json`
|
||||
- Modify: `.cursor-plugin/plugin.json`
|
||||
|
||||
### Step 1: Read current version
|
||||
|
||||
```bash
|
||||
node -e "const p = require('./understand-anything-plugin/package.json'); console.log(p.version)"
|
||||
```
|
||||
|
||||
Expected: `1.2.1` (or whatever the current version is).
|
||||
|
||||
### Step 2: Bump patch version in all four files
|
||||
|
||||
New version: `1.2.2` (patch bump — internal optimization, no API changes).
|
||||
|
||||
Update each file:
|
||||
- `understand-anything-plugin/package.json`: `"version": "1.2.2"`
|
||||
- `.claude-plugin/marketplace.json`: `"version": "1.2.2"` in `plugins[0]`
|
||||
- `.claude-plugin/plugin.json`: `"version": "1.2.2"`
|
||||
- `.cursor-plugin/plugin.json`: `"version": "1.2.2"`
|
||||
|
||||
### Step 3: Verify all four files match
|
||||
|
||||
```bash
|
||||
grep -r '"version"' understand-anything-plugin/package.json .claude-plugin/marketplace.json .claude-plugin/plugin.json .cursor-plugin/plugin.json
|
||||
```
|
||||
|
||||
All four should show `"version": "1.2.2"`.
|
||||
|
||||
### Step 4: Commit
|
||||
|
||||
```bash
|
||||
git add understand-anything-plugin/package.json \
|
||||
.claude-plugin/marketplace.json \
|
||||
.claude-plugin/plugin.json \
|
||||
.cursor-plugin/plugin.json
|
||||
git commit -m "chore: bump version to 1.2.2"
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Task 9: Build and smoke test
|
||||
|
||||
Verifies all changes work end-to-end by running `/understand --full` against a real project.
|
||||
|
||||
**Files:** None (testing only)
|
||||
|
||||
### Step 1: Build the packages
|
||||
|
||||
```bash
|
||||
pnpm --filter @understand-anything/core build
|
||||
pnpm --filter @understand-anything/skill build
|
||||
```
|
||||
|
||||
Expected: both build without errors.
|
||||
|
||||
### Step 2: Find installed plugin version and copy to cache
|
||||
|
||||
```bash
|
||||
ls ~/.claude/plugins/cache/understand-anything/understand-anything/
|
||||
```
|
||||
|
||||
Note the version (e.g., `1.0.1`). Copy local build into the cache:
|
||||
|
||||
```bash
|
||||
VERSION=$(node -e "const p = require('./understand-anything-plugin/package.json'); console.log(p.version)")
|
||||
rm -rf ~/.claude/plugins/cache/understand-anything/understand-anything/$VERSION
|
||||
cp -R ./understand-anything-plugin ~/.claude/plugins/cache/understand-anything/understand-anything/$VERSION
|
||||
```
|
||||
|
||||
### Step 3: Smoke test on a small project (~20 files)
|
||||
|
||||
Open a fresh Claude Code session in a small TypeScript project. Run:
|
||||
```
|
||||
/understand --full
|
||||
```
|
||||
|
||||
Verify:
|
||||
- Phases 0–7 complete without errors
|
||||
- `knowledge-graph.json` is created
|
||||
- Node count and edge count are reasonable
|
||||
- Layers and tour are present
|
||||
- No "allProjectFiles" or addendum errors in the output
|
||||
|
||||
### Step 4: Smoke test on a larger project (~100+ files)
|
||||
|
||||
Run `/understand --full` on a medium/large TypeScript+React project.
|
||||
|
||||
Verify:
|
||||
- Batch count is ~4-6 (at 20-30 files per batch for 100 files), not 10-20
|
||||
- No errors about missing import resolution
|
||||
- `importMap` is present in `scan-result.json` (check `.understand-anything/intermediate/` before cleanup, or add a temporary debug log)
|
||||
- Graph quality is comparable to before (summaries are descriptive, layers are correct)
|
||||
|
||||
### Step 5: Test `--review` flag
|
||||
|
||||
Run `/understand --full --review` on the same project.
|
||||
|
||||
Verify:
|
||||
- Phase 6 now dispatches the LLM graph-reviewer subagent (not the inline script)
|
||||
- `review.json` is produced with `approved` field
|
||||
- Pipeline completes normally
|
||||
|
||||
### Step 6: Final commit (if any fixes needed from smoke test)
|
||||
|
||||
```bash
|
||||
git add -A
|
||||
git commit -m "fix(understand): smoke test fixes for token reduction changes"
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Summary
|
||||
|
||||
| Task | Change | Risk |
|
||||
|---|---|---|
|
||||
| 1 | C5: Gate reviewer | Low |
|
||||
| 2 | C4a: Slim Phase 4 payload | Low |
|
||||
| 3 | C4b: Slim Phase 5 payload | Low |
|
||||
| 4 | C3: Remove addendums from batches | Low |
|
||||
| 5 | C1a: Scanner import resolution | Medium |
|
||||
| 6 | C1b: File-analyzer uses batchImportData | Medium |
|
||||
| 7 | C1c+C2: SKILL.md orchestration + batch size | Medium |
|
||||
| 8 | Version bump | Low |
|
||||
| 9 | Smoke test | — |
|
||||
|
||||
Tasks 1–4 are independent of Tasks 5–7. They can be shipped separately if needed. Tasks 5, 6, and 7 are tightly coupled (scanner produces importMap → SKILL.md passes batchImportData → file-analyzer consumes it) and must be shipped together.
|
||||
File diff suppressed because it is too large
Load Diff
@@ -0,0 +1,138 @@
|
||||
# Homepage Feature Update Implementation Plan
|
||||
|
||||
> **For Claude:** REQUIRED SUB-SKILL: Use superpowers:executing-plans to implement this plan task-by-task.
|
||||
|
||||
**Goal:** Update the Astro homepage to reflect features from v1.2.0–v2.0.0 releases.
|
||||
|
||||
**Architecture:** Three file edits — expand Features.astro from 3→6 cards, update Install.astro platform note, update Footer.astro tagline. No new files or structural changes.
|
||||
|
||||
**Tech Stack:** Astro 6, CSS grid
|
||||
|
||||
---
|
||||
|
||||
### Task 1: Update Features.astro — Replace 3 Cards with 6
|
||||
|
||||
**Files:**
|
||||
- Modify: `homepage/src/components/Features.astro`
|
||||
|
||||
**Step 1: Replace the features array (lines 2–18)**
|
||||
|
||||
Replace the entire frontmatter features array with:
|
||||
|
||||
```astro
|
||||
---
|
||||
const features = [
|
||||
{
|
||||
icon: '◈',
|
||||
title: 'Interactive Knowledge Graph',
|
||||
description: 'Visualize files, functions, and dependencies as an explorable graph with hierarchical drill-down and smart layout.',
|
||||
},
|
||||
{
|
||||
icon: '⬡',
|
||||
title: 'Beyond Code Analysis',
|
||||
description: 'Analyze your entire project — Dockerfiles, Terraform, SQL, Markdown, and 26+ file types mapped into one unified graph.',
|
||||
},
|
||||
{
|
||||
icon: '⊘',
|
||||
title: 'Smart Filtering & Search',
|
||||
description: 'Filter by node type, complexity, layer, or edge category. Fuzzy and semantic search to find anything instantly.',
|
||||
},
|
||||
{
|
||||
icon: '⎙',
|
||||
title: 'Export & Share',
|
||||
description: 'Export your knowledge graph as high-quality PNG, SVG, or filtered JSON — ready for docs, presentations, or further analysis.',
|
||||
},
|
||||
{
|
||||
icon: '⟿',
|
||||
title: 'Dependency Path Finder',
|
||||
description: 'Find the shortest path between any two components. Understand how parts of your system connect at a glance.',
|
||||
},
|
||||
{
|
||||
icon: '⟐',
|
||||
title: 'Guided Tours & Onboarding',
|
||||
description: 'AI-generated walkthroughs that teach the codebase step by step, plus onboarding guides for new team members.',
|
||||
},
|
||||
];
|
||||
---
|
||||
```
|
||||
|
||||
**Step 2: Update the reveal delay logic (line 24)**
|
||||
|
||||
The current `reveal-delay-${i + 1}` only has CSS for delays 1–3. With 6 cards in 2 rows, use modulo so each row staggers 1/2/3:
|
||||
|
||||
```astro
|
||||
<div class={`feature-card reveal reveal-delay-${(i % 3) + 1}`}>
|
||||
```
|
||||
|
||||
**Step 3: Update the grid CSS to handle 2 rows properly**
|
||||
|
||||
No change needed — `grid-template-columns: repeat(3, 1fr)` already wraps to a second row. The mobile `1fr` breakpoint also works. No CSS changes required.
|
||||
|
||||
**Step 4: Verify build**
|
||||
|
||||
Run: `cd homepage && npx astro build`
|
||||
Expected: Build completes with no errors.
|
||||
|
||||
**Step 5: Commit**
|
||||
|
||||
```bash
|
||||
git add homepage/src/components/Features.astro
|
||||
git commit -m "feat(homepage): expand features section to 6 cards for v2.0.0"
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### Task 2: Update Install.astro — Multi-Platform Note
|
||||
|
||||
**Files:**
|
||||
- Modify: `homepage/src/components/Install.astro`
|
||||
|
||||
**Step 1: Replace the platform note (line 13)**
|
||||
|
||||
Change:
|
||||
```html
|
||||
<p class="install-note">Works with <strong>Claude Code</strong> — Anthropic's official CLI for Claude.</p>
|
||||
```
|
||||
|
||||
To:
|
||||
```html
|
||||
<p class="install-note">Works with <strong>Claude Code</strong>, <strong>Codex</strong>, <strong>OpenCode</strong>, <strong>Gemini CLI</strong>, and more.</p>
|
||||
```
|
||||
|
||||
**Step 2: Commit**
|
||||
|
||||
```bash
|
||||
git add homepage/src/components/Install.astro
|
||||
git commit -m "feat(homepage): update install note for multi-platform support"
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### Task 3: Update Footer.astro — Tagline
|
||||
|
||||
**Files:**
|
||||
- Modify: `homepage/src/components/Footer.astro`
|
||||
|
||||
**Step 1: Replace the tagline (line 13)**
|
||||
|
||||
Change:
|
||||
```html
|
||||
<p class="footer-note">Built as a Claude Code plugin</p>
|
||||
```
|
||||
|
||||
To:
|
||||
```html
|
||||
<p class="footer-note">Built for AI coding assistants</p>
|
||||
```
|
||||
|
||||
**Step 2: Verify full build**
|
||||
|
||||
Run: `cd homepage && npx astro build`
|
||||
Expected: Clean build, no errors.
|
||||
|
||||
**Step 3: Commit**
|
||||
|
||||
```bash
|
||||
git add homepage/src/components/Footer.astro
|
||||
git commit -m "feat(homepage): update footer tagline for multi-platform"
|
||||
```
|
||||
File diff suppressed because it is too large
Load Diff
File diff suppressed because it is too large
Load Diff
@@ -0,0 +1,776 @@
|
||||
# .understandignore Implementation Plan
|
||||
|
||||
> **For agentic workers:** REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (`- [ ]`) syntax for tracking.
|
||||
|
||||
**Goal:** Add user-configurable file exclusion via `.understandignore` files using `.gitignore` syntax, with auto-generated starter files and a pre-analysis review pause.
|
||||
|
||||
**Architecture:** An `IgnoreFilter` module in `packages/core` uses the `ignore` npm package to parse `.understandignore` files and filter paths. A companion `IgnoreGenerator` scans the project for common patterns and produces a commented-out starter file. The `project-scanner` agent applies the filter as a second pass after its existing hardcoded exclusions. The `/understand` skill adds a Phase 0.5 that generates the starter file and pauses for user review.
|
||||
|
||||
**Tech Stack:** TypeScript, `ignore` npm package, Vitest
|
||||
|
||||
**Spec:** `docs/superpowers/specs/2026-04-10-understandignore-design.md`
|
||||
|
||||
---
|
||||
|
||||
## File Structure
|
||||
|
||||
### Core package
|
||||
- Create: `understand-anything-plugin/packages/core/src/ignore-filter.ts` — parse .understandignore, merge with defaults, filter paths
|
||||
- Create: `understand-anything-plugin/packages/core/src/ignore-generator.ts` — generate starter .understandignore by scanning project
|
||||
- Create: `understand-anything-plugin/packages/core/src/__tests__/ignore-filter.test.ts` — filter tests
|
||||
- Create: `understand-anything-plugin/packages/core/src/__tests__/ignore-generator.test.ts` — generator tests
|
||||
- Modify: `understand-anything-plugin/packages/core/src/index.ts` — export new modules
|
||||
- Modify: `understand-anything-plugin/packages/core/package.json` — add `ignore` dependency
|
||||
|
||||
### Agents & skills
|
||||
- Modify: `understand-anything-plugin/agents/project-scanner.md` — add Layer 2 filtering step
|
||||
- Modify: `understand-anything-plugin/skills/understand/SKILL.md` — add Phase 0.5
|
||||
|
||||
---
|
||||
|
||||
## Task 1: Add `ignore` dependency
|
||||
|
||||
**Files:**
|
||||
- Modify: `understand-anything-plugin/packages/core/package.json`
|
||||
|
||||
- [ ] **Step 1: Install the `ignore` npm package**
|
||||
|
||||
Run:
|
||||
```bash
|
||||
cd understand-anything-plugin && pnpm add --filter @understand-anything/core ignore
|
||||
```
|
||||
|
||||
- [ ] **Step 2: Verify it was added**
|
||||
|
||||
Run: `grep ignore understand-anything-plugin/packages/core/package.json`
|
||||
Expected: `"ignore": "^7.x.x"` (or similar) in dependencies
|
||||
|
||||
- [ ] **Step 3: Commit**
|
||||
|
||||
```bash
|
||||
git add understand-anything-plugin/packages/core/package.json understand-anything-plugin/pnpm-lock.yaml
|
||||
git commit -m "chore(core): add ignore package for .understandignore support"
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Task 2: Create IgnoreFilter module with tests (TDD)
|
||||
|
||||
**Files:**
|
||||
- Create: `understand-anything-plugin/packages/core/src/ignore-filter.ts`
|
||||
- Create: `understand-anything-plugin/packages/core/src/__tests__/ignore-filter.test.ts`
|
||||
|
||||
- [ ] **Step 1: Write the failing tests**
|
||||
|
||||
Create `understand-anything-plugin/packages/core/src/__tests__/ignore-filter.test.ts`:
|
||||
|
||||
```typescript
|
||||
import { describe, it, expect, beforeEach, afterEach } from "vitest";
|
||||
import { createIgnoreFilter, DEFAULT_IGNORE_PATTERNS } from "../ignore-filter";
|
||||
import { mkdirSync, writeFileSync, rmSync } from "node:fs";
|
||||
import { join } from "node:path";
|
||||
import { tmpdir } from "node:os";
|
||||
|
||||
describe("IgnoreFilter", () => {
|
||||
let testDir: string;
|
||||
|
||||
beforeEach(() => {
|
||||
testDir = join(tmpdir(), `ignore-filter-test-${Date.now()}`);
|
||||
mkdirSync(testDir, { recursive: true });
|
||||
mkdirSync(join(testDir, ".understand-anything"), { recursive: true });
|
||||
});
|
||||
|
||||
afterEach(() => {
|
||||
rmSync(testDir, { recursive: true, force: true });
|
||||
});
|
||||
|
||||
describe("DEFAULT_IGNORE_PATTERNS", () => {
|
||||
it("contains node_modules", () => {
|
||||
expect(DEFAULT_IGNORE_PATTERNS).toContain("node_modules/");
|
||||
});
|
||||
|
||||
it("contains .git", () => {
|
||||
expect(DEFAULT_IGNORE_PATTERNS).toContain(".git/");
|
||||
});
|
||||
|
||||
it("contains bin and obj for .NET", () => {
|
||||
expect(DEFAULT_IGNORE_PATTERNS).toContain("bin/");
|
||||
expect(DEFAULT_IGNORE_PATTERNS).toContain("obj/");
|
||||
});
|
||||
|
||||
it("contains build output directories", () => {
|
||||
expect(DEFAULT_IGNORE_PATTERNS).toContain("dist/");
|
||||
expect(DEFAULT_IGNORE_PATTERNS).toContain("build/");
|
||||
expect(DEFAULT_IGNORE_PATTERNS).toContain("out/");
|
||||
expect(DEFAULT_IGNORE_PATTERNS).toContain("coverage/");
|
||||
});
|
||||
});
|
||||
|
||||
describe("createIgnoreFilter with no user file", () => {
|
||||
it("ignores files matching default patterns", () => {
|
||||
const filter = createIgnoreFilter(testDir);
|
||||
expect(filter.isIgnored("node_modules/foo/bar.js")).toBe(true);
|
||||
expect(filter.isIgnored("dist/index.js")).toBe(true);
|
||||
expect(filter.isIgnored(".git/config")).toBe(true);
|
||||
expect(filter.isIgnored("bin/Debug/app.dll")).toBe(true);
|
||||
expect(filter.isIgnored("obj/Release/net8.0/app.dll")).toBe(true);
|
||||
});
|
||||
|
||||
it("does not ignore source files", () => {
|
||||
const filter = createIgnoreFilter(testDir);
|
||||
expect(filter.isIgnored("src/index.ts")).toBe(false);
|
||||
expect(filter.isIgnored("README.md")).toBe(false);
|
||||
expect(filter.isIgnored("package.json")).toBe(false);
|
||||
});
|
||||
|
||||
it("ignores lock files", () => {
|
||||
const filter = createIgnoreFilter(testDir);
|
||||
expect(filter.isIgnored("pnpm-lock.yaml")).toBe(true);
|
||||
expect(filter.isIgnored("package-lock.json")).toBe(true);
|
||||
expect(filter.isIgnored("yarn.lock")).toBe(true);
|
||||
});
|
||||
|
||||
it("ignores binary/asset files", () => {
|
||||
const filter = createIgnoreFilter(testDir);
|
||||
expect(filter.isIgnored("logo.png")).toBe(true);
|
||||
expect(filter.isIgnored("font.woff2")).toBe(true);
|
||||
expect(filter.isIgnored("doc.pdf")).toBe(true);
|
||||
});
|
||||
|
||||
it("ignores generated files", () => {
|
||||
const filter = createIgnoreFilter(testDir);
|
||||
expect(filter.isIgnored("bundle.min.js")).toBe(true);
|
||||
expect(filter.isIgnored("style.min.css")).toBe(true);
|
||||
expect(filter.isIgnored("source.map")).toBe(true);
|
||||
});
|
||||
|
||||
it("ignores IDE directories", () => {
|
||||
const filter = createIgnoreFilter(testDir);
|
||||
expect(filter.isIgnored(".idea/workspace.xml")).toBe(true);
|
||||
expect(filter.isIgnored(".vscode/settings.json")).toBe(true);
|
||||
});
|
||||
});
|
||||
|
||||
describe("createIgnoreFilter with user .understandignore", () => {
|
||||
it("reads patterns from .understand-anything/.understandignore", () => {
|
||||
writeFileSync(
|
||||
join(testDir, ".understand-anything", ".understandignore"),
|
||||
"# Exclude tests\n__tests__/\n*.test.ts\n"
|
||||
);
|
||||
const filter = createIgnoreFilter(testDir);
|
||||
expect(filter.isIgnored("__tests__/foo.test.ts")).toBe(true);
|
||||
expect(filter.isIgnored("src/utils.test.ts")).toBe(true);
|
||||
expect(filter.isIgnored("src/utils.ts")).toBe(false);
|
||||
});
|
||||
|
||||
it("reads patterns from project root .understandignore", () => {
|
||||
writeFileSync(
|
||||
join(testDir, ".understandignore"),
|
||||
"docs/\n"
|
||||
);
|
||||
const filter = createIgnoreFilter(testDir);
|
||||
expect(filter.isIgnored("docs/README.md")).toBe(true);
|
||||
expect(filter.isIgnored("src/index.ts")).toBe(false);
|
||||
});
|
||||
|
||||
it("handles # comments and blank lines", () => {
|
||||
writeFileSync(
|
||||
join(testDir, ".understand-anything", ".understandignore"),
|
||||
"# This is a comment\n\n\nfixtures/\n\n# Another comment\n"
|
||||
);
|
||||
const filter = createIgnoreFilter(testDir);
|
||||
expect(filter.isIgnored("fixtures/data.json")).toBe(true);
|
||||
expect(filter.isIgnored("src/index.ts")).toBe(false);
|
||||
});
|
||||
|
||||
it("supports ! negation to override defaults", () => {
|
||||
writeFileSync(
|
||||
join(testDir, ".understand-anything", ".understandignore"),
|
||||
"!dist/\n"
|
||||
);
|
||||
const filter = createIgnoreFilter(testDir);
|
||||
// dist/ is in defaults but negated by user
|
||||
expect(filter.isIgnored("dist/index.js")).toBe(false);
|
||||
});
|
||||
|
||||
it("supports ** recursive matching", () => {
|
||||
writeFileSync(
|
||||
join(testDir, ".understand-anything", ".understandignore"),
|
||||
"**/snapshots/\n"
|
||||
);
|
||||
const filter = createIgnoreFilter(testDir);
|
||||
expect(filter.isIgnored("src/components/snapshots/Button.snap")).toBe(true);
|
||||
expect(filter.isIgnored("snapshots/foo.snap")).toBe(true);
|
||||
});
|
||||
|
||||
it("merges .understand-anything/ and root .understandignore", () => {
|
||||
writeFileSync(
|
||||
join(testDir, ".understand-anything", ".understandignore"),
|
||||
"__tests__/\n"
|
||||
);
|
||||
writeFileSync(
|
||||
join(testDir, ".understandignore"),
|
||||
"fixtures/\n"
|
||||
);
|
||||
const filter = createIgnoreFilter(testDir);
|
||||
expect(filter.isIgnored("__tests__/foo.ts")).toBe(true);
|
||||
expect(filter.isIgnored("fixtures/data.json")).toBe(true);
|
||||
expect(filter.isIgnored("src/index.ts")).toBe(false);
|
||||
});
|
||||
});
|
||||
});
|
||||
```
|
||||
|
||||
- [ ] **Step 2: Run tests to verify they fail**
|
||||
|
||||
Run: `pnpm --filter @understand-anything/core test -- --run src/__tests__/ignore-filter.test.ts`
|
||||
Expected: FAIL — module not found
|
||||
|
||||
- [ ] **Step 3: Implement IgnoreFilter**
|
||||
|
||||
Create `understand-anything-plugin/packages/core/src/ignore-filter.ts`:
|
||||
|
||||
```typescript
|
||||
import ignore, { type Ignore } from "ignore";
|
||||
import { readFileSync, existsSync } from "node:fs";
|
||||
import { join } from "node:path";
|
||||
|
||||
/**
|
||||
* Hardcoded default ignore patterns matching the project-scanner agent's
|
||||
* exclusion rules, plus bin/obj for .NET projects.
|
||||
*/
|
||||
export const DEFAULT_IGNORE_PATTERNS: string[] = [
|
||||
// Dependency directories
|
||||
"node_modules/",
|
||||
".git/",
|
||||
"vendor/",
|
||||
"venv/",
|
||||
".venv/",
|
||||
"__pycache__/",
|
||||
|
||||
// Build output
|
||||
"dist/",
|
||||
"build/",
|
||||
"out/",
|
||||
"coverage/",
|
||||
".next/",
|
||||
".cache/",
|
||||
".turbo/",
|
||||
"target/",
|
||||
"bin/",
|
||||
"obj/",
|
||||
|
||||
// Lock files
|
||||
"*.lock",
|
||||
"package-lock.json",
|
||||
"yarn.lock",
|
||||
"pnpm-lock.yaml",
|
||||
|
||||
// Binary/asset files
|
||||
"*.png",
|
||||
"*.jpg",
|
||||
"*.jpeg",
|
||||
"*.gif",
|
||||
"*.svg",
|
||||
"*.ico",
|
||||
"*.woff",
|
||||
"*.woff2",
|
||||
"*.ttf",
|
||||
"*.eot",
|
||||
"*.mp3",
|
||||
"*.mp4",
|
||||
"*.pdf",
|
||||
"*.zip",
|
||||
"*.tar",
|
||||
"*.gz",
|
||||
|
||||
// Generated files
|
||||
"*.min.js",
|
||||
"*.min.css",
|
||||
"*.map",
|
||||
"*.generated.*",
|
||||
|
||||
// IDE/editor
|
||||
".idea/",
|
||||
".vscode/",
|
||||
|
||||
// Misc
|
||||
"LICENSE",
|
||||
".gitignore",
|
||||
".editorconfig",
|
||||
".prettierrc",
|
||||
".eslintrc*",
|
||||
"*.log",
|
||||
];
|
||||
|
||||
export interface IgnoreFilter {
|
||||
/** Returns true if the given relative path should be excluded from analysis. */
|
||||
isIgnored(relativePath: string): boolean;
|
||||
}
|
||||
|
||||
/**
|
||||
* Creates an IgnoreFilter that merges hardcoded defaults with user-defined
|
||||
* patterns from .understandignore files.
|
||||
*
|
||||
* Pattern load order (later entries can override earlier ones via ! negation):
|
||||
* 1. Hardcoded defaults
|
||||
* 2. .understand-anything/.understandignore (if exists)
|
||||
* 3. .understandignore at project root (if exists)
|
||||
*/
|
||||
export function createIgnoreFilter(projectRoot: string): IgnoreFilter {
|
||||
const ig: Ignore = ignore();
|
||||
|
||||
// Layer 1: hardcoded defaults
|
||||
ig.add(DEFAULT_IGNORE_PATTERNS);
|
||||
|
||||
// Layer 2: .understand-anything/.understandignore
|
||||
const projectIgnorePath = join(projectRoot, ".understand-anything", ".understandignore");
|
||||
if (existsSync(projectIgnorePath)) {
|
||||
const content = readFileSync(projectIgnorePath, "utf-8");
|
||||
ig.add(content);
|
||||
}
|
||||
|
||||
// Layer 3: .understandignore at project root
|
||||
const rootIgnorePath = join(projectRoot, ".understandignore");
|
||||
if (existsSync(rootIgnorePath)) {
|
||||
const content = readFileSync(rootIgnorePath, "utf-8");
|
||||
ig.add(content);
|
||||
}
|
||||
|
||||
return {
|
||||
isIgnored(relativePath: string): boolean {
|
||||
return ig.ignores(relativePath);
|
||||
},
|
||||
};
|
||||
}
|
||||
```
|
||||
|
||||
- [ ] **Step 4: Run tests to verify they pass**
|
||||
|
||||
Run: `pnpm --filter @understand-anything/core test -- --run src/__tests__/ignore-filter.test.ts`
|
||||
Expected: All tests PASS
|
||||
|
||||
- [ ] **Step 5: Build to verify no type errors**
|
||||
|
||||
Run: `pnpm --filter @understand-anything/core build`
|
||||
Expected: Clean build
|
||||
|
||||
- [ ] **Step 6: Commit**
|
||||
|
||||
```bash
|
||||
git add understand-anything-plugin/packages/core/src/ignore-filter.ts understand-anything-plugin/packages/core/src/__tests__/ignore-filter.test.ts
|
||||
git commit -m "feat(core): add IgnoreFilter module with .understandignore parsing and tests"
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Task 3: Create IgnoreGenerator module with tests (TDD)
|
||||
|
||||
**Files:**
|
||||
- Create: `understand-anything-plugin/packages/core/src/ignore-generator.ts`
|
||||
- Create: `understand-anything-plugin/packages/core/src/__tests__/ignore-generator.test.ts`
|
||||
|
||||
- [ ] **Step 1: Write the failing tests**
|
||||
|
||||
Create `understand-anything-plugin/packages/core/src/__tests__/ignore-generator.test.ts`:
|
||||
|
||||
```typescript
|
||||
import { describe, it, expect, beforeEach, afterEach } from "vitest";
|
||||
import { generateStarterIgnoreFile } from "../ignore-generator";
|
||||
import { mkdirSync, rmSync, writeFileSync } from "node:fs";
|
||||
import { join } from "node:path";
|
||||
import { tmpdir } from "node:os";
|
||||
|
||||
describe("generateStarterIgnoreFile", () => {
|
||||
let testDir: string;
|
||||
|
||||
beforeEach(() => {
|
||||
testDir = join(tmpdir(), `ignore-gen-test-${Date.now()}`);
|
||||
mkdirSync(testDir, { recursive: true });
|
||||
});
|
||||
|
||||
afterEach(() => {
|
||||
rmSync(testDir, { recursive: true, force: true });
|
||||
});
|
||||
|
||||
it("includes a header comment explaining the file", () => {
|
||||
const content = generateStarterIgnoreFile(testDir);
|
||||
expect(content).toContain(".understandignore");
|
||||
expect(content).toContain("same as .gitignore");
|
||||
expect(content).toContain("Built-in defaults");
|
||||
});
|
||||
|
||||
it("all suggestions are commented out", () => {
|
||||
// Create some directories to trigger suggestions
|
||||
mkdirSync(join(testDir, "__tests__"), { recursive: true });
|
||||
mkdirSync(join(testDir, "docs"), { recursive: true });
|
||||
const content = generateStarterIgnoreFile(testDir);
|
||||
const lines = content.split("\n").filter((l) => l.trim() && !l.startsWith("#"));
|
||||
// No active (uncommented) patterns
|
||||
expect(lines).toHaveLength(0);
|
||||
});
|
||||
|
||||
it("suggests __tests__ when __tests__ directory exists", () => {
|
||||
mkdirSync(join(testDir, "__tests__"), { recursive: true });
|
||||
const content = generateStarterIgnoreFile(testDir);
|
||||
expect(content).toContain("# __tests__/");
|
||||
});
|
||||
|
||||
it("suggests docs when docs directory exists", () => {
|
||||
mkdirSync(join(testDir, "docs"), { recursive: true });
|
||||
const content = generateStarterIgnoreFile(testDir);
|
||||
expect(content).toContain("# docs/");
|
||||
});
|
||||
|
||||
it("suggests test directories when they exist", () => {
|
||||
mkdirSync(join(testDir, "test"), { recursive: true });
|
||||
mkdirSync(join(testDir, "tests"), { recursive: true });
|
||||
const content = generateStarterIgnoreFile(testDir);
|
||||
expect(content).toContain("# test/");
|
||||
expect(content).toContain("# tests/");
|
||||
});
|
||||
|
||||
it("suggests fixtures when fixtures directory exists", () => {
|
||||
mkdirSync(join(testDir, "fixtures"), { recursive: true });
|
||||
const content = generateStarterIgnoreFile(testDir);
|
||||
expect(content).toContain("# fixtures/");
|
||||
});
|
||||
|
||||
it("suggests examples when examples directory exists", () => {
|
||||
mkdirSync(join(testDir, "examples"), { recursive: true });
|
||||
const content = generateStarterIgnoreFile(testDir);
|
||||
expect(content).toContain("# examples/");
|
||||
});
|
||||
|
||||
it("suggests .storybook when .storybook directory exists", () => {
|
||||
mkdirSync(join(testDir, ".storybook"), { recursive: true });
|
||||
const content = generateStarterIgnoreFile(testDir);
|
||||
expect(content).toContain("# .storybook/");
|
||||
});
|
||||
|
||||
it("suggests migrations when migrations directory exists", () => {
|
||||
mkdirSync(join(testDir, "migrations"), { recursive: true });
|
||||
const content = generateStarterIgnoreFile(testDir);
|
||||
expect(content).toContain("# migrations/");
|
||||
});
|
||||
|
||||
it("suggests scripts when scripts directory exists", () => {
|
||||
mkdirSync(join(testDir, "scripts"), { recursive: true });
|
||||
const content = generateStarterIgnoreFile(testDir);
|
||||
expect(content).toContain("# scripts/");
|
||||
});
|
||||
|
||||
it("always includes generic suggestions", () => {
|
||||
const content = generateStarterIgnoreFile(testDir);
|
||||
expect(content).toContain("# *.snap");
|
||||
expect(content).toContain("# *.test.*");
|
||||
expect(content).toContain("# *.spec.*");
|
||||
});
|
||||
|
||||
it("does not suggest directories that don't exist", () => {
|
||||
const content = generateStarterIgnoreFile(testDir);
|
||||
// __tests__ doesn't exist, so it shouldn't be in directory suggestions
|
||||
// (it may still be in generic test file patterns)
|
||||
expect(content).not.toContain("# __tests__/");
|
||||
expect(content).not.toContain("# .storybook/");
|
||||
});
|
||||
});
|
||||
```
|
||||
|
||||
- [ ] **Step 2: Run tests to verify they fail**
|
||||
|
||||
Run: `pnpm --filter @understand-anything/core test -- --run src/__tests__/ignore-generator.test.ts`
|
||||
Expected: FAIL — module not found
|
||||
|
||||
- [ ] **Step 3: Implement IgnoreGenerator**
|
||||
|
||||
Create `understand-anything-plugin/packages/core/src/ignore-generator.ts`:
|
||||
|
||||
```typescript
|
||||
import { existsSync } from "node:fs";
|
||||
import { join } from "node:path";
|
||||
|
||||
const HEADER = `# .understandignore — patterns for files/dirs to exclude from analysis
|
||||
# Syntax: same as .gitignore (globs, # comments, ! negation, trailing / for dirs)
|
||||
# Lines below are suggestions — uncomment to activate.
|
||||
# Use ! prefix to force-include something excluded by defaults.
|
||||
#
|
||||
# Built-in defaults (always excluded unless negated):
|
||||
# node_modules/, .git/, dist/, build/, bin/, obj/, *.lock, *.min.js, etc.
|
||||
#
|
||||
`;
|
||||
|
||||
/** Directories to check for and suggest excluding. */
|
||||
const DETECTABLE_DIRS = [
|
||||
{ dir: "__tests__", pattern: "__tests__/" },
|
||||
{ dir: "test", pattern: "test/" },
|
||||
{ dir: "tests", pattern: "tests/" },
|
||||
{ dir: "fixtures", pattern: "fixtures/" },
|
||||
{ dir: "testdata", pattern: "testdata/" },
|
||||
{ dir: "docs", pattern: "docs/" },
|
||||
{ dir: "examples", pattern: "examples/" },
|
||||
{ dir: "scripts", pattern: "scripts/" },
|
||||
{ dir: "migrations", pattern: "migrations/" },
|
||||
{ dir: ".storybook", pattern: ".storybook/" },
|
||||
];
|
||||
|
||||
/** Always-included generic suggestions. */
|
||||
const GENERIC_SUGGESTIONS = [
|
||||
"*.test.*",
|
||||
"*.spec.*",
|
||||
"*.snap",
|
||||
];
|
||||
|
||||
/**
|
||||
* Generates a starter .understandignore file by scanning the project root
|
||||
* for common directories and suggesting them as commented-out exclusions.
|
||||
*
|
||||
* All suggestions are commented out — the user must uncomment to activate.
|
||||
* Returns the file content as a string.
|
||||
*/
|
||||
export function generateStarterIgnoreFile(projectRoot: string): string {
|
||||
const sections: string[] = [HEADER];
|
||||
|
||||
// Detected directory suggestions
|
||||
const detected: string[] = [];
|
||||
for (const { dir, pattern } of DETECTABLE_DIRS) {
|
||||
if (existsSync(join(projectRoot, dir))) {
|
||||
detected.push(pattern);
|
||||
}
|
||||
}
|
||||
|
||||
if (detected.length > 0) {
|
||||
sections.push("# --- Detected directories (uncomment to exclude) ---\n");
|
||||
for (const pattern of detected) {
|
||||
sections.push(`# ${pattern}`);
|
||||
}
|
||||
sections.push("");
|
||||
}
|
||||
|
||||
// Generic suggestions (always included)
|
||||
sections.push("# --- Test file patterns (uncomment to exclude) ---\n");
|
||||
for (const pattern of GENERIC_SUGGESTIONS) {
|
||||
sections.push(`# ${pattern}`);
|
||||
}
|
||||
sections.push("");
|
||||
|
||||
return sections.join("\n");
|
||||
}
|
||||
```
|
||||
|
||||
- [ ] **Step 4: Run tests to verify they pass**
|
||||
|
||||
Run: `pnpm --filter @understand-anything/core test -- --run src/__tests__/ignore-generator.test.ts`
|
||||
Expected: All tests PASS
|
||||
|
||||
- [ ] **Step 5: Build**
|
||||
|
||||
Run: `pnpm --filter @understand-anything/core build`
|
||||
Expected: Clean build
|
||||
|
||||
- [ ] **Step 6: Commit**
|
||||
|
||||
```bash
|
||||
git add understand-anything-plugin/packages/core/src/ignore-generator.ts understand-anything-plugin/packages/core/src/__tests__/ignore-generator.test.ts
|
||||
git commit -m "feat(core): add IgnoreGenerator for starter .understandignore file creation"
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Task 4: Export new modules from core
|
||||
|
||||
**Files:**
|
||||
- Modify: `understand-anything-plugin/packages/core/src/index.ts`
|
||||
|
||||
- [ ] **Step 1: Add exports**
|
||||
|
||||
Add to the end of `understand-anything-plugin/packages/core/src/index.ts`:
|
||||
|
||||
```typescript
|
||||
export {
|
||||
createIgnoreFilter,
|
||||
DEFAULT_IGNORE_PATTERNS,
|
||||
type IgnoreFilter,
|
||||
} from "./ignore-filter.js";
|
||||
export { generateStarterIgnoreFile } from "./ignore-generator.js";
|
||||
```
|
||||
|
||||
- [ ] **Step 2: Build and run all tests**
|
||||
|
||||
Run: `pnpm --filter @understand-anything/core build && pnpm --filter @understand-anything/core test -- --run`
|
||||
Expected: Clean build, all tests pass
|
||||
|
||||
- [ ] **Step 3: Commit**
|
||||
|
||||
```bash
|
||||
git add understand-anything-plugin/packages/core/src/index.ts
|
||||
git commit -m "feat(core): export IgnoreFilter and IgnoreGenerator from core index"
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Task 5: Update project-scanner agent
|
||||
|
||||
**Files:**
|
||||
- Modify: `understand-anything-plugin/agents/project-scanner.md`
|
||||
|
||||
- [ ] **Step 1: Read the current project-scanner.md**
|
||||
|
||||
Read `understand-anything-plugin/agents/project-scanner.md` to understand the current structure.
|
||||
|
||||
- [ ] **Step 2: Add bin/ and obj/ to hardcoded exclusions**
|
||||
|
||||
In Step 2 (Exclusion Filtering), add `bin/` and `obj/` to the "Build output" line:
|
||||
|
||||
Change:
|
||||
```
|
||||
- **Build output:** paths with a directory segment matching `dist/`, `build/`, `out/`, `coverage/`, `.next/`, `.cache/`, `.turbo/`, `target/` (Rust)
|
||||
```
|
||||
|
||||
To:
|
||||
```
|
||||
- **Build output:** paths with a directory segment matching `dist/`, `build/`, `out/`, `coverage/`, `.next/`, `.cache/`, `.turbo/`, `target/` (Rust), `bin/` (.NET), `obj/` (.NET)
|
||||
```
|
||||
|
||||
- [ ] **Step 3: Add Layer 2 filtering step**
|
||||
|
||||
After Step 2 (Exclusion Filtering), add a new step:
|
||||
|
||||
```markdown
|
||||
**Step 2.5 -- User-Configured Filtering (.understandignore)**
|
||||
|
||||
After applying the hardcoded exclusion filters above, apply user-configured patterns from `.understandignore`:
|
||||
|
||||
1. Check if `.understand-anything/.understandignore` exists in the project root. If so, read it.
|
||||
2. Check if `.understandignore` exists in the project root. If so, read it.
|
||||
3. Parse both files using `.gitignore` syntax (glob patterns, `#` comments, blank lines ignored, `!` prefix for negation, trailing `/` for directories, `**/` for recursive matching).
|
||||
4. Filter the remaining file list through these patterns. Files matching any pattern are excluded.
|
||||
5. `!` negation patterns override the hardcoded exclusions from Step 2 (e.g., `!dist/` force-includes dist/).
|
||||
6. Track the count of files removed by this step as `filteredByIgnore`.
|
||||
|
||||
This filtering must be deterministic (not LLM-based). Use a Node.js script with the `ignore` npm package if implementing programmatically, or apply the patterns manually if the file list is small.
|
||||
```
|
||||
|
||||
- [ ] **Step 4: Update scan output schema**
|
||||
|
||||
Find the output JSON schema section and add `filteredByIgnore` field:
|
||||
|
||||
```json
|
||||
{
|
||||
"name": "...",
|
||||
"description": "...",
|
||||
"languages": ["..."],
|
||||
"frameworks": ["..."],
|
||||
"files": [...],
|
||||
"totalFiles": 123,
|
||||
"filteredByIgnore": 5,
|
||||
"estimatedComplexity": "moderate",
|
||||
"importMap": {}
|
||||
}
|
||||
```
|
||||
|
||||
- [ ] **Step 5: Commit**
|
||||
|
||||
```bash
|
||||
git add understand-anything-plugin/agents/project-scanner.md
|
||||
git commit -m "feat(agent): add .understandignore support and bin/obj exclusions to project-scanner"
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Task 6: Update /understand skill with Phase 0.5
|
||||
|
||||
**Files:**
|
||||
- Modify: `understand-anything-plugin/skills/understand/SKILL.md`
|
||||
|
||||
- [ ] **Step 1: Read the current SKILL.md Phase 0 section**
|
||||
|
||||
Read `understand-anything-plugin/skills/understand/SKILL.md` lines 22-80 to understand Phase 0.
|
||||
|
||||
- [ ] **Step 2: Add Phase 0.5 after Phase 0**
|
||||
|
||||
After the Phase 0 section (after the `---` separator before Phase 1), insert:
|
||||
|
||||
```markdown
|
||||
## Phase 0.5 — Ignore Configuration
|
||||
|
||||
Set up and verify the `.understandignore` file before scanning.
|
||||
|
||||
1. Check if `$PROJECT_ROOT/.understand-anything/.understandignore` exists.
|
||||
2. **If it does NOT exist**, generate a starter file:
|
||||
- Run a Node.js script (or inline logic) that scans `$PROJECT_ROOT` for common directories (`__tests__/`, `test/`, `tests/`, `fixtures/`, `testdata/`, `docs/`, `examples/`, `scripts/`, `migrations/`, `.storybook/`) and generates a `.understandignore` file with commented-out suggestions.
|
||||
- Write the generated content to `$PROJECT_ROOT/.understand-anything/.understandignore`.
|
||||
- Report to the user:
|
||||
> "Generated `.understand-anything/.understandignore` with suggested exclusions based on your project structure. Please review it and uncomment any patterns you'd like to exclude from analysis. When ready, confirm to continue."
|
||||
- **Wait for user confirmation before proceeding.**
|
||||
3. **If it already exists**, report:
|
||||
> "Found `.understand-anything/.understandignore`. Review it if needed, then confirm to continue."
|
||||
- **Wait for user confirmation before proceeding.**
|
||||
4. After confirmation, proceed to Phase 1.
|
||||
|
||||
**Note:** The `.understandignore` file uses `.gitignore` syntax. The user can add patterns to exclude files from analysis, or use `!` prefix to force-include files excluded by built-in defaults (e.g., `!dist/` to analyze dist/ files).
|
||||
|
||||
---
|
||||
```
|
||||
|
||||
- [ ] **Step 3: Update Phase 1 reporting**
|
||||
|
||||
In the Phase 1 section, after the gate check (~line 114), add a note about reporting ignore stats:
|
||||
|
||||
```markdown
|
||||
After scanning, if the scan result includes `filteredByIgnore > 0`, report:
|
||||
> "Scanned {totalFiles} files ({filteredByIgnore} excluded by .understandignore)"
|
||||
```
|
||||
|
||||
- [ ] **Step 4: Commit**
|
||||
|
||||
```bash
|
||||
git add understand-anything-plugin/skills/understand/SKILL.md
|
||||
git commit -m "feat(skill): add Phase 0.5 for .understandignore setup and review pause"
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Task 7: Build, test, and verify end-to-end
|
||||
|
||||
**Files:**
|
||||
- All modified files
|
||||
|
||||
- [ ] **Step 1: Build core**
|
||||
|
||||
Run: `pnpm --filter @understand-anything/core build`
|
||||
Expected: Clean build
|
||||
|
||||
- [ ] **Step 2: Run all core tests**
|
||||
|
||||
Run: `pnpm --filter @understand-anything/core test -- --run`
|
||||
Expected: All tests pass (existing + new ignore-filter + ignore-generator tests)
|
||||
|
||||
- [ ] **Step 3: Build skill package**
|
||||
|
||||
Run: `pnpm --filter @understand-anything/skill build`
|
||||
Expected: Clean build
|
||||
|
||||
- [ ] **Step 4: Verify files exist**
|
||||
|
||||
Run:
|
||||
```bash
|
||||
ls understand-anything-plugin/packages/core/src/ignore-filter.ts understand-anything-plugin/packages/core/src/ignore-generator.ts
|
||||
```
|
||||
Expected: Both files listed
|
||||
|
||||
- [ ] **Step 5: Verify exports work**
|
||||
|
||||
Run:
|
||||
```bash
|
||||
node -e "import('@understand-anything/core').then(m => { console.log('IgnoreFilter:', typeof m.createIgnoreFilter); console.log('Generator:', typeof m.generateStarterIgnoreFile); })"
|
||||
```
|
||||
Expected: Both show `function`
|
||||
|
||||
- [ ] **Step 6: Final commit (if any unstaged changes)**
|
||||
|
||||
```bash
|
||||
git status
|
||||
# If clean, skip. If changes exist:
|
||||
git add -A && git commit -m "chore: final verification for .understandignore support"
|
||||
```
|
||||
@@ -0,0 +1,856 @@
|
||||
# Language-Specific Extractor Architecture Implementation Plan
|
||||
|
||||
> **For agentic workers:** REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (`- [ ]`) syntax for tracking.
|
||||
|
||||
**Goal:** (1) Decouple AST extraction logic from TS/JS-specific node types so 8 additional code languages (Python, Go, Rust, Java, Ruby, PHP, C/C++, C#) get tree-sitter-powered structural analysis. Swift and Kotlin are excluded — no WASM grammar packages available. (2) Replace the file-analyzer agent's ad-hoc regex script generation with a deterministic, pre-built tree-sitter extraction script.
|
||||
|
||||
**Architecture:** Introduce a `LanguageExtractor` interface that each language implements. `TreeSitterPlugin` delegates extraction to the registered extractor for the file's language. A bundled `extract-structure.mjs` script in `skills/understand/` uses `PluginRegistry` (which includes both `TreeSitterPlugin` and the non-code parsers) to provide deterministic structural extraction for the file-analyzer agent — replacing the current approach where the LLM writes throwaway regex scripts every run.
|
||||
|
||||
**Tech Stack:** web-tree-sitter (WASM), TypeScript, Vitest
|
||||
|
||||
---
|
||||
|
||||
## File Structure
|
||||
|
||||
```
|
||||
packages/core/src/plugins/
|
||||
├── extractors/
|
||||
│ ├── types.ts # LanguageExtractor interface + TreeSitterNode re-export
|
||||
│ ├── base-extractor.ts # Shared utilities (traverse, getStringValue)
|
||||
│ ├── typescript-extractor.ts # TS/JS (moved from tree-sitter-plugin.ts)
|
||||
│ ├── python-extractor.ts
|
||||
│ ├── go-extractor.ts
|
||||
│ ├── rust-extractor.ts
|
||||
│ ├── java-extractor.ts
|
||||
│ ├── ruby-extractor.ts
|
||||
│ ├── php-extractor.ts
|
||||
│ ├── cpp-extractor.ts
|
||||
│ ├── csharp-extractor.ts
|
||||
│ └── index.ts # builtinExtractors array + re-exports
|
||||
├── tree-sitter-plugin.ts # Refactored to use extractors
|
||||
└── tree-sitter-plugin.test.ts # Existing tests (should still pass)
|
||||
|
||||
packages/core/src/plugins/__tests__/
|
||||
└── extractors.test.ts # Tests for all new extractors
|
||||
|
||||
skills/understand/
|
||||
├── extract-structure.mjs # Pre-built tree-sitter extraction script (NEW)
|
||||
└── SKILL.md # Updated to reference extract-structure.mjs
|
||||
|
||||
agents/
|
||||
└── file-analyzer.md # Phase 1 rewritten to execute pre-built script
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### Task 1: Create LanguageExtractor interface and shared utilities
|
||||
|
||||
**Files:**
|
||||
- Create: `packages/core/src/plugins/extractors/types.ts`
|
||||
- Create: `packages/core/src/plugins/extractors/base-extractor.ts`
|
||||
|
||||
- [ ] **Step 1: Create the extractor interface**
|
||||
|
||||
```typescript
|
||||
// packages/core/src/plugins/extractors/types.ts
|
||||
import type { StructuralAnalysis, CallGraphEntry } from "../../types.js";
|
||||
|
||||
// Re-export the tree-sitter Node type for use by extractors
|
||||
export type TreeSitterNode = import("web-tree-sitter").Node;
|
||||
|
||||
/**
|
||||
* Language-specific extractor that maps a tree-sitter AST
|
||||
* to the common StructuralAnalysis / CallGraphEntry types.
|
||||
*/
|
||||
export interface LanguageExtractor {
|
||||
/** Language IDs this extractor handles (must match LanguageConfig.id) */
|
||||
languageIds: string[];
|
||||
|
||||
/** Extract functions, classes, imports, exports from the root AST node */
|
||||
extractStructure(rootNode: TreeSitterNode): StructuralAnalysis;
|
||||
|
||||
/** Extract caller→callee relationships from the root AST node */
|
||||
extractCallGraph(rootNode: TreeSitterNode): CallGraphEntry[];
|
||||
}
|
||||
```
|
||||
|
||||
- [ ] **Step 2: Create base-extractor with shared utilities**
|
||||
|
||||
Move `traverse()` and `getStringValue()` from `tree-sitter-plugin.ts` into a shared module:
|
||||
|
||||
```typescript
|
||||
// packages/core/src/plugins/extractors/base-extractor.ts
|
||||
import type { TreeSitterNode } from "./types.js";
|
||||
|
||||
/** Recursively traverse an AST tree, calling the visitor for each node. */
|
||||
export function traverse(
|
||||
node: TreeSitterNode,
|
||||
visitor: (node: TreeSitterNode) => void,
|
||||
): void {
|
||||
visitor(node);
|
||||
for (let i = 0; i < node.childCount; i++) {
|
||||
const child = node.child(i);
|
||||
if (child) traverse(child, visitor);
|
||||
}
|
||||
}
|
||||
|
||||
/** Extract the unquoted string value from a string-like node. */
|
||||
export function getStringValue(node: TreeSitterNode): string {
|
||||
for (let i = 0; i < node.childCount; i++) {
|
||||
const child = node.child(i);
|
||||
if (child && child.type === "string_fragment") {
|
||||
return child.text;
|
||||
}
|
||||
}
|
||||
return node.text.replace(/^['"`]|['"`]$/g, "");
|
||||
}
|
||||
|
||||
/** Find the first child matching a type. */
|
||||
export function findChild(node: TreeSitterNode, type: string): TreeSitterNode | null {
|
||||
for (let i = 0; i < node.childCount; i++) {
|
||||
const child = node.child(i);
|
||||
if (child && child.type === type) return child;
|
||||
}
|
||||
return null;
|
||||
}
|
||||
|
||||
/** Find all children matching a type. */
|
||||
export function findChildren(node: TreeSitterNode, type: string): TreeSitterNode[] {
|
||||
const result: TreeSitterNode[] = [];
|
||||
for (let i = 0; i < node.childCount; i++) {
|
||||
const child = node.child(i);
|
||||
if (child && child.type === type) result.push(child);
|
||||
}
|
||||
return result;
|
||||
}
|
||||
|
||||
/** Check if a node has a child of the given type (used for export/visibility checks). */
|
||||
export function hasChildOfType(node: TreeSitterNode, type: string): boolean {
|
||||
for (let i = 0; i < node.childCount; i++) {
|
||||
const child = node.child(i);
|
||||
if (child && child.type === type) return true;
|
||||
}
|
||||
return false;
|
||||
}
|
||||
```
|
||||
|
||||
- [ ] **Step 3: Commit**
|
||||
|
||||
```bash
|
||||
git add packages/core/src/plugins/extractors/types.ts packages/core/src/plugins/extractors/base-extractor.ts
|
||||
git commit -m "feat: add LanguageExtractor interface and shared base utilities"
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### Task 2: Move TS/JS extraction logic into TypeScriptExtractor
|
||||
|
||||
**Files:**
|
||||
- Create: `packages/core/src/plugins/extractors/typescript-extractor.ts`
|
||||
- Modify: `packages/core/src/plugins/tree-sitter-plugin.ts`
|
||||
|
||||
This is a pure refactor. All existing tests must still pass with zero changes.
|
||||
|
||||
- [ ] **Step 1: Create TypeScriptExtractor**
|
||||
|
||||
Move all the TS/JS-specific extraction methods (`extractFunction`, `extractClass`, `extractVariableDeclarations`, `extractImport`, `processExportStatement`, `extractParams`, `extractReturnType`, `extractImportSpecifiers`, and the call graph walker) from `tree-sitter-plugin.ts` into `typescript-extractor.ts`, implementing the `LanguageExtractor` interface.
|
||||
|
||||
The `languageIds` should be `["typescript", "javascript"]`. Do NOT include `"tsx"` — it is a synthetic key internal to `TreeSitterPlugin` for grammar selection, not a `LanguageConfig.id`. The tsx→typescript mapping is handled in `getExtractor()` below.
|
||||
|
||||
- [ ] **Step 2: Refactor TreeSitterPlugin to use extractors**
|
||||
|
||||
Replace the hardcoded extraction logic in `TreeSitterPlugin` with extractor dispatch:
|
||||
|
||||
```typescript
|
||||
// In TreeSitterPlugin
|
||||
private extractors = new Map<string, LanguageExtractor>();
|
||||
|
||||
registerExtractor(extractor: LanguageExtractor): void {
|
||||
for (const id of extractor.languageIds) {
|
||||
this.extractors.set(id, extractor);
|
||||
}
|
||||
}
|
||||
|
||||
private getExtractor(langKey: string): LanguageExtractor | null {
|
||||
// tsx is a synthetic grammar key — extraction logic is identical to typescript
|
||||
const key = langKey === "tsx" ? "typescript" : langKey;
|
||||
return this.extractors.get(key) ?? null;
|
||||
}
|
||||
```
|
||||
|
||||
The `analyzeFile()` method becomes:
|
||||
|
||||
```typescript
|
||||
analyzeFile(filePath: string, content: string): StructuralAnalysis {
|
||||
const parser = this.getParser(filePath);
|
||||
if (!parser) return { functions: [], classes: [], imports: [], exports: [] };
|
||||
|
||||
const tree = parser.parse(content);
|
||||
if (!tree) { parser.delete(); return { functions: [], classes: [], imports: [], exports: [] }; }
|
||||
|
||||
const langKey = this.languageKeyFromPath(filePath);
|
||||
const extractor = langKey ? this.getExtractor(langKey) : null;
|
||||
|
||||
let result: StructuralAnalysis;
|
||||
if (extractor) {
|
||||
result = extractor.extractStructure(tree.rootNode);
|
||||
} else {
|
||||
result = { functions: [], classes: [], imports: [], exports: [] };
|
||||
}
|
||||
|
||||
tree.delete();
|
||||
parser.delete();
|
||||
return result;
|
||||
}
|
||||
```
|
||||
|
||||
The `extractCallGraph()` method follows the same pattern — parser lifecycle must be managed identically:
|
||||
|
||||
```typescript
|
||||
extractCallGraph(filePath: string, content: string): CallGraphEntry[] {
|
||||
const parser = this.getParser(filePath);
|
||||
if (!parser) return [];
|
||||
|
||||
const tree = parser.parse(content);
|
||||
if (!tree) { parser.delete(); return []; }
|
||||
|
||||
const langKey = this.languageKeyFromPath(filePath);
|
||||
const extractor = langKey ? this.getExtractor(langKey) : null;
|
||||
const result = extractor ? extractor.extractCallGraph(tree.rootNode) : [];
|
||||
|
||||
tree.delete();
|
||||
parser.delete();
|
||||
return result;
|
||||
}
|
||||
```
|
||||
|
||||
The constructor should accept an optional `extractors` array and register them. If none provided, register the built-in `TypeScriptExtractor` for backward compatibility.
|
||||
|
||||
- [ ] **Step 3: Run existing tests to verify zero behavior change**
|
||||
|
||||
Run: `pnpm --filter @understand-anything/core test`
|
||||
Expected: All 426 tests pass (identical to before)
|
||||
|
||||
- [ ] **Step 4: Commit**
|
||||
|
||||
```bash
|
||||
git add packages/core/src/plugins/extractors/typescript-extractor.ts packages/core/src/plugins/tree-sitter-plugin.ts
|
||||
git commit -m "refactor: move TS/JS extraction logic to TypeScriptExtractor, dispatch via LanguageExtractor interface"
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### Task 2.5: Add extractCallGraph to PluginRegistry and update DEFAULT_PLUGIN_CONFIG
|
||||
|
||||
**Files:**
|
||||
- Modify: `packages/core/src/plugins/registry.ts`
|
||||
- Modify: `packages/core/src/plugins/discovery.ts`
|
||||
|
||||
**Context:** `PluginRegistry` currently only exposes `analyzeFile` and `resolveImports` — it has no `extractCallGraph`. The `extract-structure.mjs` script (Task 13) needs call graph data through the registry. Also, `DEFAULT_PLUGIN_CONFIG` hardcodes `["typescript", "javascript"]` which needs to reflect all supported languages.
|
||||
|
||||
- [ ] **Step 1: Add extractCallGraph to PluginRegistry**
|
||||
|
||||
```typescript
|
||||
// In PluginRegistry (registry.ts)
|
||||
extractCallGraph(filePath: string, content: string): CallGraphEntry[] | null {
|
||||
const plugin = this.getPluginForFile(filePath);
|
||||
if (!plugin?.extractCallGraph) return null;
|
||||
return plugin.extractCallGraph(filePath, content);
|
||||
}
|
||||
```
|
||||
|
||||
- [ ] **Step 2: Update DEFAULT_PLUGIN_CONFIG to derive languages dynamically**
|
||||
|
||||
In `discovery.ts`, replace the hardcoded `["typescript", "javascript"]` with a dynamic derivation from `builtinLanguageConfigs`:
|
||||
|
||||
```typescript
|
||||
import { builtinLanguageConfigs } from "../languages/configs/index.js";
|
||||
|
||||
export const DEFAULT_PLUGIN_CONFIG: PluginConfig = {
|
||||
plugins: [
|
||||
{
|
||||
name: "tree-sitter",
|
||||
enabled: true,
|
||||
languages: builtinLanguageConfigs
|
||||
.filter((c) => c.treeSitter)
|
||||
.map((c) => c.id),
|
||||
},
|
||||
],
|
||||
};
|
||||
```
|
||||
|
||||
- [ ] **Step 3: Run tests, commit**
|
||||
|
||||
```bash
|
||||
pnpm --filter @understand-anything/core test
|
||||
git add packages/core/src/plugins/registry.ts packages/core/src/plugins/discovery.ts
|
||||
git commit -m "feat: add extractCallGraph to PluginRegistry, derive DEFAULT_PLUGIN_CONFIG from configs"
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### Task 3: Add npm dependencies and treeSitter configs for all 10 languages
|
||||
|
||||
**Files:**
|
||||
- Modify: `packages/core/package.json` (add 8 deps: python, go, rust, java, ruby, php, cpp, c-sharp)
|
||||
- Modify: 10 config files in `packages/core/src/languages/configs/`
|
||||
|
||||
- [ ] **Step 1: Add tree-sitter grammar dependencies to package.json**
|
||||
|
||||
Add to `dependencies`:
|
||||
|
||||
```json
|
||||
"tree-sitter-c-sharp": "^0.23.1",
|
||||
"tree-sitter-cpp": "^0.23.4",
|
||||
"tree-sitter-go": "^0.25.0",
|
||||
"tree-sitter-java": "^0.23.5",
|
||||
"tree-sitter-php": "^0.23.11",
|
||||
"tree-sitter-python": "^0.25.0",
|
||||
"tree-sitter-ruby": "^0.23.1",
|
||||
"tree-sitter-rust": "^0.24.0"
|
||||
```
|
||||
|
||||
Then run `pnpm install`.
|
||||
|
||||
- [ ] **Step 2: Add treeSitter field to all 10 language configs**
|
||||
|
||||
Each config gets a `treeSitter` block. Examples:
|
||||
|
||||
```typescript
|
||||
// python.ts
|
||||
treeSitter: { wasmPackage: "tree-sitter-python", wasmFile: "tree-sitter-python.wasm" },
|
||||
|
||||
// go.ts
|
||||
treeSitter: { wasmPackage: "tree-sitter-go", wasmFile: "tree-sitter-go.wasm" },
|
||||
|
||||
// rust.ts
|
||||
treeSitter: { wasmPackage: "tree-sitter-rust", wasmFile: "tree-sitter-rust.wasm" },
|
||||
|
||||
// java.ts
|
||||
treeSitter: { wasmPackage: "tree-sitter-java", wasmFile: "tree-sitter-java.wasm" },
|
||||
|
||||
// ruby.ts
|
||||
treeSitter: { wasmPackage: "tree-sitter-ruby", wasmFile: "tree-sitter-ruby.wasm" },
|
||||
|
||||
// php.ts
|
||||
treeSitter: { wasmPackage: "tree-sitter-php", wasmFile: "tree-sitter-php.wasm" },
|
||||
|
||||
// cpp.ts
|
||||
treeSitter: { wasmPackage: "tree-sitter-cpp", wasmFile: "tree-sitter-cpp.wasm" },
|
||||
|
||||
// csharp.ts
|
||||
treeSitter: { wasmPackage: "tree-sitter-c-sharp", wasmFile: "tree-sitter-c_sharp.wasm" },
|
||||
```
|
||||
|
||||
Note: Swift and Kotlin configs are NOT changed (no WASM packages available).
|
||||
|
||||
- [ ] **Step 3: Run pnpm install and verify WASM files resolve**
|
||||
|
||||
```bash
|
||||
pnpm install
|
||||
node -e "const r=require('module').createRequire(import.meta.url??__filename); console.log(r.resolve('tree-sitter-python/tree-sitter-python.wasm'))"
|
||||
```
|
||||
|
||||
- [ ] **Step 4: Commit**
|
||||
|
||||
```bash
|
||||
git add packages/core/package.json pnpm-lock.yaml packages/core/src/languages/configs/
|
||||
git commit -m "feat: add tree-sitter grammar deps and treeSitter configs for 10 languages"
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### Task 4: Create Python extractor
|
||||
|
||||
**Files:**
|
||||
- Create: `packages/core/src/plugins/extractors/python-extractor.ts`
|
||||
|
||||
- [ ] **Step 1: Write the Python extractor**
|
||||
|
||||
Key Python tree-sitter node types:
|
||||
- Functions: `function_definition` (name, parameters, return_type)
|
||||
- Classes: `class_definition` (name, body → methods + assignments as properties)
|
||||
- Imports: `import_statement`, `import_from_statement`
|
||||
- Decorated: `decorated_definition` wrapping function_definition or class_definition
|
||||
- Calls: `call` (function field)
|
||||
- No formal exports (all top-level names are "exported")
|
||||
|
||||
```typescript
|
||||
languageIds: ["python"]
|
||||
```
|
||||
|
||||
- [ ] **Step 2: Write tests for Python extractor**
|
||||
|
||||
Test with representative Python code:
|
||||
|
||||
```python
|
||||
import os
|
||||
from pathlib import Path
|
||||
from typing import Optional
|
||||
|
||||
class DataProcessor:
|
||||
name: str
|
||||
|
||||
def __init__(self, name: str):
|
||||
self.name = name
|
||||
|
||||
def process(self, data: list) -> dict:
|
||||
return transform(data)
|
||||
|
||||
def helper(x: int) -> str:
|
||||
return str(x)
|
||||
|
||||
@decorator
|
||||
def decorated_func():
|
||||
pass
|
||||
```
|
||||
|
||||
Verify: 2 functions (helper, decorated_func), 1 class (DataProcessor with methods __init__/process and property name), 3 imports, call graph (process→transform).
|
||||
|
||||
- [ ] **Step 3: Run tests**
|
||||
|
||||
Run: `pnpm --filter @understand-anything/core test`
|
||||
|
||||
- [ ] **Step 4: Commit**
|
||||
|
||||
---
|
||||
|
||||
### Task 5: Create Go extractor
|
||||
|
||||
**Files:**
|
||||
- Create: `packages/core/src/plugins/extractors/go-extractor.ts`
|
||||
|
||||
- [ ] **Step 1: Write the Go extractor**
|
||||
|
||||
Key Go tree-sitter node types:
|
||||
- Functions: `function_declaration` (name, parameter_list, result)
|
||||
- Methods: `method_declaration` (receiver, name, parameter_list, result)
|
||||
- Structs: `type_declaration` → `type_spec` → `struct_type`
|
||||
- Interfaces: `type_declaration` → `type_spec` → `interface_type`
|
||||
- Imports: `import_declaration` → `import_spec_list` → `import_spec`
|
||||
- Exports: capitalized first letter of name
|
||||
- Calls: `call_expression` (function field)
|
||||
|
||||
```typescript
|
||||
languageIds: ["go"]
|
||||
```
|
||||
|
||||
- [ ] **Step 2: Write tests**
|
||||
|
||||
Test with:
|
||||
```go
|
||||
package main
|
||||
|
||||
import (
|
||||
"fmt"
|
||||
"os"
|
||||
)
|
||||
|
||||
type Server struct {
|
||||
Host string
|
||||
Port int
|
||||
}
|
||||
|
||||
func (s *Server) Start() error {
|
||||
fmt.Println("starting")
|
||||
return nil
|
||||
}
|
||||
|
||||
func NewServer(host string, port int) *Server {
|
||||
return &Server{Host: host, Port: port}
|
||||
}
|
||||
```
|
||||
|
||||
Verify: 2 functions (Start, NewServer), 1 class/struct (Server with method Start, properties Host/Port), 2 imports, exports (Server, Start, NewServer — all capitalized), call graph (Start→fmt.Println).
|
||||
|
||||
- [ ] **Step 3: Run tests and commit**
|
||||
|
||||
---
|
||||
|
||||
### Task 6: Create Rust extractor
|
||||
|
||||
**Files:**
|
||||
- Create: `packages/core/src/plugins/extractors/rust-extractor.ts`
|
||||
|
||||
- [ ] **Step 1: Write the Rust extractor**
|
||||
|
||||
Key Rust tree-sitter node types:
|
||||
- Functions: `function_item` (name, parameters, return_type via `->`)
|
||||
- Structs: `struct_item` (name, field_declaration_list)
|
||||
- Enums: `enum_item`
|
||||
- Impl blocks: `impl_item` (type, body containing function_items)
|
||||
- Traits: `trait_item`
|
||||
- Imports: `use_declaration` (scoped_identifier, use_list, use_wildcard)
|
||||
- Exports: `visibility_modifier` containing `pub`
|
||||
- Calls: `call_expression` (function field)
|
||||
|
||||
```typescript
|
||||
languageIds: ["rust"]
|
||||
```
|
||||
|
||||
- [ ] **Step 2: Write tests**
|
||||
|
||||
Test with:
|
||||
```rust
|
||||
use std::collections::HashMap;
|
||||
use std::io::{self, Read};
|
||||
|
||||
pub struct Config {
|
||||
name: String,
|
||||
port: u16,
|
||||
}
|
||||
|
||||
impl Config {
|
||||
pub fn new(name: String, port: u16) -> Self {
|
||||
Config { name, port }
|
||||
}
|
||||
|
||||
fn validate(&self) -> bool {
|
||||
check_port(self.port)
|
||||
}
|
||||
}
|
||||
|
||||
pub fn check_port(port: u16) -> bool {
|
||||
port > 0
|
||||
}
|
||||
```
|
||||
|
||||
Verify: 3 functions (new, validate, check_port), 1 class/struct (Config with methods new/validate, properties name/port), 2 imports, exports (Config, new, check_port — those with `pub`), call graph (validate→check_port).
|
||||
|
||||
- [ ] **Step 3: Run tests and commit**
|
||||
|
||||
---
|
||||
|
||||
### Task 7: Create Java extractor
|
||||
|
||||
**Files:**
|
||||
- Create: `packages/core/src/plugins/extractors/java-extractor.ts`
|
||||
|
||||
- [ ] **Step 1: Write the Java extractor**
|
||||
|
||||
Key Java tree-sitter node types:
|
||||
- Methods: `method_declaration` (name, formal_parameters, type/dimensions)
|
||||
- Constructors: `constructor_declaration` (name, formal_parameters)
|
||||
- Classes: `class_declaration` (name, class_body)
|
||||
- Interfaces: `interface_declaration`
|
||||
- Fields: `field_declaration` (declarator → variable_declarator → identifier)
|
||||
- Imports: `import_declaration` (scoped_identifier)
|
||||
- Exports: `public` modifier (modifiers node)
|
||||
- Calls: `method_invocation` (name, object, arguments)
|
||||
|
||||
```typescript
|
||||
languageIds: ["java"]
|
||||
```
|
||||
|
||||
- [ ] **Step 2: Write tests with representative Java code, run, commit**
|
||||
|
||||
---
|
||||
|
||||
### Task 8: Create Ruby extractor
|
||||
|
||||
**Files:**
|
||||
- Create: `packages/core/src/plugins/extractors/ruby-extractor.ts`
|
||||
|
||||
- [ ] **Step 1: Write the Ruby extractor**
|
||||
|
||||
Key Ruby tree-sitter node types:
|
||||
- Methods: `method` (name, parameters)
|
||||
- Classes: `class` (name, body containing methods)
|
||||
- Modules: `module` (name)
|
||||
- Imports: `call` where method is `require` or `require_relative` (Ruby uses method calls for imports)
|
||||
- Calls: `call` (method, receiver, arguments)
|
||||
- No formal export syntax
|
||||
|
||||
```typescript
|
||||
languageIds: ["ruby"]
|
||||
```
|
||||
|
||||
- [ ] **Step 2: Write tests, run, commit**
|
||||
|
||||
---
|
||||
|
||||
### Task 9: Create PHP extractor
|
||||
|
||||
**Files:**
|
||||
- Create: `packages/core/src/plugins/extractors/php-extractor.ts`
|
||||
|
||||
- [ ] **Step 1: Write the PHP extractor**
|
||||
|
||||
Key PHP tree-sitter node types:
|
||||
- Functions: `function_definition` (name, formal_parameters, return_type)
|
||||
- Methods: `method_declaration` (name, formal_parameters, return_type)
|
||||
- Classes: `class_declaration` (name, declaration_list)
|
||||
- Imports: `namespace_use_declaration` (namespace_use_clause)
|
||||
- Calls: `function_call_expression` / `member_call_expression`
|
||||
- Note: PHP tree wraps everything in a `program` → `php_tag` + statements
|
||||
|
||||
```typescript
|
||||
languageIds: ["php"]
|
||||
```
|
||||
|
||||
- [ ] **Step 2: Write tests, run, commit**
|
||||
|
||||
---
|
||||
|
||||
### Task 10: Create C/C++ extractor
|
||||
|
||||
**Files:**
|
||||
- Create: `packages/core/src/plugins/extractors/cpp-extractor.ts`
|
||||
|
||||
- [ ] **Step 1: Write the C/C++ extractor**
|
||||
|
||||
Key C/C++ tree-sitter node types:
|
||||
- Functions: `function_definition` (declarator → function_declarator → identifier + parameter_list)
|
||||
- Classes: `class_specifier` (name, body → field_declaration_list)
|
||||
- Structs: `struct_specifier` (name, body)
|
||||
- Includes: `preproc_include` (path → string_literal or system_lib_string)
|
||||
- Namespaces: `namespace_definition`
|
||||
- Calls: `call_expression` (function, arguments)
|
||||
|
||||
Note: C/C++ function signatures are nested (the name is inside a `function_declarator` inside the `declarator` field).
|
||||
|
||||
The `cppConfig` has `id: "cpp"` and `extensions: [".cpp", ".cc", ".cxx", ".c", ".h", ".hpp", ".hxx"]`. Pure C files (`.c`, `.h`) are parsed with the C++ grammar, which works but won't produce C++-specific node types like `class_specifier`. The extractor must handle their absence gracefully (return empty arrays for classes when parsing pure C).
|
||||
|
||||
```typescript
|
||||
languageIds: ["cpp"]
|
||||
```
|
||||
|
||||
- [ ] **Step 2: Write tests for both C++ and pure C code, run, commit**
|
||||
|
||||
---
|
||||
|
||||
### Task 11: Create C# extractor
|
||||
|
||||
**Files:**
|
||||
- Create: `packages/core/src/plugins/extractors/csharp-extractor.ts`
|
||||
|
||||
- [ ] **Step 1: Write the C# extractor**
|
||||
|
||||
Key C# tree-sitter node types:
|
||||
- Methods: `method_declaration` (name, parameter_list, return type)
|
||||
- Constructors: `constructor_declaration`
|
||||
- Classes: `class_declaration` (name, declaration_list)
|
||||
- Interfaces: `interface_declaration`
|
||||
- Properties: `property_declaration` (name, type)
|
||||
- Imports: `using_directive` (qualified_name)
|
||||
- Calls: `invocation_expression` (identifier/member_access, argument_list)
|
||||
|
||||
```typescript
|
||||
languageIds: ["csharp"]
|
||||
```
|
||||
|
||||
- [ ] **Step 2: Write tests, run, commit**
|
||||
|
||||
---
|
||||
|
||||
### Task 12: Create extractor index and wire into TreeSitterPlugin
|
||||
|
||||
**Files:**
|
||||
- Create: `packages/core/src/plugins/extractors/index.ts`
|
||||
- Modify: `packages/core/src/plugins/tree-sitter-plugin.ts` (import builtinExtractors)
|
||||
|
||||
- [ ] **Step 1: Create index.ts exporting all extractors**
|
||||
|
||||
```typescript
|
||||
// packages/core/src/plugins/extractors/index.ts
|
||||
export type { LanguageExtractor, TreeSitterNode } from "./types.js";
|
||||
export { traverse, getStringValue, findChild, findChildren, hasChildOfType } from "./base-extractor.js";
|
||||
export { TypeScriptExtractor } from "./typescript-extractor.js";
|
||||
export { PythonExtractor } from "./python-extractor.js";
|
||||
export { GoExtractor } from "./go-extractor.js";
|
||||
export { RustExtractor } from "./rust-extractor.js";
|
||||
export { JavaExtractor } from "./java-extractor.js";
|
||||
export { RubyExtractor } from "./ruby-extractor.js";
|
||||
export { PhpExtractor } from "./php-extractor.js";
|
||||
export { CppExtractor } from "./cpp-extractor.js";
|
||||
export { CSharpExtractor } from "./csharp-extractor.js";
|
||||
|
||||
import type { LanguageExtractor } from "./types.js";
|
||||
import { TypeScriptExtractor } from "./typescript-extractor.js";
|
||||
import { PythonExtractor } from "./python-extractor.js";
|
||||
import { GoExtractor } from "./go-extractor.js";
|
||||
import { RustExtractor } from "./rust-extractor.js";
|
||||
import { JavaExtractor } from "./java-extractor.js";
|
||||
import { RubyExtractor } from "./ruby-extractor.js";
|
||||
import { PhpExtractor } from "./php-extractor.js";
|
||||
import { CppExtractor } from "./cpp-extractor.js";
|
||||
import { CSharpExtractor } from "./csharp-extractor.js";
|
||||
|
||||
export const builtinExtractors: LanguageExtractor[] = [
|
||||
new TypeScriptExtractor(),
|
||||
new PythonExtractor(),
|
||||
new GoExtractor(),
|
||||
new RustExtractor(),
|
||||
new JavaExtractor(),
|
||||
new RubyExtractor(),
|
||||
new PhpExtractor(),
|
||||
new CppExtractor(),
|
||||
new CSharpExtractor(),
|
||||
];
|
||||
```
|
||||
|
||||
- [ ] **Step 2: Wire builtinExtractors into TreeSitterPlugin constructor**
|
||||
|
||||
When no extractors are provided, default to `builtinExtractors`.
|
||||
|
||||
- [ ] **Step 3: Run full test suite**
|
||||
|
||||
Run: `pnpm --filter @understand-anything/core test`
|
||||
Expected: All tests pass (existing + new extractor tests)
|
||||
|
||||
- [ ] **Step 4: Commit**
|
||||
|
||||
---
|
||||
|
||||
### Task 13: Create bundled extract-structure.mjs script
|
||||
|
||||
**Files:**
|
||||
- Create: `skills/understand/extract-structure.mjs`
|
||||
|
||||
**Context:** Currently the file-analyzer agent (Phase 1) instructs the LLM to write a throwaway regex-based Node.js/Python script every run. This is slow, non-deterministic, and ignores the tree-sitter infrastructure we just built. This task replaces that with a pre-built script that uses `PluginRegistry` (which routes to `TreeSitterPlugin` for code files and to the regex parsers for non-code files).
|
||||
|
||||
- [ ] **Step 1: Create extract-structure.mjs**
|
||||
|
||||
The script:
|
||||
1. Accepts input JSON path (arg 1) and output JSON path (arg 2)
|
||||
2. Input format matches what file-analyzer.md already specifies: `{ projectRoot, batchFiles: [{path, language, sizeLines, fileCategory}], batchImportData }`
|
||||
3. Resolves `@understand-anything/core` from the plugin's own `node_modules` using `createRequire` relative to the script's own location (two directories up to plugin root)
|
||||
4. Creates a `PluginRegistry` with `TreeSitterPlugin` (all builtin language configs) + all non-code parsers registered
|
||||
5. For each file: reads content, calls `registry.analyzeFile()`, formats output to match the existing script output schema (functions, classes, exports, sections, definitions, services, etc.)
|
||||
6. For code files with tree-sitter support: also extracts call graph via `plugin.extractCallGraph()`
|
||||
7. For files where no plugin exists (Swift, Kotlin, unknown languages): outputs `{ path, language, fileCategory, totalLines, nonEmptyLines, metrics }` with empty structural data — the LLM agent handles these in Phase 2
|
||||
8. Writes output JSON matching the existing `scriptCompleted/filesAnalyzed/filesSkipped/results` schema
|
||||
|
||||
Key resolution logic (with fallback for different install layouts):
|
||||
```javascript
|
||||
import { createRequire } from 'node:module';
|
||||
import { dirname, resolve } from 'node:path';
|
||||
import { fileURLToPath } from 'node:url';
|
||||
|
||||
const __dirname = dirname(fileURLToPath(import.meta.url));
|
||||
const pluginRoot = resolve(__dirname, '../..');
|
||||
const require = createRequire(resolve(pluginRoot, 'package.json'));
|
||||
|
||||
let core;
|
||||
try {
|
||||
core = await import(require.resolve('@understand-anything/core'));
|
||||
} catch {
|
||||
// Fallback: direct path for installed plugin cache where pnpm symlinks may differ
|
||||
core = await import(resolve(pluginRoot, 'packages/core/dist/index.js'));
|
||||
}
|
||||
```
|
||||
|
||||
- [ ] **Step 2: Test the script locally**
|
||||
|
||||
Create a small test input JSON with a TS file, a Python file, and a YAML file. Run:
|
||||
```bash
|
||||
node skills/understand/extract-structure.mjs test-input.json test-output.json
|
||||
```
|
||||
Verify the output contains structural data for all three.
|
||||
|
||||
- [ ] **Step 3: Commit**
|
||||
|
||||
```bash
|
||||
git add skills/understand/extract-structure.mjs
|
||||
git commit -m "feat: add bundled tree-sitter extraction script for file-analyzer agent"
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### Task 14: Rewrite file-analyzer.md Phase 1 to use bundled script
|
||||
|
||||
**Files:**
|
||||
- Modify: `agents/file-analyzer.md`
|
||||
|
||||
**Context:** Phase 1 currently has ~150 lines instructing the agent to write a custom extraction script from scratch. Replace this with a short section that tells the agent to execute the pre-built `extract-structure.mjs` script.
|
||||
|
||||
- [ ] **Step 1: Replace Phase 1 in file-analyzer.md**
|
||||
|
||||
Delete the entire current Phase 1 (~150 lines of regex script generation instructions). Replace with:
|
||||
|
||||
1. Tell the agent to prepare the input JSON file (same format as before):
|
||||
```bash
|
||||
cat > $PROJECT_ROOT/.understand-anything/tmp/ua-file-analyzer-input-<batchIndex>.json << 'ENDJSON'
|
||||
{
|
||||
"projectRoot": "<project-root>",
|
||||
"batchFiles": [<this batch's files including fileCategory>],
|
||||
"batchImportData": <batchImportData JSON>
|
||||
}
|
||||
ENDJSON
|
||||
```
|
||||
|
||||
2. Execute the bundled script:
|
||||
```bash
|
||||
node <SKILL_DIR>/extract-structure.mjs \
|
||||
$PROJECT_ROOT/.understand-anything/tmp/ua-file-analyzer-input-<batchIndex>.json \
|
||||
$PROJECT_ROOT/.understand-anything/tmp/ua-file-extract-results-<batchIndex>.json
|
||||
```
|
||||
|
||||
3. If the script exits non-zero, read stderr, diagnose and report the error. Do NOT fall back to writing a manual script — the bundled script is the sole extraction path.
|
||||
|
||||
4. Keep the existing output format — Phase 2 (semantic analysis) is unchanged.
|
||||
|
||||
- [ ] **Step 2: Update SKILL.md to pass SKILL_DIR to file-analyzer dispatch**
|
||||
|
||||
In SKILL.md Phase 2, the file-analyzer dispatch prompt must include the skill directory path so the agent can locate `extract-structure.mjs`.
|
||||
|
||||
Add to the dispatch parameters:
|
||||
```
|
||||
> Skill directory (for bundled scripts): `<SKILL_DIR>`
|
||||
```
|
||||
|
||||
This follows the established pattern — SKILL.md already passes `<SKILL_DIR>` for `merge-batch-graphs.py` (line 213) and `merge-subdomain-graphs.py` (line 44) using the same mechanism.
|
||||
|
||||
- [ ] **Step 3: Verify the file-analyzer output format is unchanged**
|
||||
|
||||
Phase 2 of file-analyzer.md should NOT need changes — it reads the same JSON structure from the script results. Verify the output schema from `extract-structure.mjs` matches what Phase 2 expects.
|
||||
|
||||
- [ ] **Step 4: Commit**
|
||||
|
||||
```bash
|
||||
git add agents/file-analyzer.md skills/understand/SKILL.md
|
||||
git commit -m "feat: file-analyzer uses bundled tree-sitter script instead of LLM-generated regex"
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### Task 15: Final integration verification and cleanup
|
||||
|
||||
- [ ] **Step 1: Add exports to packages/core/src/index.ts**
|
||||
|
||||
This is required — `extract-structure.mjs` and external consumers need these exports:
|
||||
|
||||
```typescript
|
||||
export type { LanguageExtractor } from "./plugins/extractors/types.js";
|
||||
export { builtinExtractors } from "./plugins/extractors/index.js";
|
||||
```
|
||||
|
||||
- [ ] **Step 2: Build the full package**
|
||||
|
||||
```bash
|
||||
pnpm --filter @understand-anything/core build
|
||||
```
|
||||
|
||||
- [ ] **Step 3: Run full test suite one final time**
|
||||
|
||||
```bash
|
||||
pnpm --filter @understand-anything/core test
|
||||
```
|
||||
|
||||
- [ ] **Step 4: Final commit**
|
||||
|
||||
```bash
|
||||
git commit -m "feat: complete language extractor architecture — 10 languages with tree-sitter support"
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Implementation Notes
|
||||
|
||||
**Test file convention:** Each language extractor gets its own test file at `packages/core/src/plugins/extractors/__tests__/<language>-extractor.test.ts`. This follows the existing pattern where `tree-sitter-plugin.test.ts` is co-located.
|
||||
|
||||
**Lazy grammar loading (future optimization):** The current `TreeSitterPlugin.init()` loads all grammar WASMs upfront via `Promise.all`. With 10 grammars (~12MB total WASM), this may cause noticeable init delay. A future improvement: load TS/JS eagerly (most common), defer others to first use. Not required for this PR — measure first.
|
||||
|
||||
**Fingerprint side effect:** `buildFingerprintStore` in `fingerprint.ts` uses `PluginRegistry.analyzeFile` internally. Once the new extractors are wired up, fingerprinting for Python/Go/Rust/etc. will automatically produce structural fingerprints instead of content-hash-only. No code changes needed — it happens for free.
|
||||
|
||||
**PHP grammar note:** `tree-sitter-php` ships both `tree-sitter-php.wasm` (full PHP + embedded HTML/CSS/JS) and `tree-sitter-php_only.wasm` (PHP only). We use `tree-sitter-php.wasm`. The PHP extractor should be robust to non-PHP AST nodes that appear when parsing files with embedded HTML templates.
|
||||
File diff suppressed because it is too large
Load Diff
File diff suppressed because it is too large
Load Diff
@@ -0,0 +1,268 @@
|
||||
# Understand Anything — Design & Implementation Plan
|
||||
|
||||
## Context
|
||||
|
||||
AI coding tools have made writing code easy, but understanding code remains hard. Junior developers, non-programmers (PMs, designers), and even experienced devs working in unfamiliar languages struggle to comprehend codebases they didn't write — or that AI wrote for them. The only entity that "understands" the code is the AI itself.
|
||||
|
||||
**Understand Anything** bridges this gap: an open-source tool that combines LLM intelligence with static analysis to produce an interactive, multi-persona dashboard for understanding any codebase. It runs as a Claude Code skill (leveraging the active session) and serves a rich web dashboard.
|
||||
|
||||
---
|
||||
|
||||
## Architecture: Monorepo with Shared Core
|
||||
|
||||
```
|
||||
understand-anything/
|
||||
├── packages/
|
||||
│ ├── core/ # Shared analysis engine
|
||||
│ │ ├── analyzer/ # LLM + tree-sitter analysis
|
||||
│ │ ├── graph/ # Knowledge graph builder & schema
|
||||
│ │ ├── plugins/ # Plugin system for language analyzers
|
||||
│ │ └── persistence/ # JSON read/write, staleness detection
|
||||
│ ├── skill/ # Claude Code skill (5 commands)
|
||||
│ └── dashboard/ # React + TypeScript multi-panel workspace
|
||||
├── plugins/ # Built-in language analyzer plugins
|
||||
│ └── tree-sitter/ # Tree-sitter based multi-language analyzer
|
||||
├── docs/
|
||||
│ └── plans/
|
||||
├── package.json # Monorepo root (pnpm workspaces)
|
||||
├── tsconfig.json
|
||||
└── .gitignore
|
||||
```
|
||||
|
||||
**Key decisions:**
|
||||
- **Monorepo** (pnpm workspaces) — skill and dashboard share the core analysis engine
|
||||
- **JSON interchange** — knowledge graph is a JSON file, readable by both skill and dashboard
|
||||
- **Committable + auto-sync** — graph persists in `.understand-anything/`, can be committed to git, auto-detects staleness via git diff
|
||||
|
||||
---
|
||||
|
||||
## Knowledge Graph Schema
|
||||
|
||||
```typescript
|
||||
interface KnowledgeGraph {
|
||||
version: string;
|
||||
project: ProjectMeta;
|
||||
nodes: GraphNode[];
|
||||
edges: GraphEdge[];
|
||||
layers: Layer[];
|
||||
tour: TourStep[];
|
||||
}
|
||||
|
||||
interface ProjectMeta {
|
||||
name: string;
|
||||
languages: string[];
|
||||
frameworks: string[];
|
||||
description: string; // LLM-generated project summary
|
||||
analyzedAt: string; // ISO timestamp
|
||||
gitCommitHash: string; // For staleness detection
|
||||
}
|
||||
|
||||
interface GraphNode {
|
||||
id: string;
|
||||
type: "file" | "function" | "class" | "module" | "concept";
|
||||
name: string;
|
||||
filePath?: string;
|
||||
lineRange?: [number, number];
|
||||
summary: string; // Plain-English description
|
||||
tags: string[]; // Searchable tags
|
||||
complexity: "simple" | "moderate" | "complex";
|
||||
languageNotes?: string; // Language-specific explanations
|
||||
}
|
||||
|
||||
interface GraphEdge {
|
||||
source: string;
|
||||
target: string;
|
||||
type: EdgeType;
|
||||
direction: "forward" | "backward" | "bidirectional";
|
||||
description?: string;
|
||||
weight: number; // 0-1 importance
|
||||
}
|
||||
|
||||
type EdgeType =
|
||||
// Structural
|
||||
| "imports" | "exports" | "contains" | "inherits" | "implements"
|
||||
// Behavioral
|
||||
| "calls" | "subscribes" | "publishes" | "middleware"
|
||||
// Data flow
|
||||
| "reads_from" | "writes_to" | "transforms" | "validates"
|
||||
// Dependencies
|
||||
| "depends_on" | "tested_by" | "configures"
|
||||
// Semantic
|
||||
| "related" | "similar_to";
|
||||
|
||||
interface Layer {
|
||||
id: string;
|
||||
name: string; // e.g., "API Layer", "Data Layer"
|
||||
description: string;
|
||||
nodeIds: string[];
|
||||
}
|
||||
|
||||
interface TourStep {
|
||||
order: number;
|
||||
title: string;
|
||||
description: string; // Markdown explanation
|
||||
nodeIds: string[]; // Nodes to highlight
|
||||
languageLesson?: string; // Optional language concept explanation
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Dashboard: Multi-Panel Workspace (React + TypeScript)
|
||||
|
||||
```
|
||||
┌─────────────────────────────────────────────────────────┐
|
||||
│ 🔍 Natural Language Search: "communication layer" │
|
||||
├──────────────────────┬──────────────────────────────────┤
|
||||
│ │ │
|
||||
│ GRAPH VIEW │ CODE VIEWER │
|
||||
│ (React Flow) │ (Monaco Editor, read-only) │
|
||||
│ │ │
|
||||
│ Interactive node │ Source code + syntax highlight │
|
||||
│ graph. Click to │ LLM annotations inline. │
|
||||
│ select. Search │ │
|
||||
│ highlights. │ │
|
||||
├──────────────────────┼──────────────────────────────────┤
|
||||
│ │ │
|
||||
│ CHAT PANEL │ LEARN PANEL │
|
||||
│ │ │
|
||||
│ Context-aware Q&A │ Tour mode + Contextual mode │
|
||||
│ about selected │ Language lessons in context │
|
||||
│ nodes / project. │ of YOUR code. │
|
||||
│ │ │
|
||||
└──────────────────────┴──────────────────────────────────┘
|
||||
```
|
||||
|
||||
**Tech stack:**
|
||||
- React 18 + TypeScript + Vite
|
||||
- React Flow — graph visualization (built for node graphs, better than raw D3 for this)
|
||||
- Monaco Editor — code viewer with syntax highlighting (same as VS Code)
|
||||
- TailwindCSS — styling
|
||||
- Zustand — state management (lightweight, no boilerplate)
|
||||
|
||||
**Persona modes:**
|
||||
- Non-technical: High-level concept nodes, code viewer hidden, learn panel expanded
|
||||
- Junior dev: All panels, learn panel prominent, complexity indicators
|
||||
- Experienced dev: Code viewer prominent, chat panel for deep dives
|
||||
|
||||
**Natural language search:**
|
||||
- Searches against node `tags`, `summary`, and `name` fields
|
||||
- Uses embedding similarity if available, falls back to keyword matching
|
||||
- Highlights matching nodes in the graph, filters the list
|
||||
|
||||
---
|
||||
|
||||
## Claude Code Skill Commands
|
||||
|
||||
| Command | Description |
|
||||
|---------|-------------|
|
||||
| `/understand` | Full analysis (or incremental update if graph exists) + open dashboard |
|
||||
| `/understand-chat "<query>"` | In-terminal Q&A using the knowledge graph |
|
||||
| `/understand-diff` | Analyze current PR/diff — explain changes, affected areas, risks |
|
||||
| `/understand-explain <path>` | Deep-dive explanation of a specific file or function |
|
||||
| `/understand-onboard` | Generate structured onboarding guide for new team members |
|
||||
|
||||
**LLM strategy:**
|
||||
- Inside Claude Code → uses the active Claude session (zero extra cost)
|
||||
- Standalone dashboard → users provide Claude API key for chat features
|
||||
- Graph browsing, search, and learn mode work offline (pre-generated data)
|
||||
|
||||
---
|
||||
|
||||
## Persistence & Staleness Detection
|
||||
|
||||
```
|
||||
.understand-anything/
|
||||
├── knowledge-graph.json # The full graph (committable)
|
||||
├── meta.json # Analysis metadata
|
||||
│ {
|
||||
│ "lastAnalyzedAt": "2026-03-14T...",
|
||||
│ "gitCommitHash": "abc123",
|
||||
│ "version": "1.0.0",
|
||||
│ "analyzedFiles": 47
|
||||
│ }
|
||||
├── cache/ # Per-file analysis cache
|
||||
│ ├── src__index.ts.json
|
||||
│ └── src__auth__login.ts.json
|
||||
└── tours/
|
||||
└── default-tour.json
|
||||
```
|
||||
|
||||
**Auto-sync flow:**
|
||||
1. Skill starts → reads `meta.json` → gets last analyzed commit hash
|
||||
2. Runs `git diff <last-hash>..HEAD --name-only` → gets changed files
|
||||
3. If no changes → serves existing graph
|
||||
4. If changes → re-analyzes only changed files → merges into existing graph → updates meta
|
||||
|
||||
---
|
||||
|
||||
## Plugin System
|
||||
|
||||
```typescript
|
||||
interface AnalyzerPlugin {
|
||||
name: string;
|
||||
languages: string[];
|
||||
analyzeFile(filePath: string, content: string): StructuralAnalysis;
|
||||
resolveImports(filePath: string, content: string): ImportResolution[];
|
||||
extractCallGraph?(filePath: string, content: string): CallGraphEntry[];
|
||||
}
|
||||
```
|
||||
|
||||
**Day 1: tree-sitter plugin** — uses `node-tree-sitter` with language grammars for:
|
||||
- TypeScript/JavaScript, Python, Go, Java, Rust, C/C++
|
||||
- Extracts: function/class boundaries, import/export statements, call sites
|
||||
- Combined with LLM analysis for semantic understanding
|
||||
|
||||
**Future: community plugins** for language-specific deep analysis.
|
||||
|
||||
---
|
||||
|
||||
## Implementation Phases
|
||||
|
||||
### Phase 1: Foundation (MVP)
|
||||
1. Project scaffolding — monorepo, TypeScript config, build setup
|
||||
2. Core: Knowledge graph schema + JSON persistence
|
||||
3. Core: LLM analysis engine (file-by-file analysis using prompts)
|
||||
4. Core: tree-sitter integration for structural analysis
|
||||
5. Skill: `/understand` command — analyze + persist graph
|
||||
6. Dashboard: Basic React app that reads and renders the graph
|
||||
7. Dashboard: Graph view with React Flow
|
||||
8. Dashboard: Code viewer with Monaco Editor
|
||||
|
||||
### Phase 2: Intelligence
|
||||
9. Natural language search across graph nodes
|
||||
10. Skill: `/understand-chat` — terminal Q&A
|
||||
11. Dashboard: Chat panel with context-aware Q&A
|
||||
12. Staleness detection + incremental updates
|
||||
13. Layer auto-detection (group nodes into logical layers)
|
||||
|
||||
### Phase 3: Learn Mode
|
||||
14. Tour generation — guided project walkthrough
|
||||
15. Contextual explanations — click-to-explain
|
||||
16. Language-specific lessons in context of the user's code
|
||||
17. Persona modes (non-technical / junior / experienced)
|
||||
|
||||
### Phase 4: Advanced
|
||||
18. Skill: `/understand-diff` — PR/diff analysis
|
||||
19. Skill: `/understand-explain` — deep-dive on specific files
|
||||
20. Skill: `/understand-onboard` — onboarding guide generation
|
||||
21. Community plugin system
|
||||
22. Embedding-based semantic search (optional enhancement)
|
||||
|
||||
---
|
||||
|
||||
## Verification
|
||||
|
||||
### How to test end-to-end:
|
||||
1. **Skill analysis**: Run `/understand` on a sample project → verify `.understand-anything/knowledge-graph.json` is generated with correct schema
|
||||
2. **Incremental update**: Modify a file → run `/understand` again → verify only the changed file is re-analyzed
|
||||
3. **Dashboard**: Open `http://localhost:5173` → verify graph renders, nodes are clickable, search works
|
||||
4. **Chat**: Ask a question in the chat panel → verify it returns a relevant answer using the knowledge graph
|
||||
5. **Learn mode**: Start the tour → verify it walks through the project step by step
|
||||
6. **Tree-sitter**: Analyze a TypeScript file → verify function boundaries and import relationships match the actual code
|
||||
|
||||
### Test projects to validate against:
|
||||
- A small TypeScript project (the tool itself)
|
||||
- A Python Flask/Django API
|
||||
- A Go microservice
|
||||
- A mixed-language monorepo
|
||||
@@ -0,0 +1,83 @@
|
||||
# Understand Anything — Project Homepage Design
|
||||
|
||||
**Date**: 2026-03-15
|
||||
**Goal**: Attract new users to the Understand Anything Claude Code plugin
|
||||
**Approach**: "The Reveal" — cinematic scroll-driven single-page site
|
||||
|
||||
## Tech Stack
|
||||
|
||||
- **Astro** (static site generator, zero JS framework overhead)
|
||||
- **Self-hosted fonts** (no Google Fonts CDN dependency — works in China)
|
||||
- **CSS** with variables matching dashboard theme
|
||||
- **Vanilla JS** for `IntersectionObserver` scroll animations
|
||||
- **GitHub Actions** for CI/CD to `gh-pages` branch
|
||||
|
||||
## Source & Deployment
|
||||
|
||||
- Source: `homepage/` directory on `main` branch
|
||||
- Build output: deployed to `gh-pages` branch via GitHub Actions
|
||||
- URL: `understand-anything.com`
|
||||
|
||||
## Page Structure (scroll order)
|
||||
|
||||
### 1. Nav Bar
|
||||
Minimal floating nav. Logo/wordmark left, GitHub star button + "Get Started" CTA right. Transparent, becomes solid on scroll.
|
||||
|
||||
### 2. Hero (full viewport)
|
||||
- Headline: **"Understand Any Codebase"**
|
||||
- Subheadline: "Turn 200,000 lines of code into an interactive knowledge graph you can explore, search, and learn from — powered by multi-agent AI analysis."
|
||||
- CTA: "Get Started" (gold button, scrolls to install section)
|
||||
- Secondary: "View on GitHub" (text link)
|
||||
- Background: `hero.jpg` with dark gradient overlay
|
||||
|
||||
### 3. Dashboard Showcase
|
||||
- Label: "See your codebase come alive"
|
||||
- `overview.png` in a stylized browser frame with gold glow shadow
|
||||
- Fade-in on scroll
|
||||
|
||||
### 4. Feature Cards (3 columns)
|
||||
Staggered fade-in animation:
|
||||
1. **Interactive Knowledge Graph** — "Visualize files, functions, and dependencies as an explorable graph with smart layout."
|
||||
2. **Plain-English Summaries** — "Every node explained in language anyone can understand — from junior devs to product managers."
|
||||
3. **Guided Tours** — "AI-generated walkthroughs that teach you the codebase step by step."
|
||||
|
||||
### 5. Install CTA
|
||||
- Headline: "Get started in 30 seconds"
|
||||
- Code block:
|
||||
```
|
||||
/plugin marketplace add Lum1104/Understand-Anything
|
||||
/plugin install understand-anything
|
||||
/understand
|
||||
```
|
||||
- "Works with Claude Code" note
|
||||
|
||||
### 6. Footer
|
||||
- "Understand Anything" wordmark
|
||||
- GitHub link, license
|
||||
- "Built as a Claude Code plugin"
|
||||
|
||||
## Visual Design System
|
||||
|
||||
### Colors (matching dashboard)
|
||||
| Token | Value | Usage |
|
||||
|-------|-------|-------|
|
||||
| `--bg` | `#0a0a0a` | Page background |
|
||||
| `--surface` | `#141414` | Card backgrounds |
|
||||
| `--border` | `#1a1a1a` | Borders, dividers |
|
||||
| `--accent` | `#d4a574` | Gold/amber primary accent |
|
||||
| `--text` | `#e8e2d8` | Primary text (warm white) |
|
||||
| `--text-muted` | `#8a8578` | Secondary text |
|
||||
|
||||
### Typography (self-hosted, with fallbacks)
|
||||
- **Headings**: DM Serif Display → Georgia, "Times New Roman", serif
|
||||
- **Body**: Inter → -apple-system, BlinkMacSystemFont, "Segoe UI", sans-serif
|
||||
- **Code**: JetBrains Mono → "SF Mono", "Cascadia Code", "Fira Code", monospace
|
||||
- Hero headline: ~4rem serif with subtle text-shadow glow
|
||||
|
||||
### Effects
|
||||
- Gold glow on dashboard screenshot frame (`box-shadow` with gold at low opacity)
|
||||
- Subtle noise texture overlay (SVG, matching dashboard)
|
||||
- Scroll-triggered fade+slide-up animations (CSS `@keyframes` + `IntersectionObserver`)
|
||||
- CTA button: gold background with hover glow pulse
|
||||
- Cards: glass-morphism with `backdrop-filter: blur`
|
||||
- Responsive: 768px (tablet), 480px (mobile)
|
||||
@@ -0,0 +1,121 @@
|
||||
# Multi-Platform Skill Support — Simplified Design
|
||||
|
||||
**Date**: 2026-03-18
|
||||
**Status**: Approved
|
||||
**Goal**: Make Understand-Anything skills work across Codex, OpenClaw, OpenCode, and Cursor with zero build step — same files everywhere.
|
||||
|
||||
## Design Principles
|
||||
|
||||
Follows the [obra/superpowers](https://github.com/obra/superpowers) pattern:
|
||||
1. **Same files, all platforms** — no template markers, no build step, no platform-specific variants
|
||||
2. **`model: inherit`** — agents use the parent session's model, making them platform-agnostic
|
||||
3. **AI-driven installation** — `.{platform}/INSTALL.md` files that the AI agent reads and executes
|
||||
4. **Self-contained skills** — pipeline prompt templates live inside the skill directory, not in a separate `agents/` folder
|
||||
|
||||
## Change 1: Move Pipeline Agents Into Skill
|
||||
|
||||
The 5 pipeline agents (project-scanner, file-analyzer, architecture-analyzer, tour-builder, graph-reviewer) are used exclusively by the `/understand` skill. They become prompt templates co-located with the skill:
|
||||
|
||||
**Before:**
|
||||
```
|
||||
agents/
|
||||
project-scanner.md # agent definition
|
||||
file-analyzer.md
|
||||
architecture-analyzer.md
|
||||
tour-builder.md
|
||||
graph-reviewer.md
|
||||
skills/understand/
|
||||
SKILL.md # dispatches named agents
|
||||
```
|
||||
|
||||
**After:**
|
||||
```
|
||||
skills/understand/
|
||||
SKILL.md # dispatches subagents using templates
|
||||
project-scanner-prompt.md # prompt template (no agent frontmatter)
|
||||
file-analyzer-prompt.md
|
||||
architecture-analyzer-prompt.md
|
||||
tour-builder-prompt.md
|
||||
graph-reviewer-prompt.md
|
||||
```
|
||||
|
||||
The prompt template files retain the full instruction content but drop the agent frontmatter (`name`, `tools`, `model`). The `SKILL.md` dispatch changes from "Dispatch the **project-scanner** agent" to "Dispatch a subagent using the template at `./project-scanner-prompt.md`".
|
||||
|
||||
### Context Cost
|
||||
|
||||
Reading templates through the main session adds ~11K tokens total (~5.5% of 200K context). This is sequential (one template at a time), and context compression reclaims earlier content. Acceptable trade-off for portability.
|
||||
|
||||
## Change 2: New Registered Agent — knowledge-graph-guide
|
||||
|
||||
Create a reusable agent that any skill or user can invoke to work with knowledge graphs:
|
||||
|
||||
```yaml
|
||||
# agents/knowledge-graph-guide.md
|
||||
---
|
||||
name: knowledge-graph-guide
|
||||
description: |
|
||||
Use this agent when users need help understanding, querying, or working
|
||||
with an Understand-Anything knowledge graph. Guides users through graph
|
||||
structure, node/edge relationships, layer architecture, tours, and
|
||||
dashboard usage.
|
||||
model: inherit
|
||||
---
|
||||
```
|
||||
|
||||
This agent knows:
|
||||
- The KnowledgeGraph JSON schema (nodes, edges, layers, tours)
|
||||
- The 5 node types and 18 edge types
|
||||
- How to navigate and query the graph
|
||||
- How to use the interactive dashboard
|
||||
- How to interpret architectural layers and guided tours
|
||||
|
||||
## Change 3: Platform Installation Files
|
||||
|
||||
Each platform gets an `INSTALL.md` that the AI agent can fetch and follow:
|
||||
|
||||
| File | Platform | Install Mechanism |
|
||||
|------|----------|-------------------|
|
||||
| `.codex/INSTALL.md` | Codex | `git clone` + symlink to `~/.agents/skills/` |
|
||||
| `.opencode/INSTALL.md` | OpenCode | Plugin config in `opencode.json` |
|
||||
| `.openclaw/INSTALL.md` | OpenClaw | `git clone` + symlink to `~/.openclaw/skills/` |
|
||||
| `.cursor/INSTALL.md` | Cursor | `git clone` + symlink to `.cursor/plugins/` |
|
||||
|
||||
User tells the agent one line:
|
||||
```
|
||||
Fetch and follow instructions from https://raw.githubusercontent.com/Lum1104/Understand-Anything/refs/heads/main/understand-anything-plugin/.codex/INSTALL.md
|
||||
```
|
||||
|
||||
The agent executes the clone + symlink/config automatically.
|
||||
|
||||
## Change 4: README Update
|
||||
|
||||
Add a "Multi-Platform Installation" section to README.md with one-liner per platform.
|
||||
|
||||
## File Summary
|
||||
|
||||
| Action | Files |
|
||||
|--------|-------|
|
||||
| Delete | `agents/project-scanner.md`, `agents/file-analyzer.md`, `agents/architecture-analyzer.md`, `agents/tour-builder.md`, `agents/graph-reviewer.md` |
|
||||
| Create | `skills/understand/project-scanner-prompt.md`, `skills/understand/file-analyzer-prompt.md`, `skills/understand/architecture-analyzer-prompt.md`, `skills/understand/tour-builder-prompt.md`, `skills/understand/graph-reviewer-prompt.md` |
|
||||
| Create | `agents/knowledge-graph-guide.md` |
|
||||
| Create | `.codex/INSTALL.md`, `.opencode/INSTALL.md`, `.openclaw/INSTALL.md`, `.cursor/INSTALL.md` |
|
||||
| Modify | `skills/understand/SKILL.md` (dispatch references) |
|
||||
| Modify | `README.md` (multi-platform section) |
|
||||
|
||||
## What We Don't Need
|
||||
|
||||
- ~~`platforms/platform-config.json`~~ — same files everywhere
|
||||
- ~~`platforms/build.mjs`~~ — no build step
|
||||
- ~~`{{MARKER}}` template markers~~ — no templating
|
||||
- ~~`scripts/install-*.sh`~~ — AI agent follows INSTALL.md
|
||||
- ~~`dist-platforms/`~~ — no generated output
|
||||
|
||||
## Platform Compatibility
|
||||
|
||||
| Platform | Install Method | Agent Discovery | Skill Discovery |
|
||||
|----------|---------------|-----------------|-----------------|
|
||||
| Claude Code | Marketplace (existing) | `agents/` dir | `skills/` dir |
|
||||
| Codex | INSTALL.md → symlink | N/A (templates in skill) | `~/.agents/skills/` |
|
||||
| OpenCode | INSTALL.md → plugin config | N/A (templates in skill) | Plugin auto-registers |
|
||||
| OpenClaw | INSTALL.md → symlink | N/A (templates in skill) | `~/.openclaw/skills/` |
|
||||
| Cursor | INSTALL.md → symlink | `agents/` dir | `.cursor/plugins/` |
|
||||
@@ -0,0 +1,249 @@
|
||||
# Language-Agnostic Support Design
|
||||
|
||||
**Date:** 2026-03-21
|
||||
**Status:** Approved
|
||||
**Issue:** Make Understand-Anything codebase-aware and language-agnostic instead of TypeScript-heavy
|
||||
|
||||
## Problem
|
||||
|
||||
The tool's agent prompts, tree-sitter plugin, and language lesson system are heavily biased toward TypeScript/JavaScript. Non-TS codebases get degraded analysis because:
|
||||
|
||||
1. Agent prompts use TS-specific examples and concepts (e.g., "barrel files", "type guards", "generics")
|
||||
2. Tree-sitter plugin only ships TS/JS grammar support — structural analysis silently fails for other languages
|
||||
3. Language lesson detection hardcodes TS-specific concept patterns and display names
|
||||
|
||||
The architecture (PluginRegistry, GraphBuilder, dashboard, search) is already language-neutral. The bias is in shipped content, not the framework.
|
||||
|
||||
## Decisions
|
||||
|
||||
- **Scope:** All three layers — prompts, tree-sitter plugins, language framework
|
||||
- **Languages (v1):** TypeScript, JavaScript, Python, Go, Java, Rust, C/C++, C#, Ruby, PHP, Swift, Kotlin
|
||||
- **Architecture:** Config-first with code escape hatch (hybrid)
|
||||
- **Prompt strategy:** Base prompt + per-language markdown snippet files in a `languages/` folder
|
||||
- **Config location:** Prompt snippets in `skills/understand/languages/`, tree-sitter configs in `packages/core/src/languages/`
|
||||
- **Multi-language projects:** Per-file language analysis + project-level multi-language summary
|
||||
- **Language detection:** Auto-detect from file extensions only (no manual override for v1)
|
||||
|
||||
## Design
|
||||
|
||||
### 1. LanguageConfig Type & Registry
|
||||
|
||||
#### LanguageConfig Interface
|
||||
|
||||
```typescript
|
||||
// packages/core/src/languages/types.ts
|
||||
interface LanguageConfig {
|
||||
id: string; // e.g., "python"
|
||||
displayName: string; // e.g., "Python"
|
||||
extensions: string[]; // e.g., [".py", ".pyi"]
|
||||
treeSitter: {
|
||||
grammarPackage: string; // npm package name
|
||||
nodeTypes: {
|
||||
function: string[]; // e.g., ["function_definition"]
|
||||
class: string[]; // e.g., ["class_definition"]
|
||||
import: string[]; // e.g., ["import_statement", "import_from_statement"]
|
||||
export: string[]; // e.g., ["export_statement"] or [] for languages without exports
|
||||
typeAnnotation: string[]; // e.g., ["type"] for Python type hints
|
||||
};
|
||||
};
|
||||
concepts: string[]; // e.g., ["decorators", "list comprehensions", "generators"]
|
||||
filePatterns?: Record<string, string>; // special files, e.g., {"config": "pyproject.toml"}
|
||||
customAnalyzer?: (node: SyntaxNode) => AnalysisResult; // escape hatch for unusual AST shapes
|
||||
}
|
||||
```
|
||||
|
||||
#### Language Registry
|
||||
|
||||
```typescript
|
||||
// packages/core/src/languages/registry.ts
|
||||
class LanguageRegistry {
|
||||
private configs: Map<string, LanguageConfig>;
|
||||
|
||||
register(config: LanguageConfig): void;
|
||||
getByExtension(ext: string): LanguageConfig | null;
|
||||
getById(id: string): LanguageConfig;
|
||||
getAll(): LanguageConfig[];
|
||||
}
|
||||
```
|
||||
|
||||
#### File Structure
|
||||
|
||||
```
|
||||
packages/core/src/languages/
|
||||
├── types.ts
|
||||
├── registry.ts
|
||||
├── index.ts
|
||||
├── configs/
|
||||
│ ├── typescript.ts
|
||||
│ ├── javascript.ts
|
||||
│ ├── python.ts
|
||||
│ ├── go.ts
|
||||
│ ├── java.ts
|
||||
│ ├── rust.ts
|
||||
│ ├── cpp.ts
|
||||
│ ├── csharp.ts
|
||||
│ ├── ruby.ts
|
||||
│ ├── php.ts
|
||||
│ ├── swift.ts
|
||||
│ └── kotlin.ts
|
||||
```
|
||||
|
||||
All built-in configs auto-registered on import.
|
||||
|
||||
### 2. GenericTreeSitterPlugin
|
||||
|
||||
Replaces the current TS-only `TreeSitterPlugin` with a config-driven version.
|
||||
|
||||
```typescript
|
||||
// packages/core/src/plugins/generic-tree-sitter-plugin.ts
|
||||
class GenericTreeSitterPlugin implements AnalyzerPlugin {
|
||||
private registry: LanguageRegistry;
|
||||
|
||||
canAnalyze(filePath: string): boolean {
|
||||
return this.registry.getByExtension(path.extname(filePath)) !== null;
|
||||
}
|
||||
|
||||
async analyzeFile(filePath: string, content: string): Promise<FileAnalysis> {
|
||||
const config = this.registry.getByExtension(path.extname(filePath));
|
||||
|
||||
// Custom analyzer escape hatch
|
||||
if (config.customAnalyzer) {
|
||||
return config.customAnalyzer(tree.rootNode);
|
||||
}
|
||||
|
||||
// Generic extraction driven by config.treeSitter.nodeTypes
|
||||
const functions = this.extractNodes(tree, config.treeSitter.nodeTypes.function);
|
||||
const classes = this.extractNodes(tree, config.treeSitter.nodeTypes.class);
|
||||
const imports = this.extractNodes(tree, config.treeSitter.nodeTypes.import);
|
||||
const exports = this.extractNodes(tree, config.treeSitter.nodeTypes.export);
|
||||
// ...
|
||||
}
|
||||
|
||||
private extractNodes(tree: Tree, nodeTypes: string[]): NodeInfo[] {
|
||||
// Walk AST, collect all nodes matching any of the given types
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
#### Migration
|
||||
|
||||
- Current `TreeSitterPlugin` deleted, replaced by `GenericTreeSitterPlugin` + TS/JS configs
|
||||
- `PluginRegistry` unchanged
|
||||
- Existing tests updated to use new plugin
|
||||
|
||||
#### WASM Grammar Loading
|
||||
|
||||
- Each grammar loaded lazily on first use and cached
|
||||
- WASM files bundled in `packages/core/src/languages/grammars/` or fetched from tree-sitter's official WASM builds
|
||||
|
||||
### 3. Language-Aware Prompts
|
||||
|
||||
#### File Structure
|
||||
|
||||
```
|
||||
skills/understand/
|
||||
├── file-analyzer-prompt.md # Base prompt (language-neutral)
|
||||
├── tour-builder-prompt.md
|
||||
├── project-scanner-prompt.md
|
||||
├── languages/
|
||||
│ ├── typescript.md
|
||||
│ ├── javascript.md
|
||||
│ ├── python.md
|
||||
│ ├── go.md
|
||||
│ ├── java.md
|
||||
│ ├── rust.md
|
||||
│ ├── cpp.md
|
||||
│ ├── csharp.md
|
||||
│ ├── ruby.md
|
||||
│ ├── php.md
|
||||
│ ├── swift.md
|
||||
│ └── kotlin.md
|
||||
```
|
||||
|
||||
#### Base Prompt Changes
|
||||
|
||||
All TS-specific examples removed from base prompts. Replaced with injection point:
|
||||
|
||||
```markdown
|
||||
## Language-Specific Guidance
|
||||
|
||||
{{LANGUAGE_CONTEXT}}
|
||||
```
|
||||
|
||||
#### Language Markdown Format
|
||||
|
||||
Each language file contains:
|
||||
|
||||
```markdown
|
||||
# Python
|
||||
|
||||
## Key Concepts
|
||||
- Decorators, comprehensions, generators, context managers, type hints, dunder methods
|
||||
|
||||
## Import Patterns
|
||||
- `import module`, `from module import name`, relative imports
|
||||
|
||||
## Notable File Patterns
|
||||
- `__init__.py` (package initializer), `conftest.py` (pytest), `pyproject.toml` (config)
|
||||
|
||||
## Example Summary Style
|
||||
> "FastAPI route handler that accepts a Pydantic model, validates input..."
|
||||
```
|
||||
|
||||
#### Injection Logic
|
||||
|
||||
1. Project scanner detects languages present in the codebase
|
||||
2. File-analyzer: inject matching language `.md` for that file's language
|
||||
3. Tour-builder: inject all detected languages' `.md` files
|
||||
4. Project-scanner: inject all detected languages' key concepts for project-level summary
|
||||
|
||||
#### Multi-Language Projects
|
||||
|
||||
Project-scanner prompt gets a combined section listing all detected languages with their key concepts.
|
||||
|
||||
### 4. Language Lesson Updates
|
||||
|
||||
- Delete `LANGUAGE_DISPLAY_NAMES` — use `LanguageRegistry.getById(id).displayName`
|
||||
- Delete hardcoded concept patterns — use `LanguageConfig.concepts` from registry
|
||||
- Language lesson generation becomes config-driven
|
||||
|
||||
### 5. Testing Strategy
|
||||
|
||||
#### Unit Tests
|
||||
|
||||
1. **LanguageConfig validation** — Each config has all required fields, non-empty nodeTypes
|
||||
2. **LanguageRegistry** — Registration, lookup by extension/id, duplicate handling
|
||||
3. **GenericTreeSitterPlugin per language** — Small fixture file per language verifying function/class/import extraction
|
||||
4. **Language lesson generation** — Concepts sourced from config
|
||||
|
||||
#### Integration Tests
|
||||
|
||||
5. **Multi-language project** — Mixed TS + Python fixture, verify graph contains nodes from both languages
|
||||
6. **Prompt injection** — Correct language `.md` injected based on detected language
|
||||
|
||||
#### Migration Tests
|
||||
|
||||
- Current tree-sitter-plugin tests rewritten for GenericTreeSitterPlugin with TS config
|
||||
- Must produce identical results to validate non-breaking migration
|
||||
|
||||
### 6. Error Handling & Graceful Degradation
|
||||
|
||||
#### Key Principle
|
||||
|
||||
**Every file always gets analyzed.** Tree-sitter is an enhancement, not a gate. The LLM is the primary analyzer; structural analysis enriches it.
|
||||
|
||||
#### Unknown Language
|
||||
|
||||
- Tree-sitter skipped (returns `null`)
|
||||
- LLM analysis still runs — file gets summary, tags, graph node
|
||||
- Debug log: `"No language config for .xyz, skipping structural analysis"`
|
||||
|
||||
#### Missing WASM Grammar
|
||||
|
||||
- Warning logged, that language degrades to LLM-only
|
||||
- Other languages unaffected
|
||||
|
||||
#### Malformed Language Config
|
||||
|
||||
- Validated at registration time via Zod schema
|
||||
- Invalid config throws at startup — fail fast
|
||||
@@ -0,0 +1,415 @@
|
||||
# Theme System Design
|
||||
|
||||
## Overview
|
||||
|
||||
Add a curated theme preset system with accent color customization to the dashboard. Users select from 5 hand-designed theme presets and optionally swap the accent color within each preset from a set of 8-10 tested swatches.
|
||||
|
||||
### Goals
|
||||
- Support 5 theme presets: Dark Gold (current), Dark Ocean, Dark Forest, Dark Rose, Light Minimal
|
||||
- Allow accent color customization within each preset (curated swatches only, no free picker)
|
||||
- Persist theme preference in both `localStorage` (personal) and `meta.json` (project-level)
|
||||
- Maintain visual coherence — no user-breakable color combinations
|
||||
- Zero-reload theme switching via CSS variable injection at runtime
|
||||
|
||||
### Non-Goals
|
||||
- Free color picker (risk of ugly/unreadable combos)
|
||||
- Per-component color overrides
|
||||
- Multiple simultaneous themes
|
||||
|
||||
---
|
||||
|
||||
## 1. Theme Presets & Color System
|
||||
|
||||
### 1.1 Preset Definitions
|
||||
|
||||
Each preset is a complete mapping of CSS variable names to values. The 5 presets:
|
||||
|
||||
| Token | Dark Gold | Dark Ocean | Dark Forest | Dark Rose | Light Minimal |
|
||||
|-------|-----------|------------|-------------|-----------|---------------|
|
||||
| `--color-root` | `#0a0a0a` | `#0a0e14` | `#0a100a` | `#100a0a` | `#f5f3f0` |
|
||||
| `--color-surface` | `#111111` | `#111820` | `#111811` | `#181111` | `#eae7e3` |
|
||||
| `--color-elevated` | `#1a1a1a` | `#1a222c` | `#1a241a` | `#221a1a` | `#ffffff` |
|
||||
| `--color-panel` | `#141414` | `#141c24` | `#141c14` | `#1c1414` | `#f0ede9` |
|
||||
| `--color-gold`* | `#d4a574` | `#5ba4cf` | `#5ea67a` | `#cf7a8a` | `#4a6fa5` |
|
||||
| `--color-gold-dim`* | `#c9a96e` | `#4e93ba` | `#4e9468` | `#b96e7e` | `#3d5f8f` |
|
||||
| `--color-gold-bright`* | `#e8c49a` | `#7abce0` | `#78c492` | `#e094a4` | `#6088bf` |
|
||||
| `--color-text-primary` | `#f5f0eb` | `#e8edf2` | `#ebf0eb` | `#f2e8ea` | `#1a1a1a` |
|
||||
| `--color-text-secondary` | `#a39787` | `#87939f` | `#87a38f` | `#9f8790` | `#6b6b6b` |
|
||||
| `--color-text-muted` | `#6b5f53` | `#536b7a` | `#536b5a` | `#6b535a` | `#a0a0a0` |
|
||||
| `--color-border-subtle` | `rgba(212,165,116,0.12)` | `rgba(91,164,207,0.12)` | `rgba(94,166,122,0.12)` | `rgba(207,122,138,0.12)` | `rgba(74,111,165,0.10)` |
|
||||
| `--color-border-medium` | `rgba(212,165,116,0.25)` | `rgba(91,164,207,0.25)` | `rgba(94,166,122,0.25)` | `rgba(207,122,138,0.25)` | `rgba(74,111,165,0.18)` |
|
||||
|
||||
*\* The CSS variable names stay as `--color-gold`, `--color-gold-dim`, `--color-gold-bright` even for non-gold themes. They represent "the accent color" generically. Renaming them to `--color-accent` is a refactor we can do, but not required — the variable name is an implementation detail invisible to users.*
|
||||
|
||||
**Decision: Rename `--color-gold*` to `--color-accent*`** to avoid confusion. This is a find-and-replace across the codebase with no behavioral change.
|
||||
|
||||
### 1.2 Glass Effects
|
||||
|
||||
Glass effects derive from base colors and need per-preset values:
|
||||
|
||||
| Token | Dark themes | Light Minimal |
|
||||
|-------|-------------|---------------|
|
||||
| `--glass-bg` | `rgba(20,20,20,0.8)` | `rgba(255,255,255,0.8)` |
|
||||
| `--glass-bg-heavy` | `rgba(20,20,20,0.95)` | `rgba(255,255,255,0.95)` |
|
||||
| `--glass-border` | `rgba(accent,0.1)` | `rgba(accent,0.08)` |
|
||||
| `--glass-border-heavy` | `rgba(accent,0.15)` | `rgba(accent,0.12)` |
|
||||
|
||||
The `.glass` and `.glass-heavy` CSS classes will reference these variables instead of hardcoded values.
|
||||
|
||||
### 1.3 Scrollbar & Glow Colors
|
||||
|
||||
These also derive from the accent color and need to become CSS variables:
|
||||
|
||||
| Token | Purpose |
|
||||
|-------|---------|
|
||||
| `--scrollbar-thumb` | `rgba(accent, 0.2)` |
|
||||
| `--scrollbar-thumb-hover` | `rgba(accent, 0.35)` |
|
||||
| `--glow-color` | `rgba(accent, 0.4)` for node selection glow |
|
||||
| `--glow-pulse` | `rgba(accent, 0.6)` for tour highlight pulse |
|
||||
|
||||
### 1.4 Node-Type & Diff Colors
|
||||
|
||||
These are **semantic** and stay fixed across all dark themes:
|
||||
|
||||
| Variable | Value | Purpose |
|
||||
|----------|-------|---------|
|
||||
| `--color-node-file` | `#4a7c9b` | File nodes |
|
||||
| `--color-node-function` | `#5a9e6f` | Function nodes |
|
||||
| `--color-node-class` | `#8b6fb0` | Class nodes |
|
||||
| `--color-node-module` | `#c9a06c` | Module nodes |
|
||||
| `--color-node-concept` | `#b07a8a` | Concept nodes |
|
||||
| `--color-diff-changed` | `#e05252` | Changed nodes |
|
||||
| `--color-diff-affected` | `#d4a030` | Affected nodes |
|
||||
|
||||
For **Light Minimal only**, these are slightly desaturated/darkened to maintain readability on light backgrounds:
|
||||
|
||||
| Variable | Light Minimal Value |
|
||||
|----------|-------------------|
|
||||
| `--color-node-file` | `#3a6a87` |
|
||||
| `--color-node-function` | `#488a5b` |
|
||||
| `--color-node-class` | `#755d99` |
|
||||
| `--color-node-module` | `#a88a56` |
|
||||
| `--color-node-concept` | `#966674` |
|
||||
|
||||
### 1.5 Accent Swatches
|
||||
|
||||
Each preset offers 8 accent color options. The first is the "native" default for that preset. Each swatch provides 3 values (accent, accent-dim, accent-bright) plus auto-derived border and glass opacities.
|
||||
|
||||
**Dark theme accent swatches** (shared across all 4 dark presets):
|
||||
|
||||
| Name | Accent | Dim | Bright |
|
||||
|------|--------|-----|--------|
|
||||
| Gold | `#d4a574` | `#c9a96e` | `#e8c49a` |
|
||||
| Ocean | `#5ba4cf` | `#4e93ba` | `#7abce0` |
|
||||
| Emerald | `#5ea67a` | `#4e9468` | `#78c492` |
|
||||
| Rose | `#cf7a8a` | `#b96e7e` | `#e094a4` |
|
||||
| Purple | `#9b7abf` | `#876bb0` | `#b494d4` |
|
||||
| Amber | `#c9963a` | `#b5862e` | `#ddb05c` |
|
||||
| Teal | `#4aab9a` | `#3d9686` | `#68c4b4` |
|
||||
| Silver | `#a0a8b0` | `#8e959c` | `#b8bfc6` |
|
||||
|
||||
**Light Minimal accent swatches:**
|
||||
|
||||
| Name | Accent | Dim | Bright |
|
||||
|------|--------|-----|--------|
|
||||
| Indigo | `#4a6fa5` | `#3d5f8f` | `#6088bf` |
|
||||
| Ocean | `#3a8ab5` | `#2e7aa0` | `#55a0cc` |
|
||||
| Emerald | `#3a8a5c` | `#2e7a4e` | `#55a878` |
|
||||
| Rose | `#a5566a` | `#8f4a5c` | `#bf6e82` |
|
||||
| Purple | `#6b5a9e` | `#5c4d8a` | `#8474b5` |
|
||||
| Amber | `#9e7a30` | `#8a6a28` | `#b5923e` |
|
||||
| Teal | `#2e8a7a` | `#267a6c` | `#45a595` |
|
||||
| Slate | `#5a6570` | `#4e5860` | `#6e7a85` |
|
||||
|
||||
### 1.6 Border & Glass Derivation
|
||||
|
||||
When an accent swatch is selected, borders and glass effects are auto-derived:
|
||||
|
||||
```typescript
|
||||
function deriveFromAccent(accentHex: string, isDark: boolean) {
|
||||
return {
|
||||
borderSubtle: `rgba(${hexToRgb(accentHex)}, ${isDark ? 0.12 : 0.10})`,
|
||||
borderMedium: `rgba(${hexToRgb(accentHex)}, ${isDark ? 0.25 : 0.18})`,
|
||||
glassBorder: `rgba(${hexToRgb(accentHex)}, ${isDark ? 0.1 : 0.08})`,
|
||||
glassBorderHeavy: `rgba(${hexToRgb(accentHex)}, ${isDark ? 0.15 : 0.12})`,
|
||||
scrollbarThumb: `rgba(${hexToRgb(accentHex)}, 0.2)`,
|
||||
scrollbarThumbHover: `rgba(${hexToRgb(accentHex)}, 0.35)`,
|
||||
glowColor: `rgba(${hexToRgb(accentHex)}, 0.4)`,
|
||||
glowPulse: `rgba(${hexToRgb(accentHex)}, 0.6)`,
|
||||
};
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 2. Architecture & Data Flow
|
||||
|
||||
### 2.1 File Structure
|
||||
|
||||
```
|
||||
packages/dashboard/src/
|
||||
themes/
|
||||
types.ts # ThemePreset, AccentSwatch, ThemeConfig types
|
||||
presets.ts # 5 preset definitions + accent swatch arrays
|
||||
theme-engine.ts # applyTheme(), deriveFromAccent(), hexToRgb()
|
||||
ThemeContext.tsx # React context + provider + useTheme() hook
|
||||
components/
|
||||
ThemePicker.tsx # Popover UI for preset + accent selection
|
||||
```
|
||||
|
||||
### 2.2 Type Definitions
|
||||
|
||||
```typescript
|
||||
// themes/types.ts
|
||||
|
||||
export type PresetId = 'dark-gold' | 'dark-ocean' | 'dark-forest' | 'dark-rose' | 'light-minimal';
|
||||
|
||||
export interface ThemePreset {
|
||||
id: PresetId;
|
||||
name: string; // Display name: "Dark Gold"
|
||||
isDark: boolean; // true for dark themes, false for light
|
||||
colors: Record<string, string>; // CSS variable name -> value (without --)
|
||||
accentSwatches: AccentSwatch[];
|
||||
defaultAccentId: string; // Which swatch is the native default
|
||||
}
|
||||
|
||||
export interface AccentSwatch {
|
||||
id: string; // e.g. 'gold', 'ocean'
|
||||
name: string; // Display name: "Gold"
|
||||
accent: string; // Primary accent hex
|
||||
accentDim: string; // Dimmed accent hex
|
||||
accentBright: string; // Bright accent hex
|
||||
}
|
||||
|
||||
export interface ThemeConfig {
|
||||
presetId: PresetId;
|
||||
accentId: string; // Selected accent swatch ID
|
||||
}
|
||||
```
|
||||
|
||||
### 2.3 Theme Engine
|
||||
|
||||
The theme engine is a pure function layer (no React dependency):
|
||||
|
||||
```typescript
|
||||
// themes/theme-engine.ts
|
||||
|
||||
export function applyTheme(config: ThemeConfig): void {
|
||||
const preset = getPreset(config.presetId);
|
||||
const accent = getAccent(preset, config.accentId);
|
||||
|
||||
// 1. Apply base preset colors
|
||||
for (const [key, value] of Object.entries(preset.colors)) {
|
||||
document.documentElement.style.setProperty(`--color-${key}`, value);
|
||||
}
|
||||
|
||||
// 2. Override accent colors from swatch
|
||||
document.documentElement.style.setProperty('--color-accent', accent.accent);
|
||||
document.documentElement.style.setProperty('--color-accent-dim', accent.accentDim);
|
||||
document.documentElement.style.setProperty('--color-accent-bright', accent.accentBright);
|
||||
|
||||
// 3. Apply derived values (borders, glass, scrollbar, glow)
|
||||
const derived = deriveFromAccent(accent.accent, preset.isDark);
|
||||
for (const [key, value] of Object.entries(derived)) {
|
||||
document.documentElement.style.setProperty(`--${key}`, value);
|
||||
}
|
||||
|
||||
// 4. Set data-theme attribute for any CSS-only selectors needed
|
||||
document.documentElement.setAttribute('data-theme', preset.isDark ? 'dark' : 'light');
|
||||
}
|
||||
```
|
||||
|
||||
### 2.4 React Context
|
||||
|
||||
```typescript
|
||||
// themes/ThemeContext.tsx
|
||||
|
||||
interface ThemeContextValue {
|
||||
config: ThemeConfig;
|
||||
preset: ThemePreset;
|
||||
setPreset: (presetId: PresetId) => void;
|
||||
setAccent: (accentId: string) => void;
|
||||
}
|
||||
```
|
||||
|
||||
The provider:
|
||||
1. On mount: resolves theme from `localStorage` > `meta.json` field in loaded graph > default (`dark-gold`)
|
||||
2. Calls `applyTheme()` on every config change
|
||||
3. Persists to `localStorage` on every change
|
||||
4. Does NOT write to `meta.json` from the dashboard (the dashboard is read-only for meta.json; meta.json is written by the CLI/plugin side)
|
||||
|
||||
### 2.5 Integration with Zustand Store
|
||||
|
||||
The theme system is **separate from the Zustand store** — it uses its own React context. Rationale:
|
||||
- Theme state is orthogonal to graph/UI state
|
||||
- Theme needs to apply before the graph even loads (avoid flash of wrong theme)
|
||||
- Keeps the store focused on graph interaction
|
||||
|
||||
The store does NOT gain any theme-related fields.
|
||||
|
||||
---
|
||||
|
||||
## 3. UI Components
|
||||
|
||||
### 3.1 Theme Picker Button (Header)
|
||||
|
||||
A small palette icon button in the top header bar, positioned after existing controls (PersonaSelector, DiffToggle, etc.).
|
||||
|
||||
- Click opens a popover/dropdown panel
|
||||
- Popover has two sections:
|
||||
- **Presets**: 5 cards/buttons showing preset name + small color preview circles
|
||||
- **Accent Colors**: row of 8 color circles for the active preset
|
||||
- Active preset and accent are highlighted with a ring/check
|
||||
- Selecting a preset instantly applies it; selecting an accent instantly applies it
|
||||
- Clicking outside or pressing Escape closes the popover
|
||||
|
||||
### 3.2 Preset Preview
|
||||
|
||||
Each preset card shows:
|
||||
- Name (e.g., "Dark Gold")
|
||||
- 3-4 small circles showing root, surface, and accent colors as a visual preview
|
||||
- Check mark or ring on the active one
|
||||
|
||||
### 3.3 Accent Swatch Row
|
||||
|
||||
- 8 small filled circles in a horizontal row
|
||||
- Tooltip or label on hover showing the accent name
|
||||
- Active one has a ring/border indicator
|
||||
|
||||
### 3.4 Transitions
|
||||
|
||||
When switching themes:
|
||||
- CSS variables update instantly (no transition needed for most properties)
|
||||
- Optionally add a subtle `transition: background-color 0.2s, color 0.2s` on `html` for a smooth feel
|
||||
- No page reload required
|
||||
|
||||
---
|
||||
|
||||
## 4. Persistence & Resolution
|
||||
|
||||
### 4.1 Storage Locations
|
||||
|
||||
| Location | Format | Written by | Read by |
|
||||
|----------|--------|-----------|---------|
|
||||
| `localStorage` key: `ua-theme` | `JSON.stringify(ThemeConfig)` | Dashboard (on every change) | Dashboard (on mount) |
|
||||
| `.understand-anything/meta.json` | `{ ..., theme?: ThemeConfig }` | CLI/plugin (during analysis or explicit set) | Dashboard (on mount, as fallback) |
|
||||
|
||||
### 4.2 Resolution Order
|
||||
|
||||
```
|
||||
1. localStorage('ua-theme') → user's personal preference (wins)
|
||||
2. meta.json.theme → project-level default (fallback)
|
||||
3. { presetId: 'dark-gold', accentId: 'gold' } → hard default
|
||||
```
|
||||
|
||||
### 4.3 meta.json Schema Extension
|
||||
|
||||
Extend `AnalysisMeta` in `packages/core/src/types.ts`:
|
||||
|
||||
```typescript
|
||||
export interface AnalysisMeta {
|
||||
lastAnalyzedAt: string;
|
||||
gitCommitHash: string;
|
||||
version: string;
|
||||
analyzedFiles: number;
|
||||
theme?: ThemeConfig; // NEW — optional, project-level theme preference
|
||||
}
|
||||
```
|
||||
|
||||
### 4.4 Dashboard Reads meta.json Theme
|
||||
|
||||
The dashboard currently loads `/knowledge-graph.json` on mount. It also needs to load `/meta.json` (or the theme field can be embedded in `knowledge-graph.json`).
|
||||
|
||||
**Decision:** Load `/meta.json` separately — it's a small file and keeps concerns separated. The dashboard fetches `/meta.json` on mount, extracts the `theme` field if present, and uses it as fallback when `localStorage` has no theme.
|
||||
|
||||
---
|
||||
|
||||
## 5. Hardcoded Color Consolidation
|
||||
|
||||
### 5.1 Problem
|
||||
|
||||
Many components use hardcoded RGBA values instead of CSS variables:
|
||||
- `rgba(212,165,116,0.3)` scattered in GraphView, CustomNode, etc.
|
||||
- `rgba(20,20,20,0.8)` in glass effects
|
||||
- `rgba(224,82,82,0.25)` in diff overlays
|
||||
|
||||
These won't respond to theme changes.
|
||||
|
||||
### 5.2 Solution
|
||||
|
||||
Before implementing theme switching, consolidate all hardcoded color references:
|
||||
|
||||
1. **Audit**: grep for hardcoded hex/rgba values in component files
|
||||
2. **Replace with CSS variables**: create new variables where needed (e.g., `--edge-color`, `--edge-color-dim`)
|
||||
3. **Glass classes**: update `.glass` and `.glass-heavy` in `index.css` to use variables
|
||||
4. **Scrollbar**: update scrollbar styles to use variables
|
||||
5. **Glow effects**: update `.node-glow`, `.diff-changed-glow`, `.diff-affected-glow` to use variables
|
||||
|
||||
Key hardcoded patterns to consolidate:
|
||||
|
||||
| Hardcoded Value | Replace With |
|
||||
|-----------------|-------------|
|
||||
| `rgba(212,165,116,X)` | `var(--color-accent)` with opacity modifier or dedicated variable |
|
||||
| `rgba(20,20,20,0.8)` | `var(--glass-bg)` |
|
||||
| `rgba(20,20,20,0.95)` | `var(--glass-bg-heavy)` |
|
||||
| `color="rgba(212,165,116,0.15)"` in React Flow | Variable reference |
|
||||
| Amber colors in WarningBanner | Keep as-is (semantic warning color, theme-independent) |
|
||||
|
||||
### 5.3 CSS Variable Rename
|
||||
|
||||
Rename throughout codebase:
|
||||
- `--color-gold` -> `--color-accent`
|
||||
- `--color-gold-dim` -> `--color-accent-dim`
|
||||
- `--color-gold-bright` -> `--color-accent-bright`
|
||||
- All Tailwind class usages: `text-gold` -> `text-accent`, `bg-gold` -> `bg-accent`, etc.
|
||||
|
||||
---
|
||||
|
||||
## 6. Light Theme Considerations
|
||||
|
||||
The Light Minimal theme requires special attention:
|
||||
|
||||
### 6.1 Inverted Contrast
|
||||
|
||||
- Text is dark on light backgrounds (flipped from dark themes)
|
||||
- Borders need lower opacity to avoid looking harsh
|
||||
- Glass effects use white-based rgba instead of black-based
|
||||
|
||||
### 6.2 Node Colors
|
||||
|
||||
Slightly darker/desaturated variants for readability on light backgrounds (see Section 1.4).
|
||||
|
||||
### 6.3 data-theme Attribute
|
||||
|
||||
Set `data-theme="light"` on `<html>` for any styles that can't be handled purely through CSS variables (e.g., third-party component overrides, box-shadow directions).
|
||||
|
||||
### 6.4 React Flow
|
||||
|
||||
React Flow's background, minimap, and edge colors all need to respect the theme. The existing `!important` override on `.react-flow__background` already uses `var(--color-root)`, which is good. MiniMap colors in GraphView.tsx are currently hardcoded and need to be updated.
|
||||
|
||||
---
|
||||
|
||||
## 7. Summary of Changes by Package
|
||||
|
||||
### packages/core
|
||||
- Extend `AnalysisMeta` type with optional `theme?: ThemeConfig`
|
||||
- Export `ThemeConfig` and `PresetId` types from `./types` subpath
|
||||
|
||||
### packages/dashboard
|
||||
- New `themes/` directory with types, presets, engine, and context
|
||||
- New `ThemePicker` component in header
|
||||
- Rename `--color-gold*` to `--color-accent*` across all files
|
||||
- Consolidate hardcoded RGBA values into CSS variables
|
||||
- Update `index.css`: glass classes, scrollbar, glow effects to use variables
|
||||
- Update `App.tsx`: wrap with ThemeProvider, add ThemePicker to header, fetch meta.json
|
||||
- Update components with hardcoded colors: GraphView, CustomNode, LayerLegend, etc.
|
||||
|
||||
---
|
||||
|
||||
## 8. Out of Scope
|
||||
|
||||
- Theme import/export
|
||||
- Custom theme creation UI
|
||||
- Per-node color customization
|
||||
- Animated theme transitions beyond simple CSS transitions
|
||||
- Syncing theme across browser tabs (nice-to-have for later)
|
||||
@@ -0,0 +1,395 @@
|
||||
# Token Reduction Design
|
||||
|
||||
**Date:** 2026-03-27
|
||||
**Status:** Draft
|
||||
**Goal:** Reduce total token cost of `/understand` by ~85-90% on large codebases (200+ files)
|
||||
|
||||
---
|
||||
|
||||
## Problem
|
||||
|
||||
For large codebases, the `/understand` pipeline spends the vast majority of its tokens on **repeated context injection**. The same data is sent to every subagent independently, even when that data could be computed once and shared.
|
||||
|
||||
### Token cost breakdown (500-file TypeScript+React project, baseline)
|
||||
|
||||
| Source | Phase | Tokens (input) | % of total |
|
||||
|---|---|---|---|
|
||||
| `allProjectFiles` list × 67 batches | Phase 2 | ~167,000 | ~50% |
|
||||
| `file-analyzer-prompt.md` × 67 batches | Phase 2 | ~134,000 | ~40% |
|
||||
| Language/framework addendums × 67 batches | Phase 2 | ~68,000 | ~20% |
|
||||
| Tour builder payload (all nodes + edges) | Phase 5 | ~80,000 | ~24% |
|
||||
| Graph reviewer (assembled graph + inventory) | Phase 6 | ~58,000 | ~17% |
|
||||
| Architecture analyzer payload | Phase 4 | ~22,000 | ~7% |
|
||||
| **Total** | | **~529,000** | |
|
||||
|
||||
The root cause: **Phase 2 runs 67 batches (at 5-10 files each), and every single batch receives the full 500-file list for import resolution.** The file list alone costs ~2,500 tokens × 67 repetitions = 167,000 tokens on input, doing work that is entirely redundant between batches.
|
||||
|
||||
---
|
||||
|
||||
## Goals
|
||||
|
||||
- Reduce total input tokens by 85%+ on a 500-file project
|
||||
- No degradation in graph quality for standard projects
|
||||
- Preserve the `--full` / incremental / scope flags
|
||||
- Maintain backward compatibility with existing `knowledge-graph.json` output schema
|
||||
|
||||
---
|
||||
|
||||
## Changes
|
||||
|
||||
Five changes compose the full approach (C1–C5). Each is independent and can be shipped separately, but all five are needed for the full reduction.
|
||||
|
||||
---
|
||||
|
||||
### C1 — Pre-resolve imports in the project scanner
|
||||
|
||||
**Root cause addressed:** `allProjectFiles` (the entire file list) is injected into every file-analyzer batch solely so each batch's extraction script can resolve relative imports. This is redundant: the full file list is available during Phase 1, and import resolution is deterministic. It should happen once, not 67 times.
|
||||
|
||||
**Change:** Extend the Phase 1 scanner script to also parse import statements from every source file and resolve relative imports against the discovered file list. The resolved results are written into `scan-result.json` as a new `importMap` field. File-analyzer batches then receive only their own batch's pre-resolved imports — not the full file list.
|
||||
|
||||
#### Scanner output addition
|
||||
|
||||
`scan-result.json` gains:
|
||||
|
||||
```json
|
||||
{
|
||||
"importMap": {
|
||||
"src/index.ts": ["src/utils.ts", "src/config.ts"],
|
||||
"src/utils.ts": [],
|
||||
"src/components/App.tsx": ["src/hooks/useAuth.ts", "src/store/index.ts"]
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
- Keys are project-relative paths (matching `files[*].path`)
|
||||
- Values are resolved project-relative paths only (external/unresolvable imports are omitted)
|
||||
- External imports (`node_modules`, unresolvable paths) are excluded from the map entirely
|
||||
|
||||
#### Scanner script additions (Phase 1 Step 8)
|
||||
|
||||
After the existing 7 steps, the scanner script adds a new step:
|
||||
|
||||
```
|
||||
Step 8 — Import Resolution
|
||||
|
||||
For each file in the discovered source list:
|
||||
1. Read the file content
|
||||
2. Extract import statements (language-specific patterns per Step 3's language detection):
|
||||
- TypeScript/JavaScript: `import ... from '...'`, `require('...')`
|
||||
- Python: `import ...`, `from ... import ...`
|
||||
- Go: `import "..."` blocks
|
||||
- Rust: `use ...` statements
|
||||
- Java/Kotlin: `import ...` statements
|
||||
- Ruby: `require`, `require_relative`
|
||||
3. For each relative import (starts with `./` or `../`):
|
||||
a. Compute the resolved path from the current file's directory
|
||||
b. Normalize to project-relative format
|
||||
c. Try common extension variants if the import has no extension:
|
||||
`.ts`, `.tsx`, `.js`, `.jsx`, `/index.ts`, `/index.js`, `/index.tsx`
|
||||
d. If any variant exists in the discovered file list, record it; otherwise skip
|
||||
4. For absolute imports (no `.` prefix): skip (external package)
|
||||
|
||||
Output the full importMap in the JSON result.
|
||||
```
|
||||
|
||||
#### File-analyzer input schema change
|
||||
|
||||
**Before:**
|
||||
```json
|
||||
{
|
||||
"projectRoot": "/path/to/project",
|
||||
"allProjectFiles": ["src/index.ts", "src/utils.ts", "...500 paths..."],
|
||||
"batchFiles": [
|
||||
{"path": "src/index.ts", "language": "typescript", "sizeLines": 150}
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
**After:**
|
||||
```json
|
||||
{
|
||||
"projectRoot": "/path/to/project",
|
||||
"batchFiles": [
|
||||
{"path": "src/index.ts", "language": "typescript", "sizeLines": 150}
|
||||
],
|
||||
"batchImportData": {
|
||||
"src/index.ts": ["src/utils.ts", "src/config.ts"],
|
||||
"src/components/App.tsx": ["src/hooks/useAuth.ts"]
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
`allProjectFiles` is removed entirely. `batchImportData` contains only the pre-resolved imports for the files in this batch (sliced from `importMap` by the orchestrator).
|
||||
|
||||
#### File-analyzer extraction script change
|
||||
|
||||
The extraction script no longer performs import resolution. It:
|
||||
- Still extracts: functions, classes, exports, metrics (unchanged)
|
||||
- For imports: reads `batchImportData[file.path]` from the input JSON — no cross-referencing needed
|
||||
- The `imports` array in each file result becomes: `batchImportData[file.path]` mapped to import edge objects with `resolvedPath` already populated, `isExternal: false`
|
||||
|
||||
#### SKILL.md Phase 2 change
|
||||
|
||||
Remove the `allProjectFiles` injection from the batch dispatch prompt. Replace with a per-batch `batchImportData` slice:
|
||||
|
||||
```
|
||||
For each batch, slice importData from the importMap read in Phase 1:
|
||||
batchImportData = { [file.path]: importMap[file.path] ?? [] }
|
||||
for each file in this batch
|
||||
```
|
||||
|
||||
#### Token savings estimate
|
||||
|
||||
| | Batches | Tokens/batch | Total |
|
||||
|---|---|---|---|
|
||||
| Before | 67 | ~2,500 (file list) | ~167,500 |
|
||||
| After (C1 alone) | 67 | ~200 (batch importData) | ~13,400 |
|
||||
| **Savings** | | | **~154,100** |
|
||||
|
||||
---
|
||||
|
||||
### C2 — Increase batch size from 5-10 to 20-30 files
|
||||
|
||||
**Root cause addressed:** Every batch incurs the full cost of `file-analyzer-prompt.md` (~2,000 tokens) plus the batch dispatch overhead. With 67 batches, this adds up even without `allProjectFiles`. Fewer, larger batches directly reduce this repetition.
|
||||
|
||||
**Change:** In SKILL.md Phase 2, change the batch size guidance:
|
||||
|
||||
- **Before:** "Batch the file list from Phase 1 into groups of **5-10 files each**"
|
||||
- **After:** "Batch the file list from Phase 1 into groups of **20-30 files each** (aim for ~25 per batch)"
|
||||
|
||||
Also update the concurrency limit from 3 to **5** concurrent batches. Fewer total batches means we can afford more parallelism without overwhelming the system.
|
||||
|
||||
#### Trade-offs
|
||||
|
||||
| | Smaller batches (current) | Larger batches (new) |
|
||||
|---|---|---|
|
||||
| Files per batch | 5-10 | 20-30 |
|
||||
| Total batches (500 files) | ~67 | ~20 |
|
||||
| Prompt repetition | 67× | 20× |
|
||||
| Quality risk | Lower (focused) | Slightly higher (more files per subagent) |
|
||||
| Concurrency | 3 | 5 |
|
||||
|
||||
Quality risk is low: each subagent still operates on distinct, non-overlapping file groups. The extraction script is deterministic regardless of batch size. Semantic analysis (summaries, tags) may be marginally less focused, but the quality difference is negligible in practice for well-structured files.
|
||||
|
||||
#### Token savings estimate (combined with C1)
|
||||
|
||||
| | Batches | Tokens/batch (prompt) | Total |
|
||||
|---|---|---|---|
|
||||
| Before (C1 only) | 67 | ~2,000 | ~134,000 |
|
||||
| After (C1+C2) | 20 | ~2,000 | ~40,000 |
|
||||
| **Savings from C2** | | | **~94,000** |
|
||||
|
||||
C1+C2 combined eliminate ~248,000 tokens from Phase 2 (down from ~301,500 to ~53,500, a ~82% Phase 2 reduction).
|
||||
|
||||
---
|
||||
|
||||
### C3 — Remove language/framework addendums from file-analyzer batches
|
||||
|
||||
**Root cause addressed:** `languages/typescript.md` (~600 tokens) and `frameworks/react.md` (~700 tokens) are read and injected into every file-analyzer batch prompt. For a TypeScript+React project with 20 batches (after C2), this costs 20 × 1,300 = 26,000 additional tokens — and the model already has deep knowledge of these languages from training.
|
||||
|
||||
**Change:** Stop injecting addendum files into Phase 2 batch prompts entirely. The addendums remain injected into Phase 4 (architecture analyzer) where there is only **one** subagent call, making the cost acceptable.
|
||||
|
||||
Instead, add a compact "Language and Framework Hints" reference section directly into `file-analyzer-prompt.md`. This section is a distilled, one-time addition (~150 tokens total) that captures the most useful patterns from all addendums in a concise lookup table.
|
||||
|
||||
#### New section in `file-analyzer-prompt.md` (replace addendum injection)
|
||||
|
||||
```markdown
|
||||
## Language and Framework Quick Reference
|
||||
|
||||
Use these hints to improve tag and edge accuracy. These supplement your training knowledge.
|
||||
|
||||
| Signal | Tag(s) | Note |
|
||||
|---|---|---|
|
||||
| File in `hooks/`, exports function starting with `use` | `hook`, `service` | React custom hook |
|
||||
| File in `contexts/`, exports a Provider | `service`, `state` | React context |
|
||||
| File in `pages/` or `views/` | `ui`, `routing` | Page-level component |
|
||||
| File in `store/`, `slices/`, `reducers/` | `state` | State management |
|
||||
| File in `services/`, `api/` | `service` | Data-fetching / API client |
|
||||
| `__init__.py` with re-exports | `entry-point`, `barrel` | Python package root |
|
||||
| `manage.py` at project root | `entry-point` | Django management entry |
|
||||
| File named `mod.rs` | `barrel` | Rust module barrel |
|
||||
| File named `main.go` in `cmd/` | `entry-point` | Go binary entry |
|
||||
|
||||
For React: create `depends_on` edges from components to hooks they call. Create `publishes`/`subscribes` edges for Context provider/consumer patterns.
|
||||
```
|
||||
|
||||
#### SKILL.md Phase 2 change
|
||||
|
||||
Remove steps 2 and 3 from the "Build the combined prompt template" block:
|
||||
- **Remove:** Step 2 (Language context injection — read `./languages/<language-id>.md` per detected language)
|
||||
- **Remove:** Step 3 (Framework addendum injection — read `./frameworks/<framework-id>.md` per detected framework)
|
||||
- **Keep:** Step 1 (Read the base template at `./file-analyzer-prompt.md`)
|
||||
|
||||
The addendum injection steps **remain unchanged** in Phase 4 (architecture analyzer), since they run once.
|
||||
|
||||
#### Token savings estimate
|
||||
|
||||
| | Batches | Addendum tokens/batch | Total |
|
||||
|---|---|---|---|
|
||||
| Before (after C2) | 20 | ~1,300 (TS+React) | ~26,000 |
|
||||
| After | 20 | ~150 (inline hints) | ~3,000 |
|
||||
| **Savings** | | | **~23,000** |
|
||||
|
||||
---
|
||||
|
||||
### C4 — Slim Phase 4 and Phase 5 payloads
|
||||
|
||||
**Root cause addressed:** Phase 5 (tour builder) receives all nodes (file + function + class) and all edges (imports + contains + calls + exports + ...). For a 500-file project, this can include 1,500+ nodes and 3,000+ edges. Most of this data is not needed for tour design.
|
||||
|
||||
#### Phase 4 (Architecture Analyzer) — minor trim
|
||||
|
||||
Phase 4 already only sends file-type nodes, which is correct. Minor change: explicitly strip `languageNotes` from each node object in the payload (it's not useful for layer assignment and can be verbose). Also strip `name` — it is always derivable as the basename of `filePath`.
|
||||
|
||||
**Before per node:** `{id, name, filePath, summary, tags, complexity, languageNotes?}`
|
||||
**After per node:** `{id, filePath, summary, tags}`
|
||||
|
||||
Savings: ~15-20% fewer tokens per node, ~3,000–5,000 tokens total for Phase 4.
|
||||
|
||||
#### Phase 5 (Tour Builder) — major trim
|
||||
|
||||
Three changes to what the orchestrator injects into the tour-builder subagent:
|
||||
|
||||
**1. File nodes only (strip function/class nodes)**
|
||||
|
||||
The tour references node IDs for wayfinding. In practice the tour always references `file:` nodes — function and class nodes are visible in the dashboard's NodeInfo sidebar once a file is selected, but the tour itself navigates at the file level.
|
||||
|
||||
- **Before:** all nodes (file + function + class) — for 500 files, maybe 1,500+ nodes
|
||||
- **After:** file-type nodes only — 500 nodes
|
||||
|
||||
**2. Slim node format**
|
||||
|
||||
The tour builder script only uses node IDs, names, and types for graph computation. Summaries and tags are used in Phase 2 (pedagogical narrative writing). Strip heavy optional fields from the injected payload:
|
||||
|
||||
- **Before per node:** `{id, name, filePath, summary, type, tags, complexity, languageNotes?}`
|
||||
- **After per node:** `{id, name, filePath, summary, type}` (drop tags, complexity, languageNotes)
|
||||
|
||||
**3. Slim edges (imports + calls only) and slim layers**
|
||||
|
||||
The tour's BFS traversal only traverses `imports` and `calls` edges. `contains`, `exports`, `tested_by`, `depends_on`, and other edge types add no value to the traversal and inflate the payload.
|
||||
|
||||
- **Before edges:** all edge types (~3,000+ edges including all `contains` edges to function/class nodes)
|
||||
- **After edges:** only `imports` and `calls` edge types (~400–800 edges for typical projects)
|
||||
|
||||
For layers, the tour builder uses layer data only to inform the tour's narrative arc (which layer to introduce first, second, etc.). It does not need the full `nodeIds` arrays — those can be very large.
|
||||
|
||||
- **Before per layer:** `{id, name, description, nodeIds: [...hundreds of IDs]}`
|
||||
- **After per layer:** `{id, name, description}` (drop nodeIds)
|
||||
|
||||
#### Token savings estimate (Phase 5)
|
||||
|
||||
| Data | Before | After |
|
||||
|---|---|---|
|
||||
| Node count | ~1,500 × ~180 chars | ~500 × ~120 chars |
|
||||
| Node tokens | ~67,500 | ~15,000 |
|
||||
| Edge count | ~3,000 × ~80 chars | ~600 × ~80 chars |
|
||||
| Edge tokens | ~60,000 | ~12,000 |
|
||||
| Layer tokens | ~5,000 | ~500 |
|
||||
| **Phase 5 total** | **~132,500** | **~27,500** |
|
||||
| **Savings** | | **~105,000** |
|
||||
|
||||
#### SKILL.md changes
|
||||
|
||||
In **Phase 4** dispatch prompt template, update the file node format:
|
||||
```
|
||||
File nodes:
|
||||
[list of {id, filePath, summary, tags} for all file-type nodes]
|
||||
```
|
||||
|
||||
In **Phase 5** dispatch prompt template, update all three payload specs:
|
||||
```
|
||||
Nodes (file nodes only):
|
||||
[list of {id, name, filePath, summary, type} for all file-type nodes only — do NOT include function or class nodes]
|
||||
|
||||
Key edges (imports and calls only):
|
||||
[list of edges where type is "imports" or "calls" only]
|
||||
|
||||
Layers:
|
||||
[list of {id, name, description} — omit nodeIds]
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### C5 — Gate the graph-reviewer subagent behind `--review`
|
||||
|
||||
**Root cause addressed:** The graph-reviewer subagent (Phase 6) reads the entire assembled graph (~500 nodes, all edges, layers, tour) and runs a LLM-powered validation. However, its Phase 1 is entirely a deterministic script, and its Phase 2 is a simple threshold decision: if `issues.length === 0`, approve. There is no LLM judgment needed for the happy path.
|
||||
|
||||
**Change:** By default, skip the graph-reviewer subagent. The orchestrator performs inline deterministic validation using a pre-written script. Only when `--review` is explicitly passed in `$ARGUMENTS` does the full LLM reviewer subagent run.
|
||||
|
||||
#### Default path (no `--review`)
|
||||
|
||||
In Phase 6, instead of dispatching the graph-reviewer subagent, the orchestrator:
|
||||
|
||||
1. Writes a compact validation script inline (embedded in SKILL.md, ~50 lines of Node.js):
|
||||
- Check: every edge source/target references a real node ID
|
||||
- Check: every file node appears in exactly one layer
|
||||
- Check: every tour step nodeId exists
|
||||
- Check: no duplicate node IDs
|
||||
- Check: required fields present on nodes and edges
|
||||
2. Runs the script against `assembled-graph.json`
|
||||
3. If `issues.length === 0`: proceed to Phase 7 (save)
|
||||
4. If `issues.length > 0`: apply the same automated fixes as before (remove dangling edges, fill defaults), then save
|
||||
|
||||
This is sufficient for standard runs. The LLM reviewer adds value for catching subtle quality issues (generic summaries, orphan nodes, tour step coherence) — but those are nice-to-have, not blocking.
|
||||
|
||||
#### `--review` path
|
||||
|
||||
When `--review` is in `$ARGUMENTS`, the full graph-reviewer subagent runs as it does today. No change to that code path.
|
||||
|
||||
#### Token savings estimate
|
||||
|
||||
| Path | Tokens |
|
||||
|---|---|
|
||||
| Current (always runs LLM reviewer) | ~58,000 input + ~500 output |
|
||||
| Default (inline script, no LLM) | ~0 |
|
||||
| `--review` (unchanged) | ~58,000 (same as current) |
|
||||
| **Savings for default runs** | **~58,500** |
|
||||
|
||||
---
|
||||
|
||||
## Combined savings summary
|
||||
|
||||
| Change | Tokens before | Tokens after | Savings |
|
||||
|---|---|---|---|
|
||||
| C1+C2: import map + batch consolidation | ~301,500 | ~53,500 | ~248,000 |
|
||||
| C3: remove addendums from batches | ~26,000 | ~3,000 | ~23,000 |
|
||||
| C4: slim Phase 4+5 payloads | ~154,500 | ~33,000 | ~121,500 |
|
||||
| C5: gate reviewer (default path) | ~58,500 | ~0 | ~58,500 |
|
||||
| **Total** | **~540,500** | **~89,500** | **~451,000 (~83%)** |
|
||||
|
||||
Estimates are for a 500-file TypeScript+React project. Actual savings scale with project size — a 1,000-file project would see proportionally larger savings from C1+C2 (more batches = more repetition eliminated).
|
||||
|
||||
---
|
||||
|
||||
## File changes
|
||||
|
||||
| File | Change |
|
||||
|---|---|
|
||||
| `skills/understand/project-scanner-prompt.md` | Add Step 8 (import resolution); add `importMap` to output schema |
|
||||
| `skills/understand/file-analyzer-prompt.md` | Replace `allProjectFiles` with `batchImportData` in input schema; update extraction script to use pre-resolved imports; add compact Language/Framework Quick Reference section; remove addendum injection steps |
|
||||
| `skills/understand/SKILL.md` | Phase 1: note importMap in scan result; Phase 2: remove addendum injection (steps 2+3), increase batch size 5-10→20-30, increase concurrency 3→5, replace `allProjectFiles` injection with `batchImportData` slice; Phase 4: slim node format in dispatch; Phase 5: file nodes only + slim edges + slim layers in dispatch; Phase 6: conditional reviewer — default inline script, `--review` flag for LLM reviewer |
|
||||
| `skills/understand/architecture-analyzer-prompt.md` | No change (addendums still injected here) |
|
||||
| `skills/understand/tour-builder-prompt.md` | Update input schema to reflect file-only nodes, imports+calls-only edges, slim layer format |
|
||||
| `skills/understand/graph-reviewer-prompt.md` | No change (only used when `--review` flag is passed) |
|
||||
|
||||
---
|
||||
|
||||
## Risks and mitigations
|
||||
|
||||
| Risk | Likelihood | Mitigation |
|
||||
|---|---|---|
|
||||
| Scanner import resolution misses edge cases (complex re-exports, dynamic imports) | Medium | Log unresolved imports; file-analyzer still uses resolved data and creates edges only for confirmed matches. Missed imports = missing edges, which is same behavior as before for unresolvable imports |
|
||||
| Larger batches (C2) reduce summary quality | Low | Summary quality is driven by the model's analysis of individual files. Batch size mainly affects how many files share one subagent's context window, not per-file quality. 20-30 files remains well within context limits |
|
||||
| Stripping function/class nodes from tour (C4) breaks existing tour steps | None | Tour steps reference `file:` node IDs. No existing tour data references function/class nodes at the step level |
|
||||
| Removing reviewer by default (C5) misses graph errors | Low | The inline deterministic script catches all critical structural issues (dangling refs, missing layers, duplicate IDs). The LLM reviewer's additional value is quality warnings (orphan nodes, generic summaries), which are non-blocking |
|
||||
| Import map generation slows down Phase 1 | Low | The scanner script already reads all files for line counting. Import parsing adds one regex pass per file — negligible overhead |
|
||||
|
||||
---
|
||||
|
||||
## Phased rollout recommendation
|
||||
|
||||
Given the risk profile, implement in this order:
|
||||
|
||||
1. **C5 first** — gate the reviewer, lowest risk, immediate 58K token savings per run
|
||||
2. **C4** — slim Phase 5 payload, no scanner changes, no quality risk
|
||||
3. **C3** — remove addendums from batches, add inline hints
|
||||
4. **C1+C2 together** — scanner changes and batch consolidation, test thoroughly on small/medium/large projects before releasing
|
||||
@@ -0,0 +1,266 @@
|
||||
# Understand Anything: Universal File Type Support
|
||||
|
||||
**Date**: 2026-03-28
|
||||
**Status**: Approved
|
||||
**Approach**: Big Bang — all file types in one release
|
||||
|
||||
## Goals
|
||||
|
||||
1. Extend Understand Anything to analyze **any** file type, not just code
|
||||
2. Support both holistic project enrichment (non-code files enrich code graphs) and standalone analysis (docs-only repos, SQL schema collections, IaC projects)
|
||||
3. Maintain backward compatibility with existing code-only analysis
|
||||
|
||||
## Supported File Types (26 new)
|
||||
|
||||
### Documentation (3)
|
||||
|
||||
| Type | Extensions | Parser | Node Types |
|
||||
|------|-----------|--------|------------|
|
||||
| Markdown | `.md`, `.mdx` | LLM + regex heading extraction | `document` |
|
||||
| reStructuredText | `.rst` | LLM | `document` |
|
||||
| Plain text | `.txt` | LLM | `document` |
|
||||
|
||||
### Configuration (5)
|
||||
|
||||
| Type | Extensions | Parser | Node Types |
|
||||
|------|-----------|--------|------------|
|
||||
| YAML | `.yaml`, `.yml` | `yaml` npm package | `config` |
|
||||
| JSON | `.json`, `.jsonc` | `JSON.parse` / `jsonc-parser` | `config`, `schema` |
|
||||
| TOML | `.toml` | `@iarna/toml` or similar | `config` |
|
||||
| .env | `.env`, `.env.*` | Regex line parser | `config` |
|
||||
| XML | `.xml` | LLM (optionally `fast-xml-parser`) | `config` |
|
||||
|
||||
### Infrastructure & DevOps (7)
|
||||
|
||||
| Type | Extensions | Parser | Node Types |
|
||||
|------|-----------|--------|------------|
|
||||
| Dockerfile | `Dockerfile`, `Dockerfile.*`, `.dockerfile` | Custom instruction parser | `service`, `pipeline` |
|
||||
| Docker Compose | `docker-compose.yml`, `compose.yml` | YAML parser + service extraction | `service` |
|
||||
| Terraform | `.tf`, `.tfvars` | Regex block parser | `resource` |
|
||||
| Kubernetes | K8s YAML (detected by `apiVersion` field) | YAML + kind detection | `service`, `resource` |
|
||||
| GitHub Actions | `.github/workflows/*.yml` | YAML + job/step extraction | `pipeline` |
|
||||
| Jenkinsfile | `Jenkinsfile` | LLM (Groovy DSL) | `pipeline` |
|
||||
| Makefile | `Makefile`, `*.mk` | Regex target parser | `pipeline` |
|
||||
|
||||
### Data & Schema (6)
|
||||
|
||||
| Type | Extensions | Parser | Node Types |
|
||||
|------|-----------|--------|------------|
|
||||
| SQL | `.sql` | Simple DDL parser | `table`, `endpoint` |
|
||||
| GraphQL | `.graphql`, `.gql` | Regex type/query parser | `schema`, `endpoint` |
|
||||
| OpenAPI/Swagger | `openapi.yaml`, `swagger.json` | YAML/JSON + path extraction | `endpoint`, `schema` |
|
||||
| Protocol Buffers | `.proto` | Regex message/service parser | `schema` |
|
||||
| JSON Schema | `*.schema.json` | JSON + `$ref`/`$defs` extraction | `schema` |
|
||||
| CSV/TSV | `.csv`, `.tsv` | Header row extraction | `table` |
|
||||
|
||||
### Shell & Scripts (3)
|
||||
|
||||
| Type | Extensions | Parser | Node Types |
|
||||
|------|-----------|--------|------------|
|
||||
| Shell | `.sh`, `.bash`, `.zsh` | Regex function parser | `file`, `function` |
|
||||
| PowerShell | `.ps1`, `.psm1` | LLM | `file`, `function` |
|
||||
| Batch | `.bat`, `.cmd` | LLM | `file` |
|
||||
|
||||
### Markup (2)
|
||||
|
||||
| Type | Extensions | Parser | Node Types |
|
||||
|------|-----------|--------|------------|
|
||||
| HTML | `.html`, `.htm` | LLM (tag structure) | `document` |
|
||||
| CSS/SCSS/Less | `.css`, `.scss`, `.less` | LLM | `file` |
|
||||
|
||||
## Schema Extensions
|
||||
|
||||
### New Node Types (8)
|
||||
|
||||
Added to the existing `file | function | class | module | concept`:
|
||||
|
||||
| Node Type | Purpose | Example |
|
||||
|-----------|---------|---------|
|
||||
| `config` | Configuration files and key settings | `package.json`, `tsconfig.json`, env vars |
|
||||
| `document` | Documentation, prose, guides | `README.md`, API docs |
|
||||
| `service` | Deployable services/containers | Docker containers, K8s Deployments |
|
||||
| `table` | Data tables, database objects | SQL tables, CSV datasets |
|
||||
| `endpoint` | API routes, queries, mutations | REST paths, GraphQL queries |
|
||||
| `pipeline` | CI/CD workflows, build steps | GitHub Actions jobs, Makefile targets |
|
||||
| `schema` | Type definitions for data interchange | Protobuf messages, JSON Schema |
|
||||
| `resource` | Infrastructure resources | Terraform resources, K8s ConfigMaps |
|
||||
|
||||
### New Edge Types (8)
|
||||
|
||||
Added to the existing 18 edge types:
|
||||
|
||||
| Edge Type | Category | Meaning | Example |
|
||||
|-----------|----------|---------|---------|
|
||||
| `deploys` | Infrastructure | Service deploys code | Dockerfile -> app source |
|
||||
| `serves` | Infrastructure | Service exposes endpoint | K8s Service -> API endpoint |
|
||||
| `migrates` | Data flow | Migration modifies table | SQL migration -> table |
|
||||
| `documents` | Semantic | Doc describes code | README -> module |
|
||||
| `provisions` | Infrastructure | IaC creates resource | Terraform -> AWS resource |
|
||||
| `routes` | Behavioral | Routes traffic to service | nginx config -> service |
|
||||
| `defines_schema` | Data flow | Defines data shape | Protobuf -> endpoint |
|
||||
| `triggers` | Behavioral | Triggers pipeline/action | Git push -> GitHub Actions |
|
||||
|
||||
### Schema Validation Auto-Fix Aliases
|
||||
|
||||
New node type aliases:
|
||||
- `container` -> `service`, `migration` -> `table`, `workflow` -> `pipeline`
|
||||
- `route` -> `endpoint`, `doc` -> `document`, `setting` -> `config`, `infra` -> `resource`
|
||||
|
||||
New edge type aliases:
|
||||
- `describes` -> `documents`, `creates` -> `provisions`, `exposes` -> `serves`
|
||||
|
||||
## Plugin Architecture Changes
|
||||
|
||||
### Generalized AnalyzerPlugin Interface
|
||||
|
||||
```typescript
|
||||
interface AnalyzerPlugin {
|
||||
name: string;
|
||||
languages: string[];
|
||||
analyzeFile(filePath: string, content: string): StructuralAnalysis;
|
||||
resolveImports?(filePath: string, content: string): ImportResolution[]; // Now optional
|
||||
extractCallGraph?(filePath: string, content: string): CallGraphEntry[];
|
||||
extractReferences?(filePath: string, content: string): ReferenceResolution[]; // NEW
|
||||
}
|
||||
|
||||
interface ReferenceResolution {
|
||||
source: string; // File making the reference
|
||||
target: string; // Referenced file or identifier
|
||||
type: string; // Reference type: "file", "image", "schema", "service"
|
||||
line?: number;
|
||||
}
|
||||
```
|
||||
|
||||
### Extended StructuralAnalysis
|
||||
|
||||
```typescript
|
||||
interface StructuralAnalysis {
|
||||
// Existing (unchanged)
|
||||
functions: FunctionInfo[];
|
||||
classes: ClassInfo[];
|
||||
imports: ImportInfo[];
|
||||
exports: ExportInfo[];
|
||||
// New (all optional for backward compat)
|
||||
sections?: SectionInfo[]; // Documents: headings, chapters
|
||||
definitions?: DefinitionInfo[]; // Schemas: types, messages, tables
|
||||
services?: ServiceInfo[]; // Infra: containers, deployments
|
||||
endpoints?: EndpointInfo[]; // APIs: routes, queries
|
||||
steps?: StepInfo[]; // Pipelines: jobs, stages, targets
|
||||
resources?: ResourceInfo[]; // IaC: terraform resources, K8s objects
|
||||
}
|
||||
```
|
||||
|
||||
### Custom Parsers (12)
|
||||
|
||||
All lightweight — mostly regex-based, minimal dependencies:
|
||||
|
||||
| Parser | Implementation | Extracts |
|
||||
|--------|---------------|----------|
|
||||
| `MarkdownParser` | Regex | Headings, links, code blocks, front matter |
|
||||
| `YAMLParser` | `yaml` npm | Key hierarchy, anchors, multi-doc |
|
||||
| `JSONParser` | Built-in `JSON.parse` | Key structure, `$ref`/`$defs` |
|
||||
| `TOMLParser` | `@iarna/toml` | Section structure |
|
||||
| `EnvParser` | Regex | Variable names and references |
|
||||
| `DockerfileParser` | Regex | FROM stages, EXPOSE ports, COPY sources |
|
||||
| `SQLParser` | Regex | CREATE TABLE/VIEW/INDEX, columns, foreign keys |
|
||||
| `GraphQLParser` | Regex | Types, queries, mutations, subscriptions |
|
||||
| `ProtobufParser` | Regex | Messages, services, enums, RPCs |
|
||||
| `TerraformParser` | Regex | Resources, modules, variables, outputs |
|
||||
| `MakefileParser` | Regex | Targets, dependencies, variables |
|
||||
| `ShellParser` | Regex | Functions, sourced files |
|
||||
|
||||
## Agent Pipeline Changes
|
||||
|
||||
### Project Scanner
|
||||
|
||||
1. Scan ALL file types (remove code-only filter)
|
||||
2. Tag each file with category: `code`, `config`, `docs`, `infra`, `data`, `script`, `markup`
|
||||
3. Smart batch grouping: keep related files together (e.g., Dockerfile + docker-compose.yml)
|
||||
|
||||
### File Analyzer
|
||||
|
||||
Type-aware prompt templates by category:
|
||||
|
||||
- **Code**: Current behavior (functions, classes, imports, call graph)
|
||||
- **Config**: Extract key settings, what they configure, which code files they affect
|
||||
- **Documentation**: Extract sections, key concepts, which code components are documented
|
||||
- **Infrastructure**: Extract services, ports, volumes, dependencies, which code they deploy
|
||||
- **Data/Schema**: Extract tables, columns, types, relationships, which code consumes this data
|
||||
- **Pipelines**: Extract jobs, steps, triggers, which code/infra they build/deploy
|
||||
|
||||
### Cross-Type Reference Resolution
|
||||
|
||||
Post-analysis step connecting:
|
||||
- Dockerfile `COPY` -> source code directories
|
||||
- CI config `run: npm test` -> test files
|
||||
- K8s manifest `image:` -> Dockerfile
|
||||
- SQL foreign keys -> other tables
|
||||
- OpenAPI `$ref` -> schema definitions
|
||||
- Markdown links -> referenced files
|
||||
|
||||
### Architecture Analyzer
|
||||
|
||||
New pattern detection:
|
||||
- Deployment topology: Dockerfile -> compose -> K8s chain
|
||||
- Data flow: Schema -> migration -> API endpoint -> client code
|
||||
- Documentation coverage: which modules have docs vs. not
|
||||
- Configuration dependency: which config files affect which code paths
|
||||
|
||||
### Tour Builder
|
||||
|
||||
Include non-code tour stops:
|
||||
- Project README overview
|
||||
- Dockerfile containerization
|
||||
- SQL migration database schema
|
||||
- CI/CD pipeline explanation
|
||||
|
||||
## Dashboard Visualization
|
||||
|
||||
### New Node Visual Styles
|
||||
|
||||
| Node Type | Shape | Color | Icon |
|
||||
|-----------|-------|-------|------|
|
||||
| `config` | Rounded rect | Teal (#5eead4) | Gear |
|
||||
| `document` | Rounded rect | Sky blue (#7dd3fc) | Document |
|
||||
| `service` | Hexagon | Violet (#a78bfa) | Container/Box |
|
||||
| `table` | Rectangle | Emerald (#6ee7b7) | Grid |
|
||||
| `endpoint` | Pill/Stadium | Orange (#fdba74) | Arrow-right |
|
||||
| `pipeline` | Rounded rect | Rose (#fda4af) | Play/Workflow |
|
||||
| `schema` | Diamond | Amber (#fcd34d) | Blueprint |
|
||||
| `resource` | Cloud shape | Indigo (#a5b4fc) | Cloud |
|
||||
|
||||
### Graph Layout
|
||||
|
||||
1. Layer grouping by category — non-code nodes cluster separately from code nodes
|
||||
2. Legend update with 8 new node types
|
||||
3. Filter controls — checkboxes to show/hide each file category
|
||||
|
||||
### Sidebar Enhancements
|
||||
|
||||
NodeInfo panel updates per node type:
|
||||
- **Config**: key-value pairs, referencing code files
|
||||
- **Document**: heading outline, linked code components
|
||||
- **Service**: ports, volumes, dependencies, deployed code
|
||||
- **Table**: columns, types, foreign key relationships
|
||||
- **Endpoint**: HTTP method, path, request/response schema
|
||||
- **Pipeline**: jobs, triggers, deployed targets
|
||||
- **Schema**: fields, nested types, consumers
|
||||
- **Resource**: provider, type, dependencies
|
||||
|
||||
ProjectOverview panel: add "File Types" breakdown (code vs. non-code distribution).
|
||||
|
||||
## New Dependencies
|
||||
|
||||
- `yaml` — YAML parsing (already common, ~50KB)
|
||||
- `@iarna/toml` — TOML parsing (~30KB)
|
||||
- `jsonc-parser` — JSON with comments (~20KB)
|
||||
|
||||
No tree-sitter WASM additions. All other parsers are regex-based with zero dependencies.
|
||||
|
||||
## Backward Compatibility
|
||||
|
||||
- All new `StructuralAnalysis` fields are optional
|
||||
- `resolveImports` becomes optional on `AnalyzerPlugin`
|
||||
- Existing `LanguageConfig` entries unchanged
|
||||
- Schema validation auto-fixes new type aliases
|
||||
- Existing knowledge graphs remain valid (new types are additive)
|
||||
@@ -0,0 +1,53 @@
|
||||
# Homepage Update Design — 2026-03-29
|
||||
|
||||
## Goal
|
||||
|
||||
Update the Astro homepage (`homepage/`) to reflect features added across v1.2.0, v1.3.0, and v2.0.0 releases. The README and homepage structure/layout stay unchanged.
|
||||
|
||||
## Scope
|
||||
|
||||
Three areas to update:
|
||||
|
||||
### 1. Features Section — Expand from 3 to 6 Cards
|
||||
|
||||
Current (3 cards):
|
||||
- Interactive Knowledge Graph
|
||||
- Plain-English Summaries
|
||||
- Guided Tours
|
||||
|
||||
Updated (6 cards, 2 rows of 3):
|
||||
|
||||
| # | Title | Icon | Description |
|
||||
|---|-------|------|-------------|
|
||||
| 1 | Interactive Knowledge Graph | `◈` | Visualize files, functions, and dependencies as an explorable graph with hierarchical drill-down and smart layout. |
|
||||
| 2 | Beyond Code Analysis | `⬡` | Analyze your entire project — Dockerfiles, Terraform, SQL, Markdown, and 26+ file types mapped into one unified graph. |
|
||||
| 3 | Smart Filtering & Search | `⊘` | Filter by node type, complexity, layer, or edge category. Fuzzy and semantic search to find anything instantly. |
|
||||
| 4 | Export & Share | `⎙` | Export your knowledge graph as high-quality PNG, SVG, or filtered JSON — ready for docs, presentations, or further analysis. |
|
||||
| 5 | Dependency Path Finder | `⟿` | Find the shortest path between any two components. Understand how parts of your system connect at a glance. |
|
||||
| 6 | Guided Tours & Onboarding | `⟐` | AI-generated walkthroughs that teach the codebase step by step, plus onboarding guides for new team members. |
|
||||
|
||||
### 2. Install Section
|
||||
|
||||
Update the note from Claude Code-only to multi-platform:
|
||||
- Before: "Works with Claude Code — Anthropic's official CLI for Claude."
|
||||
- After: "Works with Claude Code, Codex, OpenCode, Gemini CLI, and more."
|
||||
|
||||
### 3. Footer
|
||||
|
||||
Update tagline:
|
||||
- Before: "Built as a Claude Code plugin"
|
||||
- After: "Built for AI coding assistants"
|
||||
|
||||
## Files to Modify
|
||||
|
||||
- `homepage/src/components/Features.astro` — replace 3 cards with 6
|
||||
- `homepage/src/components/Install.astro` — update platform note
|
||||
- `homepage/src/components/Footer.astro` — update tagline
|
||||
|
||||
## Out of Scope
|
||||
|
||||
- README.md updates
|
||||
- Showcase section / screenshot
|
||||
- Nav component
|
||||
- Hero section
|
||||
- Layout / global CSS structure changes
|
||||
@@ -0,0 +1,335 @@
|
||||
# Business Domain Knowledge Extraction — Design Spec
|
||||
|
||||
**Issue:** [#61](https://github.com/Lum1104/Understand-Anything/issues/61)
|
||||
**Date:** 2026-04-01
|
||||
|
||||
## Problem
|
||||
|
||||
The current knowledge graph shows file-level dependency relationships, but this has limited value — you can already see imports in an IDE. When files are many, listing dependency edges doesn't reduce cognitive load; you still mentally reconstruct what the code *does*. What's needed is business domain knowledge: the logic and domain concepts embedded within the code, not the structural wiring.
|
||||
|
||||
## Solution Overview
|
||||
|
||||
A new `/understand-domain` skill that extracts business domain knowledge and renders it as a horizontal flow graph in the dashboard. Two viewing modes: a high-level **Domain view** (default when available) and the existing **Structural view**, with a toggle to switch between them.
|
||||
|
||||
## Architecture: Separate File, Shared Schema (Approach C)
|
||||
|
||||
Domain data lives in a **separate file** (`domain-graph.json`) using the **same `KnowledgeGraph` type system** — extended with new node/edge types. The dashboard detects both files and offers a view toggle. Domain nodes can reference structural nodes by ID for drill-down.
|
||||
|
||||
**Why separate files:**
|
||||
- `/understand-domain` works standalone (lightweight) or alongside full graph
|
||||
- Shared schema means search, validation, and filtering work for both
|
||||
- No risk of polluting the structural graph
|
||||
- Each file is independently valid
|
||||
|
||||
## Section 1: Domain Graph Schema
|
||||
|
||||
### Three-Level Hierarchy
|
||||
|
||||
1. **Business Domain** (top) — e.g., "Purchasing", "Logistics", "Warehouse Management"
|
||||
2. **Business Flow** (mid) — e.g., "Create Order", "Process Refund"
|
||||
3. **Business Step** (leaf) — e.g., "Validate input", "Check inventory", "Persist order"
|
||||
|
||||
### New Node Types (3)
|
||||
|
||||
| Type | Purpose | Example |
|
||||
|------|---------|---------|
|
||||
| `domain` | Business domain cluster | "Order Management", "Logistics" |
|
||||
| `flow` | A business process within a domain | "Create Order", "Process Refund" |
|
||||
| `step` | A single step in a flow | "Validate order input" |
|
||||
|
||||
### New Edge Types (4)
|
||||
|
||||
| Type | Purpose |
|
||||
|------|---------|
|
||||
| `contains_flow` | domain → flow |
|
||||
| `flow_step` | flow → step (ordered via `weight` field, e.g., 0.1, 0.2, ...) |
|
||||
| `cross_domain` | domain → domain (interaction between domains) |
|
||||
| `implements` | step → file/function node ID (reference into structural graph) |
|
||||
|
||||
### Domain Node Structure
|
||||
|
||||
```typescript
|
||||
// domain node
|
||||
{
|
||||
id: "domain:order-management",
|
||||
type: "domain",
|
||||
name: "Order Management",
|
||||
summary: "Handles the complete order lifecycle...",
|
||||
tags: ["e-commerce", "core-business"],
|
||||
complexity: "complex",
|
||||
domainMeta?: {
|
||||
entities: ["Order", "LineItem", "OrderStatus"],
|
||||
businessRules: ["Orders require inventory check before confirmation"],
|
||||
crossDomainInteractions: ["Triggers Logistics on order confirmed", "Reads from Customer Service for buyer info"]
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### Flow Node Structure
|
||||
|
||||
```typescript
|
||||
{
|
||||
id: "flow:create-order",
|
||||
type: "flow",
|
||||
name: "Create Order",
|
||||
summary: "Customer submits a new order through the API",
|
||||
tags: ["write-path", "api"],
|
||||
complexity: "moderate",
|
||||
domainMeta?: {
|
||||
entryPoint: "POST /api/orders",
|
||||
entryType: "http" | "cli" | "event" | "cron" | "manual"
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### Step Node Structure
|
||||
|
||||
```typescript
|
||||
{
|
||||
id: "step:create-order:validate-input",
|
||||
type: "step",
|
||||
name: "Validate order input",
|
||||
summary: "Checks request body against order schema, rejects invalid payloads",
|
||||
tags: ["validation"],
|
||||
complexity: "simple",
|
||||
filePath: "src/validators/order-validator.ts",
|
||||
lineRange: [12, 45]
|
||||
}
|
||||
```
|
||||
|
||||
### File Output
|
||||
|
||||
Saved to `.understand-anything/domain-graph.json` — same `KnowledgeGraph` shape, valid on its own.
|
||||
|
||||
## Section 2: Analysis Pipeline
|
||||
|
||||
### Two Paths, Same Output
|
||||
|
||||
**Path 1: Lightweight scan (no existing graph)**
|
||||
|
||||
```
|
||||
File tree scan
|
||||
→ Static entry point detection (tree-sitter)
|
||||
→ Route definitions, exported handlers, main(), event listeners, cron decorators
|
||||
→ Feed to LLM: file tree + detected entry points + sampled file contents
|
||||
→ LLM outputs: domains, flows, steps, cross-domain interactions
|
||||
→ Build domain-graph.json
|
||||
```
|
||||
|
||||
Token cost: ~10-20% of a full `/understand` scan.
|
||||
|
||||
**Path 2: Derive from existing graph**
|
||||
|
||||
```
|
||||
Load knowledge-graph.json
|
||||
→ Extract: all nodes, edges, layers, summaries, tour
|
||||
→ Feed to LLM: graph data as structured context
|
||||
→ LLM outputs: domains, flows, steps, cross-domain interactions
|
||||
→ Build domain-graph.json
|
||||
```
|
||||
|
||||
Very cheap — no file reading needed, LLM reasons over existing summaries and call edges.
|
||||
|
||||
**Path Selection:** `/understand-domain` checks if `.understand-anything/knowledge-graph.json` exists. If yes → Path 2. If no → Path 1.
|
||||
|
||||
### Agent Structure
|
||||
|
||||
One new agent: **`domain-analyzer`** (opus model). Handles both paths. For large codebases, can batch by detected entry point groups.
|
||||
|
||||
## Section 3: Preprocessing Script & Skill Integration
|
||||
|
||||
### Script: `understand-anything-plugin/skills/understand-domain/extract-domain-context.py`
|
||||
|
||||
Bundled with the skill (not in `scripts/` which is for development tooling). Runs before the LLM agent. Outputs `.understand-anything/intermediate/domain-context.json`:
|
||||
|
||||
```json
|
||||
{
|
||||
"fileTree": ["src/api/orders.ts", "src/services/...", "..."],
|
||||
"entryPoints": [
|
||||
{
|
||||
"file": "src/api/orders.ts",
|
||||
"type": "http",
|
||||
"method": "POST",
|
||||
"path": "/api/orders",
|
||||
"handler": "createOrder",
|
||||
"lineRange": [15, 45],
|
||||
"snippet": "async function createOrder(req, res) { ... }"
|
||||
}
|
||||
],
|
||||
"fileSignatures": {
|
||||
"src/services/order-service.ts": {
|
||||
"exports": ["createOrder", "cancelOrder", "getOrderById"],
|
||||
"imports": ["inventory-service", "pricing-service", "order-repo"],
|
||||
"summary": null
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
Python script (no heavy dependencies — uses `ast` for Python, regex for other languages). Uses:
|
||||
- Walk the file tree (respecting `.gitignore`)
|
||||
- Detect entry points by pattern: route decorators, `app.get/post`, `export default handler`, `main()`, event listeners
|
||||
- Extract function signatures and import/export lists per file
|
||||
- Keep code snippets short (signature + first few lines, not full bodies)
|
||||
|
||||
### Skill Integration
|
||||
|
||||
The `/understand-domain` skill markdown:
|
||||
|
||||
1. Runs `understand-anything-plugin/skills/understand-domain/extract-domain-context.py`
|
||||
2. Checks for existing `knowledge-graph.json`
|
||||
3. If exists → passes both `domain-context.json` + graph data to domain-analyzer agent
|
||||
4. If not → passes only `domain-context.json`
|
||||
5. Agent outputs `domain-graph.json`
|
||||
6. Cleans up intermediate files
|
||||
7. Auto-triggers `/understand-dashboard`
|
||||
|
||||
## Section 4: Dashboard — Domain View
|
||||
|
||||
### View Toggle
|
||||
|
||||
- Top-left corner: pill toggle — **"Domain" / "Structural"**
|
||||
- Domain view is default when `domain-graph.json` exists
|
||||
- If only one graph file exists, no toggle shown
|
||||
- Switching views preserves sidebar state
|
||||
|
||||
### Horizontal Flow Layout
|
||||
|
||||
- **Layout engine:** Dagre with `rankdir: "LR"` (left-to-right)
|
||||
- **Zoom levels:**
|
||||
- **Zoomed out:** Domain clusters as large rounded rectangles, `cross_domain` edges between them
|
||||
- **Click domain:** Expands to show flows as horizontal lanes
|
||||
- **Click flow:** Shows step-by-step trace left-to-right
|
||||
|
||||
### Domain Cluster Rendering
|
||||
|
||||
```
|
||||
┌─────────────────────────────────────┐
|
||||
│ Order Management │
|
||||
│ "Handles the complete order..." │
|
||||
│ │
|
||||
│ Entities: Order, LineItem, Status │
|
||||
│ Flows: Create Order, Cancel Order │
|
||||
│ Rules: "Requires inventory check" │
|
||||
└─────────────────────────────────────┘
|
||||
──cross_domain──→ [Logistics]
|
||||
```
|
||||
|
||||
- Gold/amber border for domain clusters (matches existing theme)
|
||||
- Shows summary, entity list, flow count on the cluster face
|
||||
- Cross-domain edges: thick dashed lines with labels
|
||||
|
||||
### Flow Trace Rendering
|
||||
|
||||
```
|
||||
POST /api/orders
|
||||
┌──────────┐ ┌──────────────┐ ┌───────────┐ ┌──────────┐ ┌────────────┐
|
||||
│ Validate │───→│ Check │───→│ Calculate │───→│ Persist │───→│ Send │
|
||||
│ Input │ │ Inventory │ │ Pricing │ │ Order │ │ Confirm │
|
||||
└──────────┘ └──────────────┘ └───────────┘ └──────────┘ └────────────┘
|
||||
```
|
||||
|
||||
- Steps connected left-to-right by `flow_step` edges (ordered by `weight`)
|
||||
- Entry point label at the left as flow trigger
|
||||
- Clicking a step → sidebar shows detail + link to structural view
|
||||
|
||||
### Sidebar Adaptations
|
||||
|
||||
**Domain node selected:** Summary, business rules, entities, cross-domain interactions, list of flows (clickable)
|
||||
|
||||
**Flow node selected:** Entry point info, step list in order, complexity
|
||||
|
||||
**Step node selected:** Description, "View in code" link (switches to structural view + navigates to file/function), previous/next step links
|
||||
|
||||
### Drill-Down: Domain → Structural
|
||||
|
||||
When a step has an `implements` edge referencing a structural node ID:
|
||||
- "View implementation" button in sidebar
|
||||
- Switches to structural view and navigates to that node
|
||||
- Breadcrumb: `Domain: Order Management > Flow: Create Order > Step: Validate Input → [structural view]`
|
||||
|
||||
## Section 5: Skill Definition
|
||||
|
||||
### `/understand-domain` Skill
|
||||
|
||||
- **File:** `skills/understand-domain.md`
|
||||
- **Arguments:** Optional `--full` flag to force Path 1 (rescan even if graph exists)
|
||||
|
||||
### Execution Flow
|
||||
|
||||
```
|
||||
1. Run scripts/extract-domain-context.mjs
|
||||
2. Check for .understand-anything/knowledge-graph.json
|
||||
├── Exists → Path 2: load graph + domain-context.json
|
||||
└── Missing → Path 1: domain-context.json only
|
||||
3. Invoke domain-analyzer agent (opus)
|
||||
4. Validate output against schema
|
||||
5. Save .understand-anything/domain-graph.json
|
||||
6. Clean up intermediate/domain-context.json
|
||||
7. Auto-trigger /understand-dashboard
|
||||
```
|
||||
|
||||
### Domain Analyzer Agent
|
||||
|
||||
- **File:** `agents/domain-analyzer.md`
|
||||
- **Model:** opus
|
||||
- **Input:** Either (file tree + entry points) or (existing knowledge graph)
|
||||
- **Output:** Complete domain graph JSON
|
||||
|
||||
### Change Map
|
||||
|
||||
| Area | Changes |
|
||||
|------|---------|
|
||||
| `packages/core/src/types.ts` | Add 3 node types, 4 edge types, `domainMeta` optional field |
|
||||
| `packages/core/src/schema.ts` | Extend Zod schemas + aliases for new types |
|
||||
| `packages/core/src/persistence/` | Add `loadDomainGraph()` / `saveDomainGraph()` |
|
||||
| `understand-anything-plugin/skills/understand-domain/extract-domain-context.py` | New preprocessing script (bundled with skill) |
|
||||
| `agents/domain-analyzer.md` | New agent definition |
|
||||
| `skills/understand-domain.md` | New skill definition |
|
||||
| `packages/dashboard/src/store.ts` | Add `domainGraph`, `viewMode` state |
|
||||
| `packages/dashboard/src/components/` | New: `DomainGraphView.tsx`, `DomainClusterNode.tsx`, `FlowTraceNode.tsx`, `StepNode.tsx` |
|
||||
| `packages/dashboard/src/components/` | Modify: `App.tsx` (view toggle), `NodeInfo.tsx` (domain sidebar), `FilterPanel.tsx` (domain filters) |
|
||||
| `packages/dashboard/src/utils/` | New: `domain-layout.ts` (horizontal Dagre config) |
|
||||
|
||||
## Section 6: Error Tolerance
|
||||
|
||||
### Pipeline-Level Tolerance
|
||||
|
||||
| Stage | Error Handling |
|
||||
|-------|---------------|
|
||||
| Preprocessing script | If tree-sitter fails on a file, skip and continue. Log skipped files. Entry point detection is best-effort. |
|
||||
| LLM output parsing | Same strategy as existing `parseTourGenerationResponse()` — extract JSON from markdown, handle partial responses. |
|
||||
| Schema validation | Existing auto-fix pipeline: sanitize → normalize (aliases) → apply defaults → validate. Drop broken nodes/edges, don't fail the whole graph. |
|
||||
| Cross-graph references | `implements` edges pointing to non-existent structural node IDs → keep edge but mark as `unresolved`. Dashboard shows step without drill-down link. |
|
||||
|
||||
### Domain-Specific Validation Rules
|
||||
|
||||
- **Domain with no flows:** Warn, keep (summary/entities still useful)
|
||||
- **Flow with no steps:** Warn, keep (entry point info still valuable)
|
||||
- **Steps with broken ordering:** Re-number sequentially by array position if `weight` values missing/duplicate
|
||||
- **Orphan steps:** Steps not connected to any flow → attach to synthetic "Uncategorized" flow
|
||||
- **Duplicate domains:** Merge by name similarity (fuzzy match), combine flows
|
||||
- **Empty domain graph:** Error banner in dashboard: "Domain extraction failed — try running `/understand` first for richer context, then `/understand-domain`"
|
||||
|
||||
### Dashboard Resilience
|
||||
|
||||
- If `domainMeta` missing on a domain node, sidebar shows only summary/tags
|
||||
- If `domain-graph.json` fails validation entirely, fall back to structural view with warning banner
|
||||
- Partial graphs render what's valid
|
||||
|
||||
### Normalization Aliases for Domain Types
|
||||
|
||||
```typescript
|
||||
// Node type aliases
|
||||
"business_domain" → "domain"
|
||||
"process" → "flow"
|
||||
"workflow" → "flow"
|
||||
"action" → "step"
|
||||
"task" → "step"
|
||||
|
||||
// Edge type aliases
|
||||
"has_flow" → "contains_flow"
|
||||
"next_step" → "flow_step"
|
||||
"interacts_with" → "cross_domain"
|
||||
"implemented_by" → "implements"
|
||||
```
|
||||
@@ -0,0 +1,335 @@
|
||||
# /understand-knowledge — Personal Knowledge Base Plugin Design
|
||||
|
||||
## Overview
|
||||
|
||||
A new `/understand-knowledge` skill within the existing Understand Anything plugin that takes any folder of markdown notes and produces an interactive knowledge graph visualized in the existing dashboard.
|
||||
|
||||
Inspired by Andrej Karpathy's LLM Wiki pattern — where an LLM compiles and maintains a structured wiki from raw sources — this plugin goes further by adding typed relationship discovery and interactive graph visualization that tools like Obsidian and Logseq cannot provide.
|
||||
|
||||
### Goals
|
||||
|
||||
- Accept any markdown-based knowledge base (Obsidian vault, Logseq graph, Dendron workspace, Foam, Karpathy-style LLM wiki, Zettelkasten, or plain markdown)
|
||||
- Auto-detect the format and adapt parsing accordingly
|
||||
- Use LLM analysis to discover implicit relationships beyond explicit links
|
||||
- Produce a knowledge graph with typed nodes and edges
|
||||
- Visualize in the existing dashboard with knowledge-specific layout, sidebar, and reading mode
|
||||
|
||||
### Non-Goals
|
||||
|
||||
- Real-time sync with the knowledge base tool (Obsidian, Logseq, etc.)
|
||||
- Replacing the user's existing PKM tool — this is a visualization/analysis layer on top
|
||||
- Supporting non-markdown formats (PDFs, bookmarks) in v1
|
||||
|
||||
---
|
||||
|
||||
## Schema Extensions
|
||||
|
||||
### New Node Types (5)
|
||||
|
||||
Added to the existing `NodeType` union (currently 16 types):
|
||||
|
||||
```typescript
|
||||
export type NodeType =
|
||||
// existing (16)
|
||||
| "file" | "function" | "class" | "module" | "concept"
|
||||
| "config" | "document" | "service" | "table" | "endpoint"
|
||||
| "pipeline" | "schema" | "resource"
|
||||
| "domain" | "flow" | "step"
|
||||
// knowledge (5 new → 21 total)
|
||||
| "article" | "entity" | "topic" | "claim" | "source";
|
||||
```
|
||||
|
||||
| Type | What it represents | Example |
|
||||
|------|-------------------|---------|
|
||||
| `article` | A wiki/note page — the primary content unit | "LLM Knowledge Bases.md" |
|
||||
| `entity` | A named thing: person, tool, paper, org, project | "Andrej Karpathy", "Obsidian" |
|
||||
| `topic` | A thematic cluster grouping related articles | "Personal Knowledge Management" |
|
||||
| `claim` | A specific assertion, insight, or takeaway | "RAG loses context at chunk boundaries" |
|
||||
| `source` | Raw/reference material that articles are compiled from | A paper URL, a raw PDF reference |
|
||||
|
||||
### New Edge Types (6)
|
||||
|
||||
Added to the existing `EdgeType` union (currently 29 types):
|
||||
|
||||
```typescript
|
||||
export type EdgeType =
|
||||
// existing (29)
|
||||
| ...
|
||||
// knowledge (6 new → 35 total)
|
||||
| "cites" | "contradicts" | "builds_on"
|
||||
| "exemplifies" | "categorized_under" | "authored_by";
|
||||
```
|
||||
|
||||
| Type | Direction | Meaning |
|
||||
|------|-----------|---------|
|
||||
| `cites` | article → source | References or draws from |
|
||||
| `contradicts` | claim → claim | Conflicts or disagrees with |
|
||||
| `builds_on` | article → article | Extends, refines, or deepens |
|
||||
| `exemplifies` | entity → concept/topic | Is a concrete example of |
|
||||
| `categorized_under` | article/entity → topic | Belongs to this theme |
|
||||
| `authored_by` | article → entity | Written or created by |
|
||||
|
||||
### New Metadata Interface
|
||||
|
||||
```typescript
|
||||
export interface KnowledgeMeta {
|
||||
format?: "obsidian" | "logseq" | "dendron" | "foam" | "karpathy" | "zettelkasten" | "plain";
|
||||
wikilinks?: string[];
|
||||
backlinks?: string[];
|
||||
frontmatter?: Record<string, unknown>;
|
||||
sourceUrl?: string;
|
||||
confidence?: number; // 0-1, for LLM-inferred relationships
|
||||
}
|
||||
```
|
||||
|
||||
Added as an optional field on `GraphNode`:
|
||||
|
||||
```typescript
|
||||
export interface GraphNode {
|
||||
// ...existing fields
|
||||
knowledgeMeta?: KnowledgeMeta;
|
||||
}
|
||||
```
|
||||
|
||||
### Graph-Level Kind Flag
|
||||
|
||||
```typescript
|
||||
export interface KnowledgeGraph {
|
||||
version: string;
|
||||
kind: "codebase" | "knowledge"; // NEW
|
||||
project: ProjectMeta;
|
||||
nodes: GraphNode[];
|
||||
edges: GraphEdge[];
|
||||
layers: Layer[];
|
||||
tour: TourStep[];
|
||||
}
|
||||
```
|
||||
|
||||
The `kind` field tells the dashboard which layout, sidebar, and visual styling to use. For backward compatibility, graphs without a `kind` field default to `"codebase"`.
|
||||
|
||||
---
|
||||
|
||||
## Format Detection & Format Guides
|
||||
|
||||
### Auto-Detection Logic
|
||||
|
||||
Scans the target directory for signature files/patterns. Priority order (first match wins):
|
||||
|
||||
| Priority | Signal | Detected Format |
|
||||
|----------|--------|----------------|
|
||||
| 1 | `.obsidian/` directory | Obsidian |
|
||||
| 2 | `logseq/` + `pages/` directories | Logseq |
|
||||
| 3 | `.dendron.yml` or `*.schema.yml` | Dendron |
|
||||
| 4 | `.foam/` or `.vscode/foam.json` | Foam |
|
||||
| 5 | `raw/` + `wiki/` + `index.md` | Karpathy |
|
||||
| 6 | `[[wikilinks]]` + unique ID prefixes in filenames | Zettelkasten |
|
||||
| 7 | Fallback | Plain markdown |
|
||||
|
||||
### Format Guides
|
||||
|
||||
Located at `skills/understand-knowledge/formats/`. Each guide tells the LLM agents how to parse that format:
|
||||
|
||||
```
|
||||
skills/understand-knowledge/
|
||||
SKILL.md
|
||||
formats/
|
||||
obsidian.md — [[wikilinks]], [[note|alias]], [[note#heading]],
|
||||
#tags, YAML frontmatter, .obsidian/ config,
|
||||
dataview annotations, canvas files
|
||||
logseq.md — block-based outliner, ((block-refs)),
|
||||
journals/YYYY_MM_DD.md, pages/,
|
||||
property:: value syntax, TODO/DONE states
|
||||
dendron.md — dot-delimited hierarchy (a.b.c.md),
|
||||
.schema.yml for structure validation,
|
||||
cross-vault links, refactoring rules
|
||||
foam.md — [[wikilinks]] + link reference definitions
|
||||
at file bottom, .foam/config, placeholder links
|
||||
karpathy.md — raw/ → wiki/ pipeline, index.md master map,
|
||||
log.md append-only record, _meta/ state,
|
||||
LLM-maintained cross-references
|
||||
zettelkasten.md — atomic notes, unique ID prefixes (timestamps),
|
||||
typed semantic links, one idea per note
|
||||
plain.md — standard [markdown](links), folder hierarchy,
|
||||
heading structure, no special conventions
|
||||
```
|
||||
|
||||
Each format guide covers:
|
||||
- How to parse links (wikilinks vs standard vs block refs)
|
||||
- Where metadata lives (frontmatter vs inline properties vs block properties)
|
||||
- What the folder structure means (journals/ = daily notes, pages/ = permanent notes)
|
||||
- What conventions to respect vs what to infer
|
||||
|
||||
### Format Guide Authoring Process
|
||||
|
||||
Format guides must be research-backed. During implementation, the agent building each format guide must:
|
||||
1. Read the official documentation for that format (Obsidian Help, Logseq docs, Dendron wiki, Foam docs, etc.)
|
||||
2. Study real-world examples of that format's structure
|
||||
3. Write the guide based on verified behavior, not assumptions
|
||||
|
||||
---
|
||||
|
||||
## Agent Pipeline
|
||||
|
||||
```
|
||||
knowledge-scanner → format-detector → article-analyzer → relationship-builder → graph-reviewer
|
||||
```
|
||||
|
||||
### Agent Definitions
|
||||
|
||||
| Agent | Input | Output | Model |
|
||||
|-------|-------|--------|-------|
|
||||
| `knowledge-scanner` | Target directory path | File manifest: all `.md` files with paths, sizes, first 20 lines preview | `inherit` |
|
||||
| `format-detector` | File manifest + directory structure | Detected format + format-specific parsing hints | `inherit` |
|
||||
| `article-analyzer` | Individual `.md` file + format guide | Per-file nodes (article, entities, claims) + explicit edges (wikilinks, tags) | `inherit` |
|
||||
| `relationship-builder` | All per-file results | Cross-file implicit edges (builds_on, contradicts, categorized_under) + topic clustering + layers | `inherit` |
|
||||
| `graph-reviewer` | Assembled graph | Validated graph — deduped entities, consistent edge weights, orphan detection | `inherit` |
|
||||
|
||||
### Key Differences from Codebase Pipeline
|
||||
|
||||
- **No tree-sitter** — markdown parsing is simpler, mostly regex + LLM interpretation
|
||||
- **format-detector** replaces framework detection — picks the right format guide
|
||||
- **article-analyzer** replaces file-analyzer — extracts knowledge concepts instead of code structure
|
||||
- **relationship-builder** is the heavy LLM step — discovers implicit connections across files that explicit links miss
|
||||
- **graph-reviewer** stays similar — validates the assembled graph for consistency
|
||||
|
||||
### Intermediate Files
|
||||
|
||||
Same pattern as codebase analysis:
|
||||
|
||||
```
|
||||
.understand-anything/intermediate/
|
||||
knowledge-manifest.json — scanner output
|
||||
format-detection.json — detected format + hints
|
||||
article-*.json — per-file analysis
|
||||
relationships.json — cross-file edges
|
||||
knowledge-graph.json — final assembled graph
|
||||
```
|
||||
|
||||
Intermediate files are cleaned up after graph assembly (same as codebase flow).
|
||||
|
||||
### Incremental Mode (`--ingest`)
|
||||
|
||||
When the user runs `/understand-knowledge --ingest path/to/new-source.md`:
|
||||
|
||||
1. **knowledge-scanner** — runs on just the new file(s)
|
||||
2. **format-detector** — skipped (format already known from initial scan)
|
||||
3. **article-analyzer** — processes only new/changed files
|
||||
4. **relationship-builder** — runs on new nodes against the existing graph, finds connections to what's already there
|
||||
5. **graph-reviewer** — validates the merged result
|
||||
|
||||
Existing nodes are preserved; only new nodes/edges are added or updated.
|
||||
|
||||
---
|
||||
|
||||
## Dashboard Changes
|
||||
|
||||
All changes are scoped to graphs with `"kind": "knowledge"`.
|
||||
|
||||
### Vertical Flow Layout
|
||||
|
||||
- Default to top-down vertical layout (like existing domain/business flow view)
|
||||
- Topics at top → articles in middle → entities/claims/sources at bottom
|
||||
- Reads like a knowledge hierarchy: broad themes flow down into specifics
|
||||
- User can still switch to horizontal or force-directed layout via controls
|
||||
|
||||
### Knowledge Sidebar
|
||||
|
||||
Replaces NodeInfo when a knowledge graph is loaded:
|
||||
|
||||
| Selection | Sidebar Shows |
|
||||
|-----------|---------------|
|
||||
| Nothing selected | ProjectOverview: format detected, total articles/entities/topics/claims/sources |
|
||||
| Article node | Title, summary, tags, frontmatter metadata, backlinks list (clickable), outgoing links, related topics |
|
||||
| Entity node | Name, type (person/tool/paper/org), articles that mention it, relationships to other entities |
|
||||
| Topic node | Description, child articles, child entities, cross-topic connections |
|
||||
| Claim node | Assertion text, supporting articles, contradicting claims (if any), confidence score |
|
||||
| Source node | Original URL/path, articles that cite it, ingestion date |
|
||||
|
||||
### Reading Mode
|
||||
|
||||
- Clicking an article node triggers a reading panel that slides up from the bottom (same pattern as current code viewer overlay)
|
||||
- Shows the full compiled markdown rendered as HTML
|
||||
- Includes a mini backlinks sidebar within the panel
|
||||
- Clicking a `[[wikilink]]` or entity reference in the reading panel navigates the graph to that node
|
||||
|
||||
### Node Visual Styling
|
||||
|
||||
| Node Type | Shape | Color Accent |
|
||||
|-----------|-------|-------------|
|
||||
| `article` | Rounded rectangle | Warm amber |
|
||||
| `entity` | Circle | Soft blue |
|
||||
| `topic` | Large rounded rectangle | Muted gold |
|
||||
| `claim` | Diamond | Green/red depending on contradictions |
|
||||
| `source` | Small square | Gray |
|
||||
|
||||
### Edge Visual Styling
|
||||
|
||||
| Edge Type | Style |
|
||||
|-----------|-------|
|
||||
| `cites` | Dashed line |
|
||||
| `contradicts` | Red line |
|
||||
| `builds_on` | Solid with arrow |
|
||||
| `categorized_under` | Thin gray |
|
||||
| `authored_by` | Dotted blue |
|
||||
| `exemplifies` | Dotted green |
|
||||
|
||||
---
|
||||
|
||||
## Skill Interface
|
||||
|
||||
### Usage
|
||||
|
||||
```bash
|
||||
# Full scan — first time or rescan
|
||||
/understand-knowledge
|
||||
|
||||
# Point at a specific directory
|
||||
/understand-knowledge path/to/my-notes
|
||||
|
||||
# Incremental ingest — add new sources to existing graph
|
||||
/understand-knowledge --ingest path/to/new-note.md
|
||||
/understand-knowledge --ingest path/to/new-folder/
|
||||
```
|
||||
|
||||
### Behavior
|
||||
|
||||
1. Auto-detects format (Obsidian, Logseq, Karpathy, etc.)
|
||||
2. Announces: "Detected Obsidian vault with 342 notes. Scanning..."
|
||||
3. Runs the agent pipeline (scanner → detector → analyzer → relationship-builder → reviewer)
|
||||
4. Writes `knowledge-graph.json` to `.understand-anything/` with `"kind": "knowledge"`
|
||||
5. Auto-triggers `/understand-dashboard` after completion
|
||||
|
||||
### File Structure
|
||||
|
||||
```
|
||||
skills/understand-knowledge/
|
||||
SKILL.md — skill entry point, orchestration logic
|
||||
formats/
|
||||
obsidian.md
|
||||
logseq.md
|
||||
dendron.md
|
||||
foam.md
|
||||
karpathy.md
|
||||
zettelkasten.md
|
||||
plain.md
|
||||
```
|
||||
|
||||
### Coexistence with `/understand`
|
||||
|
||||
- `/understand` produces `"kind": "codebase"` graphs
|
||||
- `/understand-knowledge` produces `"kind": "knowledge"` graphs
|
||||
- Both write to `.understand-anything/knowledge-graph.json`
|
||||
- Running one replaces the other
|
||||
- To scope knowledge analysis to a subdirectory (e.g., `docs/` within a code repo), use `/understand-knowledge path/to/docs`
|
||||
|
||||
---
|
||||
|
||||
## What This Enables That Nothing Else Does
|
||||
|
||||
| Existing Tools | Limitation | Our Advantage |
|
||||
|---------------|-----------|---------------|
|
||||
| Obsidian graph view | Untyped edges — all links look the same | Typed edges: cites, contradicts, builds_on |
|
||||
| Logseq graph | Only shows explicit links | LLM discovers implicit relationships |
|
||||
| All PKM tools | Single-format only | Cross-format support with auto-detection |
|
||||
| Karpathy LLM Wiki | Flat text wiki, no visualization | Interactive graph dashboard with guided tours |
|
||||
| None | No knowledge graph tours | Tour mode walks through a knowledge base step by step |
|
||||
@@ -0,0 +1,258 @@
|
||||
# .understandignore Design Spec
|
||||
|
||||
## Overview
|
||||
|
||||
Add user-configurable file exclusion via `.understandignore` files, using `.gitignore` syntax. This makes analysis faster by skipping irrelevant files (vendor code, generated output, test fixtures) without modifying hardcoded defaults.
|
||||
|
||||
## Goals
|
||||
|
||||
- Let users exclude files/directories from analysis via `.understandignore`
|
||||
- Use `.gitignore` syntax (familiar, no learning curve)
|
||||
- Keep hardcoded defaults as built-in — `.understandignore` adds patterns on top
|
||||
- Allow `!` negation to force-include files excluded by defaults
|
||||
- Auto-generate a commented-out starter file on first run (deterministic code, not LLM)
|
||||
- Pause before analysis to let user review the ignore file
|
||||
|
||||
## Non-Goals
|
||||
|
||||
- Replacing `.gitignore` — this is analysis-specific
|
||||
- Per-directory `.understandignore` files (project root and `.understand-anything/` only)
|
||||
- GUI for editing ignore patterns
|
||||
|
||||
---
|
||||
|
||||
## IgnoreFilter Module
|
||||
|
||||
New file: `packages/core/src/ignore-filter.ts`
|
||||
|
||||
Uses the [`ignore`](https://www.npmjs.com/package/ignore) npm package for gitignore-compatible pattern matching.
|
||||
|
||||
### API
|
||||
|
||||
```typescript
|
||||
export interface IgnoreFilter {
|
||||
isIgnored(relativePath: string): boolean;
|
||||
}
|
||||
|
||||
export function createIgnoreFilter(projectRoot: string): IgnoreFilter;
|
||||
```
|
||||
|
||||
### Behavior
|
||||
|
||||
`createIgnoreFilter` loads patterns in this order (later entries can override earlier ones):
|
||||
|
||||
1. **Hardcoded defaults** — the existing exclusion patterns from project-scanner (node_modules/, .git/, dist/, build/, bin/, obj/, *.lock, *.min.js, etc.)
|
||||
2. **`.understand-anything/.understandignore`** — project-level, lives alongside the output
|
||||
3. **`.understandignore`** at project root — alternative location for visibility
|
||||
|
||||
Patterns merge additively. `!` negation in user files can override hardcoded defaults (e.g., `!dist/` force-includes dist/).
|
||||
|
||||
### Hardcoded Default Patterns
|
||||
|
||||
These are the built-in defaults (matching current project-scanner behavior, plus bin/obj for .NET):
|
||||
|
||||
```
|
||||
# Dependency directories
|
||||
node_modules/
|
||||
.git/
|
||||
vendor/
|
||||
venv/
|
||||
.venv/
|
||||
__pycache__/
|
||||
|
||||
# Build output
|
||||
dist/
|
||||
build/
|
||||
out/
|
||||
coverage/
|
||||
.next/
|
||||
.cache/
|
||||
.turbo/
|
||||
target/
|
||||
bin/
|
||||
obj/
|
||||
|
||||
# Lock files
|
||||
*.lock
|
||||
package-lock.json
|
||||
yarn.lock
|
||||
pnpm-lock.yaml
|
||||
|
||||
# Binary/asset files
|
||||
*.png
|
||||
*.jpg
|
||||
*.jpeg
|
||||
*.gif
|
||||
*.svg
|
||||
*.ico
|
||||
*.woff
|
||||
*.woff2
|
||||
*.ttf
|
||||
*.eot
|
||||
*.mp3
|
||||
*.mp4
|
||||
*.pdf
|
||||
*.zip
|
||||
*.tar
|
||||
*.gz
|
||||
|
||||
# Generated files
|
||||
*.min.js
|
||||
*.min.css
|
||||
*.map
|
||||
*.generated.*
|
||||
|
||||
# IDE/editor
|
||||
.idea/
|
||||
.vscode/
|
||||
|
||||
# Misc
|
||||
LICENSE
|
||||
.gitignore
|
||||
.editorconfig
|
||||
.prettierrc
|
||||
.eslintrc*
|
||||
*.log
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Starter File Generator
|
||||
|
||||
New file: `packages/core/src/ignore-generator.ts`
|
||||
|
||||
### API
|
||||
|
||||
```typescript
|
||||
export function generateStarterIgnoreFile(projectRoot: string): string;
|
||||
```
|
||||
|
||||
### Behavior
|
||||
|
||||
- Deterministic code — scans the project directory for common patterns
|
||||
- Returns the file content as a string (caller writes it to disk)
|
||||
- All suggestions are **commented out** — user must uncomment to activate
|
||||
- Header comment explains the file, syntax, and built-in defaults
|
||||
|
||||
### Detection Logic
|
||||
|
||||
| If exists | Suggest |
|
||||
|-----------|---------|
|
||||
| `__tests__/` or `*.test.*` files | `# __tests__/`, `# *.test.*`, `# *.spec.*` |
|
||||
| `fixtures/` or `testdata/` | `# fixtures/`, `# testdata/` |
|
||||
| `test/` or `tests/` | `# test/`, `# tests/` |
|
||||
| `.storybook/` | `# .storybook/` |
|
||||
| `docs/` | `# docs/` |
|
||||
| `examples/` | `# examples/` |
|
||||
| `scripts/` | `# scripts/` |
|
||||
| `migrations/` | `# migrations/` |
|
||||
| `*.snap` files | `# *.snap` |
|
||||
| `bin/` (non-.NET, i.e. shell scripts) | `# bin/` |
|
||||
| `obj/` | `# obj/` |
|
||||
|
||||
### Generated File Format
|
||||
|
||||
```
|
||||
# .understandignore — patterns for files/dirs to exclude from analysis
|
||||
# Syntax: same as .gitignore (globs, # comments, ! negation, trailing / for dirs)
|
||||
# Lines below are suggestions — uncomment to activate.
|
||||
# Use ! prefix to force-include something excluded by defaults.
|
||||
#
|
||||
# Built-in defaults (always excluded unless negated):
|
||||
# node_modules/, .git/, dist/, build/, bin/, obj/, *.lock, *.min.js, etc.
|
||||
#
|
||||
|
||||
# --- Suggested exclusions (uncomment to activate) ---
|
||||
|
||||
# Test files
|
||||
# __tests__/
|
||||
# *.test.*
|
||||
# *.spec.*
|
||||
|
||||
# Test data
|
||||
# fixtures/
|
||||
# testdata/
|
||||
|
||||
# Documentation
|
||||
# docs/
|
||||
|
||||
# ... (more suggestions based on detection)
|
||||
```
|
||||
|
||||
Only generated if `.understand-anything/.understandignore` doesn't already exist.
|
||||
|
||||
---
|
||||
|
||||
## Skill Integration
|
||||
|
||||
### Phase 0.5: Ignore Setup (new phase in SKILL.md)
|
||||
|
||||
Added between Pre-flight (Phase 0) and SCAN (Phase 1):
|
||||
|
||||
1. Check if `.understand-anything/.understandignore` exists
|
||||
2. If not, run `generateStarterIgnoreFile(projectRoot)` and write the result to `.understand-anything/.understandignore`
|
||||
3. Report to user:
|
||||
- **First run:** "Generated `.understand-anything/.understandignore` with suggested exclusions. Please review it and uncomment any patterns you'd like to exclude. When ready, confirm to continue."
|
||||
- **Subsequent runs:** "Found `.understand-anything/.understandignore`. Review it if needed, then confirm to continue."
|
||||
4. Wait for user confirmation before proceeding
|
||||
|
||||
### Phase 1: SCAN changes
|
||||
|
||||
The `project-scanner` agent's scan script is updated to:
|
||||
|
||||
1. Collect files via `git ls-files` (or fallback)
|
||||
2. Apply agent's hardcoded pattern filter (Layer 1 — existing behavior)
|
||||
3. Apply `IgnoreFilter` from core (Layer 2 — user patterns)
|
||||
4. Add `filteredByIgnore` count to scan output
|
||||
5. Report: "Scanned {totalFiles} files ({filteredByIgnore} excluded by .understandignore)"
|
||||
|
||||
Two-layer filtering:
|
||||
- **Layer 1:** Agent's hardcoded patterns in the prompt (fast, coarse filter)
|
||||
- **Layer 2:** `IgnoreFilter` from core (deterministic code, user-configurable)
|
||||
|
||||
---
|
||||
|
||||
## Project Scanner Agent Update
|
||||
|
||||
Changes to `understand-anything-plugin/agents/project-scanner.md`:
|
||||
|
||||
- After the file list is built and Layer 1 filtering is applied, the agent runs a Node.js script that imports `createIgnoreFilter` from `@understand-anything/core` and filters the remaining paths
|
||||
- The scan result JSON includes a new `filteredByIgnore: number` field
|
||||
- Existing hardcoded exclusion patterns in the agent prompt remain for backward compatibility
|
||||
|
||||
---
|
||||
|
||||
## Testing
|
||||
|
||||
### `packages/core/src/__tests__/ignore-filter.test.ts`
|
||||
|
||||
- Parses basic glob patterns (`*.log`, `dist/`)
|
||||
- Handles `#` comments and blank lines
|
||||
- Handles `!` negation (force-include)
|
||||
- Handles `**/` recursive matching
|
||||
- Handles trailing `/` for directory-only patterns
|
||||
- Merges defaults + user patterns correctly
|
||||
- `!` in user file overrides hardcoded defaults
|
||||
- Returns `false` for paths not matching any pattern
|
||||
|
||||
### `packages/core/src/__tests__/ignore-generator.test.ts`
|
||||
|
||||
- Generates starter file with header comment
|
||||
- Detects existing directories and suggests relevant patterns
|
||||
- All suggestions are commented out (prefixed with `# `)
|
||||
- Doesn't overwrite existing file
|
||||
- Includes bin/obj suggestions when relevant
|
||||
|
||||
---
|
||||
|
||||
## File Structure
|
||||
|
||||
| File | Purpose |
|
||||
|------|---------|
|
||||
| `packages/core/src/ignore-filter.ts` | Parse .understandignore, merge with defaults, filter paths |
|
||||
| `packages/core/src/ignore-generator.ts` | Generate starter file by scanning project structure |
|
||||
| `packages/core/src/__tests__/ignore-filter.test.ts` | Filter logic tests |
|
||||
| `packages/core/src/__tests__/ignore-generator.test.ts` | Generator tests |
|
||||
| `agents/project-scanner.md` | Add Layer 2 filtering via IgnoreFilter |
|
||||
| `skills/understand/SKILL.md` | Add Phase 0.5 (generate + pause for review) |
|
||||
| `packages/core/package.json` | Add `ignore` npm dependency |
|
||||
@@ -0,0 +1,488 @@
|
||||
# Dashboard Graph Layout Scaling — Design
|
||||
|
||||
## Problem
|
||||
|
||||
When a structural-graph layer contains many nodes, the current `applyDagreLayout` (TB direction) places same-rank nodes in a single horizontal row. With 50+ nodes per rank, the row stretches into thousands of pixels and the view becomes unreadable: nodes shrink, labels disappear, edges tangle, and there are no visual anchors to orient the reader.
|
||||
|
||||
This design replaces dagre with ELK across all structural-style views, introduces folder/community-based **containers** for the layer-detail view, and computes layout in **two lazy stages** — a single-pass over containers, then per-container child layout on demand.
|
||||
|
||||
The graph schema and pipeline output (`graph.json`) are unchanged. All improvements derive from existing data.
|
||||
|
||||
## Goals
|
||||
|
||||
- Eliminate horizontal sprawl in layer-detail views at ≤100 nodes per layer (current target), and remain workable up to 1000+ nodes (future scaling).
|
||||
- Give each layer-detail view explicit visual anchors so structure is readable at a glance.
|
||||
- Aggregate cross-cluster edges by default; surface individual edges on demand.
|
||||
- Keep visual style continuous with the existing layer-cluster (overview-level) presentation.
|
||||
- Treat layout failures with the same `GraphIssue` model already used for schema validation.
|
||||
|
||||
## Non-Goals
|
||||
|
||||
- No regeneration of `graph.json`. All grouping is derived client-side.
|
||||
- No change to KnowledgeGraphView (already force-directed; out of scope).
|
||||
- No multi-level container nesting (single depth only in v1).
|
||||
- No remote error reporting (Sentry-style) — open-source plugin, no default telemetry.
|
||||
- No persona-specific grouping behavior beyond the existing node-type filter.
|
||||
|
||||
## Scope
|
||||
|
||||
Three views are affected:
|
||||
|
||||
| View | Change |
|
||||
|---|---|
|
||||
| Overview (layer clusters) | Replace dagre → ELK. No new grouping (layers are already groups). |
|
||||
| DomainGraphView | Replace dagre → ELK with domain-as-parent of flow/step. |
|
||||
| Layer-detail | Replace dagre → ELK + new folder/community containers + edge aggregation + lazy two-stage layout. |
|
||||
|
||||
KnowledgeGraphView remains on `applyForceLayout` and is not touched.
|
||||
|
||||
---
|
||||
|
||||
## §1. Architecture
|
||||
|
||||
```
|
||||
existing graph (immutable)
|
||||
│
|
||||
▼
|
||||
deriveContainers(nodes, edges) // §2 — folder strategy with community fallback
|
||||
│
|
||||
▼
|
||||
buildCompoundGraph() // §4 — aggregate inter-container edges, keep intra-container
|
||||
│
|
||||
▼
|
||||
runStage1Layout(containers, aggEdges) // §6 — ELK on containers only; uses size memory
|
||||
│
|
||||
▼ ┌──────────────────────────────┐
|
||||
│ │ render: containers laid │
|
||||
│ │ out, children unrendered │
|
||||
│ └──────────────────────────────┘
|
||||
│
|
||||
│ triggered by: click | zoom > 1.0 | search/focus/tour hit child
|
||||
▼
|
||||
runStage2Layout(container) // §6 — ELK on one container's children; cached
|
||||
│
|
||||
▼
|
||||
React Flow render (parentId for parent-child) + visual overlay (selection/diff/search/tour)
|
||||
```
|
||||
|
||||
Two invariants preserved from current code:
|
||||
|
||||
1. **Layout computation is pure and memoized.** It only re-runs when graph topology / persona / diff / focus / nodeTypeFilters change.
|
||||
2. **Visual state is a separate O(n) overlay pass.** Selection, search highlight, tour highlight, hover do not trigger relayout.
|
||||
|
||||
This matches the existing `useLayerDetailTopology` / `useLayerDetailGraph` split in `GraphView.tsx`.
|
||||
|
||||
---
|
||||
|
||||
## §2. Container Derivation (Layer-Detail Only)
|
||||
|
||||
### 2.1 Folder strategy (default)
|
||||
|
||||
1. Collect every node's `filePath` in the layer.
|
||||
2. Compute longest common prefix (LCP) across all paths and strip it.
|
||||
3. Group by the **first path segment after the LCP**.
|
||||
- `auth/login.go` → container `auth`
|
||||
- `auth/handlers/oauth.go` → container `auth`
|
||||
- `cart/cart.go` → container `cart`
|
||||
4. Single-depth grouping only; no recursive nesting in v1.
|
||||
5. Nodes with no `filePath` (e.g. `concept` type) → container `~` (rendered as `(root)`, dimmed).
|
||||
|
||||
### 2.2 Community fallback (Louvain)
|
||||
|
||||
Triggered when **any** of:
|
||||
|
||||
- All nodes share the same single folder after LCP stripping.
|
||||
- Bucket count (folders + rooted) `< 2`.
|
||||
- Any single bucket (folder or rooted) holds `> 70%` of nodes.
|
||||
|
||||
Run Louvain modularity-based community detection on the layer's internal edges. Each community becomes a container. Names are placeholders (`Cluster A`, `Cluster B`, ...) since no semantic name is available.
|
||||
|
||||
Implementation: use `graphology` + `graphology-communities-louvain` (~30KB total). Pure JS, no native deps, runs on main thread synchronously for layer-internal edges.
|
||||
|
||||
### 2.3 Edge cases
|
||||
|
||||
| Case | Behavior |
|
||||
|---|---|
|
||||
| Container has 1 child (only when layer total ≥ 3) | No container box rendered; child becomes a top-level node in Stage 1 layout |
|
||||
| Container has 2 children | Container rendered; label dimmed |
|
||||
| All nodes lack `filePath` | All go to `~` container; if it would become single-child, fall back to flat |
|
||||
|
||||
### 2.4 Function signature
|
||||
|
||||
```ts
|
||||
function deriveContainers(
|
||||
nodes: GraphNode[],
|
||||
edges: GraphEdge[],
|
||||
): {
|
||||
containers: Array<{
|
||||
id: string; // e.g. "container:auth" or "container:cluster-0"
|
||||
name: string; // "auth" or "Cluster A"
|
||||
nodeIds: string[];
|
||||
strategy: "folder" | "community";
|
||||
}>;
|
||||
ungrouped: string[]; // nodes that bypass containerization
|
||||
};
|
||||
```
|
||||
|
||||
The `strategy` field is exposed in the UI ("Grouped by folder" vs "Grouped by edge density") so the user knows how a particular layer was organized.
|
||||
|
||||
---
|
||||
|
||||
## §3. ELK Integration
|
||||
|
||||
### 3.1 Package
|
||||
|
||||
- `elkjs` ^0.9 (~250KB gzipped). Use `elk.bundled.js`, not the worker variant.
|
||||
- Promise-based API. Runs on main thread for graphs ≤500 nodes; <100ms typical.
|
||||
|
||||
### 3.2 Configuration
|
||||
|
||||
```ts
|
||||
{
|
||||
algorithm: "layered",
|
||||
"elk.direction": "DOWN", // matches dagre TB
|
||||
"elk.layered.spacing.nodeNodeBetweenLayers": 80,
|
||||
"elk.spacing.nodeNode": 60,
|
||||
"elk.layered.crossingMinimization.strategy": "LAYER_SWEEP",
|
||||
"elk.edgeRouting": "ORTHOGONAL",
|
||||
"elk.layered.compaction.postCompaction.strategy": "LEFT",
|
||||
"elk.padding": "[top=40,left=20,right=20,bottom=20]", // container internal padding
|
||||
}
|
||||
```
|
||||
|
||||
`hierarchyHandling: INCLUDE_CHILDREN` is **not** used — the two-stage approach (§6) issues separate ELK calls for top-level containers and per-container children, so a single compound graph is never assembled.
|
||||
|
||||
### 3.3 Per-view input shaping
|
||||
|
||||
| View | ELK input |
|
||||
|---|---|
|
||||
| Overview | Flat. Children = layer-cluster nodes. |
|
||||
| DomainGraphView | Flat in v1 (domain stays as the only grouping; flow/step nodes positioned within). |
|
||||
| Layer-detail Stage 1 | Flat. Children = containers (treated as opaque atoms). |
|
||||
| Layer-detail Stage 2 | Flat per container. Children = files within. |
|
||||
|
||||
A single `runElk(input): Promise<positioned>` function services all four cases.
|
||||
|
||||
### 3.4 Boundaries with existing `utils/layout.ts`
|
||||
|
||||
| Function | Status |
|
||||
|---|---|
|
||||
| `applyDagreLayout` | Kept temporarily; removed in the version after layout migration is verified stable |
|
||||
| `applyForceLayout` | Untouched (KnowledgeGraphView only) |
|
||||
| `applyElkLayout` (new) | Wrapper that handles repair → ELK → result coercion |
|
||||
|
||||
### 3.5 Async + loading state
|
||||
|
||||
Stage 1 runs in a `useEffect` with cancellation on dependency change:
|
||||
|
||||
```ts
|
||||
useEffect(() => {
|
||||
let cancelled = false;
|
||||
setLayoutStatus("computing");
|
||||
applyElkLayout(input).then(result => {
|
||||
if (!cancelled) {
|
||||
setLayout(result);
|
||||
setLayoutStatus("ready");
|
||||
}
|
||||
});
|
||||
return () => { cancelled = true };
|
||||
}, [graph, activeLayerId, persona, diffMode, nodeTypeFilters]);
|
||||
```
|
||||
|
||||
While `layoutStatus === "computing"`, render a `"Computing layout…"` overlay (semi-transparent, centered). Stale layout from the previous state is kept underneath so the viewport doesn't blink.
|
||||
|
||||
### 3.6 Failure handling — reuses existing GraphIssue model
|
||||
|
||||
Before invoking ELK, run `repairElkInput()` over the assembled input. Each repair emits a `GraphIssue` consumed by the existing `WarningBanner`.
|
||||
|
||||
| Repair function | Triggered by | Issue level |
|
||||
|---|---|---|
|
||||
| `ensureNodeDimensions` | Node missing width/height | `auto-corrected` |
|
||||
| `dedupeNodeIds` | Duplicate child id under same parent | `auto-corrected` |
|
||||
| `dropOrphanEdges` | Edge source/target not in node set | `dropped` |
|
||||
| `dropOrphanChildren` | Child references a non-existent parent | `dropped` |
|
||||
| `dropCircularContainment` | Container containment cycle | `dropped` |
|
||||
|
||||
If ELK still rejects after repair → emit a `fatal` `GraphIssue`, render an empty graph + the existing fatal banner. The fatal copy text is augmented with "this looks like a dashboard rendering bug — please file an issue with the copied error" so the user knows to direct the report at the dashboard, not the graph data.
|
||||
|
||||
### 3.7 Dev mode strict failures
|
||||
|
||||
Both `repairElkInput` and `runElk` accept a `strict: boolean`. In `import.meta.env.DEV`, strict is on — repairs and ELK errors throw immediately rather than producing graceful issues. This catches input-construction bugs during development before they ship as silent fallbacks.
|
||||
|
||||
---
|
||||
|
||||
## §4. Edge Aggregation
|
||||
|
||||
### 4.1 Algorithm
|
||||
|
||||
Performed inside `buildCompoundGraph()`, before either ELK stage.
|
||||
|
||||
```ts
|
||||
function aggregateContainerEdges(
|
||||
nodes: GraphNode[],
|
||||
edges: GraphEdge[],
|
||||
nodeToContainer: Map<string, string>,
|
||||
): {
|
||||
intraContainer: Edge[]; // preserved as-is
|
||||
interContainerAggregated: AggregatedEdge[]; // one per (sourceContainer, targetContainer)
|
||||
};
|
||||
```
|
||||
|
||||
Rules:
|
||||
|
||||
- For each edge, look up source/target containers.
|
||||
- Same container → intra (unchanged).
|
||||
- Different containers → bucket by `(sourceContainer, targetContainer)`. Direction matters: A→B and B→A are independent.
|
||||
- Each aggregated edge carries `count` and `types` (set of edge types appearing in the bucket).
|
||||
|
||||
### 4.2 Visual
|
||||
|
||||
Reuse the styling pattern already in overview-level edge aggregation (`GraphView.tsx` line ~186):
|
||||
|
||||
- `strokeWidth: Math.min(1 + Math.log2(count + 1), 5)`
|
||||
- Label: count number
|
||||
- Color: existing `rgba(212,165,116,0.4)`
|
||||
|
||||
### 4.3 Expand / collapse
|
||||
|
||||
State (zustand store):
|
||||
|
||||
```ts
|
||||
expandedContainers: Set<string>; // currently expanded container ids
|
||||
```
|
||||
|
||||
Triggers:
|
||||
|
||||
- **Click container** → toggle membership.
|
||||
- **Click empty canvas** or `Esc` → clear all.
|
||||
- **Multi-container expansion is allowed** (user comparing two folders' relationships).
|
||||
|
||||
When a container is expanded:
|
||||
|
||||
- Its inter-container aggregated edges (both directions) are replaced with the underlying file→file individual edges.
|
||||
- Other containers' aggregated edges remain aggregated.
|
||||
- Position re-layout is **not** triggered. Only React Flow's edge array changes.
|
||||
|
||||
### 4.4 Interactions with persona / diff
|
||||
|
||||
- **Persona filter** changes `count` (post-filter edges only). Aggregated edge re-derived in the memoized pipeline.
|
||||
- **Diff mode**: aggregated edge containing any changed node → red stroke + animated; on expand, individual edges follow normal diff styling.
|
||||
|
||||
---
|
||||
|
||||
## §5. Container Visual
|
||||
|
||||
### 5.1 New component: `ContainerNode`
|
||||
|
||||
A new React Flow node type `"container"` registered alongside the existing `custom` / `layer-cluster` / `portal`.
|
||||
|
||||
It does **not** reuse `LayerClusterNode` because:
|
||||
|
||||
- Click semantics differ (`LayerClusterNode` drills into a layer; `ContainerNode` toggles edge expansion).
|
||||
- Metadata differs (`ContainerNode` does not carry `aggregateComplexity`).
|
||||
|
||||
Visual language is shared: rounded translucent box, gold border, DM Serif title.
|
||||
|
||||
### 5.2 Spec
|
||||
|
||||
| Element | Style |
|
||||
|---|---|
|
||||
| Border (default) | `1px solid rgba(212,165,116,0.25)` |
|
||||
| Border (hover / expanded) | `1.5px rgba(212,165,116,0.6)`, expanded adds chevron `▾` |
|
||||
| Background | `rgba(255,255,255,0.02)` |
|
||||
| Corner radius | `12px` |
|
||||
| Title | DM Serif, 14px, `#d4a574`, top-left padding `12px 16px` |
|
||||
| Child-count badge | top-right chip, `#a39787`, 11px |
|
||||
| Internal padding (around children) | `40px top / 20px L,R,B` |
|
||||
|
||||
### 5.3 Color coding
|
||||
|
||||
Container index modulo 12-color palette (same palette used for `layerColorIndex` in `LayerClusterNode`). Hue is applied at low saturation to border + title only — never to the body fill — so the palette doesn't overpower individual nodes inside.
|
||||
|
||||
### 5.4 State styles
|
||||
|
||||
| State | Visual |
|
||||
|---|---|
|
||||
| `default` | Base spec |
|
||||
| `hover` | Brighter border, title underline |
|
||||
| `expanded` | 1.5px gold border + chevron `▾` |
|
||||
| `search-hit-inside` | Search badge in title row showing match count |
|
||||
| `diff-affected` | Border swaps to `rgba(224,82,82,0.5)` |
|
||||
| `focused-via-child` | Same as expanded plus brightness boost |
|
||||
|
||||
### 5.5 Label source
|
||||
|
||||
| Strategy | Label |
|
||||
|---|---|
|
||||
| `folder` | First path segment after LCP (e.g. `auth`) |
|
||||
| `community` | `Cluster A`, `Cluster B`, ... ordered by community id |
|
||||
| `~` (root) | `(root)` in dimmed style |
|
||||
|
||||
---
|
||||
|
||||
## §6. Lazy Two-Stage Layout
|
||||
|
||||
### 6.1 State machine
|
||||
|
||||
```
|
||||
[layer entered]
|
||||
│
|
||||
│ Stage 1: ELK on containers (always runs)
|
||||
▼
|
||||
[containers laid out, children unrendered]
|
||||
│
|
||||
├── click container ─────┐
|
||||
├── zoom > 1.0 in viewport (200ms debounce, hysteresis) ─┤
|
||||
└── search / focus / tour hit a child ─┘
|
||||
▼
|
||||
Stage 2 (per container)
|
||||
│
|
||||
▼
|
||||
[container expanded, children laid out + rendered]
|
||||
```
|
||||
|
||||
### 6.2 Store extensions
|
||||
|
||||
```ts
|
||||
expandedContainers: Set<string>;
|
||||
containerLayoutCache: Map<string, {
|
||||
childPositions: Map<string, { x: number; y: number }>;
|
||||
actualSize: { width: number; height: number };
|
||||
}>;
|
||||
containerSizeMemory: Map<string, { width: number; height: number }>;
|
||||
```
|
||||
|
||||
- `containerLayoutCache` invalidated by `(graphHash, containerId)`.
|
||||
- `containerSizeMemory` persists across container collapses to prevent jitter on next expand.
|
||||
|
||||
### 6.3 Stage 1
|
||||
|
||||
```ts
|
||||
async function runStage1Layout(containers, aggregatedInterEdges, sizeMemory) {
|
||||
const elkInput = {
|
||||
id: "root",
|
||||
children: containers.map(c => ({
|
||||
id: c.id,
|
||||
width: sizeMemory.get(c.id)?.width
|
||||
?? Math.sqrt(c.nodeIds.length) * NODE_WIDTH * 1.2,
|
||||
height: sizeMemory.get(c.id)?.height
|
||||
?? Math.sqrt(c.nodeIds.length) * NODE_HEIGHT * 1.2,
|
||||
})),
|
||||
edges: aggregatedInterEdges.map(toElkEdge),
|
||||
};
|
||||
return runElk(elkInput);
|
||||
}
|
||||
```
|
||||
|
||||
Container size is estimated from `sqrt(childCount)` so it grows sub-linearly with content. If memory has the actual size from a previous run, that wins.
|
||||
|
||||
### 6.4 Stage 2
|
||||
|
||||
```ts
|
||||
async function runStage2Layout(container, intraEdges) {
|
||||
if (containerLayoutCache.has(container.id)) {
|
||||
return containerLayoutCache.get(container.id)!;
|
||||
}
|
||||
const elkInput = {
|
||||
id: container.id,
|
||||
children: container.nodeIds.map(toElkChild),
|
||||
edges: intraEdges.filter(e => isWithin(container, e)).map(toElkEdge),
|
||||
};
|
||||
const result = await runElk(elkInput);
|
||||
containerLayoutCache.set(container.id, result);
|
||||
containerSizeMemory.set(container.id, result.actualSize);
|
||||
return result;
|
||||
}
|
||||
```
|
||||
|
||||
If `result.actualSize` differs from the Stage 1 estimate by **> 20%** in either dimension, trigger a Stage 1 re-layout (full re-run; <100ms at this scale, so the user perceives a small reflow rather than two distinct layouts).
|
||||
|
||||
### 6.5 Auto-expand triggers
|
||||
|
||||
| Trigger | Implementation |
|
||||
|---|---|
|
||||
| Click | `onClick` toggles `expandedContainers` |
|
||||
| Zoom | React Flow `onMove` listener (200ms debounce). When viewport zoom > 1.0, all containers in viewport added to `expandedContainers`. Hysteresis: containers don't auto-collapse until zoom < 0.6, preventing flapping. |
|
||||
| Search / focus / tour | `useEffect` watches `searchResults` / `focusNodeId` / `tourHighlightedNodeIds`; finds the parent container of any matched leaf node and adds to `expandedContainers` |
|
||||
|
||||
### 6.6 Performance budget
|
||||
|
||||
| Operation | Target |
|
||||
|---|---|
|
||||
| Stage 1 (any layer) | < 100ms |
|
||||
| Stage 2 (first expand of a container) | < 100ms |
|
||||
| Stage 2 (cache hit) | < 5ms |
|
||||
| Zoom-driven auto-expand | 200ms debounce |
|
||||
| Stage 1 re-layout after >20% deviation | < 100ms (re-uses Stage 1 path) |
|
||||
|
||||
---
|
||||
|
||||
## §7. Interaction Matrix
|
||||
|
||||
| Existing feature | Behavior with new layout |
|
||||
|---|---|
|
||||
| Persona filter | Drives `nodeTypeFilters` dependency in Stage 1 memo. Filtered-out nodes don't enter container derivation; containers with all-filtered children disappear. |
|
||||
| Diff mode | Container with a changed child gets red border (§5.4); aggregated edges containing a changed node animate red; on expand, individual diff styling applies. |
|
||||
| Focus mode (1-hop) | Focus node's container auto-expands. Non-neighbor containers fade to opacity 0.2; their children remain unrendered. |
|
||||
| Search | Container with a hit gets search badge in title; container does **not** auto-expand to avoid expanding many at once. Clicking the badge expands and `fitView`s. |
|
||||
| Tour | Tour-highlighted child auto-expands its container. `TourFitView` fits to the highlighted leaf positions (cached after expand). |
|
||||
| Drill-in (`overview → layer-detail`) | Unchanged. After drill-in, Stage 1 runs on the new layer's containers. |
|
||||
| Breadcrumb | Containers do not enter the breadcrumb. Path remains `Project > LAYER`. |
|
||||
| Code viewer | Unchanged. Click a file node inside a container → existing slide-up viewer. |
|
||||
| WarningBanner | Layout repair issues feed the same banner. Fatal copy text augmented to differentiate render bugs from data bugs. |
|
||||
| Export (PNG/SVG) | Captures current state including expanded containers. Filename includes layer name. |
|
||||
|
||||
---
|
||||
|
||||
## §8. Files & Test Plan
|
||||
|
||||
### 8.1 Files
|
||||
|
||||
```
|
||||
packages/dashboard/src/
|
||||
├── utils/
|
||||
│ ├── layout.ts [modify] add applyElkLayout export
|
||||
│ ├── elk-layout.ts [new] runElk + repairElkInput + GraphIssue mapping
|
||||
│ ├── containers.ts [new] deriveContainers (folder + community fallback)
|
||||
│ ├── louvain.ts [new] thin wrapper around graphology-communities-louvain
|
||||
│ └── edgeAggregation.ts [modify] add aggregateContainerEdges
|
||||
├── components/
|
||||
│ ├── ContainerNode.tsx [new] container box visual
|
||||
│ ├── GraphView.tsx [modify] Stage 1 / Stage 2 wiring, expand state, auto-expand triggers
|
||||
│ └── DomainGraphView.tsx [modify] dagre → ELK
|
||||
├── store.ts [modify] expandedContainers, containerLayoutCache, containerSizeMemory
|
||||
└── package.json [modify] add elkjs ^0.9, graphology, graphology-communities-louvain
|
||||
```
|
||||
|
||||
### 8.2 Test matrix
|
||||
|
||||
| Type | Target | Cases |
|
||||
|---|---|---|
|
||||
| Unit | `deriveContainers` | folder grouping happy path; all-in-root fallback; <2 buckets fallback; >70% concentration fallback; no-`filePath` nodes; single-child container suppression (gated by layer ≥ 3) |
|
||||
| Unit | `aggregateContainerEdges` | empty edges; multiple same-direction edges merge; bidirectional edges split; intra + inter mix; types deduped |
|
||||
| Unit | `repairElkInput` | each repair function in isolation; validates correct `GraphIssue` level emitted |
|
||||
| Unit | `runElk` | minimal valid input; dev-mode strict throw; production graceful fatal; cancellation on dependency change |
|
||||
| Integration | Stage 1 + Stage 2 flow | 50-node fixture; click → cache miss; second click → cache hit; size-deviation >20% → re-layout |
|
||||
| Integration | Persona / focus / search interactions | switching persona reruns Stage 1; focusing a child auto-expands its container; search hit adds badge without auto-expanding |
|
||||
| Visual regression (optional) | Playwright + microservices-demo fixture | baseline screenshots for overview, layer-detail, domain views |
|
||||
|
||||
### 8.3 Performance benchmarks
|
||||
|
||||
Generate fixtures with `scripts/generate-large-graph.mjs` at 500 / 1000 / 3000 nodes. Verify:
|
||||
|
||||
- Stage 1 < 200ms at 500 nodes; < 500ms at 3000 nodes.
|
||||
- Stage 2 any container < 100ms.
|
||||
|
||||
If 3000-node Stage 1 misses the budget, revisit container size estimation or ELK config — do not lower the budget.
|
||||
|
||||
---
|
||||
|
||||
## Open Questions
|
||||
|
||||
None at this point. All decisions made during brainstorming are captured above.
|
||||
|
||||
## Migration Notes
|
||||
|
||||
- `applyDagreLayout` is kept in the codebase for one release after this lands, then removed in the next. This gives a fallback path during the rollout and a clean uninstall once stable.
|
||||
- No graph data migration needed.
|
||||
- New dependencies (elkjs, graphology, graphology-communities-louvain) are pure JS, no native bindings — safe across the supported platform matrix.
|
||||
@@ -0,0 +1,587 @@
|
||||
# Semantic Batching and Output Chunking Design
|
||||
|
||||
**Date:** 2026-05-24
|
||||
**Status:** Draft
|
||||
**Branch:** `feat/semantic-batching-and-output-chunking`
|
||||
**Issue:** [#159](https://github.com/Lum1104/Understand-Anything/issues/159) — Frequently seeing output limit exceeded
|
||||
|
||||
---
|
||||
|
||||
## Problem
|
||||
|
||||
The `/understand` skill's Phase 2 dispatches `file-analyzer` subagents in batches of 20-30 files each (`skills/understand/SKILL.md:282`). Two issues compound on output-constrained LLM backends (notably Bedrock OPUS with default max_tokens of 4096-8192):
|
||||
|
||||
1. **Output cap pressure.** Each `file-analyzer` writes one `batch-<N>.json` containing all nodes (file + functions + classes) and edges for its batch. For 25 dense files the JSON content easily exceeds the per-turn `Write(content=...)` token budget. The agent improvises by entering an undefined "minimal output mode" and drops nodes/edges silently. Issue #159 reports this for OPUS on Bedrock at the 100-file scale.
|
||||
|
||||
2. **Count-based batching breaks module semantics.** Files are batched by count, not by logical relationship. Files that import each other (and would together form an `auth` module, an `api` module, etc.) get split across batches. The file-analyzer only sees within-batch edges confidently; `calls`/`related`/`inherits`/`implements` edges between modules get dropped at batch boundaries.
|
||||
|
||||
The existing `recover_imports_from_scan` in `merge-batch-graphs.py:913` is a deterministic safety net for `imports` edges — but it cannot recover semantic edges (calls / related / inherits / implements). Those are lost.
|
||||
|
||||
---
|
||||
|
||||
## Goals
|
||||
|
||||
- Eliminate "Batch X failed (output limit)" from `/understand` runs on Bedrock OPUS for projects up to 500 files.
|
||||
- Improve cross-batch semantic edge coverage by replacing count-based batching with Louvain community detection on the import graph.
|
||||
- Maintain `imports` edge coverage parity (no regression on existing safety net).
|
||||
- Stay within one PR — defer broader refactors to follow-ups (Section "Out of scope").
|
||||
|
||||
## Non-goals
|
||||
|
||||
- Refactoring Phase 1 / 2 tree-sitter usage to deduplicate per-batch extraction.
|
||||
- Adding LLM-generated file summaries to neighborMap.
|
||||
- Auto-tuning output thresholds per provider.
|
||||
|
||||
---
|
||||
|
||||
## Architecture
|
||||
|
||||
Pipeline before:
|
||||
|
||||
```
|
||||
Phase 1 project-scanner → scan-result.json (files + importMap)
|
||||
Phase 2 file-analyzer (×N concur) → batch-<i>.json (one per batch; SKILL.md prose batching)
|
||||
Phase 2末 merge-batch-graphs.py → assembled-graph.json
|
||||
```
|
||||
|
||||
Pipeline after:
|
||||
|
||||
```
|
||||
Phase 1 project-scanner → scan-result.json (unchanged)
|
||||
Phase 1.5 compute-batches.mjs → batches.json (NEW — semantic batching + neighborMap)
|
||||
Phase 2 file-analyzer (×N concur) → batch-<i>.json (single) OR batch-<i>-part-<k>.json (split)
|
||||
Phase 2末 merge-batch-graphs.py → assembled-graph.json (verified, no code change)
|
||||
```
|
||||
|
||||
**Phase 1.5 single responsibility:** topology decision + neighborMap construction. Pure algorithm — reads `scan-result.json`, writes `batches.json`, no LLM calls.
|
||||
|
||||
**Phase 2 changes:** SKILL.md stops doing prose batching; iterates `batches.json` and dispatches one file-analyzer per batch.
|
||||
|
||||
**file-analyzer changes:** consumes neighborMap; self-checks output size before writing; splits into `batch-<i>-part-<k>.json` when above thresholds.
|
||||
|
||||
**merge-batch-graphs.py:** no code changes — the `batch-*.json` glob and sort-key regex already accept multi-part naming. Test fixture and stderr report enhancement added.
|
||||
|
||||
---
|
||||
|
||||
## Component 1 — `compute-batches.mjs`
|
||||
|
||||
**Location:** `understand-anything-plugin/skills/understand/compute-batches.mjs`
|
||||
|
||||
**Invocation:** `node <SKILL_DIR>/compute-batches.mjs $PROJECT_ROOT [--changed-files=<path>]`
|
||||
|
||||
**Input:** `$PROJECT_ROOT/.understand-anything/intermediate/scan-result.json`
|
||||
|
||||
**Output:** `$PROJECT_ROOT/.understand-anything/intermediate/batches.json`
|
||||
|
||||
### Dependencies
|
||||
|
||||
Added to `understand-anything-plugin/package.json`:
|
||||
|
||||
- `graphology` (~10KB)
|
||||
- `graphology-communities-louvain` (~30KB)
|
||||
|
||||
Reuses `@understand-anything/core`'s `TreeSitterPlugin` and `PluginRegistry` (already imported by `extract-structure.mjs`).
|
||||
|
||||
### Algorithm
|
||||
|
||||
```
|
||||
1. Load scan-result.json.
|
||||
|
||||
2. Partition files by fileCategory:
|
||||
- codeFiles = files where fileCategory === "code"
|
||||
- nonCodeFiles = the rest
|
||||
|
||||
3. Code batching (Louvain on import graph):
|
||||
a. Build undirected graph: nodes = codeFiles, edges = importMap relations
|
||||
(weight=1, undirected so import and imported-by both count).
|
||||
b. Run graphology-communities-louvain → community assignment per file.
|
||||
c. For any community with size > 35 (max): split via edge-betweenness greedy
|
||||
cut (or simpler weakly-connected-component partition) until each
|
||||
sub-community ≤ 35. Log warning per split.
|
||||
(Whether this branch fires is decided by the implementation prototype
|
||||
step — see "Prototype-first implementation" below.)
|
||||
d. Communities with size < 5 are kept as-is. Wasted dispatches are
|
||||
bounded by the 5-concurrent cap, and the alternative ("merge small")
|
||||
adds edge cases without proportional value.
|
||||
|
||||
4. Non-code batching (hardcoded heuristics, moved from SKILL.md prose):
|
||||
- Group A: For each directory containing a `Dockerfile`, bundle that
|
||||
directory's `Dockerfile` + any `docker-compose.*` + any
|
||||
`.dockerignore` → one batch per such directory (so multi-service
|
||||
repos with several Dockerfiles get one batch per service).
|
||||
- Group B: `.github/workflows/*.yml` files → one batch.
|
||||
- Group C: `.gitlab-ci.yml` + files under `.circleci/` → one batch.
|
||||
- Group D: SQL files under any `migrations/` or `migration/` directory,
|
||||
sorted by filename → one batch per directory.
|
||||
- Group E: All other non-code files grouped by their immediate parent
|
||||
directory, max 20 per batch.
|
||||
|
||||
5. Assign batchIndex: code communities first (1..N), non-code groups
|
||||
second (N+1..M).
|
||||
|
||||
6. Exports extraction:
|
||||
- For each code file, run TreeSitterPlugin.extract() and collect
|
||||
top-level exports (function names, class names, exported const names).
|
||||
- Per-file failures: catch, set exports = [], emit warning.
|
||||
- Non-code files: exports = [].
|
||||
|
||||
7. Construct neighborMap (1-hop):
|
||||
For each file F in batch B:
|
||||
neighborMap[F.path] = [
|
||||
{ path: G.path, batchIndex: G.batch, symbols: G.exports }
|
||||
for G in importMap[F.path] ∪ reverseImportMap[F.path]
|
||||
where G.batch ≠ B
|
||||
]
|
||||
If neighborMap[F.path].length > 50, truncate to top 50 by neighbor
|
||||
degree (highest-imported neighbors kept), emit warning.
|
||||
|
||||
8. Construct batchImportData:
|
||||
For each batch B:
|
||||
batchImportData[F.path] = importMap[F.path] for F in B.files
|
||||
|
||||
9. Write batches.json.
|
||||
|
||||
Fallback (script-internal): If steps 3a-3c throw, catch → emit warning
|
||||
→ assign batches by alphabetical chunking (12 files per code batch).
|
||||
Steps 4, 6, 7, 8 still run normally. Set `algorithm: "count-fallback"`
|
||||
in the output.
|
||||
```
|
||||
|
||||
### Louvain implementation
|
||||
|
||||
Use `graphology-communities-louvain`'s default modularity-greedy algorithm:
|
||||
|
||||
```js
|
||||
import Graph from 'graphology';
|
||||
import louvain from 'graphology-communities-louvain';
|
||||
|
||||
const graph = new Graph({ type: 'undirected' });
|
||||
for (const file of codeFiles) graph.addNode(file.path);
|
||||
for (const [src, targets] of Object.entries(importMap)) {
|
||||
for (const tgt of targets) {
|
||||
if (graph.hasNode(src) && graph.hasNode(tgt) && !graph.hasEdge(src, tgt)) {
|
||||
graph.addEdge(src, tgt);
|
||||
}
|
||||
}
|
||||
}
|
||||
const communities = louvain(graph); // { nodeId: communityId }
|
||||
```
|
||||
|
||||
### Output schema (`batches.json`)
|
||||
|
||||
```json
|
||||
{
|
||||
"schemaVersion": 1,
|
||||
"algorithm": "louvain",
|
||||
"totalFiles": 100,
|
||||
"totalBatches": 7,
|
||||
"batches": [
|
||||
{
|
||||
"batchIndex": 1,
|
||||
"files": [
|
||||
{ "path": "src/auth/login.ts", "language": "typescript",
|
||||
"sizeLines": 120, "fileCategory": "code" }
|
||||
],
|
||||
"batchImportData": {
|
||||
"src/auth/login.ts": ["src/auth/session.ts", "src/db/users.ts"]
|
||||
},
|
||||
"neighborMap": {
|
||||
"src/auth/login.ts": [
|
||||
{ "path": "src/db/users.ts", "batchIndex": 3,
|
||||
"symbols": ["User", "findById", "createUser"] }
|
||||
]
|
||||
}
|
||||
}
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
`algorithm` is `"louvain"` on the happy path, `"count-fallback"` when the Louvain branch crashed.
|
||||
|
||||
### `--changed-files` mode
|
||||
|
||||
When invoked with `--changed-files=<path>`, the script:
|
||||
|
||||
- Loads file paths from `<path>` (one per line).
|
||||
- Still builds the full project import graph (for accurate neighborMap construction).
|
||||
- Only emits batches containing changed files.
|
||||
- neighborMap entries reference unchanged files with their batchIndex from the deterministic full-graph Louvain re-run. The seed is fixed so the assignment is reproducible across incremental invocations.
|
||||
|
||||
### Prototype-first implementation
|
||||
|
||||
Before writing the full script, build a minimal skeleton:
|
||||
|
||||
1. Load `scan-result.json` from this repo's `.understand-anything/` directory (if absent, generate via `/understand --full`).
|
||||
2. Run Louvain only — no size enforcement, no neighborMap.
|
||||
3. Print community size distribution.
|
||||
4. Decide: do real-world communities cluster in [5, 35]? If yes, size enforcement branch may be unnecessary or trivially defensive. If no, implement edge-betweenness split.
|
||||
|
||||
This gates the more speculative code (size enforcement) on empirical observation rather than upfront design.
|
||||
|
||||
---
|
||||
|
||||
## Component 2 — `skills/understand/SKILL.md` changes
|
||||
|
||||
### Add — Phase 1.5 section (after Phase 1)
|
||||
|
||||
```markdown
|
||||
## Phase 1.5 — BATCH
|
||||
|
||||
Report: `[Phase 1.5/7] Computing semantic batches...`
|
||||
|
||||
Run the bundled batching script:
|
||||
\`\`\`bash
|
||||
node <SKILL_DIR>/compute-batches.mjs $PROJECT_ROOT
|
||||
\`\`\`
|
||||
|
||||
Reads `.understand-anything/intermediate/scan-result.json`, writes
|
||||
`.understand-anything/intermediate/batches.json`.
|
||||
|
||||
Capture stderr. Append any line starting with `Warning:` to
|
||||
$PHASE_WARNINGS for the final report.
|
||||
|
||||
If the script exits non-zero, the failure is hard — relay the full
|
||||
stderr to the user as a Phase 1.5 failure. Do not attempt to recover;
|
||||
the script's internal fallback (count-based) already handles recoverable
|
||||
issues. A non-zero exit means a fundamental problem (missing input file,
|
||||
malformed JSON, etc.).
|
||||
```
|
||||
|
||||
### Replace — Phase 2 ANALYZE section (current SKILL.md:280-332)
|
||||
|
||||
Delete the existing "Batch the file list from Phase 1 into groups of 20-30 files each" prose, the non-code grouping prose (now in compute-batches), and the dispatch-time `batchImportData` construction prose (now provided in batches.json). Replace with:
|
||||
|
||||
```markdown
|
||||
## Phase 2 — ANALYZE
|
||||
|
||||
### Full analysis path
|
||||
|
||||
Load `.understand-anything/intermediate/batches.json` (produced by
|
||||
Phase 1.5). Iterate the `batches[]` array.
|
||||
|
||||
Report: `[Phase 2/7] Analyzing files — <totalFiles> files in
|
||||
<totalBatches> batches (up to 5 concurrent)...`
|
||||
|
||||
For each batch, dispatch a `file-analyzer` subagent (up to 5
|
||||
concurrent). Dispatch prompt template:
|
||||
|
||||
> Analyze these files and produce GraphNode and GraphEdge objects.
|
||||
> Project root: `$PROJECT_ROOT`
|
||||
> Project: `<projectName>`
|
||||
> Languages: `<languages>`
|
||||
> Batch: `<batchIndex>/<totalBatches>`
|
||||
> Skill directory: `<SKILL_DIR>`
|
||||
> Output: write to
|
||||
> `$PROJECT_ROOT/.understand-anything/intermediate/batch-<batchIndex>.json`
|
||||
> (single-file mode) OR `batch-<batchIndex>-part-<k>.json` (split mode,
|
||||
> per Step B of your output protocol).
|
||||
>
|
||||
> Pre-resolved import data (use directly — do NOT re-resolve from source):
|
||||
> \`\`\`json
|
||||
> <batchImportData JSON inline from batches.json[i].batchImportData>
|
||||
> \`\`\`
|
||||
>
|
||||
> Cross-batch neighbors with their exported symbols (confidence boost
|
||||
> for cross-batch edges):
|
||||
> \`\`\`json
|
||||
> <neighborMap JSON inline from batches.json[i].neighborMap>
|
||||
> \`\`\`
|
||||
>
|
||||
> Files to analyze:
|
||||
> 1. `<path>` (<sizeLines> lines, language: `<language>`,
|
||||
> fileCategory: `<fileCategory>`)
|
||||
> ...
|
||||
|
||||
$LANGUAGE_DIRECTIVE
|
||||
|
||||
After ALL batches complete, run the merge-and-normalize script:
|
||||
\`\`\`bash
|
||||
python <SKILL_DIR>/merge-batch-graphs.py $PROJECT_ROOT
|
||||
\`\`\`
|
||||
|
||||
(Rest of Phase 2 unchanged.)
|
||||
```
|
||||
|
||||
### Replace — Incremental update path (current SKILL.md:355-366)
|
||||
|
||||
```markdown
|
||||
### Incremental update path
|
||||
|
||||
Run compute-batches.mjs with `--changed-files=<path>`, where `<path>`
|
||||
is a temp file listing changed file paths (one per line). The script
|
||||
reuses the full project's import graph for neighborMap computation
|
||||
but only emits batches containing changed files. Dispatch file-analyzer
|
||||
subagents per the same template as the full path.
|
||||
```
|
||||
|
||||
### Line budget
|
||||
|
||||
Net added LLM-context prose: Phase 1.5 (~12 lines) + Phase 2 template clarifications (~5 lines) − removed batching prose (~15 lines) − removed batchImportData construction prose (~6 lines) ≈ **−4 lines**.
|
||||
|
||||
---
|
||||
|
||||
## Component 3 — `agents/file-analyzer.md` changes
|
||||
|
||||
### Add — Cross-batch context section
|
||||
|
||||
Insert after "Step 1: Input file construction":
|
||||
|
||||
```markdown
|
||||
### Cross-batch context (neighborMap)
|
||||
|
||||
Your dispatch prompt includes a `neighborMap` — for each file in your
|
||||
batch, it lists project-internal neighbors in OTHER batches (files that
|
||||
import yours or that you import), with their exported symbols.
|
||||
|
||||
Use neighborMap as a confidence boost for cross-batch edges (`calls`,
|
||||
`related`, `inherits`, `implements` to nodes outside your batch):
|
||||
|
||||
- If your source clearly references a symbol that appears in some
|
||||
`neighbor.symbols`, emit the edge to
|
||||
`function:<neighbor.path>:<symbol>` or
|
||||
`class:<neighbor.path>:<symbol>` with confidence.
|
||||
- If your source references a cross-batch symbol that is NOT in
|
||||
neighborMap (the project-scanner may not have extracted it), you may
|
||||
still emit the edge if you saw it explicitly in the imported file's
|
||||
surface — but prefer matching neighborMap symbols when available.
|
||||
- Imports continue to use `batchImportData` (fully resolved), not
|
||||
neighborMap.
|
||||
|
||||
The merge script's dangling-edge dropper is the safety net for
|
||||
genuinely unresolvable targets.
|
||||
```
|
||||
|
||||
### Replace — Writing Results section (current file-analyzer.md:467-475)
|
||||
|
||||
```markdown
|
||||
## Writing Results — single or multi-part
|
||||
|
||||
**Step A — Compute totals.**
|
||||
\`\`\`
|
||||
nodeCount = nodes.length
|
||||
edgeCount = edges.length
|
||||
\`\`\`
|
||||
|
||||
**Step B — Decide split.**
|
||||
- If `nodeCount ≤ 60` AND `edgeCount ≤ 120`: write ONE file to
|
||||
`.understand-anything/intermediate/batch-<batchIndex>.json`. Done.
|
||||
Skip to Step E.
|
||||
- Otherwise: `parts = ceil(max(nodeCount / 60, edgeCount / 120))`.
|
||||
|
||||
**Step C — Partition.**
|
||||
Sort files in your batch alphabetically by path. Chunk them sequentially
|
||||
into `parts` groups of size `ceil(N / parts)`. For each part:
|
||||
- All nodes whose `filePath` is in this part's files (for non-file
|
||||
nodes like `module`/`concept`, use the file they belong to).
|
||||
- All edges whose `source` is in this part's nodes (target may be
|
||||
anywhere — same part, different part of same batch, different batch).
|
||||
|
||||
**Step D — Write each part.**
|
||||
Write part `k` (1-indexed) to
|
||||
`.understand-anything/intermediate/batch-<batchIndex>-part-<k>.json`.
|
||||
Each part is a valid GraphFragment: `{ "nodes": [...], "edges": [...] }`.
|
||||
|
||||
**Step E — Self-validate.**
|
||||
For each file written, verify:
|
||||
- Valid JSON.
|
||||
- `nodes` array exists and is well-formed.
|
||||
- For every edge: `source` and `target` both appear as either (a) a
|
||||
node `id` in this part's nodes, OR (b) a `file:<path>` reference
|
||||
where `<path>` is in `neighborMap` or `batchImportData`, OR (c) a
|
||||
`function:<path>:<symbol>` / `class:<path>:<symbol>` reference where
|
||||
`<symbol>` is in some `neighbor.symbols`.
|
||||
|
||||
If validation fails on a part, do NOT silently rebuild. Respond with
|
||||
an explicit error stating which part failed, which edge(s) failed
|
||||
validation, and why. The dispatching session can then retry.
|
||||
|
||||
**Step F — Respond.**
|
||||
Respond with ONLY a brief text summary: parts written (1 or more),
|
||||
total nodes/edges across all parts, any files skipped. Do NOT include
|
||||
JSON content in the response.
|
||||
```
|
||||
|
||||
### Threshold rationale
|
||||
|
||||
`60 nodes / 120 edges per part` derives from:
|
||||
|
||||
- File node JSON serialized ≈ 150-300 chars; function/class ≈ 80-150 chars; edge ≈ 100-150 chars.
|
||||
- 60 nodes + 120 edges ≈ 25-35KB JSON ≈ 7000-9000 output tokens (JSON tokenization is dense).
|
||||
- Bedrock OPUS default `max_tokens` 4096-8192 → ~10% safety margin.
|
||||
|
||||
These constants live as file-analyzer.md prose for now. Auto-tuning per provider is deferred to follow-up.
|
||||
|
||||
---
|
||||
|
||||
## Component 4 — `merge-batch-graphs.py` (verify-only)
|
||||
|
||||
### Confirmed compatibility
|
||||
|
||||
The existing glob and sort-key already handle multi-part files transparently:
|
||||
|
||||
- `intermediate_dir.glob("batch-*.json")` matches `batch-3-part-1.json`.
|
||||
- `re.search(r"batch-(\d+)", p.stem)` extracts `3` from `batch-3-part-1`, giving the same sort key as `batch-3.json`. Python `sorted` is stable, so parts load in lexicographic tie-break order.
|
||||
- `merge_and_normalize` walks `all_nodes.extend(...)` / `all_edges.extend(...)`; load order does not affect dedup correctness.
|
||||
- `recover_imports_from_scan` operates on the merged graph — transparent to multi-part inputs.
|
||||
- `link_tests` operates on the merged node pool — transparent.
|
||||
|
||||
No code change required for correctness.
|
||||
|
||||
### Add — Multi-part awareness in stderr report
|
||||
|
||||
`merge-batch-graphs.py:1026` currently prints `Found {N} batch files:`. Enhance:
|
||||
|
||||
```python
|
||||
from collections import defaultdict
|
||||
by_batch = defaultdict(list)
|
||||
for f in batch_files:
|
||||
m = re.match(r"batch-(\d+)(?:-part-(\d+))?\.json", f.name)
|
||||
if m:
|
||||
by_batch[int(m.group(1))].append(f.name)
|
||||
|
||||
logical_count = len(by_batch)
|
||||
multi_part = sum(1 for files in by_batch.values() if len(files) > 1)
|
||||
print(
|
||||
f"Found {len(batch_files)} batch files "
|
||||
f"({logical_count} logical batches, {multi_part} multi-part)",
|
||||
file=sys.stderr,
|
||||
)
|
||||
```
|
||||
|
||||
### Add — Missing-part warning
|
||||
|
||||
After grouping, detect logical batches with non-contiguous part numbers (e.g. parts `{2, 3}` present but `1` missing) and emit:
|
||||
|
||||
```
|
||||
Warning: merge: batch <i> has parts {<set>} but missing part {<missing>}
|
||||
— possible truncated write — affected nodes/edges may be lost
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Failure modes & observability
|
||||
|
||||
| Failure point | Behavior | Safety net | Required warning text |
|
||||
|---|---|---|---|
|
||||
| Louvain library throws | exception | Script-internal: catch → count-based fallback (12 files/batch); neighborMap still built | `Warning: compute-batches: Louvain failed (<msg>) — falling back to count-based grouping (12 files/batch) — module semantic boundaries lost` |
|
||||
| tree-sitter exports per-file failure | empty exports | symbols=[] in neighborMap | `Warning: compute-batches: exports extraction failed for <path> (<msg>) — symbols=[] in neighborMap — cross-batch edges to this file limited to file-level` |
|
||||
| Louvain produces oversized community | size > 35 | Edge-betweenness split | `Warning: compute-batches: community size <N> > max 35 — splitting via edge-betweenness — modularity may decrease` |
|
||||
| compute-batches complete crash | exit non-zero, no batches.json | SKILL.md surfaces full stderr to user; no Phase 2 fallback | (script's own error to stderr; SKILL.md relays verbatim) |
|
||||
| neighborMap truncation | > 50 neighbors | Top-50 by degree kept | `Warning: compute-batches: neighborMap for <path> truncated from <N> to top 50 (by neighbor degree)` |
|
||||
| file-analyzer part JSON malformed | `load_batch` skips | Existing `load_batch:139` warns and skips | (existing — verify the warning is not swallowed) |
|
||||
| Missing part in multi-part batch | gap in parts | merge detects and warns | `Warning: merge: batch <i> has parts {<set>} but missing part {<missing>} — possible truncated write — affected nodes/edges may be lost` |
|
||||
| file-analyzer dangling edges | source/target missing | merge drops, adds to `unfixable` (existing) | (existing) |
|
||||
| file-analyzer dispatch fails | subagent error | existing retry-once mechanism | (existing) |
|
||||
|
||||
### Observability invariant
|
||||
|
||||
Every fallback / degrade / drop MUST:
|
||||
|
||||
1. Write a stderr line in `Warning: <component>: <what happened> — <why> — <impact>` format.
|
||||
2. Bubble up to `$PHASE_WARNINGS` (SKILL.md existing mechanism) → user-facing Phase 7 final report.
|
||||
3. Never use silent `catch {}` / `except: pass`. Code review treats this as a blocker.
|
||||
|
||||
### Invariants
|
||||
|
||||
1. **scan-result.json is source of truth.** Any batching/topology change preserves importMap; `recover_imports_from_scan` always restores `imports` edges.
|
||||
2. **Dangling-edge dropper is final defense.** No batch-generated edge can connect to a nonexistent node in the assembled graph.
|
||||
3. **No silent fallback.** `batches.json` missing → loud failure. Internal compute-batches fallback → loud warning that bubbles to user.
|
||||
|
||||
---
|
||||
|
||||
## Testing
|
||||
|
||||
### Unit tests — `compute-batches.mjs`
|
||||
|
||||
New file: `understand-anything-plugin/skills/understand/test_compute_batches.test.mjs` (Vitest).
|
||||
|
||||
Required cases:
|
||||
|
||||
- **Louvain basic:** 3 disjoint cliques → 3 batches.
|
||||
- **Empty importMap:** independent files → count-fallback batches by alphabetical chunking.
|
||||
- **Oversized community:** 50-node complete graph → split triggered, all sub-batches ≤ 35.
|
||||
- **Non-code grouping A:** `Dockerfile` + `docker-compose.yml` + `.dockerignore` siblings → one batch per directory cluster.
|
||||
- **Non-code grouping B:** `.github/workflows/*.yml` → one batch.
|
||||
- **Non-code grouping C:** SQL migrations under `migrations/` → one batch per directory.
|
||||
- **Mixed code + non-code:** non-code batchIndex follows code batches.
|
||||
- **neighborMap correctness:** file A imports file B across batches → `neighborMap[A]` contains `{path: B, batchIndex: B's, symbols: B's exports}`.
|
||||
- **neighborMap excludes same-batch:** A and C in same batch → `neighborMap[A]` does not contain C.
|
||||
- **Exports failure tolerance:** mock TreeSitter to throw on one file → `exports = []` for that file, others unaffected.
|
||||
- **`--changed-files`:** input subset → output contains only batches with changed files; neighborMap may reference unchanged files.
|
||||
- **Fallback triggers:** mock Louvain throw → `algorithm` field = `"count-fallback"`, warning in stderr.
|
||||
- **Warning assertion per fallback:** for each of {Louvain crash, exports failure, oversize split, neighborMap truncation}, assert the exact warning string appears in stderr.
|
||||
|
||||
### Unit tests — `merge-batch-graphs.py`
|
||||
|
||||
New test class `TestMultiPart` in `test_merge_batch_graphs.py`:
|
||||
|
||||
- Two parts of one logical batch: `batch-1-part-1.json` + `batch-1-part-2.json` → assembled contains all nodes/edges from both.
|
||||
- Three parts of one logical batch.
|
||||
- Cross-part edges: edge with source in part-1, target node in part-2 → connected after merge.
|
||||
- Malformed part-1 + valid part-2: part-1 skipped with warning, part-2 contents present.
|
||||
- Mixed single-batch and multi-part inputs.
|
||||
- Missing part detection: `batch-1-part-2.json` + `batch-1-part-3.json` (no part-1) → warning emitted with exact text.
|
||||
- stderr format: assert `"X logical batches, Y multi-part"` appears.
|
||||
|
||||
### Integration — PR acceptance gate (manual)
|
||||
|
||||
Documented in the PR's Test plan:
|
||||
|
||||
- [ ] `pnpm install` (graphology installs cleanly).
|
||||
- [ ] `pnpm --filter @understand-anything/core build`.
|
||||
- [ ] Run `/understand --full` on this repo (Understand-Anything itself):
|
||||
- `batches.json` generated; community size distribution sanity-check (mix of small and medium batches).
|
||||
- At least one batch produces multi-part output.
|
||||
- `assembled-graph.json` node/edge counts within expected range vs current main.
|
||||
- Dashboard renders normally.
|
||||
- Phase 7 final report includes any `$PHASE_WARNINGS` from compute-batches (visually verify warnings reach user-facing output, not just stderr).
|
||||
- [ ] Run on a ~100-file repo matching ayushghosh's scenario; confirm no "output limit" errors.
|
||||
- [ ] Run on a 5-10 file small repo: fallback path (all one batch) works correctly.
|
||||
|
||||
### Not tested
|
||||
|
||||
- Louvain algorithm correctness (trust `graphology-communities-louvain`'s own tests).
|
||||
- Performance benchmarks (sub-second on 100-500 files is empirical; not gated).
|
||||
- Multiple LLM provider output-cap variations (thresholds are conservative for Bedrock OPUS; first-party Anthropic is more permissive).
|
||||
|
||||
---
|
||||
|
||||
## Out of scope (tracked for follow-up)
|
||||
|
||||
### Tree-sitter deduplication
|
||||
|
||||
Currently Phase 1 (project-scanner), Phase 1.5 (compute-batches), and Phase 2 (file-analyzer per-batch) each run tree-sitter independently. Consolidating into a single Phase 1.5 structure extraction would simplify file-analyzer and save time on large projects. Defer because it requires reorganizing file-analyzer's protocol significantly.
|
||||
|
||||
### neighborMap LLM summaries
|
||||
|
||||
Adding one-sentence summaries per file to neighborMap would enable file-analyzer to emit `related` edges across batches with semantic justification. Requires a new lightweight summary-pass agent; defer until the tree-sitter dedup lands (Phase 1.5 will already have full structure → cheaper to add).
|
||||
|
||||
### Adaptive thresholds
|
||||
|
||||
`60 nodes / 120 edges` are conservative for Bedrock OPUS. Anthropic first-party supports much larger output caps. Adding a `--output-cap=<N>` CLI to compute-batches and propagating to file-analyzer would unlock larger parts on permissive backends. Track real-world part counts before implementing.
|
||||
|
||||
### Cross-batch edge audit
|
||||
|
||||
A post-merge audit comparing neighborMap-suggested edges vs actually-emitted edges would surface gaps. Mirror the existing `recover_imports_from_scan` pattern. Requires preserving `batches.json` for merge-time consumption.
|
||||
|
||||
### Multi-language monorepo handling
|
||||
|
||||
Multi-language repos (TS + Python) tend to naturally split via Louvain (no cross-language imports). Bridge files (OpenAPI, protobuf) might create odd communities. Address only if real reports surface.
|
||||
|
||||
---
|
||||
|
||||
## Implementation order
|
||||
|
||||
1. **Prototype:** minimal `compute-batches.mjs` skeleton — load scan-result.json, run Louvain, print community sizes. Run against this repo's `scan-result.json` (generate if missing via `/understand --full`). Decide whether size-enforcement branch is needed; if needed, choose between edge-betweenness and weakly-connected-component split.
|
||||
2. Add exports extraction (reuse TreeSitterPlugin).
|
||||
3. Add neighborMap construction + batchImportData passthrough.
|
||||
4. Add non-code grouping heuristics (Groups A-E).
|
||||
5. Add fallback path + warning emissions for every failure mode listed in the Failure modes table.
|
||||
6. Write unit tests for compute-batches (per Testing section), including warning-text assertions.
|
||||
7. Modify `agents/file-analyzer.md` — add Cross-batch context section, replace Writing Results.
|
||||
8. Modify `skills/understand/SKILL.md` — add Phase 1.5, replace Phase 2 ANALYZE batching prose, replace incremental path.
|
||||
9. Add multi-part stderr report + missing-part warning to `merge-batch-graphs.py`.
|
||||
10. Write unit tests for `merge-batch-graphs.py` multi-part handling.
|
||||
11. Add `graphology` + `graphology-communities-louvain` to `understand-anything-plugin/package.json`.
|
||||
12. Run integration acceptance gate.
|
||||
13. Bump version in all five `package.json` / `plugin.json` files per the project's CLAUDE.md versioning rule.
|
||||
Reference in New Issue
Block a user