13 KiB
Business Domain Knowledge Extraction — Design Spec
Issue: #61 Date: 2026-04-01
Problem
The current knowledge graph shows file-level dependency relationships, but this has limited value — you can already see imports in an IDE. When files are many, listing dependency edges doesn't reduce cognitive load; you still mentally reconstruct what the code does. What's needed is business domain knowledge: the logic and domain concepts embedded within the code, not the structural wiring.
Solution Overview
A new /understand-domain skill that extracts business domain knowledge and renders it as a horizontal flow graph in the dashboard. Two viewing modes: a high-level Domain view (default when available) and the existing Structural view, with a toggle to switch between them.
Architecture: Separate File, Shared Schema (Approach C)
Domain data lives in a separate file (domain-graph.json) using the same KnowledgeGraph type system — extended with new node/edge types. The dashboard detects both files and offers a view toggle. Domain nodes can reference structural nodes by ID for drill-down.
Why separate files:
/understand-domainworks standalone (lightweight) or alongside full graph- Shared schema means search, validation, and filtering work for both
- No risk of polluting the structural graph
- Each file is independently valid
Section 1: Domain Graph Schema
Three-Level Hierarchy
- Business Domain (top) — e.g., "Purchasing", "Logistics", "Warehouse Management"
- Business Flow (mid) — e.g., "Create Order", "Process Refund"
- Business Step (leaf) — e.g., "Validate input", "Check inventory", "Persist order"
New Node Types (3)
| Type | Purpose | Example |
|---|---|---|
domain |
Business domain cluster | "Order Management", "Logistics" |
flow |
A business process within a domain | "Create Order", "Process Refund" |
step |
A single step in a flow | "Validate order input" |
New Edge Types (4)
| Type | Purpose |
|---|---|
contains_flow |
domain → flow |
flow_step |
flow → step (ordered via weight field, e.g., 0.1, 0.2, ...) |
cross_domain |
domain → domain (interaction between domains) |
implements |
step → file/function node ID (reference into structural graph) |
Domain Node Structure
// domain node
{
id: "domain:order-management",
type: "domain",
name: "Order Management",
summary: "Handles the complete order lifecycle...",
tags: ["e-commerce", "core-business"],
complexity: "complex",
domainMeta?: {
entities: ["Order", "LineItem", "OrderStatus"],
businessRules: ["Orders require inventory check before confirmation"],
crossDomainInteractions: ["Triggers Logistics on order confirmed", "Reads from Customer Service for buyer info"]
}
}
Flow Node Structure
{
id: "flow:create-order",
type: "flow",
name: "Create Order",
summary: "Customer submits a new order through the API",
tags: ["write-path", "api"],
complexity: "moderate",
domainMeta?: {
entryPoint: "POST /api/orders",
entryType: "http" | "cli" | "event" | "cron" | "manual"
}
}
Step Node Structure
{
id: "step:create-order:validate-input",
type: "step",
name: "Validate order input",
summary: "Checks request body against order schema, rejects invalid payloads",
tags: ["validation"],
complexity: "simple",
filePath: "src/validators/order-validator.ts",
lineRange: [12, 45]
}
File Output
Saved to .understand-anything/domain-graph.json — same KnowledgeGraph shape, valid on its own.
Section 2: Analysis Pipeline
Two Paths, Same Output
Path 1: Lightweight scan (no existing graph)
File tree scan
→ Static entry point detection (tree-sitter)
→ Route definitions, exported handlers, main(), event listeners, cron decorators
→ Feed to LLM: file tree + detected entry points + sampled file contents
→ LLM outputs: domains, flows, steps, cross-domain interactions
→ Build domain-graph.json
Token cost: ~10-20% of a full /understand scan.
Path 2: Derive from existing graph
Load knowledge-graph.json
→ Extract: all nodes, edges, layers, summaries, tour
→ Feed to LLM: graph data as structured context
→ LLM outputs: domains, flows, steps, cross-domain interactions
→ Build domain-graph.json
Very cheap — no file reading needed, LLM reasons over existing summaries and call edges.
Path Selection: /understand-domain checks if .understand-anything/knowledge-graph.json exists. If yes → Path 2. If no → Path 1.
Agent Structure
One new agent: domain-analyzer (opus model). Handles both paths. For large codebases, can batch by detected entry point groups.
Section 3: Preprocessing Script & Skill Integration
Script: understand-anything-plugin/skills/understand-domain/extract-domain-context.py
Bundled with the skill (not in scripts/ which is for development tooling). Runs before the LLM agent. Outputs .understand-anything/intermediate/domain-context.json:
{
"fileTree": ["src/api/orders.ts", "src/services/...", "..."],
"entryPoints": [
{
"file": "src/api/orders.ts",
"type": "http",
"method": "POST",
"path": "/api/orders",
"handler": "createOrder",
"lineRange": [15, 45],
"snippet": "async function createOrder(req, res) { ... }"
}
],
"fileSignatures": {
"src/services/order-service.ts": {
"exports": ["createOrder", "cancelOrder", "getOrderById"],
"imports": ["inventory-service", "pricing-service", "order-repo"],
"summary": null
}
}
}
Python script (no heavy dependencies — uses ast for Python, regex for other languages). Uses:
- Walk the file tree (respecting
.gitignore) - Detect entry points by pattern: route decorators,
app.get/post,export default handler,main(), event listeners - Extract function signatures and import/export lists per file
- Keep code snippets short (signature + first few lines, not full bodies)
Skill Integration
The /understand-domain skill markdown:
- Runs
understand-anything-plugin/skills/understand-domain/extract-domain-context.py - Checks for existing
knowledge-graph.json - If exists → passes both
domain-context.json+ graph data to domain-analyzer agent - If not → passes only
domain-context.json - Agent outputs
domain-graph.json - Cleans up intermediate files
- Auto-triggers
/understand-dashboard
Section 4: Dashboard — Domain View
View Toggle
- Top-left corner: pill toggle — "Domain" / "Structural"
- Domain view is default when
domain-graph.jsonexists - If only one graph file exists, no toggle shown
- Switching views preserves sidebar state
Horizontal Flow Layout
- Layout engine: Dagre with
rankdir: "LR"(left-to-right) - Zoom levels:
- Zoomed out: Domain clusters as large rounded rectangles,
cross_domainedges between them - Click domain: Expands to show flows as horizontal lanes
- Click flow: Shows step-by-step trace left-to-right
- Zoomed out: Domain clusters as large rounded rectangles,
Domain Cluster Rendering
┌─────────────────────────────────────┐
│ Order Management │
│ "Handles the complete order..." │
│ │
│ Entities: Order, LineItem, Status │
│ Flows: Create Order, Cancel Order │
│ Rules: "Requires inventory check" │
└─────────────────────────────────────┘
──cross_domain──→ [Logistics]
- Gold/amber border for domain clusters (matches existing theme)
- Shows summary, entity list, flow count on the cluster face
- Cross-domain edges: thick dashed lines with labels
Flow Trace Rendering
POST /api/orders
┌──────────┐ ┌──────────────┐ ┌───────────┐ ┌──────────┐ ┌────────────┐
│ Validate │───→│ Check │───→│ Calculate │───→│ Persist │───→│ Send │
│ Input │ │ Inventory │ │ Pricing │ │ Order │ │ Confirm │
└──────────┘ └──────────────┘ └───────────┘ └──────────┘ └────────────┘
- Steps connected left-to-right by
flow_stepedges (ordered byweight) - Entry point label at the left as flow trigger
- Clicking a step → sidebar shows detail + link to structural view
Sidebar Adaptations
Domain node selected: Summary, business rules, entities, cross-domain interactions, list of flows (clickable)
Flow node selected: Entry point info, step list in order, complexity
Step node selected: Description, "View in code" link (switches to structural view + navigates to file/function), previous/next step links
Drill-Down: Domain → Structural
When a step has an implements edge referencing a structural node ID:
- "View implementation" button in sidebar
- Switches to structural view and navigates to that node
- Breadcrumb:
Domain: Order Management > Flow: Create Order > Step: Validate Input → [structural view]
Section 5: Skill Definition
/understand-domain Skill
- File:
skills/understand-domain.md - Arguments: Optional
--fullflag to force Path 1 (rescan even if graph exists)
Execution Flow
1. Run scripts/extract-domain-context.mjs
2. Check for .understand-anything/knowledge-graph.json
├── Exists → Path 2: load graph + domain-context.json
└── Missing → Path 1: domain-context.json only
3. Invoke domain-analyzer agent (opus)
4. Validate output against schema
5. Save .understand-anything/domain-graph.json
6. Clean up intermediate/domain-context.json
7. Auto-trigger /understand-dashboard
Domain Analyzer Agent
- File:
agents/domain-analyzer.md - Model: opus
- Input: Either (file tree + entry points) or (existing knowledge graph)
- Output: Complete domain graph JSON
Change Map
| Area | Changes |
|---|---|
packages/core/src/types.ts |
Add 3 node types, 4 edge types, domainMeta optional field |
packages/core/src/schema.ts |
Extend Zod schemas + aliases for new types |
packages/core/src/persistence/ |
Add loadDomainGraph() / saveDomainGraph() |
understand-anything-plugin/skills/understand-domain/extract-domain-context.py |
New preprocessing script (bundled with skill) |
agents/domain-analyzer.md |
New agent definition |
skills/understand-domain.md |
New skill definition |
packages/dashboard/src/store.ts |
Add domainGraph, viewMode state |
packages/dashboard/src/components/ |
New: DomainGraphView.tsx, DomainClusterNode.tsx, FlowTraceNode.tsx, StepNode.tsx |
packages/dashboard/src/components/ |
Modify: App.tsx (view toggle), NodeInfo.tsx (domain sidebar), FilterPanel.tsx (domain filters) |
packages/dashboard/src/utils/ |
New: domain-layout.ts (horizontal Dagre config) |
Section 6: Error Tolerance
Pipeline-Level Tolerance
| Stage | Error Handling |
|---|---|
| Preprocessing script | If tree-sitter fails on a file, skip and continue. Log skipped files. Entry point detection is best-effort. |
| LLM output parsing | Same strategy as existing parseTourGenerationResponse() — extract JSON from markdown, handle partial responses. |
| Schema validation | Existing auto-fix pipeline: sanitize → normalize (aliases) → apply defaults → validate. Drop broken nodes/edges, don't fail the whole graph. |
| Cross-graph references | implements edges pointing to non-existent structural node IDs → keep edge but mark as unresolved. Dashboard shows step without drill-down link. |
Domain-Specific Validation Rules
- Domain with no flows: Warn, keep (summary/entities still useful)
- Flow with no steps: Warn, keep (entry point info still valuable)
- Steps with broken ordering: Re-number sequentially by array position if
weightvalues missing/duplicate - Orphan steps: Steps not connected to any flow → attach to synthetic "Uncategorized" flow
- Duplicate domains: Merge by name similarity (fuzzy match), combine flows
- Empty domain graph: Error banner in dashboard: "Domain extraction failed — try running
/understandfirst for richer context, then/understand-domain"
Dashboard Resilience
- If
domainMetamissing on a domain node, sidebar shows only summary/tags - If
domain-graph.jsonfails validation entirely, fall back to structural view with warning banner - Partial graphs render what's valid
Normalization Aliases for Domain Types
// Node type aliases
"business_domain" → "domain"
"process" → "flow"
"workflow" → "flow"
"action" → "step"
"task" → "step"
// Edge type aliases
"has_flow" → "contains_flow"
"next_step" → "flow_step"
"interacts_with" → "cross_domain"
"implemented_by" → "implements"