qiaoxinjiu/Fulfilled-Knowledge

Fork 0

Files

qiaoxinjiu e31a75d2bb Add under-anything knowledge dashboard

2026-05-27 15:40:32 +08:00

13 KiB

Raw Blame History

Business Domain Knowledge Extraction — Design Spec

Issue: #61 Date: 2026-04-01

Problem

The current knowledge graph shows file-level dependency relationships, but this has limited value — you can already see imports in an IDE. When files are many, listing dependency edges doesn't reduce cognitive load; you still mentally reconstruct what the code does. What's needed is business domain knowledge: the logic and domain concepts embedded within the code, not the structural wiring.

Solution Overview

A new /understand-domain skill that extracts business domain knowledge and renders it as a horizontal flow graph in the dashboard. Two viewing modes: a high-level Domain view (default when available) and the existing Structural view, with a toggle to switch between them.

Architecture: Separate File, Shared Schema (Approach C)

Domain data lives in a separate file (domain-graph.json) using the same KnowledgeGraph type system — extended with new node/edge types. The dashboard detects both files and offers a view toggle. Domain nodes can reference structural nodes by ID for drill-down.

Why separate files:

/understand-domain works standalone (lightweight) or alongside full graph
Shared schema means search, validation, and filtering work for both
No risk of polluting the structural graph
Each file is independently valid

Section 1: Domain Graph Schema

Three-Level Hierarchy

Business Domain (top) — e.g., "Purchasing", "Logistics", "Warehouse Management"
Business Flow (mid) — e.g., "Create Order", "Process Refund"
Business Step (leaf) — e.g., "Validate input", "Check inventory", "Persist order"

New Node Types (3)

Type	Purpose	Example
`domain`	Business domain cluster	"Order Management", "Logistics"
`flow`	A business process within a domain	"Create Order", "Process Refund"
`step`	A single step in a flow	"Validate order input"

New Edge Types (4)

Type	Purpose
`contains_flow`	domain → flow
`flow_step`	flow → step (ordered via `weight` field, e.g., 0.1, 0.2, ...)
`cross_domain`	domain → domain (interaction between domains)
`implements`	step → file/function node ID (reference into structural graph)

Domain Node Structure

// domain node
{
  id: "domain:order-management",
  type: "domain",
  name: "Order Management",
  summary: "Handles the complete order lifecycle...",
  tags: ["e-commerce", "core-business"],
  complexity: "complex",
  domainMeta?: {
    entities: ["Order", "LineItem", "OrderStatus"],
    businessRules: ["Orders require inventory check before confirmation"],
    crossDomainInteractions: ["Triggers Logistics on order confirmed", "Reads from Customer Service for buyer info"]
  }
}

Flow Node Structure

{
  id: "flow:create-order",
  type: "flow",
  name: "Create Order",
  summary: "Customer submits a new order through the API",
  tags: ["write-path", "api"],
  complexity: "moderate",
  domainMeta?: {
    entryPoint: "POST /api/orders",
    entryType: "http" | "cli" | "event" | "cron" | "manual"
  }
}

Step Node Structure

{
  id: "step:create-order:validate-input",
  type: "step",
  name: "Validate order input",
  summary: "Checks request body against order schema, rejects invalid payloads",
  tags: ["validation"],
  complexity: "simple",
  filePath: "src/validators/order-validator.ts",
  lineRange: [12, 45]
}

File Output

Saved to .understand-anything/domain-graph.json — same KnowledgeGraph shape, valid on its own.

Section 2: Analysis Pipeline

Two Paths, Same Output

Path 1: Lightweight scan (no existing graph)

File tree scan
    → Static entry point detection (tree-sitter)
        → Route definitions, exported handlers, main(), event listeners, cron decorators
    → Feed to LLM: file tree + detected entry points + sampled file contents
        → LLM outputs: domains, flows, steps, cross-domain interactions
    → Build domain-graph.json

Token cost: ~10-20% of a full /understand scan.

Path 2: Derive from existing graph

Load knowledge-graph.json
    → Extract: all nodes, edges, layers, summaries, tour
    → Feed to LLM: graph data as structured context
        → LLM outputs: domains, flows, steps, cross-domain interactions
    → Build domain-graph.json

Very cheap — no file reading needed, LLM reasons over existing summaries and call edges.

Path Selection: /understand-domain checks if .understand-anything/knowledge-graph.json exists. If yes → Path 2. If no → Path 1.

Agent Structure

One new agent: domain-analyzer (opus model). Handles both paths. For large codebases, can batch by detected entry point groups.

Section 3: Preprocessing Script & Skill Integration

Script: `understand-anything-plugin/skills/understand-domain/extract-domain-context.py`

Bundled with the skill (not in scripts/ which is for development tooling). Runs before the LLM agent. Outputs .understand-anything/intermediate/domain-context.json:

{
  "fileTree": ["src/api/orders.ts", "src/services/...", "..."],
  "entryPoints": [
    {
      "file": "src/api/orders.ts",
      "type": "http",
      "method": "POST",
      "path": "/api/orders",
      "handler": "createOrder",
      "lineRange": [15, 45],
      "snippet": "async function createOrder(req, res) { ... }"
    }
  ],
  "fileSignatures": {
    "src/services/order-service.ts": {
      "exports": ["createOrder", "cancelOrder", "getOrderById"],
      "imports": ["inventory-service", "pricing-service", "order-repo"],
      "summary": null
    }
  }
}

Python script (no heavy dependencies — uses ast for Python, regex for other languages). Uses:

Walk the file tree (respecting .gitignore)
Detect entry points by pattern: route decorators, app.get/post, export default handler, main(), event listeners
Extract function signatures and import/export lists per file
Keep code snippets short (signature + first few lines, not full bodies)

Skill Integration

The /understand-domain skill markdown:

Runs understand-anything-plugin/skills/understand-domain/extract-domain-context.py
Checks for existing knowledge-graph.json
If exists → passes both domain-context.json + graph data to domain-analyzer agent
If not → passes only domain-context.json
Agent outputs domain-graph.json
Cleans up intermediate files
Auto-triggers /understand-dashboard

Section 4: Dashboard — Domain View

View Toggle

Top-left corner: pill toggle — "Domain" / "Structural"
Domain view is default when domain-graph.json exists
If only one graph file exists, no toggle shown
Switching views preserves sidebar state

Horizontal Flow Layout

Layout engine: Dagre with rankdir: "LR" (left-to-right)
Zoom levels:
- Zoomed out: Domain clusters as large rounded rectangles, cross_domain edges between them
- Click domain: Expands to show flows as horizontal lanes
- Click flow: Shows step-by-step trace left-to-right

Domain Cluster Rendering

┌─────────────────────────────────────┐
│  Order Management                   │
│  "Handles the complete order..."    │
│                                     │
│  Entities: Order, LineItem, Status  │
│  Flows: Create Order, Cancel Order  │
│  Rules: "Requires inventory check"  │
└─────────────────────────────────────┘
          ──cross_domain──→  [Logistics]

Gold/amber border for domain clusters (matches existing theme)
Shows summary, entity list, flow count on the cluster face
Cross-domain edges: thick dashed lines with labels

Flow Trace Rendering

POST /api/orders
  ┌──────────┐    ┌──────────────┐    ┌───────────┐    ┌──────────┐    ┌────────────┐
  │ Validate  │───→│ Check        │───→│ Calculate  │───→│ Persist   │───→│ Send       │
  │ Input     │    │ Inventory    │    │ Pricing    │    │ Order     │    │ Confirm    │
  └──────────┘    └──────────────┘    └───────────┘    └──────────┘    └────────────┘

Steps connected left-to-right by flow_step edges (ordered by weight)
Entry point label at the left as flow trigger
Clicking a step → sidebar shows detail + link to structural view

Sidebar Adaptations

Domain node selected: Summary, business rules, entities, cross-domain interactions, list of flows (clickable)

Flow node selected: Entry point info, step list in order, complexity

Step node selected: Description, "View in code" link (switches to structural view + navigates to file/function), previous/next step links

Drill-Down: Domain → Structural

When a step has an implements edge referencing a structural node ID:

"View implementation" button in sidebar
Switches to structural view and navigates to that node
Breadcrumb: Domain: Order Management > Flow: Create Order > Step: Validate Input → [structural view]

Section 5: Skill Definition

`/understand-domain` Skill

File: skills/understand-domain.md
Arguments: Optional --full flag to force Path 1 (rescan even if graph exists)

Execution Flow

1. Run scripts/extract-domain-context.mjs
2. Check for .understand-anything/knowledge-graph.json
   ├── Exists → Path 2: load graph + domain-context.json
   └── Missing → Path 1: domain-context.json only
3. Invoke domain-analyzer agent (opus)
4. Validate output against schema
5. Save .understand-anything/domain-graph.json
6. Clean up intermediate/domain-context.json
7. Auto-trigger /understand-dashboard

Domain Analyzer Agent

File: agents/domain-analyzer.md
Model: opus
Input: Either (file tree + entry points) or (existing knowledge graph)
Output: Complete domain graph JSON

Change Map

Area	Changes
`packages/core/src/types.ts`	Add 3 node types, 4 edge types, `domainMeta` optional field
`packages/core/src/schema.ts`	Extend Zod schemas + aliases for new types
`packages/core/src/persistence/`	Add `loadDomainGraph()` / `saveDomainGraph()`
`understand-anything-plugin/skills/understand-domain/extract-domain-context.py`	New preprocessing script (bundled with skill)
`agents/domain-analyzer.md`	New agent definition
`skills/understand-domain.md`	New skill definition
`packages/dashboard/src/store.ts`	Add `domainGraph`, `viewMode` state
`packages/dashboard/src/components/`	New: `DomainGraphView.tsx`, `DomainClusterNode.tsx`, `FlowTraceNode.tsx`, `StepNode.tsx`
`packages/dashboard/src/components/`	Modify: `App.tsx` (view toggle), `NodeInfo.tsx` (domain sidebar), `FilterPanel.tsx` (domain filters)
`packages/dashboard/src/utils/`	New: `domain-layout.ts` (horizontal Dagre config)

Section 6: Error Tolerance

Pipeline-Level Tolerance

Stage	Error Handling
Preprocessing script	If tree-sitter fails on a file, skip and continue. Log skipped files. Entry point detection is best-effort.
LLM output parsing	Same strategy as existing `parseTourGenerationResponse()` — extract JSON from markdown, handle partial responses.
Schema validation	Existing auto-fix pipeline: sanitize → normalize (aliases) → apply defaults → validate. Drop broken nodes/edges, don't fail the whole graph.
Cross-graph references	`implements` edges pointing to non-existent structural node IDs → keep edge but mark as `unresolved`. Dashboard shows step without drill-down link.

Domain-Specific Validation Rules

Domain with no flows: Warn, keep (summary/entities still useful)
Flow with no steps: Warn, keep (entry point info still valuable)
Steps with broken ordering: Re-number sequentially by array position if weight values missing/duplicate
Orphan steps: Steps not connected to any flow → attach to synthetic "Uncategorized" flow
Duplicate domains: Merge by name similarity (fuzzy match), combine flows
Empty domain graph: Error banner in dashboard: "Domain extraction failed — try running /understand first for richer context, then /understand-domain"

Dashboard Resilience

If domainMeta missing on a domain node, sidebar shows only summary/tags
If domain-graph.json fails validation entirely, fall back to structural view with warning banner
Partial graphs render what's valid

Normalization Aliases for Domain Types

// Node type aliases
"business_domain" → "domain"
"process" → "flow"
"workflow" → "flow"
"action" → "step"
"task" → "step"

// Edge type aliases
"has_flow" → "contains_flow"
"next_step" → "flow_step"
"interacts_with" → "cross_domain"
"implemented_by" → "implements"

13 KiB Raw Blame History