Files
Fulfilled-Knowledge/Understand-Anything-main/docs/superpowers/specs/2026-03-28-understand-anything-extension-design.md
2026-05-27 15:40:32 +08:00

11 KiB

Understand Anything: Universal File Type Support

Date: 2026-03-28 Status: Approved Approach: Big Bang — all file types in one release

Goals

  1. Extend Understand Anything to analyze any file type, not just code
  2. Support both holistic project enrichment (non-code files enrich code graphs) and standalone analysis (docs-only repos, SQL schema collections, IaC projects)
  3. Maintain backward compatibility with existing code-only analysis

Supported File Types (26 new)

Documentation (3)

Type Extensions Parser Node Types
Markdown .md, .mdx LLM + regex heading extraction document
reStructuredText .rst LLM document
Plain text .txt LLM document

Configuration (5)

Type Extensions Parser Node Types
YAML .yaml, .yml yaml npm package config
JSON .json, .jsonc JSON.parse / jsonc-parser config, schema
TOML .toml @iarna/toml or similar config
.env .env, .env.* Regex line parser config
XML .xml LLM (optionally fast-xml-parser) config

Infrastructure & DevOps (7)

Type Extensions Parser Node Types
Dockerfile Dockerfile, Dockerfile.*, .dockerfile Custom instruction parser service, pipeline
Docker Compose docker-compose.yml, compose.yml YAML parser + service extraction service
Terraform .tf, .tfvars Regex block parser resource
Kubernetes K8s YAML (detected by apiVersion field) YAML + kind detection service, resource
GitHub Actions .github/workflows/*.yml YAML + job/step extraction pipeline
Jenkinsfile Jenkinsfile LLM (Groovy DSL) pipeline
Makefile Makefile, *.mk Regex target parser pipeline

Data & Schema (6)

Type Extensions Parser Node Types
SQL .sql Simple DDL parser table, endpoint
GraphQL .graphql, .gql Regex type/query parser schema, endpoint
OpenAPI/Swagger openapi.yaml, swagger.json YAML/JSON + path extraction endpoint, schema
Protocol Buffers .proto Regex message/service parser schema
JSON Schema *.schema.json JSON + $ref/$defs extraction schema
CSV/TSV .csv, .tsv Header row extraction table

Shell & Scripts (3)

Type Extensions Parser Node Types
Shell .sh, .bash, .zsh Regex function parser file, function
PowerShell .ps1, .psm1 LLM file, function
Batch .bat, .cmd LLM file

Markup (2)

Type Extensions Parser Node Types
HTML .html, .htm LLM (tag structure) document
CSS/SCSS/Less .css, .scss, .less LLM file

Schema Extensions

New Node Types (8)

Added to the existing file | function | class | module | concept:

Node Type Purpose Example
config Configuration files and key settings package.json, tsconfig.json, env vars
document Documentation, prose, guides README.md, API docs
service Deployable services/containers Docker containers, K8s Deployments
table Data tables, database objects SQL tables, CSV datasets
endpoint API routes, queries, mutations REST paths, GraphQL queries
pipeline CI/CD workflows, build steps GitHub Actions jobs, Makefile targets
schema Type definitions for data interchange Protobuf messages, JSON Schema
resource Infrastructure resources Terraform resources, K8s ConfigMaps

New Edge Types (8)

Added to the existing 18 edge types:

Edge Type Category Meaning Example
deploys Infrastructure Service deploys code Dockerfile -> app source
serves Infrastructure Service exposes endpoint K8s Service -> API endpoint
migrates Data flow Migration modifies table SQL migration -> table
documents Semantic Doc describes code README -> module
provisions Infrastructure IaC creates resource Terraform -> AWS resource
routes Behavioral Routes traffic to service nginx config -> service
defines_schema Data flow Defines data shape Protobuf -> endpoint
triggers Behavioral Triggers pipeline/action Git push -> GitHub Actions

Schema Validation Auto-Fix Aliases

New node type aliases:

  • container -> service, migration -> table, workflow -> pipeline
  • route -> endpoint, doc -> document, setting -> config, infra -> resource

New edge type aliases:

  • describes -> documents, creates -> provisions, exposes -> serves

Plugin Architecture Changes

Generalized AnalyzerPlugin Interface

interface AnalyzerPlugin {
  name: string;
  languages: string[];
  analyzeFile(filePath: string, content: string): StructuralAnalysis;
  resolveImports?(filePath: string, content: string): ImportResolution[];  // Now optional
  extractCallGraph?(filePath: string, content: string): CallGraphEntry[];
  extractReferences?(filePath: string, content: string): ReferenceResolution[];  // NEW
}

interface ReferenceResolution {
  source: string;      // File making the reference
  target: string;      // Referenced file or identifier
  type: string;        // Reference type: "file", "image", "schema", "service"
  line?: number;
}

Extended StructuralAnalysis

interface StructuralAnalysis {
  // Existing (unchanged)
  functions: FunctionInfo[];
  classes: ClassInfo[];
  imports: ImportInfo[];
  exports: ExportInfo[];
  // New (all optional for backward compat)
  sections?: SectionInfo[];      // Documents: headings, chapters
  definitions?: DefinitionInfo[]; // Schemas: types, messages, tables
  services?: ServiceInfo[];      // Infra: containers, deployments
  endpoints?: EndpointInfo[];    // APIs: routes, queries
  steps?: StepInfo[];            // Pipelines: jobs, stages, targets
  resources?: ResourceInfo[];    // IaC: terraform resources, K8s objects
}

Custom Parsers (12)

All lightweight — mostly regex-based, minimal dependencies:

Parser Implementation Extracts
MarkdownParser Regex Headings, links, code blocks, front matter
YAMLParser yaml npm Key hierarchy, anchors, multi-doc
JSONParser Built-in JSON.parse Key structure, $ref/$defs
TOMLParser @iarna/toml Section structure
EnvParser Regex Variable names and references
DockerfileParser Regex FROM stages, EXPOSE ports, COPY sources
SQLParser Regex CREATE TABLE/VIEW/INDEX, columns, foreign keys
GraphQLParser Regex Types, queries, mutations, subscriptions
ProtobufParser Regex Messages, services, enums, RPCs
TerraformParser Regex Resources, modules, variables, outputs
MakefileParser Regex Targets, dependencies, variables
ShellParser Regex Functions, sourced files

Agent Pipeline Changes

Project Scanner

  1. Scan ALL file types (remove code-only filter)
  2. Tag each file with category: code, config, docs, infra, data, script, markup
  3. Smart batch grouping: keep related files together (e.g., Dockerfile + docker-compose.yml)

File Analyzer

Type-aware prompt templates by category:

  • Code: Current behavior (functions, classes, imports, call graph)
  • Config: Extract key settings, what they configure, which code files they affect
  • Documentation: Extract sections, key concepts, which code components are documented
  • Infrastructure: Extract services, ports, volumes, dependencies, which code they deploy
  • Data/Schema: Extract tables, columns, types, relationships, which code consumes this data
  • Pipelines: Extract jobs, steps, triggers, which code/infra they build/deploy

Cross-Type Reference Resolution

Post-analysis step connecting:

  • Dockerfile COPY -> source code directories
  • CI config run: npm test -> test files
  • K8s manifest image: -> Dockerfile
  • SQL foreign keys -> other tables
  • OpenAPI $ref -> schema definitions
  • Markdown links -> referenced files

Architecture Analyzer

New pattern detection:

  • Deployment topology: Dockerfile -> compose -> K8s chain
  • Data flow: Schema -> migration -> API endpoint -> client code
  • Documentation coverage: which modules have docs vs. not
  • Configuration dependency: which config files affect which code paths

Tour Builder

Include non-code tour stops:

  • Project README overview
  • Dockerfile containerization
  • SQL migration database schema
  • CI/CD pipeline explanation

Dashboard Visualization

New Node Visual Styles

Node Type Shape Color Icon
config Rounded rect Teal (#5eead4) Gear
document Rounded rect Sky blue (#7dd3fc) Document
service Hexagon Violet (#a78bfa) Container/Box
table Rectangle Emerald (#6ee7b7) Grid
endpoint Pill/Stadium Orange (#fdba74) Arrow-right
pipeline Rounded rect Rose (#fda4af) Play/Workflow
schema Diamond Amber (#fcd34d) Blueprint
resource Cloud shape Indigo (#a5b4fc) Cloud

Graph Layout

  1. Layer grouping by category — non-code nodes cluster separately from code nodes
  2. Legend update with 8 new node types
  3. Filter controls — checkboxes to show/hide each file category

Sidebar Enhancements

NodeInfo panel updates per node type:

  • Config: key-value pairs, referencing code files
  • Document: heading outline, linked code components
  • Service: ports, volumes, dependencies, deployed code
  • Table: columns, types, foreign key relationships
  • Endpoint: HTTP method, path, request/response schema
  • Pipeline: jobs, triggers, deployed targets
  • Schema: fields, nested types, consumers
  • Resource: provider, type, dependencies

ProjectOverview panel: add "File Types" breakdown (code vs. non-code distribution).

New Dependencies

  • yaml — YAML parsing (already common, ~50KB)
  • @iarna/toml — TOML parsing (~30KB)
  • jsonc-parser — JSON with comments (~20KB)

No tree-sitter WASM additions. All other parsers are regex-based with zero dependencies.

Backward Compatibility

  • All new StructuralAnalysis fields are optional
  • resolveImports becomes optional on AnalyzerPlugin
  • Existing LanguageConfig entries unchanged
  • Schema validation auto-fixes new type aliases
  • Existing knowledge graphs remain valid (new types are additive)