Add under-anything knowledge dashboard
This commit is contained in:
@@ -0,0 +1,266 @@
|
||||
# Understand Anything: Universal File Type Support
|
||||
|
||||
**Date**: 2026-03-28
|
||||
**Status**: Approved
|
||||
**Approach**: Big Bang — all file types in one release
|
||||
|
||||
## Goals
|
||||
|
||||
1. Extend Understand Anything to analyze **any** file type, not just code
|
||||
2. Support both holistic project enrichment (non-code files enrich code graphs) and standalone analysis (docs-only repos, SQL schema collections, IaC projects)
|
||||
3. Maintain backward compatibility with existing code-only analysis
|
||||
|
||||
## Supported File Types (26 new)
|
||||
|
||||
### Documentation (3)
|
||||
|
||||
| Type | Extensions | Parser | Node Types |
|
||||
|------|-----------|--------|------------|
|
||||
| Markdown | `.md`, `.mdx` | LLM + regex heading extraction | `document` |
|
||||
| reStructuredText | `.rst` | LLM | `document` |
|
||||
| Plain text | `.txt` | LLM | `document` |
|
||||
|
||||
### Configuration (5)
|
||||
|
||||
| Type | Extensions | Parser | Node Types |
|
||||
|------|-----------|--------|------------|
|
||||
| YAML | `.yaml`, `.yml` | `yaml` npm package | `config` |
|
||||
| JSON | `.json`, `.jsonc` | `JSON.parse` / `jsonc-parser` | `config`, `schema` |
|
||||
| TOML | `.toml` | `@iarna/toml` or similar | `config` |
|
||||
| .env | `.env`, `.env.*` | Regex line parser | `config` |
|
||||
| XML | `.xml` | LLM (optionally `fast-xml-parser`) | `config` |
|
||||
|
||||
### Infrastructure & DevOps (7)
|
||||
|
||||
| Type | Extensions | Parser | Node Types |
|
||||
|------|-----------|--------|------------|
|
||||
| Dockerfile | `Dockerfile`, `Dockerfile.*`, `.dockerfile` | Custom instruction parser | `service`, `pipeline` |
|
||||
| Docker Compose | `docker-compose.yml`, `compose.yml` | YAML parser + service extraction | `service` |
|
||||
| Terraform | `.tf`, `.tfvars` | Regex block parser | `resource` |
|
||||
| Kubernetes | K8s YAML (detected by `apiVersion` field) | YAML + kind detection | `service`, `resource` |
|
||||
| GitHub Actions | `.github/workflows/*.yml` | YAML + job/step extraction | `pipeline` |
|
||||
| Jenkinsfile | `Jenkinsfile` | LLM (Groovy DSL) | `pipeline` |
|
||||
| Makefile | `Makefile`, `*.mk` | Regex target parser | `pipeline` |
|
||||
|
||||
### Data & Schema (6)
|
||||
|
||||
| Type | Extensions | Parser | Node Types |
|
||||
|------|-----------|--------|------------|
|
||||
| SQL | `.sql` | Simple DDL parser | `table`, `endpoint` |
|
||||
| GraphQL | `.graphql`, `.gql` | Regex type/query parser | `schema`, `endpoint` |
|
||||
| OpenAPI/Swagger | `openapi.yaml`, `swagger.json` | YAML/JSON + path extraction | `endpoint`, `schema` |
|
||||
| Protocol Buffers | `.proto` | Regex message/service parser | `schema` |
|
||||
| JSON Schema | `*.schema.json` | JSON + `$ref`/`$defs` extraction | `schema` |
|
||||
| CSV/TSV | `.csv`, `.tsv` | Header row extraction | `table` |
|
||||
|
||||
### Shell & Scripts (3)
|
||||
|
||||
| Type | Extensions | Parser | Node Types |
|
||||
|------|-----------|--------|------------|
|
||||
| Shell | `.sh`, `.bash`, `.zsh` | Regex function parser | `file`, `function` |
|
||||
| PowerShell | `.ps1`, `.psm1` | LLM | `file`, `function` |
|
||||
| Batch | `.bat`, `.cmd` | LLM | `file` |
|
||||
|
||||
### Markup (2)
|
||||
|
||||
| Type | Extensions | Parser | Node Types |
|
||||
|------|-----------|--------|------------|
|
||||
| HTML | `.html`, `.htm` | LLM (tag structure) | `document` |
|
||||
| CSS/SCSS/Less | `.css`, `.scss`, `.less` | LLM | `file` |
|
||||
|
||||
## Schema Extensions
|
||||
|
||||
### New Node Types (8)
|
||||
|
||||
Added to the existing `file | function | class | module | concept`:
|
||||
|
||||
| Node Type | Purpose | Example |
|
||||
|-----------|---------|---------|
|
||||
| `config` | Configuration files and key settings | `package.json`, `tsconfig.json`, env vars |
|
||||
| `document` | Documentation, prose, guides | `README.md`, API docs |
|
||||
| `service` | Deployable services/containers | Docker containers, K8s Deployments |
|
||||
| `table` | Data tables, database objects | SQL tables, CSV datasets |
|
||||
| `endpoint` | API routes, queries, mutations | REST paths, GraphQL queries |
|
||||
| `pipeline` | CI/CD workflows, build steps | GitHub Actions jobs, Makefile targets |
|
||||
| `schema` | Type definitions for data interchange | Protobuf messages, JSON Schema |
|
||||
| `resource` | Infrastructure resources | Terraform resources, K8s ConfigMaps |
|
||||
|
||||
### New Edge Types (8)
|
||||
|
||||
Added to the existing 18 edge types:
|
||||
|
||||
| Edge Type | Category | Meaning | Example |
|
||||
|-----------|----------|---------|---------|
|
||||
| `deploys` | Infrastructure | Service deploys code | Dockerfile -> app source |
|
||||
| `serves` | Infrastructure | Service exposes endpoint | K8s Service -> API endpoint |
|
||||
| `migrates` | Data flow | Migration modifies table | SQL migration -> table |
|
||||
| `documents` | Semantic | Doc describes code | README -> module |
|
||||
| `provisions` | Infrastructure | IaC creates resource | Terraform -> AWS resource |
|
||||
| `routes` | Behavioral | Routes traffic to service | nginx config -> service |
|
||||
| `defines_schema` | Data flow | Defines data shape | Protobuf -> endpoint |
|
||||
| `triggers` | Behavioral | Triggers pipeline/action | Git push -> GitHub Actions |
|
||||
|
||||
### Schema Validation Auto-Fix Aliases
|
||||
|
||||
New node type aliases:
|
||||
- `container` -> `service`, `migration` -> `table`, `workflow` -> `pipeline`
|
||||
- `route` -> `endpoint`, `doc` -> `document`, `setting` -> `config`, `infra` -> `resource`
|
||||
|
||||
New edge type aliases:
|
||||
- `describes` -> `documents`, `creates` -> `provisions`, `exposes` -> `serves`
|
||||
|
||||
## Plugin Architecture Changes
|
||||
|
||||
### Generalized AnalyzerPlugin Interface
|
||||
|
||||
```typescript
|
||||
interface AnalyzerPlugin {
|
||||
name: string;
|
||||
languages: string[];
|
||||
analyzeFile(filePath: string, content: string): StructuralAnalysis;
|
||||
resolveImports?(filePath: string, content: string): ImportResolution[]; // Now optional
|
||||
extractCallGraph?(filePath: string, content: string): CallGraphEntry[];
|
||||
extractReferences?(filePath: string, content: string): ReferenceResolution[]; // NEW
|
||||
}
|
||||
|
||||
interface ReferenceResolution {
|
||||
source: string; // File making the reference
|
||||
target: string; // Referenced file or identifier
|
||||
type: string; // Reference type: "file", "image", "schema", "service"
|
||||
line?: number;
|
||||
}
|
||||
```
|
||||
|
||||
### Extended StructuralAnalysis
|
||||
|
||||
```typescript
|
||||
interface StructuralAnalysis {
|
||||
// Existing (unchanged)
|
||||
functions: FunctionInfo[];
|
||||
classes: ClassInfo[];
|
||||
imports: ImportInfo[];
|
||||
exports: ExportInfo[];
|
||||
// New (all optional for backward compat)
|
||||
sections?: SectionInfo[]; // Documents: headings, chapters
|
||||
definitions?: DefinitionInfo[]; // Schemas: types, messages, tables
|
||||
services?: ServiceInfo[]; // Infra: containers, deployments
|
||||
endpoints?: EndpointInfo[]; // APIs: routes, queries
|
||||
steps?: StepInfo[]; // Pipelines: jobs, stages, targets
|
||||
resources?: ResourceInfo[]; // IaC: terraform resources, K8s objects
|
||||
}
|
||||
```
|
||||
|
||||
### Custom Parsers (12)
|
||||
|
||||
All lightweight — mostly regex-based, minimal dependencies:
|
||||
|
||||
| Parser | Implementation | Extracts |
|
||||
|--------|---------------|----------|
|
||||
| `MarkdownParser` | Regex | Headings, links, code blocks, front matter |
|
||||
| `YAMLParser` | `yaml` npm | Key hierarchy, anchors, multi-doc |
|
||||
| `JSONParser` | Built-in `JSON.parse` | Key structure, `$ref`/`$defs` |
|
||||
| `TOMLParser` | `@iarna/toml` | Section structure |
|
||||
| `EnvParser` | Regex | Variable names and references |
|
||||
| `DockerfileParser` | Regex | FROM stages, EXPOSE ports, COPY sources |
|
||||
| `SQLParser` | Regex | CREATE TABLE/VIEW/INDEX, columns, foreign keys |
|
||||
| `GraphQLParser` | Regex | Types, queries, mutations, subscriptions |
|
||||
| `ProtobufParser` | Regex | Messages, services, enums, RPCs |
|
||||
| `TerraformParser` | Regex | Resources, modules, variables, outputs |
|
||||
| `MakefileParser` | Regex | Targets, dependencies, variables |
|
||||
| `ShellParser` | Regex | Functions, sourced files |
|
||||
|
||||
## Agent Pipeline Changes
|
||||
|
||||
### Project Scanner
|
||||
|
||||
1. Scan ALL file types (remove code-only filter)
|
||||
2. Tag each file with category: `code`, `config`, `docs`, `infra`, `data`, `script`, `markup`
|
||||
3. Smart batch grouping: keep related files together (e.g., Dockerfile + docker-compose.yml)
|
||||
|
||||
### File Analyzer
|
||||
|
||||
Type-aware prompt templates by category:
|
||||
|
||||
- **Code**: Current behavior (functions, classes, imports, call graph)
|
||||
- **Config**: Extract key settings, what they configure, which code files they affect
|
||||
- **Documentation**: Extract sections, key concepts, which code components are documented
|
||||
- **Infrastructure**: Extract services, ports, volumes, dependencies, which code they deploy
|
||||
- **Data/Schema**: Extract tables, columns, types, relationships, which code consumes this data
|
||||
- **Pipelines**: Extract jobs, steps, triggers, which code/infra they build/deploy
|
||||
|
||||
### Cross-Type Reference Resolution
|
||||
|
||||
Post-analysis step connecting:
|
||||
- Dockerfile `COPY` -> source code directories
|
||||
- CI config `run: npm test` -> test files
|
||||
- K8s manifest `image:` -> Dockerfile
|
||||
- SQL foreign keys -> other tables
|
||||
- OpenAPI `$ref` -> schema definitions
|
||||
- Markdown links -> referenced files
|
||||
|
||||
### Architecture Analyzer
|
||||
|
||||
New pattern detection:
|
||||
- Deployment topology: Dockerfile -> compose -> K8s chain
|
||||
- Data flow: Schema -> migration -> API endpoint -> client code
|
||||
- Documentation coverage: which modules have docs vs. not
|
||||
- Configuration dependency: which config files affect which code paths
|
||||
|
||||
### Tour Builder
|
||||
|
||||
Include non-code tour stops:
|
||||
- Project README overview
|
||||
- Dockerfile containerization
|
||||
- SQL migration database schema
|
||||
- CI/CD pipeline explanation
|
||||
|
||||
## Dashboard Visualization
|
||||
|
||||
### New Node Visual Styles
|
||||
|
||||
| Node Type | Shape | Color | Icon |
|
||||
|-----------|-------|-------|------|
|
||||
| `config` | Rounded rect | Teal (#5eead4) | Gear |
|
||||
| `document` | Rounded rect | Sky blue (#7dd3fc) | Document |
|
||||
| `service` | Hexagon | Violet (#a78bfa) | Container/Box |
|
||||
| `table` | Rectangle | Emerald (#6ee7b7) | Grid |
|
||||
| `endpoint` | Pill/Stadium | Orange (#fdba74) | Arrow-right |
|
||||
| `pipeline` | Rounded rect | Rose (#fda4af) | Play/Workflow |
|
||||
| `schema` | Diamond | Amber (#fcd34d) | Blueprint |
|
||||
| `resource` | Cloud shape | Indigo (#a5b4fc) | Cloud |
|
||||
|
||||
### Graph Layout
|
||||
|
||||
1. Layer grouping by category — non-code nodes cluster separately from code nodes
|
||||
2. Legend update with 8 new node types
|
||||
3. Filter controls — checkboxes to show/hide each file category
|
||||
|
||||
### Sidebar Enhancements
|
||||
|
||||
NodeInfo panel updates per node type:
|
||||
- **Config**: key-value pairs, referencing code files
|
||||
- **Document**: heading outline, linked code components
|
||||
- **Service**: ports, volumes, dependencies, deployed code
|
||||
- **Table**: columns, types, foreign key relationships
|
||||
- **Endpoint**: HTTP method, path, request/response schema
|
||||
- **Pipeline**: jobs, triggers, deployed targets
|
||||
- **Schema**: fields, nested types, consumers
|
||||
- **Resource**: provider, type, dependencies
|
||||
|
||||
ProjectOverview panel: add "File Types" breakdown (code vs. non-code distribution).
|
||||
|
||||
## New Dependencies
|
||||
|
||||
- `yaml` — YAML parsing (already common, ~50KB)
|
||||
- `@iarna/toml` — TOML parsing (~30KB)
|
||||
- `jsonc-parser` — JSON with comments (~20KB)
|
||||
|
||||
No tree-sitter WASM additions. All other parsers are regex-based with zero dependencies.
|
||||
|
||||
## Backward Compatibility
|
||||
|
||||
- All new `StructuralAnalysis` fields are optional
|
||||
- `resolveImports` becomes optional on `AnalyzerPlugin`
|
||||
- Existing `LanguageConfig` entries unchanged
|
||||
- Schema validation auto-fixes new type aliases
|
||||
- Existing knowledge graphs remain valid (new types are additive)
|
||||
Reference in New Issue
Block a user