Add under-anything knowledge dashboard
This commit is contained in:
@@ -0,0 +1,249 @@
|
||||
# Language-Agnostic Support Design
|
||||
|
||||
**Date:** 2026-03-21
|
||||
**Status:** Approved
|
||||
**Issue:** Make Understand-Anything codebase-aware and language-agnostic instead of TypeScript-heavy
|
||||
|
||||
## Problem
|
||||
|
||||
The tool's agent prompts, tree-sitter plugin, and language lesson system are heavily biased toward TypeScript/JavaScript. Non-TS codebases get degraded analysis because:
|
||||
|
||||
1. Agent prompts use TS-specific examples and concepts (e.g., "barrel files", "type guards", "generics")
|
||||
2. Tree-sitter plugin only ships TS/JS grammar support — structural analysis silently fails for other languages
|
||||
3. Language lesson detection hardcodes TS-specific concept patterns and display names
|
||||
|
||||
The architecture (PluginRegistry, GraphBuilder, dashboard, search) is already language-neutral. The bias is in shipped content, not the framework.
|
||||
|
||||
## Decisions
|
||||
|
||||
- **Scope:** All three layers — prompts, tree-sitter plugins, language framework
|
||||
- **Languages (v1):** TypeScript, JavaScript, Python, Go, Java, Rust, C/C++, C#, Ruby, PHP, Swift, Kotlin
|
||||
- **Architecture:** Config-first with code escape hatch (hybrid)
|
||||
- **Prompt strategy:** Base prompt + per-language markdown snippet files in a `languages/` folder
|
||||
- **Config location:** Prompt snippets in `skills/understand/languages/`, tree-sitter configs in `packages/core/src/languages/`
|
||||
- **Multi-language projects:** Per-file language analysis + project-level multi-language summary
|
||||
- **Language detection:** Auto-detect from file extensions only (no manual override for v1)
|
||||
|
||||
## Design
|
||||
|
||||
### 1. LanguageConfig Type & Registry
|
||||
|
||||
#### LanguageConfig Interface
|
||||
|
||||
```typescript
|
||||
// packages/core/src/languages/types.ts
|
||||
interface LanguageConfig {
|
||||
id: string; // e.g., "python"
|
||||
displayName: string; // e.g., "Python"
|
||||
extensions: string[]; // e.g., [".py", ".pyi"]
|
||||
treeSitter: {
|
||||
grammarPackage: string; // npm package name
|
||||
nodeTypes: {
|
||||
function: string[]; // e.g., ["function_definition"]
|
||||
class: string[]; // e.g., ["class_definition"]
|
||||
import: string[]; // e.g., ["import_statement", "import_from_statement"]
|
||||
export: string[]; // e.g., ["export_statement"] or [] for languages without exports
|
||||
typeAnnotation: string[]; // e.g., ["type"] for Python type hints
|
||||
};
|
||||
};
|
||||
concepts: string[]; // e.g., ["decorators", "list comprehensions", "generators"]
|
||||
filePatterns?: Record<string, string>; // special files, e.g., {"config": "pyproject.toml"}
|
||||
customAnalyzer?: (node: SyntaxNode) => AnalysisResult; // escape hatch for unusual AST shapes
|
||||
}
|
||||
```
|
||||
|
||||
#### Language Registry
|
||||
|
||||
```typescript
|
||||
// packages/core/src/languages/registry.ts
|
||||
class LanguageRegistry {
|
||||
private configs: Map<string, LanguageConfig>;
|
||||
|
||||
register(config: LanguageConfig): void;
|
||||
getByExtension(ext: string): LanguageConfig | null;
|
||||
getById(id: string): LanguageConfig;
|
||||
getAll(): LanguageConfig[];
|
||||
}
|
||||
```
|
||||
|
||||
#### File Structure
|
||||
|
||||
```
|
||||
packages/core/src/languages/
|
||||
├── types.ts
|
||||
├── registry.ts
|
||||
├── index.ts
|
||||
├── configs/
|
||||
│ ├── typescript.ts
|
||||
│ ├── javascript.ts
|
||||
│ ├── python.ts
|
||||
│ ├── go.ts
|
||||
│ ├── java.ts
|
||||
│ ├── rust.ts
|
||||
│ ├── cpp.ts
|
||||
│ ├── csharp.ts
|
||||
│ ├── ruby.ts
|
||||
│ ├── php.ts
|
||||
│ ├── swift.ts
|
||||
│ └── kotlin.ts
|
||||
```
|
||||
|
||||
All built-in configs auto-registered on import.
|
||||
|
||||
### 2. GenericTreeSitterPlugin
|
||||
|
||||
Replaces the current TS-only `TreeSitterPlugin` with a config-driven version.
|
||||
|
||||
```typescript
|
||||
// packages/core/src/plugins/generic-tree-sitter-plugin.ts
|
||||
class GenericTreeSitterPlugin implements AnalyzerPlugin {
|
||||
private registry: LanguageRegistry;
|
||||
|
||||
canAnalyze(filePath: string): boolean {
|
||||
return this.registry.getByExtension(path.extname(filePath)) !== null;
|
||||
}
|
||||
|
||||
async analyzeFile(filePath: string, content: string): Promise<FileAnalysis> {
|
||||
const config = this.registry.getByExtension(path.extname(filePath));
|
||||
|
||||
// Custom analyzer escape hatch
|
||||
if (config.customAnalyzer) {
|
||||
return config.customAnalyzer(tree.rootNode);
|
||||
}
|
||||
|
||||
// Generic extraction driven by config.treeSitter.nodeTypes
|
||||
const functions = this.extractNodes(tree, config.treeSitter.nodeTypes.function);
|
||||
const classes = this.extractNodes(tree, config.treeSitter.nodeTypes.class);
|
||||
const imports = this.extractNodes(tree, config.treeSitter.nodeTypes.import);
|
||||
const exports = this.extractNodes(tree, config.treeSitter.nodeTypes.export);
|
||||
// ...
|
||||
}
|
||||
|
||||
private extractNodes(tree: Tree, nodeTypes: string[]): NodeInfo[] {
|
||||
// Walk AST, collect all nodes matching any of the given types
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
#### Migration
|
||||
|
||||
- Current `TreeSitterPlugin` deleted, replaced by `GenericTreeSitterPlugin` + TS/JS configs
|
||||
- `PluginRegistry` unchanged
|
||||
- Existing tests updated to use new plugin
|
||||
|
||||
#### WASM Grammar Loading
|
||||
|
||||
- Each grammar loaded lazily on first use and cached
|
||||
- WASM files bundled in `packages/core/src/languages/grammars/` or fetched from tree-sitter's official WASM builds
|
||||
|
||||
### 3. Language-Aware Prompts
|
||||
|
||||
#### File Structure
|
||||
|
||||
```
|
||||
skills/understand/
|
||||
├── file-analyzer-prompt.md # Base prompt (language-neutral)
|
||||
├── tour-builder-prompt.md
|
||||
├── project-scanner-prompt.md
|
||||
├── languages/
|
||||
│ ├── typescript.md
|
||||
│ ├── javascript.md
|
||||
│ ├── python.md
|
||||
│ ├── go.md
|
||||
│ ├── java.md
|
||||
│ ├── rust.md
|
||||
│ ├── cpp.md
|
||||
│ ├── csharp.md
|
||||
│ ├── ruby.md
|
||||
│ ├── php.md
|
||||
│ ├── swift.md
|
||||
│ └── kotlin.md
|
||||
```
|
||||
|
||||
#### Base Prompt Changes
|
||||
|
||||
All TS-specific examples removed from base prompts. Replaced with injection point:
|
||||
|
||||
```markdown
|
||||
## Language-Specific Guidance
|
||||
|
||||
{{LANGUAGE_CONTEXT}}
|
||||
```
|
||||
|
||||
#### Language Markdown Format
|
||||
|
||||
Each language file contains:
|
||||
|
||||
```markdown
|
||||
# Python
|
||||
|
||||
## Key Concepts
|
||||
- Decorators, comprehensions, generators, context managers, type hints, dunder methods
|
||||
|
||||
## Import Patterns
|
||||
- `import module`, `from module import name`, relative imports
|
||||
|
||||
## Notable File Patterns
|
||||
- `__init__.py` (package initializer), `conftest.py` (pytest), `pyproject.toml` (config)
|
||||
|
||||
## Example Summary Style
|
||||
> "FastAPI route handler that accepts a Pydantic model, validates input..."
|
||||
```
|
||||
|
||||
#### Injection Logic
|
||||
|
||||
1. Project scanner detects languages present in the codebase
|
||||
2. File-analyzer: inject matching language `.md` for that file's language
|
||||
3. Tour-builder: inject all detected languages' `.md` files
|
||||
4. Project-scanner: inject all detected languages' key concepts for project-level summary
|
||||
|
||||
#### Multi-Language Projects
|
||||
|
||||
Project-scanner prompt gets a combined section listing all detected languages with their key concepts.
|
||||
|
||||
### 4. Language Lesson Updates
|
||||
|
||||
- Delete `LANGUAGE_DISPLAY_NAMES` — use `LanguageRegistry.getById(id).displayName`
|
||||
- Delete hardcoded concept patterns — use `LanguageConfig.concepts` from registry
|
||||
- Language lesson generation becomes config-driven
|
||||
|
||||
### 5. Testing Strategy
|
||||
|
||||
#### Unit Tests
|
||||
|
||||
1. **LanguageConfig validation** — Each config has all required fields, non-empty nodeTypes
|
||||
2. **LanguageRegistry** — Registration, lookup by extension/id, duplicate handling
|
||||
3. **GenericTreeSitterPlugin per language** — Small fixture file per language verifying function/class/import extraction
|
||||
4. **Language lesson generation** — Concepts sourced from config
|
||||
|
||||
#### Integration Tests
|
||||
|
||||
5. **Multi-language project** — Mixed TS + Python fixture, verify graph contains nodes from both languages
|
||||
6. **Prompt injection** — Correct language `.md` injected based on detected language
|
||||
|
||||
#### Migration Tests
|
||||
|
||||
- Current tree-sitter-plugin tests rewritten for GenericTreeSitterPlugin with TS config
|
||||
- Must produce identical results to validate non-breaking migration
|
||||
|
||||
### 6. Error Handling & Graceful Degradation
|
||||
|
||||
#### Key Principle
|
||||
|
||||
**Every file always gets analyzed.** Tree-sitter is an enhancement, not a gate. The LLM is the primary analyzer; structural analysis enriches it.
|
||||
|
||||
#### Unknown Language
|
||||
|
||||
- Tree-sitter skipped (returns `null`)
|
||||
- LLM analysis still runs — file gets summary, tags, graph node
|
||||
- Debug log: `"No language config for .xyz, skipping structural analysis"`
|
||||
|
||||
#### Missing WASM Grammar
|
||||
|
||||
- Warning logged, that language degrades to LLM-only
|
||||
- Other languages unaffected
|
||||
|
||||
#### Malformed Language Config
|
||||
|
||||
- Validated at registration time via Zod schema
|
||||
- Invalid config throws at startup — fail fast
|
||||
Reference in New Issue
Block a user