20 KiB
Dashboard Graph Layout Scaling — Design
Problem
When a structural-graph layer contains many nodes, the current applyDagreLayout (TB direction) places same-rank nodes in a single horizontal row. With 50+ nodes per rank, the row stretches into thousands of pixels and the view becomes unreadable: nodes shrink, labels disappear, edges tangle, and there are no visual anchors to orient the reader.
This design replaces dagre with ELK across all structural-style views, introduces folder/community-based containers for the layer-detail view, and computes layout in two lazy stages — a single-pass over containers, then per-container child layout on demand.
The graph schema and pipeline output (graph.json) are unchanged. All improvements derive from existing data.
Goals
- Eliminate horizontal sprawl in layer-detail views at ≤100 nodes per layer (current target), and remain workable up to 1000+ nodes (future scaling).
- Give each layer-detail view explicit visual anchors so structure is readable at a glance.
- Aggregate cross-cluster edges by default; surface individual edges on demand.
- Keep visual style continuous with the existing layer-cluster (overview-level) presentation.
- Treat layout failures with the same
GraphIssuemodel already used for schema validation.
Non-Goals
- No regeneration of
graph.json. All grouping is derived client-side. - No change to KnowledgeGraphView (already force-directed; out of scope).
- No multi-level container nesting (single depth only in v1).
- No remote error reporting (Sentry-style) — open-source plugin, no default telemetry.
- No persona-specific grouping behavior beyond the existing node-type filter.
Scope
Three views are affected:
| View | Change |
|---|---|
| Overview (layer clusters) | Replace dagre → ELK. No new grouping (layers are already groups). |
| DomainGraphView | Replace dagre → ELK with domain-as-parent of flow/step. |
| Layer-detail | Replace dagre → ELK + new folder/community containers + edge aggregation + lazy two-stage layout. |
KnowledgeGraphView remains on applyForceLayout and is not touched.
§1. Architecture
existing graph (immutable)
│
▼
deriveContainers(nodes, edges) // §2 — folder strategy with community fallback
│
▼
buildCompoundGraph() // §4 — aggregate inter-container edges, keep intra-container
│
▼
runStage1Layout(containers, aggEdges) // §6 — ELK on containers only; uses size memory
│
▼ ┌──────────────────────────────┐
│ │ render: containers laid │
│ │ out, children unrendered │
│ └──────────────────────────────┘
│
│ triggered by: click | zoom > 1.0 | search/focus/tour hit child
▼
runStage2Layout(container) // §6 — ELK on one container's children; cached
│
▼
React Flow render (parentId for parent-child) + visual overlay (selection/diff/search/tour)
Two invariants preserved from current code:
- Layout computation is pure and memoized. It only re-runs when graph topology / persona / diff / focus / nodeTypeFilters change.
- Visual state is a separate O(n) overlay pass. Selection, search highlight, tour highlight, hover do not trigger relayout.
This matches the existing useLayerDetailTopology / useLayerDetailGraph split in GraphView.tsx.
§2. Container Derivation (Layer-Detail Only)
2.1 Folder strategy (default)
- Collect every node's
filePathin the layer. - Compute longest common prefix (LCP) across all paths and strip it.
- Group by the first path segment after the LCP.
auth/login.go→ containerauthauth/handlers/oauth.go→ containerauthcart/cart.go→ containercart
- Single-depth grouping only; no recursive nesting in v1.
- Nodes with no
filePath(e.g.concepttype) → container~(rendered as(root), dimmed).
2.2 Community fallback (Louvain)
Triggered when any of:
- All nodes share the same single folder after LCP stripping.
- Bucket count (folders + rooted)
< 2. - Any single bucket (folder or rooted) holds
> 70%of nodes.
Run Louvain modularity-based community detection on the layer's internal edges. Each community becomes a container. Names are placeholders (Cluster A, Cluster B, ...) since no semantic name is available.
Implementation: use graphology + graphology-communities-louvain (~30KB total). Pure JS, no native deps, runs on main thread synchronously for layer-internal edges.
2.3 Edge cases
| Case | Behavior |
|---|---|
| Container has 1 child (only when layer total ≥ 3) | No container box rendered; child becomes a top-level node in Stage 1 layout |
| Container has 2 children | Container rendered; label dimmed |
All nodes lack filePath |
All go to ~ container; if it would become single-child, fall back to flat |
2.4 Function signature
function deriveContainers(
nodes: GraphNode[],
edges: GraphEdge[],
): {
containers: Array<{
id: string; // e.g. "container:auth" or "container:cluster-0"
name: string; // "auth" or "Cluster A"
nodeIds: string[];
strategy: "folder" | "community";
}>;
ungrouped: string[]; // nodes that bypass containerization
};
The strategy field is exposed in the UI ("Grouped by folder" vs "Grouped by edge density") so the user knows how a particular layer was organized.
§3. ELK Integration
3.1 Package
elkjs^0.9 (~250KB gzipped). Useelk.bundled.js, not the worker variant.- Promise-based API. Runs on main thread for graphs ≤500 nodes; <100ms typical.
3.2 Configuration
{
algorithm: "layered",
"elk.direction": "DOWN", // matches dagre TB
"elk.layered.spacing.nodeNodeBetweenLayers": 80,
"elk.spacing.nodeNode": 60,
"elk.layered.crossingMinimization.strategy": "LAYER_SWEEP",
"elk.edgeRouting": "ORTHOGONAL",
"elk.layered.compaction.postCompaction.strategy": "LEFT",
"elk.padding": "[top=40,left=20,right=20,bottom=20]", // container internal padding
}
hierarchyHandling: INCLUDE_CHILDREN is not used — the two-stage approach (§6) issues separate ELK calls for top-level containers and per-container children, so a single compound graph is never assembled.
3.3 Per-view input shaping
| View | ELK input |
|---|---|
| Overview | Flat. Children = layer-cluster nodes. |
| DomainGraphView | Flat in v1 (domain stays as the only grouping; flow/step nodes positioned within). |
| Layer-detail Stage 1 | Flat. Children = containers (treated as opaque atoms). |
| Layer-detail Stage 2 | Flat per container. Children = files within. |
A single runElk(input): Promise<positioned> function services all four cases.
3.4 Boundaries with existing utils/layout.ts
| Function | Status |
|---|---|
applyDagreLayout |
Kept temporarily; removed in the version after layout migration is verified stable |
applyForceLayout |
Untouched (KnowledgeGraphView only) |
applyElkLayout (new) |
Wrapper that handles repair → ELK → result coercion |
3.5 Async + loading state
Stage 1 runs in a useEffect with cancellation on dependency change:
useEffect(() => {
let cancelled = false;
setLayoutStatus("computing");
applyElkLayout(input).then(result => {
if (!cancelled) {
setLayout(result);
setLayoutStatus("ready");
}
});
return () => { cancelled = true };
}, [graph, activeLayerId, persona, diffMode, nodeTypeFilters]);
While layoutStatus === "computing", render a "Computing layout…" overlay (semi-transparent, centered). Stale layout from the previous state is kept underneath so the viewport doesn't blink.
3.6 Failure handling — reuses existing GraphIssue model
Before invoking ELK, run repairElkInput() over the assembled input. Each repair emits a GraphIssue consumed by the existing WarningBanner.
| Repair function | Triggered by | Issue level |
|---|---|---|
ensureNodeDimensions |
Node missing width/height | auto-corrected |
dedupeNodeIds |
Duplicate child id under same parent | auto-corrected |
dropOrphanEdges |
Edge source/target not in node set | dropped |
dropOrphanChildren |
Child references a non-existent parent | dropped |
dropCircularContainment |
Container containment cycle | dropped |
If ELK still rejects after repair → emit a fatal GraphIssue, render an empty graph + the existing fatal banner. The fatal copy text is augmented with "this looks like a dashboard rendering bug — please file an issue with the copied error" so the user knows to direct the report at the dashboard, not the graph data.
3.7 Dev mode strict failures
Both repairElkInput and runElk accept a strict: boolean. In import.meta.env.DEV, strict is on — repairs and ELK errors throw immediately rather than producing graceful issues. This catches input-construction bugs during development before they ship as silent fallbacks.
§4. Edge Aggregation
4.1 Algorithm
Performed inside buildCompoundGraph(), before either ELK stage.
function aggregateContainerEdges(
nodes: GraphNode[],
edges: GraphEdge[],
nodeToContainer: Map<string, string>,
): {
intraContainer: Edge[]; // preserved as-is
interContainerAggregated: AggregatedEdge[]; // one per (sourceContainer, targetContainer)
};
Rules:
- For each edge, look up source/target containers.
- Same container → intra (unchanged).
- Different containers → bucket by
(sourceContainer, targetContainer). Direction matters: A→B and B→A are independent. - Each aggregated edge carries
countandtypes(set of edge types appearing in the bucket).
4.2 Visual
Reuse the styling pattern already in overview-level edge aggregation (GraphView.tsx line ~186):
strokeWidth: Math.min(1 + Math.log2(count + 1), 5)- Label: count number
- Color: existing
rgba(212,165,116,0.4)
4.3 Expand / collapse
State (zustand store):
expandedContainers: Set<string>; // currently expanded container ids
Triggers:
- Click container → toggle membership.
- Click empty canvas or
Esc→ clear all. - Multi-container expansion is allowed (user comparing two folders' relationships).
When a container is expanded:
- Its inter-container aggregated edges (both directions) are replaced with the underlying file→file individual edges.
- Other containers' aggregated edges remain aggregated.
- Position re-layout is not triggered. Only React Flow's edge array changes.
4.4 Interactions with persona / diff
- Persona filter changes
count(post-filter edges only). Aggregated edge re-derived in the memoized pipeline. - Diff mode: aggregated edge containing any changed node → red stroke + animated; on expand, individual edges follow normal diff styling.
§5. Container Visual
5.1 New component: ContainerNode
A new React Flow node type "container" registered alongside the existing custom / layer-cluster / portal.
It does not reuse LayerClusterNode because:
- Click semantics differ (
LayerClusterNodedrills into a layer;ContainerNodetoggles edge expansion). - Metadata differs (
ContainerNodedoes not carryaggregateComplexity).
Visual language is shared: rounded translucent box, gold border, DM Serif title.
5.2 Spec
| Element | Style |
|---|---|
| Border (default) | 1px solid rgba(212,165,116,0.25) |
| Border (hover / expanded) | 1.5px rgba(212,165,116,0.6), expanded adds chevron ▾ |
| Background | rgba(255,255,255,0.02) |
| Corner radius | 12px |
| Title | DM Serif, 14px, #d4a574, top-left padding 12px 16px |
| Child-count badge | top-right chip, #a39787, 11px |
| Internal padding (around children) | 40px top / 20px L,R,B |
5.3 Color coding
Container index modulo 12-color palette (same palette used for layerColorIndex in LayerClusterNode). Hue is applied at low saturation to border + title only — never to the body fill — so the palette doesn't overpower individual nodes inside.
5.4 State styles
| State | Visual |
|---|---|
default |
Base spec |
hover |
Brighter border, title underline |
expanded |
1.5px gold border + chevron ▾ |
search-hit-inside |
Search badge in title row showing match count |
diff-affected |
Border swaps to rgba(224,82,82,0.5) |
focused-via-child |
Same as expanded plus brightness boost |
5.5 Label source
| Strategy | Label |
|---|---|
folder |
First path segment after LCP (e.g. auth) |
community |
Cluster A, Cluster B, ... ordered by community id |
~ (root) |
(root) in dimmed style |
§6. Lazy Two-Stage Layout
6.1 State machine
[layer entered]
│
│ Stage 1: ELK on containers (always runs)
▼
[containers laid out, children unrendered]
│
├── click container ─────┐
├── zoom > 1.0 in viewport (200ms debounce, hysteresis) ─┤
└── search / focus / tour hit a child ─┘
▼
Stage 2 (per container)
│
▼
[container expanded, children laid out + rendered]
6.2 Store extensions
expandedContainers: Set<string>;
containerLayoutCache: Map<string, {
childPositions: Map<string, { x: number; y: number }>;
actualSize: { width: number; height: number };
}>;
containerSizeMemory: Map<string, { width: number; height: number }>;
containerLayoutCacheinvalidated by(graphHash, containerId).containerSizeMemorypersists across container collapses to prevent jitter on next expand.
6.3 Stage 1
async function runStage1Layout(containers, aggregatedInterEdges, sizeMemory) {
const elkInput = {
id: "root",
children: containers.map(c => ({
id: c.id,
width: sizeMemory.get(c.id)?.width
?? Math.sqrt(c.nodeIds.length) * NODE_WIDTH * 1.2,
height: sizeMemory.get(c.id)?.height
?? Math.sqrt(c.nodeIds.length) * NODE_HEIGHT * 1.2,
})),
edges: aggregatedInterEdges.map(toElkEdge),
};
return runElk(elkInput);
}
Container size is estimated from sqrt(childCount) so it grows sub-linearly with content. If memory has the actual size from a previous run, that wins.
6.4 Stage 2
async function runStage2Layout(container, intraEdges) {
if (containerLayoutCache.has(container.id)) {
return containerLayoutCache.get(container.id)!;
}
const elkInput = {
id: container.id,
children: container.nodeIds.map(toElkChild),
edges: intraEdges.filter(e => isWithin(container, e)).map(toElkEdge),
};
const result = await runElk(elkInput);
containerLayoutCache.set(container.id, result);
containerSizeMemory.set(container.id, result.actualSize);
return result;
}
If result.actualSize differs from the Stage 1 estimate by > 20% in either dimension, trigger a Stage 1 re-layout (full re-run; <100ms at this scale, so the user perceives a small reflow rather than two distinct layouts).
6.5 Auto-expand triggers
| Trigger | Implementation |
|---|---|
| Click | onClick toggles expandedContainers |
| Zoom | React Flow onMove listener (200ms debounce). When viewport zoom > 1.0, all containers in viewport added to expandedContainers. Hysteresis: containers don't auto-collapse until zoom < 0.6, preventing flapping. |
| Search / focus / tour | useEffect watches searchResults / focusNodeId / tourHighlightedNodeIds; finds the parent container of any matched leaf node and adds to expandedContainers |
6.6 Performance budget
| Operation | Target |
|---|---|
| Stage 1 (any layer) | < 100ms |
| Stage 2 (first expand of a container) | < 100ms |
| Stage 2 (cache hit) | < 5ms |
| Zoom-driven auto-expand | 200ms debounce |
| Stage 1 re-layout after >20% deviation | < 100ms (re-uses Stage 1 path) |
§7. Interaction Matrix
| Existing feature | Behavior with new layout |
|---|---|
| Persona filter | Drives nodeTypeFilters dependency in Stage 1 memo. Filtered-out nodes don't enter container derivation; containers with all-filtered children disappear. |
| Diff mode | Container with a changed child gets red border (§5.4); aggregated edges containing a changed node animate red; on expand, individual diff styling applies. |
| Focus mode (1-hop) | Focus node's container auto-expands. Non-neighbor containers fade to opacity 0.2; their children remain unrendered. |
| Search | Container with a hit gets search badge in title; container does not auto-expand to avoid expanding many at once. Clicking the badge expands and fitViews. |
| Tour | Tour-highlighted child auto-expands its container. TourFitView fits to the highlighted leaf positions (cached after expand). |
Drill-in (overview → layer-detail) |
Unchanged. After drill-in, Stage 1 runs on the new layer's containers. |
| Breadcrumb | Containers do not enter the breadcrumb. Path remains Project > LAYER. |
| Code viewer | Unchanged. Click a file node inside a container → existing slide-up viewer. |
| WarningBanner | Layout repair issues feed the same banner. Fatal copy text augmented to differentiate render bugs from data bugs. |
| Export (PNG/SVG) | Captures current state including expanded containers. Filename includes layer name. |
§8. Files & Test Plan
8.1 Files
packages/dashboard/src/
├── utils/
│ ├── layout.ts [modify] add applyElkLayout export
│ ├── elk-layout.ts [new] runElk + repairElkInput + GraphIssue mapping
│ ├── containers.ts [new] deriveContainers (folder + community fallback)
│ ├── louvain.ts [new] thin wrapper around graphology-communities-louvain
│ └── edgeAggregation.ts [modify] add aggregateContainerEdges
├── components/
│ ├── ContainerNode.tsx [new] container box visual
│ ├── GraphView.tsx [modify] Stage 1 / Stage 2 wiring, expand state, auto-expand triggers
│ └── DomainGraphView.tsx [modify] dagre → ELK
├── store.ts [modify] expandedContainers, containerLayoutCache, containerSizeMemory
└── package.json [modify] add elkjs ^0.9, graphology, graphology-communities-louvain
8.2 Test matrix
| Type | Target | Cases |
|---|---|---|
| Unit | deriveContainers |
folder grouping happy path; all-in-root fallback; <2 buckets fallback; >70% concentration fallback; no-filePath nodes; single-child container suppression (gated by layer ≥ 3) |
| Unit | aggregateContainerEdges |
empty edges; multiple same-direction edges merge; bidirectional edges split; intra + inter mix; types deduped |
| Unit | repairElkInput |
each repair function in isolation; validates correct GraphIssue level emitted |
| Unit | runElk |
minimal valid input; dev-mode strict throw; production graceful fatal; cancellation on dependency change |
| Integration | Stage 1 + Stage 2 flow | 50-node fixture; click → cache miss; second click → cache hit; size-deviation >20% → re-layout |
| Integration | Persona / focus / search interactions | switching persona reruns Stage 1; focusing a child auto-expands its container; search hit adds badge without auto-expanding |
| Visual regression (optional) | Playwright + microservices-demo fixture | baseline screenshots for overview, layer-detail, domain views |
8.3 Performance benchmarks
Generate fixtures with scripts/generate-large-graph.mjs at 500 / 1000 / 3000 nodes. Verify:
- Stage 1 < 200ms at 500 nodes; < 500ms at 3000 nodes.
- Stage 2 any container < 100ms.
If 3000-node Stage 1 misses the budget, revisit container size estimation or ELK config — do not lower the budget.
Open Questions
None at this point. All decisions made during brainstorming are captured above.
Migration Notes
applyDagreLayoutis kept in the codebase for one release after this lands, then removed in the next. This gives a fallback path during the rollout and a clean uninstall once stable.- No graph data migration needed.
- New dependencies (elkjs, graphology, graphology-communities-louvain) are pure JS, no native bindings — safe across the supported platform matrix.