CAD-AI Evaluation & Prompt Engineering Standards
Evaluation metrics and standards for CAD-AI systems in Tau.
🎯 Benchmark Metrics (MVP)
| Category | Metric | Target |
|---|---|---|
| Geometry | Kernel success rate | ≥ 99% successful boolean/tessellation |
| Geometry | Manifold solid | ShapeMesh.triangles.length > 0 & Euler check |
| Intent match | Parametric robustness | ≥ 90% of params can be ±5% without failure |
| Intent match | Visual similarity (SSIM/CLIP) | ≥ 0.85 vs reference render |
| Code quality | Clone ratio | ≤ 15% duplicate lines |
| Code quality | RMS layer order | 100% (static rule-checker) |
| Efficiency | Prompt tokens/turn (p95) | ≤ 2,000 |
Note: "Kernel success" is measured by running the generated JS under Replicad/OpenCascade headless; failures throw.
🔄 Replicad API → LLM Format
Processing Pipeline
- Start from
gen/api/replicad/replicad-clean-with-jsdoc.d.ts - Strip compiler-only noise with regex:
/^.*\b(gp_|TopoDS_|BRep|Adaptor3d_|Handle_|Bnd_).*$/gm - Remove generics,
private|protected, overload duplicates (keep max-arity) - Save each top-level
exportwith its JSDoc into JSON chunks:{ "id": "drawRoundedRectangle", "signature": "export declare function drawRoundedRectangle(width: number, height: number, r?: number | { rx?: number; ry?: number; }): Drawing;", "jsDoc": "Creates the Drawing of a rectangle with rounded corners…" } - Persist array to
gen/api/replicad/replicad-chunks.json(≈ 900 objects)
🗃️ Vector Store (RAG)
Database Setup
- Database: PostgreSQL 15 +
pgvectorextensionCREATE EXTENSION IF NOT EXISTS vector;
Schema
CREATE TABLE replicad_chunks (
id text PRIMARY KEY,
signature text,
jsdoc text,
embedding vector(1536)
);Usage
- CLI build script (
scripts/build-replicad-chunks.ts) inserts withINSERT … ON CONFLICT - Runtime retrieval:
const top = await db .select() .from('replicad_chunks') .orderBy(sql`embedding <-> $1`, [queryEmbedding]) .limit(8);
🧠 Prompt-side Memory Strategy
- Vector memory: Store
(summary, embedding)for every completed assistant/cad message - Sliding window: Keep last 4 messages verbatim
- Image captions: Base64 never sent; use CLIP caption + embedding
- Tool call summaries: 1-sentence description replaces raw JSON
📦 Package Setup
pnpm install pg pgvector # DB client + vector helpersThe server must enable the pgvector extension once per database.
These standards are the source-of-truth for future evaluation harnesses and prompt refactors.