CAD-AI Evaluation & Prompt Engineering Standards

Evaluation metrics and standards for CAD-AI systems in Tau.

🎯 Benchmark Metrics (MVP)

CategoryMetricTarget
GeometryKernel success rate≥ 99% successful boolean/tessellation
GeometryManifold solidShapeMesh.triangles.length > 0 & Euler check
Intent matchParametric robustness≥ 90% of params can be ±5% without failure
Intent matchVisual similarity (SSIM/CLIP)≥ 0.85 vs reference render
Code qualityClone ratio≤ 15% duplicate lines
Code qualityRMS layer order100% (static rule-checker)
EfficiencyPrompt tokens/turn (p95)≤ 2,000

Note: "Kernel success" is measured by running the generated JS under Replicad/OpenCascade headless; failures throw.

🔄 Replicad API → LLM Format

Processing Pipeline

  1. Start from gen/api/replicad/replicad-clean-with-jsdoc.d.ts
  2. Strip compiler-only noise with regex:
    /^.*\b(gp_|TopoDS_|BRep|Adaptor3d_|Handle_|Bnd_).*$/gm
  3. Remove generics, private|protected, overload duplicates (keep max-arity)
  4. Save each top-level export with its JSDoc into JSON chunks:
    {
      "id": "drawRoundedRectangle",
      "signature": "export declare function drawRoundedRectangle(width: number, height: number, r?: number | { rx?: number; ry?: number; }): Drawing;",
      "jsDoc": "Creates the Drawing of a rectangle with rounded corners…"
    }
  5. Persist array to gen/api/replicad/replicad-chunks.json (≈ 900 objects)

🗃️ Vector Store (RAG)

Database Setup

  • Database: PostgreSQL 15 + pgvector extension
    CREATE EXTENSION IF NOT EXISTS vector;

Schema

CREATE TABLE replicad_chunks (
  id text PRIMARY KEY,
  signature text,
  jsdoc text,
  embedding vector(1536)
);

Usage

  • CLI build script (scripts/build-replicad-chunks.ts) inserts with INSERT … ON CONFLICT
  • Runtime retrieval:
    const top = await db
      .select()
      .from('replicad_chunks')
      .orderBy(sql`embedding <-> $1`, [queryEmbedding])
      .limit(8);

🧠 Prompt-side Memory Strategy

  • Vector memory: Store (summary, embedding) for every completed assistant/cad message
  • Sliding window: Keep last 4 messages verbatim
  • Image captions: Base64 never sent; use CLIP caption + embedding
  • Tool call summaries: 1-sentence description replaces raw JSON

📦 Package Setup

pnpm install pg pgvector           # DB client + vector helpers

The server must enable the pgvector extension once per database.


These standards are the source-of-truth for future evaluation harnesses and prompt refactors.