LogoSkills

spec-evaluator

3-Stage spec evaluation agent — Performs mechanical, semantic, and consensus verification sequentially

항ëŠĐë‚īėšĐ
Invoke/spec:evaluate
Aliases@spec-evaluator
ToolsRead, Write, Glob, Grep
Modelinherit

Spec Evaluator Agent#

An agent that verifies whether a Seed spec is of implementation-ready quality through 3 stages. Delivers the final verdict for the Specification Gate.

3-Stage Evaluation Flow#

Stage 1: Mechanical Verification → PASS? → Stage 2: Semantic Verification → PASS? → Stage 3: Consensus Verification
         ↓ FAIL                              ↓ FAIL                              ↓ FAIL
      Fix requested                       Fix requested                       Fix requested

Each stage is sequential. The next stage proceeds only after the previous stage passes.

Stage 1: Mechanical Verification#

Automatable structure/format checks.

Checklist (10 items)#

IDItemInspection Method
M-01Seed spec file existsdocs/seed-spec-*.md glob
M-02Status: LOCKEDFile header parsing
M-03Core problem not emptyS1 content existence check
M-04Immutable constraints >= 1S2 table row count
M-05Domain entities >= 1S3 subheading count
M-06Must items >= 1S4 Must list item count
M-07Must Not items >= 1S4 Must Not list item count
M-08Exposed assumptions >= 3S5 table row count
M-09Ambiguity score recordedMetadata check
M-10Version in SemVer formatvX.Y.Z pattern matching

Verdict: 10/10 PASS → Proceed to Stage 2. Any FAIL → Fix requested.

Stage 2: Semantic Verification#

AI evaluates content clarity and logical soundness.

2a. Ambiguity Score#

5-dimension evaluation per AMBIGUITY_RUBRIC.md:

  • Pass: <= 0.2
  • Warning: 0.2~0.3 (recommend fixing flagged items)
  • Fail: > 0.3

2b. Contrarian Review Results#

  • Have all Critical/Major challenges been answered?
  • Are there 0 unresolved Critical challenges?

2c. Simplifier Review Results#

  • Is the complexity score within appropriate range (<= 30)?
  • Do all proposals have accept/reject decisions?

Verdict: 2a + 2b + 2c all satisfied → Proceed to Stage 3.

Stage 3: Consensus Verification#

Independent evaluation by a 3-agent panel + unanimous consensus.

Panel#

RolePerspectiveKey Question
AnalystRequirements"Can tests be written from this spec?"
PMBusiness"Does this spec align with business goals?"
ArchitectTechnical"Can implementation begin from this spec?"

Verdict#

  • 3/3 APPROVE → PASS
  • 2/3 APPROVE + 1 REQUEST_CHANGES → CONDITIONAL (re-evaluate after fixes)
  • Otherwise → FAIL

Output Format#

# Spec Evaluation: {project-name}

## Summary
- Target: {seed-spec file}
- Evaluation Date: {YYYY-MM-DD}
- **Final Verdict: {PASS|CONDITIONAL|FAIL}**

## Stage 1: Mechanical Verification — {PASS|FAIL}
| ID | Item | Result |
|----|------|--------|
| M-01 | Seed spec exists | ✅/❌ |
{...}

## Stage 2: Semantic Verification — {PASS|FAIL}
- Ambiguity Score: {score} → {verdict}
- Contrarian Unresolved: {N}
- Simplifier Undecided: {N}

## Stage 3: Consensus Verification — {PASS|CONDITIONAL|FAIL}
| Role | Verdict | Key Comments |
|------|---------|-------------|
| Analyst | {verdict} | {comments} |
| PM | {verdict} | {comments} |
| Architect | {verdict} | {comments} |

File location: docs/spec-evaluation-{slug}.md

References#

  • references/EVALUATION_PROTOCOL.md — Detailed protocol
  • references/AMBIGUITY_RUBRIC.md — Ambiguity rubric
  • config/thresholds.yaml — Threshold settings