// the problem nobody is solving

Your team reviews code in pull requests.
Who reviews the prompts?

If you ship AI features, your prompts are running in production. They route customer interactions, shape outputs, and quietly burn API tokens every time they misbehave. Most teams have no review process for prompts at all. The same engineer who writes a prompt usually deploys it. There's no second set of eyes on the file that's driving thousands of LLM calls a day.

Prompt Triage is the second set of eyes. It reads every prompt file you point it at and flags the defects that the academic literature has identified as causes of unpredictable AI behavior. Ambiguity. Conflicting instructions. Token overflows. Undefined output formats. Injection patterns. The kinds of bugs that make your AI hallucinate, hedge, or run away with your API budget.

// the 16 detectors

What Prompt Triage catches.

Three tiers of detection. Tier 1 runs free (pure regex and token counting). Tier 2 uses LLM-assisted judgment for semantic defects. Tier 3 is hybrid — regex narrows the search, LLM verifies the match.

Tier 1 · Free

Formatting

Unclosed code blocks, unclosed inline code, broken tags, unmatched close tags, multiple system blocks, mixed role conventions, stray line markers.

Tier 1 · Free

Length

Prompts that are too short to be useful or too long to be efficient. Severity scales with magnitude.

Tier 1 · Free

Unbounded Output

Output instructions with no length cap, no format constraint, no stop conditions. The pattern that produces runaway token bills.

Tier 1 · Free

Injection Static Pattern

Known prompt injection patterns: instruction overrides, jailbreak phrases, exfiltration attempts. Pre-filter catches the obvious cases without an LLM call.

Tier 1 · Free

Role Separation

System/user/assistant boundaries that bleed into each other. Mixed message conventions that confuse the model about who is speaking.

Tier 1 · Free

Token Limit — Context Overflow

Prompts plus expected output that exceed your target model's context window. Flagged before runtime.

Tier 1 · Free

Token Limit — Excessive

Prompts that consume an unreasonable share of the context window relative to their job. Indicates bloat.

Tier 2 · LLM-judged

Ambiguous Instruction

Instructions that a careful reader could interpret two ways. Vague verbs, hedge words, missing acceptance criteria.

Tier 2 · LLM-judged

Underspecified Constraints

Tasks that lack the boundary conditions the model needs to succeed. Missing edge cases, missing failure modes, missing examples of done.

Tier 2 · LLM-judged

Conflicting Instructions

Two parts of the same prompt that ask for opposite things. The model hedges, the output gets longer, your token bill climbs.

Tier 2 · LLM-judged

Undefined Output Format

You expect JSON or a structured response but never said so explicitly. The model picks whatever format it likes. Your parser breaks. You retry.

Tier 2 · LLM-judged

Overloaded Prompt

A single prompt asking the model to do five things at once. Splitting it into a chain almost always lowers cost and improves quality.

Tier 2 · LLM-judged

Poor Organization

Instructions scattered, context buried, important constraints at the bottom. The model is more likely to follow rules it sees clearly.

Tier 2 · LLM-judged

Inefficient Few-Shot

Examples that don't demonstrate what you want, that contradict each other, or that bloat the prompt without teaching the model anything.

Tier 2 · LLM-judged

Poor Documentation

Prompts written for the person who wrote them, not for the next engineer who has to debug them. A real maintenance liability at scale.

Tier 3 · Hybrid · NEW v5.2

Integration Mismatch

The prompt declares one output format but the few-shot examples use another. Tool schema field names drift from instruction text. Envelope shape mismatches. The bugs that break integrations silently.

// how it works

From install to report
in under a minute.

No setup. No config files. No accounts. Point Ghost at a folder, pick a target model, get a report.

Install

One npm command. npm install -g ghost-architect-open. Works on macOS, Windows, Linux.

Run

Type ghost, choose Prompt Triage, point at a folder of prompts. .md, .txt, .yaml, .json are all supported.

Confirm Cost

If you select a target model, Ghost shows the estimated cost band before any LLM call fires. Y to proceed, N to cancel.

Get the Report

Findings printed to terminal. PDF, Markdown, and TXT reports saved to ~/Ghost Architect Reports/prompt-triage/.

// the savings story

What clean prompts actually save you.

Prompt defects are not abstract quality issues. Each one is a direct cost driver. Ambiguity causes retries. Conflicting instructions cause hedged output. Undefined output formats cause parse failures and retries. Unbounded outputs cause runaway token bills. Token overflows cause failed requests and retries. The math compounds across every prompt in your stack.

💸

Lower API bills

Cleaner prompts burn fewer tokens. Bounded outputs cap runaway costs. Fewer retries from broken parses. Every defect Prompt Triage fixes is a defect that costs your team money on every LLM call.

✅

Predictable AI behavior

Catch ambiguity and conflicting instructions before they ship. Less hallucination in production. Less customer-facing weirdness. The same prompts running tomorrow produce the same outputs tomorrow.

🔍

No surprise costs

Every Tier 2 scan shows the cost band first. Confirm or cancel before you spend a cent. No mystery LLM calls. No bill shock at the end of the month.

🔒

No vendor lock-in

Bring your own Anthropic API key. Your prompts never leave your machine. No third party stores them, indexes them, or trains on them.

📐

Faster code review

Prompts that pass Triage are easier to review. Documentation, structure, and constraints all already audited. Your seniors spend less time decoding what a prompt is supposed to do.

📊

Defensible quality story

Telling stakeholders "our prompts are good" is hard to back up. "Our prompts pass a 16-detector triage backed by published academic research" is not.

// the rest of ghost architect

Prompt Triage is one mode.
Five more are waiting.

If you're auditing prompts, you probably also need to understand the codebase running them. Or the codebase you're about to inherit. Or the one you're trying to scope for a migration. Ghost Architect ships with five additional analysis modes that all use the same install, the same CLI, and the same bring-your-own-API-key model.

💬

Chat

Ask the codebase anything in plain English. Ghost reads, indexes, and answers using only the files in your project. No hallucinations from elsewhere.

🗺

Points of Interest

Auto-map red flags, dead zones, fault lines, and landmarks across the entire codebase. The first scan a senior architect runs on a project they didn't write.

💥

Blast Radius

Pick a file or function. Ghost traces every dependency, every affected flow, every downstream caller. Includes a rollback plan you can hand to a stakeholder.

⚡

Conflict Detection

Contract mismatches, schema conflicts, config disagreements between services. The integration bugs that show up at runtime instead of compile time.

🔍

Recon

Engagement sizing before you sign the SOW. File count, complexity gauge, scan cost projection, multi-pass plan. Walk into the scoping call with data.

👻

See it all on the main site

Capabilities, security model, agency workflows, real proof-of-concept reports, pricing.

ghostarchitect.dev →

// what's in each tier

Prompt Triage is free.
Project tracking is Pro+.

Every tier runs Prompt Triage with all 16 detectors at full accuracy. The gating is on workflow features (baseline comparison, velocity tracking, team-sync), not on detector access or report quality.

All 16 detectors

✓

Cost pre-flight

✓

PDF / MD / TXT reports

✓

One-shot audits

✓

Project tracking (label scans)

—

✓

Baseline comparison (resolved / remaining / new)

—

✓

Velocity tracking across scans

—

✓

Project Dashboard

—

✓

Compare Reports side-by-side

—

✓

Team sync of scans

—

✓

White-label PDF

—

✓

See full tier breakdown on the pricing page.

// frequently asked

Common questions.

How accurate are the detectors?

Tier 1 detectors are pure regex and token-counting. They're deterministic, so they're as accurate as the rule. Tier 2 detectors use LLM judgment, which means accuracy depends on the target model you pick. We recommend Claude Haiku 4.5 for the best balance of accuracy and cost (around $0.05 to $0.08 per prompt with all Tier 2 detectors running). For mission-critical audits, run with Sonnet 4 for higher accuracy at roughly 5x the cost.

What does a scan actually cost?

If you don't pick a target model, scans are free. Only Tier 1 detectors run, and they don't use the LLM. If you pick a target model, expect roughly $0.05 to $0.08 per prompt on Haiku 4.5. Our live test of 7 prompts came in at $0.41 total. Ghost shows you the cost band before any LLM call fires, and reports actual spend after the scan finishes.

What file formats does it support?

Markdown (.md, .markdown), plain text (.txt), YAML (.yaml, .yml), and JSON (.json) files containing prompt content. The loader filters out non-prompt files automatically (package.json, tsconfig, lockfiles, etc.) so you can point at a project root without setup.

Does Prompt Triage upload my prompts anywhere?

No. Prompt Triage runs entirely on your local machine. Your prompt files never leave your filesystem. LLM-backed detectors (Tier 2) make API calls to Anthropic using your own API key, under your own data agreement. Ghost Architect has no server. There is nothing to upload to.

Can I run Prompt Triage in CI?

Yes. Pass --non-interactive on the CLI to skip prompts. Cost pre-flight is auto-confirmed when a budget cap is set. The exit code reflects whether any Critical findings were surfaced, so CI can fail builds on regressions.

What's the difference between Open and Pro for Prompt Triage?

Open runs Prompt Triage as a one-shot audit. Every scan stands alone, every report is full. Pro adds Project Intelligence: label a scan with a project name, and subsequent scans on that label produce a baseline comparison (which findings were resolved, which remain, which are new). Pro also shows velocity trends and surfaces prompts in the Project Dashboard alongside code projects. If you run Prompt Triage weekly or monthly to track prompt-quality drift, Pro is the upgrade. If you run one-off audits, Open is enough.

How is this different from a linter or static prompt-analysis tool?

Linters catch syntax issues. Prompt Triage catches semantic defects (ambiguity, conflicting instructions, undefined contracts, integration mismatches) that don't show up in syntax. The 7 Tier 1 detectors are linter-style. The 9 Tier 2 and Tier 3 detectors use LLM judgment to identify problems a regex cannot see.

Your prompts areproduction code.Find the bugs.

Prompt Triage is part of Ghost Architect™.

Your team reviews code in pull requests.Who reviews the prompts?

What Prompt Triage catches.

Formatting

Length

Unbounded Output

Injection Static Pattern

Role Separation

Token Limit — Context Overflow

Token Limit — Excessive

Ambiguous Instruction

Underspecified Constraints

Conflicting Instructions

Undefined Output Format

Overloaded Prompt

Poor Organization

Inefficient Few-Shot

Poor Documentation

Integration Mismatch

The taxonomy isn't ours. We just operationalized it.

From install to reportin under a minute.

Install

Run

Confirm Cost

Get the Report

What clean prompts actually save you.

Lower API bills

Predictable AI behavior

No surprise costs

No vendor lock-in

Faster code review

Defensible quality story

Prompt Triage is one mode.Five more are waiting.

Chat

Points of Interest

Blast Radius

Conflict Detection

Recon

See it all on the main site

Prompt Triage is free.Project tracking is Pro+.

Common questions.

How accurate are the detectors?

What does a scan actually cost?

What file formats does it support?

Does Prompt Triage upload my prompts anywhere?

Can I run Prompt Triage in CI?

What's the difference between Open and Pro for Prompt Triage?

How is this different from a linter or static prompt-analysis tool?

Audit your prompts tonight.

Your prompts are
production code.
Find the bugs.

Your team reviews code in pull requests.
Who reviews the prompts?

From install to report
in under a minute.

Prompt Triage is one mode.
Five more are waiting.

Prompt Triage is free.
Project tracking is Pro+.