This capstone is where theory stops and judgment starts. You will review three real prompts, diagnose defects line by line, rewrite each prompt as a clear spec, and defend your choices in a team review.
The standard is simple: your rewrite should be easier to review, easier to evaluate, and safer to ship than the original.
Keep prompt design as product design open while you work. Then connect your rewrite choices to user-facing behavior with AI UX patterns that work. A prompt is not done when it reads well in a document. It is done when output quality supports the interface decisions users actually experience.
For this capstone, you can use your team’s prompts. If you do not have access, use canonical scenarios inspired by these cases:
Notion’s rollout discipline for sensitive in-product assistance.
Perplexity’s answer-plus-citation challenge for trust and grounding.
Harvey’s legal context constraints for high-stakes accuracy posture.
Linear’s quiet utility summaries for concise productivity support.
Choose three prompts from distinct contexts. Do not pick three variants of the same task. Diversity reveals whether your method is robust.
Recommended trio:
Prompt A: generation assistant in a sensitive user workflow.
Prompt B: research or synthesis assistant that risks overconfident claims.
Prompt C: productivity assistant where concise utility matters more than flourish.
Now follow this workflow.
Phase one: establish task and risk frame.
For each prompt, document user moment, desired decision outcome, and failure cost. Keep this short and specific. If you cannot state failure cost clearly, your review will drift into style debates.
Phase two: annotate the original prompt.
Read each line and tag it as one of three statuses.
Keep: clear and necessary.
Rewrite: necessary but ambiguous or weak.
Delete: unnecessary, redundant, or contradictory.
Then label defect type where relevant: ambiguity, scope, constraint, format, or failure-mode gap.
Phase three: design the rewritten prompt spec.
Use the structure from lesson three: role, task, constraints, examples when justified, and format contract. Do not add complexity by default. Add only what reduces a known defect.
Phase four: define output interface implications.
This is where many capstones fail. Prompt rewrites are not isolated text edits. They change what users see.
For each rewrite, specify the expected UI behavior.
Should the response include explicit confidence language?
Should citations be required and rendered visibly?
Should uncertain answers trigger a clarifying question flow?
Should long responses be chunked into sections for scanability?
These choices connect directly to AI UX patterns that work. If your prompt asks for caution but your interface hides uncertainty cues, trust still breaks.
Phase five: run qualitative evaluation.
Use a compact golden set for each prompt: baseline, edge, adversarial, and policy-sensitive inputs where relevant. Compare original and rewritten outputs side by side.
Document three findings for each prompt.
What improved clearly.
What stayed flat.
What new risk appeared.
Be honest about tradeoffs. A rewrite that improves safety may reduce warmth. A rewrite that enforces structure may reduce creativity. Good product judgment is explicit about these exchanges.
Phase six: prepare the review packet.
Your deliverable should include for each prompt:
Original prompt.
Annotated defect map.
Rewritten prompt.
Observed output differences on test inputs.
Ship recommendation: ship, ship with guardrails, or iterate.
This packet is your artifact. It should let another teammate understand your decisions without a live explanation.
Now, how do you judge quality in the final presentation?
Use five scoring dimensions, each on a simple pass-needs-work basis.
Spec clarity: can reviewers explain what each major line buys?
Risk handling: are failure behaviors explicit and appropriate?
Usability: does output format support fast user decisions?
Trust posture: are confidence, uncertainty, and evidence handled honestly?
Maintainability: can this prompt be versioned and reviewed without guesswork?
If two or more dimensions are weak, do not ship.
A common capstone mistake is optimizing only for answer quality on happy-path inputs. That is demo thinking. Production quality includes edge behavior, uncertainty handling, and consistency under pressure.
Another mistake is overfitting to known examples. You get perfect scores on your tiny set and poor generalization in the wild. Keep examples representative and avoid teaching one rigid response pattern.
You should also narrate what you intentionally did not do.
Did you avoid chain-of-thought scaffolding because it added latency without quality gain?
Did you skip heavy few-shot examples because the task did not require stylistic imitation?
Did you preserve concise format because user context demanded speed?
These decisions show maturity. Prompt quality is as much about subtraction as addition.
If you are doing this exercise with your team, run one final step: cross-review.
Have another teammate score your rewrite packet without seeing your verbal defense. If they cannot follow your logic, your artifact is not review-ready.
The point of this capstone is not to produce a perfect prompt. The point is to prove you can run a repeatable quality process.
When a team can repeatedly diagnose defects, propose focused rewrites, and tie prompt changes to user-facing behavior, prompt work stops being mystical. It becomes product craft.
That is the bar for shipping AI features that people trust more after the tenth use, not less.
Rules from this lesson
- Rewrite prompts only after defining user moment, failure cost, and defect map.
- Tie every prompt decision to surfaced user experience, not just hidden model behavior.
- Judge rewrites on trust, usability, and maintainability, not happy-path fluency alone.
- Document tradeoffs explicitly so ship decisions are defensible.
- A capstone artifact is complete only when another reviewer can validate it without you in the room.