GenAI Project Scoping Worksheet

Scoping a GenAI project is not like scoping a web app. The failure surface is different: non-deterministic outputs, unknown cost curves, unlabeled data, and stakeholders who've seen a demo and expect magic. This worksheet is the 20-question kickoff we run on every engagement. Fill it in honestly — the questions you can't answer tell you where the project will get stuck.

If you can't answer 80% of this worksheet, you're not ready to write a prompt. You're ready for a discovery week.

Section 1: Use case clarity

Q1: In one sentence, what does this system do, for whom?

"A summarization feature" is not an answer. "It generates a 3-sentence summary of a closed sales call for the AE, surfaced in Salesforce" is. Write the sentence, then read it aloud. If you hesitate, revise.

Q2: What does the user do before this system exists? What do they do after?

Map the current workflow and the future workflow. If the future workflow doesn't save time, money, or unlock something new, stop.

Q3: What would a human expert produce for the same input?

If no human does this task today, that's a red flag — you have no ground truth and no quality bar. Consider whether to hire a human first to establish baseline.

Q4: What's the blast radius of a wrong answer?

On a scale from "mildly embarrassing" to "regulatory incident." Determines your guardrail and review requirements.

Q5: Who reviews the output before it reaches the end decision?

Human-in-the-loop, human-on-the-loop, or fully autonomous. Get this explicit — "a human will check it" without naming the workflow is hand-waving.

Section 2: Success metrics

Q6: What's the single primary metric and its target?

"Better responses" is not a metric. "75% of summaries rated 4+/5 by the AE who made the call, measured weekly" is. One number, one target, one cadence.

Q7: What's the baseline today?

Measure it before you build. If you can't measure it, you can't prove ROI later. If no baseline exists, instrument the current process for 2 weeks before starting.

Q8: What metric would cause you to kill the project?

A stop-loss. "If accuracy is below 60% after 6 weeks" or "if cost per request exceeds $0.05." Name it now, while you're rational.

Section 3: Data and guardrails

Q9: What data does the model read? Where does it live? Who owns it?

Source systems, access paths, refresh cadence. Identify the political / compliance gatekeepers for each data source early — they become the critical path.

Q10: How much of it is labeled, structured, or eval-ready?

Be honest. "We have lots of data" usually means "we have lots of unstructured documents no one has read." Plan labeling effort accordingly.

Q11: What PII or sensitive data is in the inputs or outputs?

PII categories (name, email, SSN, health, financial), handling plan (redact, tokenize, pass through with DLP), and the legal sign-off owner.

Q12: What's the refusal policy?

List the categories of input the system should refuse to answer: off-topic, legal advice, medical advice, political, competitor queries, etc. Each with an example refusal response.

Section 4: Constraints

Q13: Latency budget?

P50 and P95, in milliseconds. "Real-time" is not a budget. Under 500ms drives very different architecture decisions than 5 seconds.

Q14: Cost budget per request and per month?

A number. If unknown, run a back-of-envelope: expected requests/month × target $/request. See our GenAI Cost Estimation Template.

Q15: Data residency, deployment, and compliance constraints?

Cloud region, on-prem, air-gapped, SOC 2, HIPAA, GDPR, or sovereign cloud requirements. These constrain your model shortlist before you start.

Section 5: Team and timeline

Q16: Who owns the system after launch? Name the person.

If the answer is "the AI team" or "TBD" — the project is already fragile. Ownership must be a person with a calendar, not an org.

Q17: What skills does the build team have? What's missing?

Prompt engineering, eval design, ML ops, frontend, domain expertise. Missing skills are the honest timeline risk.

Q18: What's the ship date, and what's driving it?

Is the date pegged to a board meeting, a trade show, a customer contract, or wishful thinking? Each implies a very different tradeoff posture.

Section 6: Go-live readiness

Q19: What does the rollout look like? Who are the first 10 users?

Named pilot group, not "internal employees." The first 10 users should be reachable by Slack, tolerant of bugs, and diverse enough to surface edge cases.

Q20: What's the rollback plan if the system misbehaves in production?

Kill switch, feature flag, fallback to prior workflow. If the answer is "we'll push a hotfix" — you're not ready for production.

How to use this

Run the worksheet in a 90-minute session with the sponsor, tech lead, and one domain expert.
Fill in what you can. Mark unknowns clearly.
The unknowns are your first sprint. Every unknown is a discovery task, not a build task.
Re-run the worksheet at the end of the discovery sprint. The answers will be shorter and sharper.
Archive the filled-in worksheet in the project repo. It's the closest thing you have to a north star when scope creeps in month 3.

The questions you skipped are the ones that will take the project down. Go back and answer them.

Next step

If you want help running this worksheet with your team or pressure-testing the answers, reach out. We've run this session ~150 times — patterns emerge fast.

Fillable Template

Unlock the full template.

Drop your details to see the full worksheet and worked examples. GenAI Project Scoping Worksheet.