
11 Free Claude API Assessment Practice Questions (Updated June 2026)
Try 20 Free Questions
Question 1 of 20You're building a document analysis system that needs to process 500-page legal contracts and extract key clauses. Your API quota allows for 2M tokens/month. The extracted insights must be returned within 30 seconds. Which Claude model should you select and why?
Select your answer below
What Is the "Building with the Claude API" Assessment?
"Building with the Claude API" is one of Anthropic's free developer courses on Anthropic Academy (anthropic.skilljar.com). The course walks through the practical surface of the Claude API — Messages endpoint, system prompts, multi-turn conversations, streaming, tool use, vision and PDF input, prompt engineering, error handling and rate limits, model selection across the Haiku / Sonnet / Opus tiers, and Anthropic's responsible-use policies. It ends with a graded final assessment that mixes multiple-choice and short scenario questions covering the same surface.
The assessment is not timed in the harshly-proctored way an enterprise certification exam is — you take it inside the Academy portal at your own pace, and most developers finish in 30 to 45 minutes. Anthropic rotates and regenerates the actual items, so any blog claiming to be a "leak" of the assessment is either out of date or fabricating. The honest way to prepare is to study the published course topics at the right difficulty band — which is what the 11 questions below are calibrated to.
Completing the course (and passing the assessment) is the single best free preparation for the paid Claude Certified Architect (CCA-F) exam, which goes deeper on agentic architecture, Claude Code, and Model Context Protocol on top of the same Claude API foundation. If you are aiming at CCA-F, this post pairs with our CCA practice questions and our free 540-question CCA-F pack.
Topic 1: Messages API — Structure, Roles, and System Prompts
The Messages API is the single entry point for everything Claude does. A request is an array of messages with role "user" or "assistant", an optional top-level system prompt, and a model selector. The course assessment leans heavily on knowing the difference between a system prompt (instructions about who Claude is and how to behave, set once at the top level) and a user message (the actual turn). The most common mistake is putting persona/behaviour instructions inside the first user message — it works but it dilutes the model's adherence and wastes tokens on every multi-turn request.
Other Messages-API specifics that show up: messages must alternate user → assistant → user → assistant; the API will error if you send two user messages back to back outside of tool-result flows. The first message in any conversation must be a user message. The model parameter is required; max_tokens is required; temperature is optional and defaults to 1.0. The response carries a top-level stop_reason field — "end_turn", "max_tokens", "tool_use", or "stop_sequence" — and you should always inspect it before treating the response as complete.
Topic 2: Streaming and Server-Sent Events
Streaming responses are opt-in via stream: true on the request. The API returns server-sent events (SSE) where each event has a type field — message_start, content_block_start, content_block_delta, content_block_stop, message_delta, message_stop, plus ping and error events. To reconstruct the assistant's text you accumulate the text fields from content_block_delta events; you do NOT use the message_delta event for text (that one carries usage and stop_reason updates).
The most common streaming bug developers hit is treating each SSE line as a complete JSON object instead of parsing the SSE envelope first ("event: ..." then "data: ..." separated by blank lines). The second most common bug is closing the connection on the first content_block_stop without waiting for message_stop — you lose the final usage numbers and stop_reason. Always read the stream to message_stop.
Topic 3: Tool Use (Function Calling)
Tool use is how Claude calls back into your application to execute code, fetch data, or trigger side effects. You declare tools as a list on the request, each with a name, description, and JSON-Schema input_schema. Claude responds with a tool_use content block (carrying tool name and input JSON), your application executes the tool, and you send the result back as a user message containing a tool_result content block — referenced by tool_use_id so Claude can match it to the original call.
The course assessment often probes the round-trip: the assistant message carrying a tool_use block has stop_reason: "tool_use", which tells you to execute and respond. If you skip the tool_result and just send a new user question, the model will hallucinate what the tool returned. Forcing structured output via a tool (instead of asking for JSON in the prompt and parsing free text) is the canonical pattern for schema-conformant responses — defining the output shape as the tool's input_schema beats regex-stripping a markdown-fenced JSON blob every time.
Topic 4: Prompt Engineering — XML Tags, CoT, Few-Shot
Three prompt-engineering patterns dominate the assessment: XML-tagged structure (wrap inputs, examples, and instructions in distinct tags like <document>, <example>, <task> so the model can reliably reference them), chain-of-thought (ask the model to think step-by-step before answering, often inside a <thinking> tag you instruct it to use), and few-shot examples (show 2-5 worked examples in the prompt to anchor format and reasoning style).
Claude responds particularly well to XML tags because they are unambiguous delimiters that survive long contexts — much more reliably than triple-backtick or "BEGIN INPUT" markers. The course also covers prompt-caching as a separate primitive: setting cache_control on a content block lets you re-use long shared prefixes across many requests at a fraction of the per-token cost. Cache hits and writes appear in the usage field of every response.
Topic 5: Vision and Document Input
Claude accepts images directly in user messages as content blocks of type "image" with either a base64-encoded source or a public URL. Supported formats are JPEG, PNG, GIF, and WebP, with a per-image size limit. PDFs are a distinct content-block type ("document") that Claude processes natively — both the text layer and the visual layout. The model can answer questions about specific pages, extract tables, and reason about figures without you having to OCR or chunk the PDF first.
Two assessment-relevant gotchas: large images count against your token budget at a documented rate (the API returns the image token count in the usage field), and mixing image and text content blocks in the same user message is supported and expected — you place the image first for context and the question second.
Topic 6: Errors, Rate Limits, and Retries
The Claude API returns standard HTTP status codes plus an error envelope with a type field. The codes the assessment cares about: 400 (invalid_request_error — your fault, fix the request), 401 (authentication_error — bad or missing API key), 403 (permission_error), 404 (not_found_error), 413 (request_too_large), 429 (rate_limit_error), 500 (api_error — Anthropic's side, retry), 529 (overloaded_error — back off and retry). The 429 response carries Retry-After-Ms or anthropic-ratelimit-* headers that tell you how long to wait.
The canonical retry policy is exponential backoff with jitter, retrying on 429, 500, 503, and 529 but NOT on 400 / 401 / 403 / 404 / 413 — retrying those just wastes tokens and quota. The Anthropic SDKs implement this for you, but the assessment expects you to know the shape.
Topic 7: Model Selection — Haiku, Sonnet, Opus
Anthropic publishes three model tiers and the course expects you to pick the right one for each scenario. Haiku is the fastest and cheapest — use it for high-volume classification, simple routing, summary tasks, and latency-sensitive endpoints. Sonnet is the balanced workhorse — use it for general reasoning, coding, RAG, and most agentic loops. Opus is the most capable — use it for deep multi-step reasoning, complex code generation, and tasks where wrong answers cost more than the marginal token spend.
A good heuristic the assessment tests: start with Sonnet, drop to Haiku once you've measured that your task succeeds at the cheaper tier, escalate to Opus only when measurement shows Sonnet is failing on real traffic. Picking Opus by default is a common cost mistake; picking Haiku by default is a common quality mistake. Always re-evaluate model choice when a new generation ships.
Topic 8: Safety and Responsible Use
Anthropic's Acceptable Use Policy (AUP) is the single source of truth for what is and is not allowed. The assessment doesn't quiz you on individual prohibited use cases, but it does expect you to know that you, the developer, are responsible for your product's use of Claude — Anthropic is not the last line of defence. That means implementing your own content moderation, abuse detection, and rate limiting on top of the API.
The course also covers responsible behaviour the model itself enforces — refusing certain categories of content, declining to impersonate real living people without consent, and being transparent that it is an AI when asked. Prompts that try to override these ("ignore all previous instructions...") are not a path to passing the assessment or to building a production product; the model has been hardened against the common patterns and you'll get refusals and a reputation hit for trying.
11 Practice Questions (Quick Answer Key Below)
Work through each question, pick your answer, then read the explanation. The questions match the difficulty band and the topic mix of the actual assessment without claiming to be its answers. If you can confidently answer 9 of 11, you are exam-ready.
Quick answer key (try the questions first, this is here for scanning afterwards): 1-B, 2-C, 3-A, 4-B, 5-A, 6-C, 7-D, 8-B, 9-C, 10-B, 11-D. The two most often missed are the streaming reconstruction question (Q2) and the tool_use round-trip (Q4) — if you got both of those right, your understanding of the request/response loop is solid.
Questions 1-3: Messages API and Streaming
Topic: Messages API | Difficulty: Moderate 1. You are building a customer-support chatbot that must always answer in the user's language and never reveal that it is built on Claude. Where do you put these two instructions? A) As the first user message in every conversation, so they appear right before the question B) In the top-level system prompt, so they apply to every turn without being re-sent C) In a sequence of assistant messages prepended to the conversation, demonstrating the desired tone D) As a tool that the model must call before responding, returning the instructions
Correct Answer: B Persona, language, and behavioural instructions belong in the top-level system prompt. They are set once and apply to every turn without consuming a slot in the user/assistant alternation or being repeated on every request. Putting them in user messages (A) works but wastes tokens, dilutes adherence, and makes the conversation history confusing. Faking assistant turns (C) is brittle and the model will sometimes treat the faked turns as user-supplied context to question. A tool (D) misuses the tool-use mechanism, which is for actions and structured outputs, not configuration.
Topic: Streaming | Difficulty: Challenging 2. Your application streams a response from the Messages API and needs to display the assistant's text incrementally to the user. Which event type carries the text deltas you should accumulate? A) message_start B) message_delta C) content_block_delta D) content_block_stop
Correct Answer: C The content_block_delta event carries the actual text increments inside its delta.text field — accumulate those to reconstruct the assistant's reply. The message_delta event (B) carries updates to top-level fields like stop_reason and usage, NOT text. message_start (A) opens the stream with metadata and content_block_stop (D) closes a block — neither contains text. A common bug is closing the connection at the first content_block_stop without waiting for message_stop, which loses the final usage and stop_reason values.
Topic: Messages API | Difficulty: Moderate 3. You receive a response from the Messages API where the last content block has the expected output. Before treating the response as complete, which top-level field should you inspect? A) stop_reason B) usage.output_tokens C) role D) model
Correct Answer: A stop_reason tells you why generation halted: "end_turn" (clean finish), "max_tokens" (truncated — you may need to continue), "tool_use" (the model wants you to execute a tool and respond), or "stop_sequence" (a stop sequence you provided was emitted). Skipping this check is how applications ship truncated answers to users without realising. usage.output_tokens (B) is informational. role (C) is always "assistant" on a response. model (D) is the model that handled the call.
Questions 4-6: Tool Use and Prompt Engineering
Topic: Tool Use | Difficulty: Challenging 4. The model responds with a stop_reason of "tool_use" and an assistant message containing a tool_use content block requesting your `get_weather` tool with input {"city": "Sydney"}. What should your application send next? A) A new user message asking the model to retry with different input B) A new user message containing a tool_result content block with the tool's output, referenced by tool_use_id C) A new system prompt update that includes the weather data D) A new assistant message containing a tool_result content block with the tool's output
Correct Answer: B Tool results are returned as content blocks of type "tool_result" inside a new USER message, with tool_use_id matching the original tool_use block's id. The model then sees the result and produces its next assistant message. Sending a fresh question (A) makes the model hallucinate what the tool returned. tool_result does NOT belong in an assistant message (D) — Claude does not call its own tools. System prompts (C) are not the channel for tool results; they are set once and not the dynamic data plane.
Topic: Prompt Engineering | Difficulty: Moderate 5. You need to ground a long answer in a 30-page document and have the model quote relevant passages directly. Which prompt structure is most reliable? A) Place the document inside <document> XML tags at the top of the user message, then ask the question with explicit instruction to quote from <document> in <quote> tags B) Concatenate the document text immediately before the question with no delimiters, relying on the model to figure out where the document ends C) Send the document as a system prompt and the question as the user message D) Send the document and the question as two separate user messages in sequence
Correct Answer: A XML-tagged structure is the most reliable way to delimit inputs for Claude, especially for long documents. The model has been trained to recognise tags as unambiguous boundaries and to quote from named tagged regions on request. Concatenating without delimiters (B) routinely produces answers that conflate the document with the question. Putting documents in the system prompt (C) works but limits caching and audit-trail benefits and is awkward for documents that change per request. Two user messages in a row (D) violates the alternation rule and the API will error.
Topic: Tool Use | Difficulty: Moderate 6. Your service must return strictly valid JSON matching a fixed schema for every Claude response, because a downstream system parses it programmatically. Which approach is most reliable? A) Ask for JSON in the prompt and run a regex to strip any extra text before parsing B) Lower the temperature to 0 and tell the model not to include explanations C) Define the response shape as a tool input_schema and have Claude respond by calling that tool D) Post-process every response with a second Claude call that reformats it into JSON
Correct Answer: C Defining the target shape as a tool input_schema and routing the response through tool use is the most reliable way to enforce schema-conformant output. The model fills the schema rather than generating free text that you parse, so you avoid stray prose, markdown fences, and trailing commentary. Regex stripping (A) is brittle. Temperature alone (B) does not guarantee structure. A second reformatting call (D) doubles cost and latency without guaranteeing validity.
Questions 7-9: Vision, Errors, and Rate Limits
Topic: Vision | Difficulty: Moderate 7. You want Claude to answer a question about a scanned PDF invoice. How should you structure the user message? A) Convert the PDF to plain text yourself and send only the text, since Claude cannot read PDFs B) Send the PDF as an image content block, page by page C) Upload the PDF to your own storage and send the URL as a tool_result D) Send the PDF as a content block of type "document" alongside the text question
Correct Answer: D Claude accepts PDFs natively as a content block of type "document" — both the text layer and visual layout are processed. You do NOT need to OCR (A) or chunk into images per page (B). Tool_result (C) is for tool round-trips, not for delivering source documents to the model. The document block sits alongside a text block carrying the question, mirroring the image-plus-question pattern used for vision.
Topic: Errors | Difficulty: Moderate 8. Your application receives a 401 authentication_error response from the Messages API. Which response is correct? A) Retry the request with exponential backoff up to 5 attempts B) Stop immediately, surface the error, and check the API key — do not retry C) Lower max_tokens and retry, since 401 sometimes means the request was too large D) Switch to a different model and retry
Correct Answer: B 401 means the API key is missing, malformed, or revoked — retrying with the same key just wastes attempts and triggers downstream alarms. The correct path is to surface the error, log a meaningful diagnostic, and verify the ANTHROPIC_API_KEY environment variable in the failing environment. Exponential backoff (A) is for transient errors like 429 (rate limit), 500 (api_error), and 529 (overloaded_error). Request size mismatches (C) return 413, not 401. Model swaps (D) do not change auth status.
Topic: Rate Limits | Difficulty: Challenging 9. Your batch job receives 429 rate_limit_error responses intermittently. The response includes a "Retry-After-Ms" header. What is the correct retry behaviour? A) Retry immediately — the header is informational and not enforced B) Sleep for a fixed 1 second and retry, ignoring the header value C) Sleep for the duration specified in Retry-After-Ms, then retry — adding small jitter when many workers are retrying D) Stop the job and notify a human on the first 429
Correct Answer: C The Retry-After-Ms (or Retry-After) header is the API's instruction on how long to wait before the next attempt. Honour it; adding small randomised jitter prevents the thundering-herd problem when many concurrent workers all wake up at the same instant. Immediate retry (A) just earns another 429 and counts against your quota. A fixed sleep that ignores the header (B) is too short on heavy congestion and wastes time when the limit clears sooner. Failing on first 429 (D) is over-reactive — 429s are routine in high-throughput workloads.
Questions 10-11: Model Selection and Safety
Topic: Model Selection | Difficulty: Moderate 10. You're building a high-volume classification endpoint that tags incoming support tickets as billing, technical, or account. Latency-per-call matters and the task is well-defined. Which Claude tier is the most appropriate starting point? A) Opus, to get the highest possible accuracy regardless of cost B) Haiku, because the task is bounded and high-volume C) Sonnet, because it is the default and always the right choice D) Whichever model the last tutorial you read used
Correct Answer: B Haiku is built for high-volume, bounded, latency-sensitive tasks like classification, routing, and simple summarisation. Opus (A) gives you accuracy you don't need at a cost you don't want to pay on this workload. Sonnet (C) is a good default but it is not always the right choice — measure your task at the cheaper tier first. (D) is the most common real-world mistake: people pick whichever model the snippet they copy-pasted used, without re-evaluating for their workload. Always start cheap, measure, and escalate only when measurement says you need to.
Topic: Safety | Difficulty: Moderate 11. Your product allows users to send arbitrary prompts to Claude. Who is responsible for moderating the content your users send and for the outputs they consume? A) Anthropic, since they trained the model and operate the API B) Nobody — Claude has internal safety training, so additional moderation is redundant C) The end user, who agrees to terms of service when signing up D) You, the developer — Anthropic provides the model and a safety baseline, but you are responsible for moderation, abuse detection, rate limiting, and AUP compliance in your product
Correct Answer: D Anthropic's Acceptable Use Policy makes clear that the developer is responsible for their product's use of Claude. Anthropic provides a strong safety baseline (training-time alignment, refusals on certain categories), but you cannot offload moderation, abuse detection, or rate limiting onto the model. (A) misattributes responsibility. (B) is the dangerous misread that ships unmoderated apps. (C) is true that users agree to your terms, but agreement does not absolve you of operational responsibility. Plan for moderation from day one rather than retrofitting it after a public incident.
How Did You Score?
9-11 correct: Exam-ready on the Claude API assessment. Sit it in Anthropic Academy with confidence, then start working through the CCA-F practice questions and the free 540-question CCA-F pack for the next step on the Claude certification ladder.
5-8 correct: Solid base. Re-read the topic sections above for whichever questions you missed — especially streaming, tool use, and error handling, which are where most candidates lose marks. Then re-take these questions cold a few days later before sitting the assessment.
0-4 correct: Work through the "Building with the Claude API" course end-to-end first (it is free at anthropic.skilljar.com), then come back to these questions. Skipping the course and trying to brute-force the assessment from practice questions is the slow path; the course is genuinely good.
If you are preparing for the CCA-F exam in addition to the Claude API course, our free CCA-Foundations pack covers 540 scenario-based questions across all five exam domains with the same difficulty band as the real assessment.
Frequently Asked Questions
Is the "Building with the Claude API" assessment timed?
The assessment sits inside Anthropic Academy at anthropic.skilljar.com and is not strictly timed in the proctored-exam sense — you take it at your own pace. Most developers complete it in 30 to 45 minutes. You should plan to take it in a single sitting because the portal does not always preserve mid-assessment state.
Can I retake the Claude API course assessment?
Yes. Anthropic Academy allows retakes on the course assessment. We recommend reviewing the modules you found weakest before retaking rather than re-attempting cold — the items are regenerated and rotated, so memorising answers from a previous attempt is not a useful strategy.
Do I get a certificate for passing the "Building with the Claude API" course?
Yes — Anthropic Academy issues a completion certificate when you pass the final assessment. The certificate is a credential of having completed the free course; it is distinct from the paid Claude Certified Architect (CCA-F) certification, which is a deeper, scenario-based exam covering agentic architecture, Claude Code, and MCP on top of the Claude API foundation.
How is the Claude API course assessment different from the CCA-F exam?
The "Building with the Claude API" assessment is a course-completion check that validates you understood the Claude API surface — Messages, tool use, streaming, prompts, vision, errors, model selection. The Claude Certified Architect Foundations (CCA-F) exam goes deeper and adds substantial material on agentic architecture and orchestration (27% of the exam), Claude Code workflows (20%), Model Context Protocol (18%), and context management and reliability (15%). The CCA-F is the right next certification after the course assessment if you want a credential that proves you can ship production systems with Claude. See our CCA-F practice questions and our free 540-question CCA-F pack for prep material.
Are these the actual Claude API course assessment answers?
No, and you should be suspicious of any post claiming to be. Anthropic rotates and regenerates the assessment items, so a static "answer key" goes stale fast — and republishing live assessment content would violate the course terms anyway. The 11 practice questions above are scenario-based and mapped to the same topics at the same difficulty band as the assessment, which is the honest and useful way to prepare.
Master Your Exams with ReadRoost
Practice questions, flashcards, and timed exams for 57 certifications.
Related Articles
Security+ or CySA+ first? The job ads disagree with the study subs.
Most people treat CySA+ as the automatic next cert after Security+. The study subs reinforce it: pass Security+, line up CySA+, keep the streak going. But while I was deciding the same thing, I went through a stack of actual SOC analyst and security-tier-1 job ads, and they told a different story. Security+ was the hard requirement, the line in the filter that screens you out if it is missing. CySA+ kept showing up under 'nice to have' or 'or equivalent'. That gap is the whole decision. If you already hold Security+, the honest question is not 'is CySA+ the next step', it is 'is CySA+ the next step for the job I actually want, or am I about to spend two months on a cert hiring managers treat as optional'. Here is how I would order them in 2026, and the one situation where doing CySA+ first is genuinely the right call.
AZ-900 for non-technical people: what it actually proves (and what it does not)
AZ-900 gets dismissed a lot as the cert you do before the real certs, and that framing misses the people who have no intention of becoming cloud engineers. I have watched project managers, pre-sales consultants, and procurement officers use AZ-900 as the thing that lets them sit in a technical meeting and follow it, rather than nodding along and Googling terms afterwards. For that group the question is not 'is this a stepping stone to AZ-104'. It is 'will 40 hours of study let me stop bluffing in conversations that matter to my job'. The answer is usually yes, with a couple of honest caveats about what the cert does not do. Here is how to decide if you are in a non-technical role and wondering whether AZ-900 is for you.
The CISSP CAT tripped me up until I stopped studying for a multiple-choice exam
I put off the CISSP for years, and when I finally sat down to prepare I made the mistake almost everyone makes: I studied it like a big multiple-choice test. Cram the eight domains, drill the definitions, walk in and recall. The adaptive format quietly punishes that approach, and realising it late in my prep was a frustrating moment I would rather other people skip. The CISSP CAT does not just change how many questions you answer. It changes which questions you see based on how you are doing, and it is built to test how you think, not how much you can recall. If you are a mid-career security professional finally taking the CISSP seriously, here is how the adaptive format should change the way you study, from someone who holds it.
