Exam Guides

11 Free Claude API Assessment Practice Questions (Updated June 2026)

Anthropic's free "Building with the Claude API" course on Anthropic Academy ends with a graded final assessment that covers the practical surface of the Claude API: the Messages endpoint and roles, streaming, tool use, prompt engineering, vision and document input, error handling, model selection, and safety. The 11 scenario-based practice questions below are mapped to those topics at the same difficulty band as the assessment itself. They are practice questions, not the assessment answers — Anthropic regenerates and rotates the actual items, and any post claiming to have the live questions is misleading. Work through these, read the explanations, and you will be in good shape on exam day.

ReadRoost Team

Study & certification team

June 6, 202618 min read

11 Free Claude API Assessment Practice Questions (Updated June 2026)

Try 5 Free Questions

Question 1 of 5

Context Management & Reliability

You're building a document analysis system that processes 500-page legal contracts and extracts key clauses. Each document is large, and the extracted insights must be returned within about 30 seconds. Which Claude model tier best fits, and why?

Select your answer below

What Is the "Building with the Claude API" Assessment?

"Building with the Claude API" is one of Anthropic's free developer courses on Anthropic Academy (anthropic.skilljar.com). The course walks through the practical surface of the Claude API — Messages endpoint, system prompts, multi-turn conversations, streaming, tool use, vision and PDF input, prompt engineering, error handling and rate limits, model selection across the Haiku / Sonnet / Opus tiers, and Anthropic's responsible-use policies. It ends with a graded final assessment that mixes multiple-choice and short scenario questions covering the same surface.

The assessment is not timed in the harshly-proctored way an enterprise certification exam is — you take it inside the Academy portal at your own pace, and most developers finish in 30 to 45 minutes. Anthropic rotates and regenerates the actual items, so any blog claiming to be a "leak" of the assessment is either out of date or fabricating. The honest way to prepare is to study the published course topics at the right difficulty band — which is what the 11 questions below are calibrated to.

Completing the course (and passing the assessment) is the single best free preparation for the paid Claude Certified Architect (CCA-F) exam, which goes deeper on agentic architecture, Claude Code, and Model Context Protocol on top of the same Claude API foundation. If you are aiming at CCA-F, this post pairs with our CCA practice questions and our free 540-question CCA-F pack.

Topic 1: Messages API — Structure, Roles, and System Prompts

The Messages API is the single entry point for everything Claude does. A request is an array of messages with role "user" or "assistant", an optional top-level system prompt, and a model selector. The course assessment leans heavily on knowing the difference between a system prompt (instructions about who Claude is and how to behave, set once at the top level) and a user message (the actual turn). The most common mistake is putting persona/behaviour instructions inside the first user message — it works but it dilutes the model's adherence and wastes tokens on every multi-turn request.

Other Messages-API specifics that show up: messages must alternate user → assistant → user → assistant; the API will error if you send two user messages back to back outside of tool-result flows. The first message in any conversation must be a user message. The model parameter is required; max_tokens is required; temperature is optional and defaults to 1.0. The response carries a top-level stop_reason field — "end_turn", "max_tokens", "tool_use", or "stop_sequence" — and you should always inspect it before treating the response as complete.

Topic 2: Streaming and Server-Sent Events

Streaming responses are opt-in via stream: true on the request. The API returns server-sent events (SSE) where each event has a type field — message_start, content_block_start, content_block_delta, content_block_stop, message_delta, message_stop, plus ping and error events. To reconstruct the assistant's text you accumulate the text fields from content_block_delta events; you do NOT use the message_delta event for text (that one carries usage and stop_reason updates).

The most common streaming bug developers hit is treating each SSE line as a complete JSON object instead of parsing the SSE envelope first ("event: ..." then "data: ..." separated by blank lines). The second most common bug is closing the connection on the first content_block_stop without waiting for message_stop — you lose the final usage numbers and stop_reason. Always read the stream to message_stop.

Topic 3: Tool Use (Function Calling)

Tool use is how Claude calls back into your application to execute code, fetch data, or trigger side effects. You declare tools as a list on the request, each with a name, description, and JSON-Schema input_schema. Claude responds with a tool_use content block (carrying tool name and input JSON), your application executes the tool, and you send the result back as a user message containing a tool_result content block — referenced by tool_use_id so Claude can match it to the original call.

The course assessment often probes the round-trip: the assistant message carrying a tool_use block has stop_reason: "tool_use", which tells you to execute and respond. If you skip the tool_result and just send a new user question, the model will hallucinate what the tool returned. Forcing structured output via a tool (instead of asking for JSON in the prompt and parsing free text) is the canonical pattern for schema-conformant responses — defining the output shape as the tool's input_schema beats regex-stripping a markdown-fenced JSON blob every time.

Topic 4: Prompt Engineering — XML Tags, CoT, Few-Shot

Three prompt-engineering patterns dominate the assessment: XML-tagged structure (wrap inputs, examples, and instructions in distinct tags like <document>, <example>, <task> so the model can reliably reference them), chain-of-thought (ask the model to think step-by-step before answering, often inside a <thinking> tag you instruct it to use), and few-shot examples (show 2-5 worked examples in the prompt to anchor format and reasoning style).

Claude responds particularly well to XML tags because they are unambiguous delimiters that survive long contexts — much more reliably than triple-backtick or "BEGIN INPUT" markers. The course also covers prompt-caching as a separate primitive: setting cache_control on a content block lets you re-use long shared prefixes across many requests at a fraction of the per-token cost. Cache hits and writes appear in the usage field of every response.

Topic 5: Vision and Document Input

Claude accepts images directly in user messages as content blocks of type "image" with either a base64-encoded source or a public URL. Supported formats are JPEG, PNG, GIF, and WebP, with a per-image size limit. PDFs are a distinct content-block type ("document") that Claude processes natively — both the text layer and the visual layout. The model can answer questions about specific pages, extract tables, and reason about figures without you having to OCR or chunk the PDF first.

Two assessment-relevant gotchas: large images count against your token budget at a documented rate (the API returns the image token count in the usage field), and mixing image and text content blocks in the same user message is supported and expected — you place the image first for context and the question second.

Topic 6: Errors, Rate Limits, and Retries

The Claude API returns standard HTTP status codes plus an error envelope with a type field. The codes the assessment cares about: 400 (invalid_request_error — your fault, fix the request), 401 (authentication_error — bad or missing API key), 403 (permission_error), 404 (not_found_error), 413 (request_too_large), 429 (rate_limit_error), 500 (api_error — Anthropic's side, retry), 529 (overloaded_error — back off and retry). The 429 response carries Retry-After-Ms or anthropic-ratelimit-* headers that tell you how long to wait.

The canonical retry policy is exponential backoff with jitter, retrying on 429, 500, 503, and 529 but NOT on 400 / 401 / 403 / 404 / 413 — retrying those just wastes tokens and quota. The Anthropic SDKs implement this for you, but the assessment expects you to know the shape.

Topic 7: Model Selection — Haiku, Sonnet, Opus

Anthropic publishes three model tiers and the course expects you to pick the right one for each scenario. Haiku is the fastest and cheapest — use it for high-volume classification, simple routing, summary tasks, and latency-sensitive endpoints. Sonnet is the balanced workhorse — use it for general reasoning, coding, RAG, and most agentic loops. Opus is the most capable — use it for deep multi-step reasoning, complex code generation, and tasks where wrong answers cost more than the marginal token spend.

A good heuristic the assessment tests: start with Sonnet, drop to Haiku once you've measured that your task succeeds at the cheaper tier, escalate to Opus only when measurement shows Sonnet is failing on real traffic. Picking Opus by default is a common cost mistake; picking Haiku by default is a common quality mistake. Always re-evaluate model choice when a new generation ships.

Topic 8: Safety and Responsible Use

Anthropic's Acceptable Use Policy (AUP) is the single source of truth for what is and is not allowed. The assessment doesn't quiz you on individual prohibited use cases, but it does expect you to know that you, the developer, are responsible for your product's use of Claude — Anthropic is not the last line of defence. That means implementing your own content moderation, abuse detection, and rate limiting on top of the API.

The course also covers responsible behaviour the model itself enforces — refusing certain categories of content, declining to impersonate real living people without consent, and being transparent that it is an AI when asked. Prompts that try to override these ("ignore all previous instructions...") are not a path to passing the assessment or to building a production product; the model has been hardened against the common patterns and you'll get refusals and a reputation hit for trying.

11 Practice Questions (Quick Answer Key Below)

Work through each question, pick your answer, then read the explanation. The questions match the difficulty band and the topic mix of the actual assessment without claiming to be its answers. If you can confidently answer 9 of 11, you are exam-ready.

Quick answer key (try the questions first, this is here for scanning afterwards): 1-B, 2-C, 3-A, 4-B, 5-A, 6-C, 7-D, 8-B, 9-C, 10-B, 11-D. The two most often missed are the streaming reconstruction question (Q2) and the tool_use round-trip (Q4) — if you got both of those right, your understanding of the request/response loop is solid.

Questions 1-3: Messages API and Streaming

Topic: Messages API | Difficulty: Moderate 1. You are building a customer-support chatbot that must always answer in the user's language and never reveal that it is built on Claude. Where do you put these two instructions? A) As the first user message in every conversation, so they appear right before the question B) In the top-level system prompt, so they apply to every turn without being re-sent C) In a sequence of assistant messages prepended to the conversation, demonstrating the desired tone D) As a tool that the model must call before responding, returning the instructions

Correct Answer: B Persona, language, and behavioural instructions belong in the top-level system prompt. They are set once and apply to every turn without consuming a slot in the user/assistant alternation or being repeated on every request. Putting them in user messages (A) works but wastes tokens, dilutes adherence, and makes the conversation history confusing. Faking assistant turns (C) is brittle and the model will sometimes treat the faked turns as user-supplied context to question. A tool (D) misuses the tool-use mechanism, which is for actions and structured outputs, not configuration.

Topic: Streaming | Difficulty: Challenging 2. Your application streams a response from the Messages API and needs to display the assistant's text incrementally to the user. Which event type carries the text deltas you should accumulate? A) message_start B) message_delta C) content_block_delta D) content_block_stop

Correct Answer: C The content_block_delta event carries the actual text increments inside its delta.text field — accumulate those to reconstruct the assistant's reply. The message_delta event (B) carries updates to top-level fields like stop_reason and usage, NOT text. message_start (A) opens the stream with metadata and content_block_stop (D) closes a block — neither contains text. A common bug is closing the connection at the first content_block_stop without waiting for message_stop, which loses the final usage and stop_reason values.

Topic: Messages API | Difficulty: Moderate 3. You receive a response from the Messages API where the last content block has the expected output. Before treating the response as complete, which top-level field should you inspect? A) stop_reason B) usage.output_tokens C) role D) model

Correct Answer: A stop_reason tells you why generation halted: "end_turn" (clean finish), "max_tokens" (truncated — you may need to continue), "tool_use" (the model wants you to execute a tool and respond), or "stop_sequence" (a stop sequence you provided was emitted). Skipping this check is how applications ship truncated answers to users without realising. usage.output_tokens (B) is informational. role (C) is always "assistant" on a response. model (D) is the model that handled the call.

Questions 4-6: Tool Use and Prompt Engineering

Topic: Tool Use | Difficulty: Challenging 4. The model responds with a stop_reason of "tool_use" and an assistant message containing a tool_use content block requesting your `get_weather` tool with input {"city": "Sydney"}. What should your application send next? A) A new user message asking the model to retry with different input B) A new user message containing a tool_result content block with the tool's output, referenced by tool_use_id C) A new system prompt update that includes the weather data D) A new assistant message containing a tool_result content block with the tool's output

Correct Answer: B Tool results are returned as content blocks of type "tool_result" inside a new USER message, with tool_use_id matching the original tool_use block's id. The model then sees the result and produces its next assistant message. Sending a fresh question (A) makes the model hallucinate what the tool returned. tool_result does NOT belong in an assistant message (D) — Claude does not call its own tools. System prompts (C) are not the channel for tool results; they are set once and not the dynamic data plane.

Topic: Prompt Engineering | Difficulty: Moderate 5. You need to ground a long answer in a 30-page document and have the model quote relevant passages directly. Which prompt structure is most reliable? A) Place the document inside <document> XML tags at the top of the user message, then ask the question with explicit instruction to quote from <document> in <quote> tags B) Concatenate the document text immediately before the question with no delimiters, relying on the model to figure out where the document ends C) Send the document as a system prompt and the question as the user message D) Send the document and the question as two separate user messages in sequence

Correct Answer: A XML-tagged structure is the most reliable way to delimit inputs for Claude, especially for long documents. The model has been trained to recognise tags as unambiguous boundaries and to quote from named tagged regions on request. Concatenating without delimiters (B) routinely produces answers that conflate the document with the question. Putting documents in the system prompt (C) works but limits caching and audit-trail benefits and is awkward for documents that change per request. Two user messages in a row (D) violates the alternation rule and the API will error.

Topic: Tool Use | Difficulty: Moderate 6. Your service must return strictly valid JSON matching a fixed schema for every Claude response, because a downstream system parses it programmatically. Which approach is most reliable? A) Ask for JSON in the prompt and run a regex to strip any extra text before parsing B) Lower the temperature to 0 and tell the model not to include explanations C) Define the response shape as a tool input_schema and have Claude respond by calling that tool D) Post-process every response with a second Claude call that reformats it into JSON

Correct Answer: C Defining the target shape as a tool input_schema and routing the response through tool use is the most reliable way to enforce schema-conformant output. The model fills the schema rather than generating free text that you parse, so you avoid stray prose, markdown fences, and trailing commentary. Regex stripping (A) is brittle. Temperature alone (B) does not guarantee structure. A second reformatting call (D) doubles cost and latency without guaranteeing validity.

Questions 7-9: Vision, Errors, and Rate Limits

Topic: Vision | Difficulty: Moderate 7. You want Claude to answer a question about a scanned PDF invoice. How should you structure the user message? A) Convert the PDF to plain text yourself and send only the text, since Claude cannot read PDFs B) Send the PDF as an image content block, page by page C) Upload the PDF to your own storage and send the URL as a tool_result D) Send the PDF as a content block of type "document" alongside the text question

Correct Answer: D Claude accepts PDFs natively as a content block of type "document" — both the text layer and visual layout are processed. You do NOT need to OCR (A) or chunk into images per page (B). Tool_result (C) is for tool round-trips, not for delivering source documents to the model. The document block sits alongside a text block carrying the question, mirroring the image-plus-question pattern used for vision.

Topic: Errors | Difficulty: Moderate 8. Your application receives a 401 authentication_error response from the Messages API. Which response is correct? A) Retry the request with exponential backoff up to 5 attempts B) Stop immediately, surface the error, and check the API key — do not retry C) Lower max_tokens and retry, since 401 sometimes means the request was too large D) Switch to a different model and retry

Correct Answer: B 401 means the API key is missing, malformed, or revoked — retrying with the same key just wastes attempts and triggers downstream alarms. The correct path is to surface the error, log a meaningful diagnostic, and verify the ANTHROPIC_API_KEY environment variable in the failing environment. Exponential backoff (A) is for transient errors like 429 (rate limit), 500 (api_error), and 529 (overloaded_error). Request size mismatches (C) return 413, not 401. Model swaps (D) do not change auth status.

Topic: Rate Limits | Difficulty: Challenging 9. Your batch job receives 429 rate_limit_error responses intermittently. The response includes a "Retry-After-Ms" header. What is the correct retry behaviour? A) Retry immediately — the header is informational and not enforced B) Sleep for a fixed 1 second and retry, ignoring the header value C) Sleep for the duration specified in Retry-After-Ms, then retry — adding small jitter when many workers are retrying D) Stop the job and notify a human on the first 429

Correct Answer: C The Retry-After-Ms (or Retry-After) header is the API's instruction on how long to wait before the next attempt. Honour it; adding small randomised jitter prevents the thundering-herd problem when many concurrent workers all wake up at the same instant. Immediate retry (A) just earns another 429 and counts against your quota. A fixed sleep that ignores the header (B) is too short on heavy congestion and wastes time when the limit clears sooner. Failing on first 429 (D) is over-reactive — 429s are routine in high-throughput workloads.

Questions 10-11: Model Selection and Safety

Topic: Model Selection | Difficulty: Moderate 10. You're building a high-volume classification endpoint that tags incoming support tickets as billing, technical, or account. Latency-per-call matters and the task is well-defined. Which Claude tier is the most appropriate starting point? A) Opus, to get the highest possible accuracy regardless of cost B) Haiku, because the task is bounded and high-volume C) Sonnet, because it is the default and always the right choice D) Whichever model the last tutorial you read used

Correct Answer: B Haiku is built for high-volume, bounded, latency-sensitive tasks like classification, routing, and simple summarisation. Opus (A) gives you accuracy you don't need at a cost you don't want to pay on this workload. Sonnet (C) is a good default but it is not always the right choice — measure your task at the cheaper tier first. (D) is the most common real-world mistake: people pick whichever model the snippet they copy-pasted used, without re-evaluating for their workload. Always start cheap, measure, and escalate only when measurement says you need to.

Topic: Safety | Difficulty: Moderate 11. Your product allows users to send arbitrary prompts to Claude. Who is responsible for moderating the content your users send and for the outputs they consume? A) Anthropic, since they trained the model and operate the API B) Nobody — Claude has internal safety training, so additional moderation is redundant C) The end user, who agrees to terms of service when signing up D) You, the developer — Anthropic provides the model and a safety baseline, but you are responsible for moderation, abuse detection, rate limiting, and AUP compliance in your product

Correct Answer: D Anthropic's Acceptable Use Policy makes clear that the developer is responsible for their product's use of Claude. Anthropic provides a strong safety baseline (training-time alignment, refusals on certain categories), but you cannot offload moderation, abuse detection, or rate limiting onto the model. (A) misattributes responsibility. (B) is the dangerous misread that ships unmoderated apps. (C) is true that users agree to your terms, but agreement does not absolve you of operational responsibility. Plan for moderation from day one rather than retrofitting it after a public incident.

How Did You Score?

9-11 correct: Exam-ready on the Claude API assessment. Sit it in Anthropic Academy with confidence, then start working through the CCA-F practice questions and the free 540-question CCA-F pack for the next step on the Claude certification ladder.

5-8 correct: Solid base. Re-read the topic sections above for whichever questions you missed — especially streaming, tool use, and error handling, which are where most candidates lose marks. Then re-take these questions cold a few days later before sitting the assessment.

0-4 correct: Work through the "Building with the Claude API" course end-to-end first (it is free at anthropic.skilljar.com), then come back to these questions. Skipping the course and trying to brute-force the assessment from practice questions is the slow path; the course is genuinely good.

If you are preparing for the CCA-F exam in addition to the Claude API course, our free CCA-Foundations pack covers 540 scenario-based questions across all five exam domains with the same difficulty band as the real assessment.

ReadRoost Team

We turn crowdsourced pass reports and official exam objectives into practice questions, flashcards and timed exams — so you study what the exam actually tests. New guides every week.

Frequently Asked Questions

Is the "Building with the Claude API" assessment timed?

The assessment sits inside Anthropic Academy at anthropic.skilljar.com and is not strictly timed in the proctored-exam sense — you take it at your own pace. Most developers complete it in 30 to 45 minutes. You should plan to take it in a single sitting because the portal does not always preserve mid-assessment state.

Can I retake the Claude API course assessment?

Yes. Anthropic Academy allows retakes on the course assessment. We recommend reviewing the modules you found weakest before retaking rather than re-attempting cold — the items are regenerated and rotated, so memorising answers from a previous attempt is not a useful strategy.

Do I get a certificate for passing the "Building with the Claude API" course?

Yes — Anthropic Academy issues a completion certificate when you pass the final assessment. The certificate is a credential of having completed the free course; it is distinct from the paid Claude Certified Architect (CCA-F) certification, which is a deeper, scenario-based exam covering agentic architecture, Claude Code, and MCP on top of the Claude API foundation.

How is the Claude API course assessment different from the CCA-F exam?

The "Building with the Claude API" assessment is a course-completion check that validates you understood the Claude API surface — Messages, tool use, streaming, prompts, vision, errors, model selection. The Claude Certified Architect Foundations (CCA-F) exam goes deeper and adds substantial material on agentic architecture and orchestration (27% of the exam), Claude Code workflows (20%), Model Context Protocol (18%), and context management and reliability (15%). The CCA-F is the right next certification after the course assessment if you want a credential that proves you can ship production systems with Claude. See our CCA-F practice questions and our free 540-question CCA-F pack for prep material.

Are these the actual Claude API course assessment answers?

No, and you should be suspicious of any post claiming to be. Anthropic rotates and regenerates the assessment items, so a static "answer key" goes stale fast — and republishing live assessment content would violate the course terms anyway. The 11 practice questions above are scenario-based and mapped to the same topics at the same difficulty band as the assessment, which is the honest and useful way to prepare.

Master your exam

Reading is good. Practising is better.

Practice questions, flashcards and timed exams for 57 certifications. Start with a free starter pack — no card needed.

Start free →View pricing

14-day money-back guarantee.