Hands-On Guide to Anthropic Citations

Verifiable AI Answers for PDFs, Text, and Custom Content


Introduction


 Anthropic recently released a powerful new feature called Citations. Currently available via the API only, Citations lets you attach documents — whether they’re plain text, PDFs, or custom-chunked content — and then receive verifiable references in the model’s responses. This boosts transparency in AI-generated answers, allowing you to trace each piece of information back to its original source.


Why Citations?

Previously, developers relied on complex prompt engineering to coax the model into including source information in its answers. This often led to inconsistent performance and significant time investment in testing and refinement. With Citations, you can now simply attach your source documents to the context window. When you query the model, Claude automatically cites from those sources.

Beyond streamlining prompt design, Citations offer several important benefits:

  1. Verified Sources: Large language models can provide answers that sound correct but may lack tangible references. By using Citations, each answer is paired with a source location, increasing reliability and trustworthiness.
  2. Flexible Document Handling: Anthropic supports plain text, PDF, and custom content, giving you multiple ways to upload and manage data, with PDF and plain text, chunk size is a sentence by default, but the custom content approach lets you define your own chunks — providing much greater flexibility in how you organize, label, and reference specific sections within your documents.
  3. Higher Recall Accuracy: Anthropic claims that this feature could increase the recall accuracy by 15%, from the Anthopic’s blog:

Our internal evaluations show that Claude’s built-in citation capabilities outperform most custom implementations, increasing recall accuracy by up to 15%


In a Hurry ? 

Feel free to take a look at the code in Github, README is updated with the instructions, let me know if you face any issues in getting it running, by leaving a comment on this post.

Document Types and Their Citation Formats

1. Plain Text Documents

  • Format: Character-indexed.
  • Use Case: Any text-based content.
  • Citation Style: Typically references character ranges or brief labels in your text.
response = client.messages.create(
model="claude-3-5-sonnet-20241022",
max_tokens=1024,
messages=[
{
"role": "user",
"content": [
{
"type": "document",
"source": {
"type": "text",
"media_type": "text/plain",
"data": "Space exploration stands as one of humanity’s greatest achievements. For centuries, people have gazed at the skies with both wonder and aspiration. The 20th century finally saw these dreams take shape through the launch of the first satellites, orbital missions carrying human crews, and the landmark Apollo 11 moon landing in 1969. These monumental strides pushed the boundaries of scientific knowledge, revealing insights into planetary bodies, asteroids, and distant galaxies. Today, exploration efforts continue through international collaborations and advanced robotics, even hinting at a future of commercial space tourism"
},
"title": "My Document",
"context": "This is a trustworthy document.",
"citations": {"enabled": True}
},
{
"type": "text",
"text": "Which event took place in 1969?"
}
]
}
]
)

In the above example, I gave a text about space exploration and asked a question

“Which event took place in 1969?”

This is what I got in response:

2. PDF Documents

  • Format: character-indexed, as per the documentation, pdf is extracted into a text, and citations are provided based on that text, even though the model would be able to understand and base it’s responses from the image in the pdf, it won’t be able to include citations from the Pdf, I saw this behaviour in my experiment, look below for more details.
  • Use Case: Any Pdf document.
  • Citation Style: Text, images are not cited.
messages=[
{
"role": "user",
"content": [
{
"type": "document",
"source": {
"type": "base64",
"media_type": "application/pdf",
"data": pdf_base64
},
"title": "My Document",
"context": "This is a trustworthy document.",
"citations": {"enabled": True}
},
{
"type": "text",
"text": "Tell me about the mirror?"
}
]
}
]

In the above example, I provided the pdf as a base64 converted data, this is the PDF I used, it lists some interesting facts about webb space telescope, and I asked the below question:
Tell me about the mirror?”

In the above example, I have listed the Response from the model, along with citations and the actual Pdf, comparing them side by side, if you notice the red block in the Response section you would see that it is able to understand the image and also provides the response based on that understanding but is not able to cite the image since it’s a limitation of this feature. The other 2 cited blocks Orange and Yellow, correctly points to the actual sections in the pdf.

3. Custom Content Documents

  • Format: Your own chunked data (block-indexed).
  • Use Case: Transcripts, bullet points, or structured sections where you need precise, user-defined grouping of text.
  • Citation Style: The model references the specific chunks provided by the user.
 messages=[
{
"role": "user",
"content": [
{
"type": "document",
"source": {
"type": "content",
"content": [
{
"type": "text",
"text": "Space exploration is a pioneering field that has captured human imagination for centuries. Early astronomers used rudimentary telescopes to observe the heavens, sparking our enduring quest to explore beyond Earth's atmosphere."
},
{
"type": "text",
"text": "Modern missions often rely on international cooperation. The International Space Station (ISS) exemplifies this collaborative spirit, orbiting Earth with astronauts from diverse nations conducting research on microgravity and life support systems."
},
{
"type": "text",
"text": "Private companies also play a growing role, designing reusable rockets to reduce launch costs and offering commercial flights for space tourism. This surge in entrepreneurship fuels innovation and raises big questions about sustainability and governance beyond our planet."
},
{
"type": "text",
"text": "Looking ahead, missions to the Moon and Mars aim to establish permanent research outposts. Eventually, these efforts might serve as stepping stones for crewed missions deeper into the solar system, expanding our horizons and scientific knowledge alike."
},
{
"type": "text",
"text": "This article was written by John Doe."
},
{
"type": "text",
"text": "who is the author of this article?"
}

]
},
"title": "Space Exploration Overview",
"context": "A set of short chunks illustrating key aspects of space travel and collaboration. Not cited directly in answers.",
"citations": {"enabled": True}
}
]

In the above example I broke down my text into several content blocks and provided them in a list, at the end of the list is the question, I mean to ask.

I asked it 2 questions: 

“Who is the author of this article?”

Response from Anthropic

“What is the aim of the missions?” 

Response from Anthropic

For both the queries it was able to respond and provide the correct citations.


Putting It All Together: Running the Code

Code is available on Github, and README is updated with the instructions, let me know if you face any issues in getting it running, by leaving a comment on this post.

Below is a minimal Python script illustrating how to use this feature with custom content and ask a question with citations enabled:

import anthropic

client = anthropic.Anthropic()


response = client.messages.create(
model="claude-3-5-sonnet-20241022",
max_tokens=1024,
messages=[
{
"role": "user",
"content": [
{
"type": "document",
"source": {
"type": "content",
"content": [
{
"type": "text",
"text": "Space exploration is a pioneering field that has captured human imagination for centuries. Early astronomers used rudimentary telescopes to observe the heavens, sparking our enduring quest to explore beyond Earth's atmosphere."
},
{
"type": "text",
"text": "Modern missions often rely on international cooperation. The International Space Station (ISS) exemplifies this collaborative spirit, orbiting Earth with astronauts from diverse nations conducting research on microgravity and life support systems."
},
{
"type": "text",
"text": "Private companies also play a growing role, designing reusable rockets to reduce launch costs and offering commercial flights for space tourism. This surge in entrepreneurship fuels innovation and raises big questions about sustainability and governance beyond our planet."
},
{
"type": "text",
"text": "Looking ahead, missions to the Moon and Mars aim to establish permanent research outposts. Eventually, these efforts might serve as stepping stones for crewed missions deeper into the solar system, expanding our horizons and scientific knowledge alike."
},
{
"type": "text",
"text": "This article was written by John Doe."
},
{
"type": "text",
"text": "who is the author of this article?"
}

]
},
"title": "Space Exploration Overview",
"context": "A set of short chunks illustrating key aspects of space travel and collaboration. Not cited directly in answers.",
"citations": {"enabled": True}
}
]

}
]
)
# Extract and concatenate all text values
text_blocks = [block.text for block in response.content]
full_text = ''.join(text_blocks)
print("Full text:", full_text)

print("\nCited text snippets:")
for block in response.content:
if hasattr(block, 'citations') and block.citations:
for citation in block.citations:
print(citation.cited_text)
  1. content list: Contains both the document in custom chunks and your user’s question (as the last item).
  2. model : Specify the Anthropic model you wish to use.
  3. citations: set it to {“enabled”: True} to enable the citations

Suggestions

  • Regular or Custom: if you are happy with each sentence being a chunk, go ahead with text/pdf approach, but if you want to have a different strategy give Custom Content approach a try and see if it improves the retrieval accuracy.
  • Validate Your Output: Always confirm the returned citations match the correct source text.
  • Handle Images via Descriptions: If you need to cite images or figures, provide textual descriptions or alt-text within the document. Citations can then reference those descriptions, effectively linking any mention of an image back to the correct content block. I haven’t personally tried this but I don’t see a reason why it won’t work, let me know if it does or doesn’t work.

Conclusion

The new Citations API from Anthropic streamlines the process of linking AI-generated answers to their original sources. By eliminating the need for complex prompt engineering, it offers more consistency, transparency, and time savings than previous methods. Whether you work with plain text, PDFs, or custom content, Citations ensures your model’s responses are both informative and easy to verify — helping you build AI applications that users can trust.


🌟 Stay Connected! 🌟

I love sharing ideas and stories here, but the conversation doesn’t have to end when the last paragraph does. Let’s keep it going!

🔹Website : https://madhavarora.net

🔹 LinkedIn for professional insights and networking: https://www.linkedin.com/in/madhav-arora-0730a718/

🔹 Twitter for daily thoughts and interactions:https://twitter.com/MadhavAror

🔹 YouTube for engaging videos and deeper dives into topics: https://www.youtube.com/@aidiscoverylab

Got questions or want to say hello? Feel free to reach out to me at madhavarorabusiness@gmail.com. I’m always open to discussions, opportunities, or just a friendly chat. Let’s make the digital world a little more connected!

Leave a Comment

Your email address will not be published. Required fields are marked *