Get Started
← Back to Blog

PDF Document Info Dictionary: Understanding Basic PDF Metadata

April 18, 2026• 5 min read

Every PDF contains a Document Info Dictionary—a collection of basic metadata fields that describe the document. Understanding these fields is essential for anyone working with PDF forensics or privacy.

What Is the Document Info Dictionary?

The Document Info Dictionary is the original metadata storage system in PDFs. It's a simple structure containing key-value pairs that describe the document's properties.

Location: Found in the PDF's trailer or catalog object.

Format: Direct object with predefined keys.

Standard Document Info Fields

Title

The document's title, which may differ from the filename.

  • Often shows the original document name
  • May reveal internal project names
  • Can be empty or generic

Author

The person or entity who created the document.

  • Usually the username of the creator
  • May contain full names or email addresses
  • Often reveals employee information

Subject

A description of the document's subject matter.

  • Set by the creating application or user
  • May contain project or department information
  • Often left empty

Keywords

Search keywords associated with the document.

  • Can reveal document categorization
  • May include internal tags
  • Useful for document discovery

Creator

The application that created the original content.

  • Shows the source application (Word, InDesign, etc.)
  • Reveals the workflow used
  • Different from Producer

Producer

The software that generated the PDF.

  • Shows the PDF creation tool
  • May differ from Creator if converted
  • Reveals processing history

CreationDate

When the PDF was first created.

  • Timestamp in PDF date format
  • Important for authenticity verification
  • Compare with ModDate for edit detection

ModDate

When the PDF was last modified.

  • Updates with each save
  • Key indicator of document editing
  • If different from CreationDate, document was modified

Trapped

Indicates if the document has been "trapped" for printing.

  • Values: True, False, or Unknown
  • Relevant for print production
  • Rarely privacy-relevant

Document Info vs. XMP Metadata

AspectDocument InfoXMP Metadata
FormatSimple key-valueXML-based
SizeSmallCan be large
FieldsFixed setExtensible
Custom DataNoYes
HistoryLimitedCan include history
SupportUniversalVaries by tool

Important: A PDF can have both, and they may contain different information.

Privacy Implications

What Can Leak

  1. Author names - Full names, usernames, email addresses
  2. Organization info - Through usernames or paths
  3. Software versions - Security implications
  4. Timestamps - Work patterns, document history
  5. Internal titles - Project names, codenames

Example Exposure

Title: Q4 Financial Report - CONFIDENTIAL DRAFT
Author: John.Smith@company.com
Creator: Microsoft® Word for Microsoft 365
Producer: Microsoft® Word for Microsoft 365
CreationDate: D:20231015093022-05'00'
ModDate: D:20231018142315-05'00'

This reveals:

  • Document confidentiality status
  • Employee email address
  • Software used
  • Exact edit timestamps

Viewing Document Info

In PDF Readers

Most PDF readers show Document Info:

  • Adobe Reader: File → Properties → Description
  • Preview (Mac): Tools → Show Inspector → General
  • Foxit: File → Properties → Description

Programmatically

Using tools like PyPDF2 (Python):

from PyPDF2 import PdfReader

reader = PdfReader("document.pdf")
info = reader.metadata
print(info.author)
print(info.creator)
print(info.creation_date)

Raw View

In a text editor, look for:

/Title (Document Title)
/Author (John Smith)
/Creator (Microsoft Word)

Forensic Analysis

What to Check

  1. CreationDate vs. ModDate - Different means edits occurred
  2. Creator vs. Producer - Different means conversion/processing
  3. Author consistency - Does it match expected source?
  4. Timestamp reasonableness - Do dates make sense?

Red Flags

  • ModDate significantly later than CreationDate
  • Author doesn't match expected source
  • Consumer software for "official" documents
  • Missing or cleared fields (may indicate sanitization)

Removing Document Info

Using Adobe Acrobat

  1. File → Properties → Description
  2. Clear each field manually
  3. Or use File → Save As Other → Optimized PDF

Using Sanitization Tools

CleanPDF and similar tools automatically remove:

  • All Document Info fields
  • Associated XMP metadata
  • Additional hidden data

Verification

After removal, verify fields are empty by checking properties again.

Common Issues

Inconsistent Metadata

Document Info and XMP may disagree:

  • Different authors listed
  • Conflicting dates
  • Mismatched titles

This can indicate:

  • Editing by different tools
  • Incomplete metadata updates
  • Potential manipulation

Partial Removal

Some tools only clear certain fields:

  • Title and Author removed
  • Dates still present
  • Creator/Producer unchanged

Always verify complete removal.

Best Practices

For Privacy

  1. Always check Document Info before sharing
  2. Use comprehensive sanitization tools
  3. Verify removal after cleaning
  4. Consider XMP metadata too

For Forensics

  1. Document all metadata findings
  2. Compare Document Info with XMP
  3. Check for inconsistencies
  4. Analyze timestamps carefully

For Document Creation

  1. Be aware of what metadata is added
  2. Configure applications to minimize metadata
  3. Sanitize as part of release workflow
  4. Establish organizational policies

Conclusion

The Document Info Dictionary is basic but important:

  • Contains key metadata like author, dates, and software info
  • Privacy risk if not managed properly
  • Forensic value for document verification
  • Often overlooked in favor of visible content

Understanding Document Info is the foundation of PDF metadata awareness—but don't forget that XMP metadata may contain additional information.


Want to see what's in your PDF's Document Info Dictionary? Analyze it with CleanPDF for a complete metadata breakdown.

Related Articles

See Also

Try CleanPDF

Analyze your PDFs for editing traces or remove metadata for privacy.