PDF Document Info Dictionary: Understanding Basic PDF Metadata
Every PDF contains a Document Info Dictionary—a collection of basic metadata fields that describe the document. Understanding these fields is essential for anyone working with PDF forensics or privacy.
What Is the Document Info Dictionary?
The Document Info Dictionary is the original metadata storage system in PDFs. It's a simple structure containing key-value pairs that describe the document's properties.
Location: Found in the PDF's trailer or catalog object.
Format: Direct object with predefined keys.
Standard Document Info Fields
Title
The document's title, which may differ from the filename.
- Often shows the original document name
- May reveal internal project names
- Can be empty or generic
Author
The person or entity who created the document.
- Usually the username of the creator
- May contain full names or email addresses
- Often reveals employee information
Subject
A description of the document's subject matter.
- Set by the creating application or user
- May contain project or department information
- Often left empty
Keywords
Search keywords associated with the document.
- Can reveal document categorization
- May include internal tags
- Useful for document discovery
Creator
The application that created the original content.
- Shows the source application (Word, InDesign, etc.)
- Reveals the workflow used
- Different from Producer
Producer
The software that generated the PDF.
- Shows the PDF creation tool
- May differ from Creator if converted
- Reveals processing history
CreationDate
When the PDF was first created.
- Timestamp in PDF date format
- Important for authenticity verification
- Compare with ModDate for edit detection
ModDate
When the PDF was last modified.
- Updates with each save
- Key indicator of document editing
- If different from CreationDate, document was modified
Trapped
Indicates if the document has been "trapped" for printing.
- Values: True, False, or Unknown
- Relevant for print production
- Rarely privacy-relevant
Document Info vs. XMP Metadata
| Aspect | Document Info | XMP Metadata |
|---|---|---|
| Format | Simple key-value | XML-based |
| Size | Small | Can be large |
| Fields | Fixed set | Extensible |
| Custom Data | No | Yes |
| History | Limited | Can include history |
| Support | Universal | Varies by tool |
Important: A PDF can have both, and they may contain different information.
Privacy Implications
What Can Leak
- Author names - Full names, usernames, email addresses
- Organization info - Through usernames or paths
- Software versions - Security implications
- Timestamps - Work patterns, document history
- Internal titles - Project names, codenames
Example Exposure
Title: Q4 Financial Report - CONFIDENTIAL DRAFT
Author: John.Smith@company.com
Creator: Microsoft® Word for Microsoft 365
Producer: Microsoft® Word for Microsoft 365
CreationDate: D:20231015093022-05'00'
ModDate: D:20231018142315-05'00'
This reveals:
- Document confidentiality status
- Employee email address
- Software used
- Exact edit timestamps
Viewing Document Info
In PDF Readers
Most PDF readers show Document Info:
- Adobe Reader: File → Properties → Description
- Preview (Mac): Tools → Show Inspector → General
- Foxit: File → Properties → Description
Programmatically
Using tools like PyPDF2 (Python):
from PyPDF2 import PdfReader
reader = PdfReader("document.pdf")
info = reader.metadata
print(info.author)
print(info.creator)
print(info.creation_date)
Raw View
In a text editor, look for:
/Title (Document Title)
/Author (John Smith)
/Creator (Microsoft Word)
Forensic Analysis
What to Check
- CreationDate vs. ModDate - Different means edits occurred
- Creator vs. Producer - Different means conversion/processing
- Author consistency - Does it match expected source?
- Timestamp reasonableness - Do dates make sense?
Red Flags
- ModDate significantly later than CreationDate
- Author doesn't match expected source
- Consumer software for "official" documents
- Missing or cleared fields (may indicate sanitization)
Removing Document Info
Using Adobe Acrobat
- File → Properties → Description
- Clear each field manually
- Or use File → Save As Other → Optimized PDF
Using Sanitization Tools
CleanPDF and similar tools automatically remove:
- All Document Info fields
- Associated XMP metadata
- Additional hidden data
Verification
After removal, verify fields are empty by checking properties again.
Common Issues
Inconsistent Metadata
Document Info and XMP may disagree:
- Different authors listed
- Conflicting dates
- Mismatched titles
This can indicate:
- Editing by different tools
- Incomplete metadata updates
- Potential manipulation
Partial Removal
Some tools only clear certain fields:
- Title and Author removed
- Dates still present
- Creator/Producer unchanged
Always verify complete removal.
Best Practices
For Privacy
- Always check Document Info before sharing
- Use comprehensive sanitization tools
- Verify removal after cleaning
- Consider XMP metadata too
For Forensics
- Document all metadata findings
- Compare Document Info with XMP
- Check for inconsistencies
- Analyze timestamps carefully
For Document Creation
- Be aware of what metadata is added
- Configure applications to minimize metadata
- Sanitize as part of release workflow
- Establish organizational policies
Conclusion
The Document Info Dictionary is basic but important:
- Contains key metadata like author, dates, and software info
- Privacy risk if not managed properly
- Forensic value for document verification
- Often overlooked in favor of visible content
Understanding Document Info is the foundation of PDF metadata awareness—but don't forget that XMP metadata may contain additional information.
Want to see what's in your PDF's Document Info Dictionary? Analyze it with CleanPDF for a complete metadata breakdown.
Related Articles
Top 5 PDF Sanitization Tools Reviewed (2025)
Compare the best PDF sanitization tools for removing metadata and hidden data. Detailed review of features, security, and pricing for document privacy.
Read article →Why PDF Metadata Matters for Privacy: Real Risks and Examples
Understand why PDF metadata is a privacy concern. Real examples of data leaks, what personal information hides in documents, and how to protect yourself.
Read article →Is My PDF Digitally Signed? How to Check
Learn how to check if your PDF is digitally signed and verify the signature. Step-by-step guide to understanding PDF signature status and what it means.
Read article →PDF Creator and Producer Metadata Explained
Understanding PDF creator and producer metadata fields. Learn what these fields reveal about document origin, software used, and privacy implications.
Read article →See Also
Try CleanPDF
Analyze your PDFs for editing traces or remove metadata for privacy.