How to Remove Metadata from a PDF Before Sharing It
Every PDF you send carries hidden data — author name, edit history, software version, and more. Learn how to find and strip that metadata before it reaches the wrong eyes.
The Hidden Layer Inside Every PDF
When you save a PDF, you are saving far more than the visible content. Quietly embedded in the file is a metadata layer that can include your full name, your organisation, the software you used to create the document, the date you first drafted it, and sometimes a revision history stretching back months.
For most internal workflows this is harmless. But when you share a PDF with a client, publish it on your website, or submit it as part of a tender or legal process, that hidden data becomes a liability. This guide explains exactly what PDF metadata contains, why removing it matters, and how to clean it before the file leaves your hands.
What Metadata Is Stored in a PDF?
A PDF stores metadata in two places: the Document Information Dictionary (a legacy key-value store) and the XMP stream (an XML-based format introduced by Adobe that most modern applications use). Together they can contain:
Identity Fields
- • Author (often the OS account name)
- • Creator application and version
- • PDF producer (e.g. "Acrobat Distiller 24.0")
- • Organisation or company name
Timestamp Fields
- • Creation date and time
- • Last modification date
- • Last printed date (in some applications)
- • XMP history timestamps
Document Fields
- • Title, subject, and keywords
- • Document description
- • Language settings
- • Custom application-defined properties
Structural Data
- • Embedded fonts and ICC colour profiles
- • PDF version and conformance level
- • Tagged PDF accessibility structure
- • Digital signature certificates (if signed)
Why This Matters: Real Privacy Risks
Leaking Internal Names and Roles
The Author field is populated automatically by most applications using your operating system username or the name registered in your Office suite. A file drafted by "j.smith@acmecorp.com" reveals both the employee and the organisation. In competitive or sensitive negotiations, this kind of accidental disclosure can be exploited.
Revealing Your Toolchain
The Creator and Producer fields disclose exactly which software generated the file — down to the version number. Knowing that a company is running an unpatched, three-year-old version of a PDF editor can be useful intelligence for someone probing for vulnerabilities.
Exposing Timeline and Revision History
Timestamps tell a story. If a document was "created" six months ago but the current version was saved yesterday, that gap raises questions. In legal contexts, metadata timestamps have been used as evidence in disputes over when a document was actually drafted or altered.
Unintended Keywords and Tags
Document management systems often write internal project codes, workflow tags, or classification labels into the Keywords or Subject fields. These are invisible to the reader but trivially extractable — a reminder that "hidden" in PDF parlance means hidden from the default viewer, not from anyone who looks.
Step 1 — Audit What Is Already There
Before removing anything, you need to know what you are dealing with. Use PDFCheck's metadata viewer to get an instant, complete readout of every field in your document — both the legacy dictionary and the XMP stream. Processing happens entirely in your browser, so the file contents stay on your device.
What you are looking for: any field that contains a real person's name, an internal username, a company-specific code, a software version you would rather not disclose, or timestamps that do not match the story the document is supposed to tell.
Step 2 — Choose Your Removal Method
Option A: Adobe Acrobat (Sanitise Document)
Acrobat Pro includes a Sanitize Document function (Tools → Redact → Sanitize Document) that strips metadata, hidden layers, embedded content, and scripts in a single pass. This is the most thorough option if you have Acrobat. Note that Document Properties → Description editing alone is not sufficient — it clears the visible fields but may leave the XMP stream partially intact.
Option B: Print to PDF
The blunt-instrument approach: open the PDF in any viewer and print it to a virtual PDF printer (the built-in "Save as PDF" on macOS or Windows, or a utility like PDF24). The resulting file is a fresh render with none of the original metadata. The trade-off is that you lose interactive elements, bookmarks, form fields, and any accessibility tagging. Good for simple read-only documents; not suitable for forms or signed files.
Option C: LibreOffice / Open Source Tools
LibreOffice Writer can open a PDF and re-export it with metadata cleared via File → Export as PDF → Security tab (clear the Personal Data checkbox). ExifTool is a powerful command-line option for surgical removal of specific fields:
exiftool -all= -overwrite_original document.pdf
This command strips all ExifTool-writable metadata fields. Add -XMP:all= to be explicit about the XMP stream. Note that ExifTool cannot remove all internal PDF structures — for a full sanitisation, use Acrobat or a dedicated PDF sanitiser.
Option D: Python / Automated Pipelines
For bulk processing in a CI/CD or document pipeline, pypdf lets you clear the info dictionary programmatically:
from pypdf import PdfReader, PdfWriter
reader = PdfReader("input.pdf")
writer = PdfWriter()
for page in reader.pages:
writer.add_page(page)
# Clear all metadata fields
writer.add_metadata({
"/Author": "",
"/Creator": "",
"/Producer": "",
"/Title": "",
"/Subject": "",
"/Keywords": "",
})
with open("output.pdf", "wb") as f:
writer.write(f)
For XMP streams, use pikepdf which provides direct access to the XMP packet:
import pikepdf
with pikepdf.open("input.pdf") as pdf:
with pdf.open_metadata() as meta:
meta.clear()
pdf.save("output.pdf")
Step 3 — Verify the Result
After applying your chosen removal method, run the file through the metadata viewer again. Confirm that all sensitive fields are now empty or absent. Pay particular attention to:
-
The XMP
xmp:CreateDateandxmp:ModifyDatefields — these survive many "simple" metadata clear operations -
The
xmpMM:InstanceIDandxmpMM:DocumentIDfields — unique identifiers that can be used to link revisions of the same document - The Producer field — often reset to the name of the tool used for sanitisation itself, which may reveal your internal toolchain
- Any custom application namespaces — some enterprise document management systems write proprietary XMP schemas that generic cleaners miss
What You Cannot Remove Without Destroying the Document
- Digital signatures — A signature cryptographically covers the file content; removing metadata after signing will invalidate the signature. If you need a clean-metadata file, strip metadata before signing.
- PDF/A conformance data — Archival-format PDFs embed conformance markers in their XMP that are required for validity. Removing them breaks the archival standard.
- Tagged PDF accessibility structure — Accessibility tags live in the document structure tree, not in metadata. They describe reading order and element roles; removing them harms screen reader users without improving privacy.
- Font fingerprinting — Custom or embedded fonts can sometimes be used to identify a document's origin. Metadata removal does not address this — only font subsetting or substitution does.
Build Metadata Hygiene Into Your Workflow
The most reliable approach is to make metadata removal part of your export or publish step rather than an ad-hoc action. Practical ways to do this:
Set neutral author defaults in your applications
In Microsoft Word, LibreOffice, and Adobe applications you can set the default author name to a generic value (e.g. your company name) so that every new document starts clean. Go to application preferences → user info and replace personal details with your organisation name.
Add a sanitisation step to your CI/CD pipeline
If your organisation publishes PDFs from an automated pipeline (reports, invoices, generated documents), add an ExifTool or pikepdf sanitisation step before the upload or distribution stage. This guarantees that no file leaves with internal metadata regardless of how it was created.
Audit shared documents before distribution
Make a pre-send metadata check part of your document review checklist, especially for client deliverables, public-facing downloads, or regulatory submissions. A 30-second check with a metadata viewer can prevent embarrassing or sensitive disclosures.
Key Takeaways
- Every PDF embeds metadata that can reveal author identity, software versions, and revision timestamps
- Audit first with a metadata viewer — you can not clean what you have not seen
- Use Acrobat Sanitize, ExifTool, or pikepdf depending on your workflow — always verify the result
- Strip metadata before signing — removing it after will invalidate digital signatures
- Automate sanitisation in any pipeline that publishes PDFs to ensure no file slips through with internal data intact
See What Metadata Your PDF Is Hiding
Instantly inspect every metadata field in your PDF — author, software, timestamps, and more — without uploading anything to a server.
Check Your PDF MetadataPDFCheck Team
Wir entwickeln Werkzeuge, die PDF-Analyse für alle zugänglich machen.