Of 75 national security agencies across 47 countries, only seven properly strip metadata from their public PDF documents. The other 68 are leaking internal usernames, email addresses, operating system versions, file paths, and software configurations — embedded silently in files they publish to the open internet every day.

That finding comes from a team of European security researchers who downloaded and analyzed 39,664 PDFs from the public websites of intelligence services, defense ministries, and national cybersecurity centers. The results are striking: 65% of documents that appeared to have been "cleaned" still contained exploitable metadata. Across 19 agencies, researchers identified 159 individual employees by name and email address — people working inside organizations whose entire mission is keeping secrets.

What PDF Metadata Actually Contains

Every PDF carries hidden fields that most users never see. Open any document's properties and you will find some of them. But the deeper metadata — the kind that leaked from these agencies — is embedded in the file's internal structure and requires deliberate effort to remove.

Here is what a typical government PDF reveals before metadata is scrubbed:

/Author j.analyst@defense.gov.example

/Creator Word Processor 2019

/Producer PDF Library 15.0

/Title DRAFT - Threat Assessment Q3 (v4-FINAL-reviewed)

/Subject Internal circulation only

/CreationDate D:20250314091547+00'00'

/ModDate D:20250318163022+00'00'

/Keywords classified, nato, sigint

%% XMP metadata stream:

xmp:CreatorTool Microsoft Office Word

dc:creator J.Analyst

pdf:Producer macOS 14.2 Quartz PDFContext

That fictional example is mild compared to what the researchers actually found. Some files contained full Windows domain paths like C:\Users\jdoe\Documents\Ministry\Classified\, revealing both a username and an internal directory structure. Others embedded GPS coordinates from scanned documents, or contained revision histories showing which sections had been added or deleted and by whom.

Why This Matters Beyond Embarrassment

Metadata leaks from security agencies are not abstract risks. Each piece of information is an intelligence vector:

Employee identification. An email address confirms that a specific person works for a specific agency. That is information adversary intelligence services pay to obtain. The 159 employees identified in this study are now linkable to their agencies by anyone who downloads these PDFs.
Software reconnaissance. Knowing an agency uses Word 2019 on Windows 10 or macOS 14.2 tells an attacker exactly which CVEs to target. The PDF producer field is a free vulnerability map.
Network mapping. File paths expose internal naming conventions, shared drive structures, and domain names. A path like \\INTSERV04\shared\briefs\ tells you the internal hostname and folder hierarchy.
Workflow analysis. Creation and modification dates, revision counts, and draft titles reveal how documents move through an organization — how many reviews they go through, how long approval takes, and who is in the chain.
Spear-phishing fuel. Combine an employee's name, email format, software version, and department into one profile and you have a tailor-made spear-phishing target.

The 7 Agencies That Got It Right

The researchers noted that the handful of agencies which consistently scrubbed metadata shared a common trait: they had automated the process. Rather than relying on individual employees to remember to clean each document, these agencies deployed pipeline-level sanitization — every PDF passed through a metadata stripping step before it could be published. The pattern suggests that manual hygiene does not scale. When 68 out of 75 agencies whose core mission is security still leak metadata through manual processes, automation appears to be the only reliable approach.

How to Check and Remove PDF Metadata

Metadata exposure is not limited to security agencies. Every PDF created from Word, Google Docs, macOS Preview, or any other application embeds metadata by default. Here is how the problem can be addressed:

Inspect before sharing. Open the PDF in PDFb2's Metadata tool to see exactly what the document contains: author, creator application, timestamps, keywords, and any XMP data streams.
Strip all metadata fields. Remove the author, title, subject, keywords, creator, and producer fields. Also remove XMP metadata if present. PDFb2 handles this in one step, entirely in your browser.
Re-inspect the output. Open the cleaned PDF and verify the metadata fields are empty or generic. Verification catches edge cases that automated tools occasionally miss.
Policy over habit. The agencies that succeeded built metadata removal into their publishing pipeline. Organizations that share PDFs regularly can apply the same approach — a consistent step in every workflow.

The Uncomfortable Bottom Line

If intelligence agencies — organizations with classified document handling procedures, dedicated information security teams, and legal mandates to protect sensitive data — are failing to strip PDF metadata 93% of the time, most other organizations are likely doing no better. The average PDF created in Word, Google Docs, or macOS Preview contains the author's name, software version, operating system, and the folder path where the file was saved. That information is invisible in normal use but readable by anyone who opens the file's properties.

Stripping metadata takes seconds. The exposure from an unstripped document lasts as long as that file exists on the internet.

93% of the World's Security Agencies Are Leaking Secrets Through Their PDFs

What PDF Metadata Actually Contains

Why This Matters Beyond Embarrassment

The 7 Agencies That Got It Right

How to Check and Remove PDF Metadata

The Uncomfortable Bottom Line

See What Your PDFs Are Leaking — Then Fix It