Your PDF's Author Field Could Cost You 4% of Revenue

In May 2023, a European data protection authority fined a major technology company EUR 1.2 billion for transferring personal data from the EU to the United States without adequate safeguards. It was the largest GDPR fine ever imposed. The regulation defines "personal data" broadly: any information relating to an identified or identifiable natural person. That definition does not care whether the data is in a database, a spreadsheet, or the metadata fields of a PDF. Any organization that shares PDFs internationally without scrubbing their metadata is transferring personal data across borders. The maximum penalty is 4% of global annual revenue.

What Personal Data Lives Inside a PDF

Most people think of a PDF as a static document — the text and images you see on screen. But every PDF carries hidden data structures that can contain personal information the author never intended to share. Under GDPR, all of the following qualify as personal data when they identify or can be used to identify a natural person:

Author and creator fields. The document properties typically contain the name of the person who created or last modified the file. This is often pulled automatically from the operating system's user account or the application's license registration. A full name and department in the Author field is personal data.
Creation and modification timestamps. These reveal when someone worked on a document, which can identify individuals when combined with other data. A PDF modified at 2:00 AM on a Saturday narrows the field of possible authors considerably.
Software and system information. The Producer and Creator fields record the software used (including version numbers) and sometimes the operating system. Combined with other metadata, this can fingerprint a specific workstation or user.
GPS coordinates in embedded images. If a PDF contains images taken with a smartphone or camera with location services enabled, the EXIF data in those images can contain precise GPS coordinates. A photo of a signed contract may reveal the exact address where it was signed.
Tracked changes and comments. Revision history can preserve names and email addresses of every person who edited or reviewed the document, along with what they changed and when.
Embedded file attachments. PDFs can contain attached files that carry their own metadata, including documents, spreadsheets, or images with additional personal data.
Form field data. Interactive PDF forms may contain filled-in data from previous users, cached auto-fill information, or hidden field values not visible on the printed page.
XMP metadata blocks. The Extensible Metadata Platform standard allows arbitrary metadata to be embedded in a PDF. This can include email addresses, organizational structures, project identifiers, and other data linking the document to specific individuals.

Why This Matters Under GDPR

GDPR Article 4 defines personal data as "any information relating to an identified or identifiable natural person." Article 44 restricts transfers of personal data to countries outside the EU/EEA that do not provide an adequate level of protection. Emailing a PDF with an employee's name in the Author field to a contact in a non-adequate country constitutes a cross-border transfer of personal data.

The enforcement risk is not theoretical. Data protection authorities across Europe have increasingly focused on incidental data exposure — personal data shared not through databases or APIs, but through ordinary business documents that no one thought to inspect. EU-level data protection guidance has explicitly confirmed that metadata in electronic documents falls within the scope of the regulation.

Under GDPR Article 83, violations of the data transfer provisions carry fines of up to EUR 20 million or 4% of total worldwide annual turnover, whichever is higher. The EUR 1.2 billion fine mentioned above was for systematic, deliberate transfers. But the regulation makes no distinction between intentional and accidental transfers. An Author field that was not scrubbed is treated the same as a database export deliberately sent overseas.

The Metadata Scrubbing Checklist

Before sharing a PDF with an external party — especially across borders — the following fields are worth checking and scrubbing or setting to generic values:

1.Author — Remove or replace with organization name
2.Creator — Remove or replace with generic application name
3.Producer — Remove software-specific identifying strings
4.Subject and Keywords — Review for project names or identifiers linked to individuals
5.Creation/Modification dates — Consider whether timestamps reveal individual work patterns
6.Embedded image EXIF data — Strip GPS coordinates, camera serial numbers, and device identifiers
7.Comments and tracked changes — Remove all reviewer names and revision history
8.XMP metadata — Inspect and remove the full XMP block, which may contain data not visible in standard property fields
9.Form field data — Clear any cached or hidden form values
10.Embedded attachments — Inspect or remove any files attached to the PDF

The Problem with "We Didn't Know"

Most organizations that share PDFs internationally have no idea these fields exist. A marketing team sends a brochure PDF to a partner in Singapore. A legal department emails a contract to counsel in New York. A consulting firm delivers a report to a client in Brazil. In every case, the Author field, timestamps, and potentially GPS-tagged images are riding along silently.

GDPR does not accept ignorance as a defense. Article 5(1)(f) requires "appropriate security" of personal data, which includes ensuring that data is not inadvertently disclosed. Article 25 requires "data protection by design and by default" — meaning organizations must build metadata scrubbing into their document workflows, not treat it as an afterthought.

How to Inspect and Remove PDF Metadata

The first step is visibility. It is not possible to scrub what is not visible. Opening a PDF in a metadata inspection tool and reviewing every field is a good starting point. Most people are surprised by what they find: their full name, their company's licensed PDF editor, and exact timestamps of their late-night editing sessions.

For one-off documents, manual inspection and removal is sufficient. For organizations that produce hundreds or thousands of PDFs, metadata scrubbing should be automated as part of the document publication workflow. Every PDF that leaves the organization should pass through a metadata removal step before reaching any external recipient.

There is an important caveat: metadata scrubbing ideally happens client-side. Uploading a PDF to a cloud-based metadata removal service means the personal data has already been transferred to a third party — potentially across borders — which is the exact problem the scrubbing is meant to solve. Tools that process documents locally, within the user's browser or on their machine, eliminate this circular risk entirely.

The Bottom Line

Every PDF an organization sends externally is a potential GDPR compliance event. The Author field is the most obvious exposure, but it is far from the only one. GPS coordinates in embedded images, tracked changes with reviewer names, and XMP metadata blocks all constitute personal data under the regulation. The fines are real, the enforcement is increasing, and the fix is straightforward: inspect, scrub, and verify every document before it leaves the organization's control.