Skip to main content
Case Study7 min read

The PDF Feature Nobody Checks: How Bookmarks Exposed a Redacted Government Contract

Government contract PDF with redacted pricing exposed through unredacted bookmark navigation entries
How PDF Bookmarks Exposed a Redacted Government Contract

What makes this case notable is that the team who redacted the contract did the hard part right. The text redaction was real. Not a black rectangle over selectable text. Not a highlight annotation. The underlying character data was genuinely removed from the content stream. Selecting the blacked-out sections returned nothing. Searching for the hidden terms produced no results. By every standard test for proper redaction, the document passed.

And then someone clicked the bookmarks panel.

What the Bookmarks Revealed

In early 2021, amid public pressure over vaccine pricing transparency, a regulatory body published a redacted version of its advance purchase agreement with a major pharmaceutical manufacturer. The contract ran over 40 pages and had large sections blacked out — the financial terms, delivery schedules, indemnification clauses, and liability caps that were at the center of the public debate.

But the PDF had a navigation panel. Bookmarks — the clickable table of contents that sits in the left sidebar of most PDF readers — were generated from the document's heading structure. And those bookmarks contained the text of section headings, sub-headings, and clause titles that had been redacted in the body.

A member of parliament was among the first to flag the issue publicly. The bookmarks referenced a total contract value in the hundreds of millions of euros. They revealed section headings for delivery schedules that had been specifically blacked out. Clause titles describing liability limitations and indemnification terms were fully readable in the bookmark panel, even though the corresponding text on the page had been properly redacted.

The redaction team addressed the page text but overlooked the bookmarks — the equivalent of shredding a confidential report while leaving the table of contents pinned to the bulletin board.

What Most People Do Not Know Lives Inside a PDF

This bookmark leak illustrates a broader problem: most people think of a PDF as a flat, inert document — the digital equivalent of a printed page. In practice, a PDF is a structured container that can carry far more information than what appears on the page. Here is what else the format can store:

Bookmarks

A hierarchical outline generated from the document's heading structure. Often auto-generated by Word, LaTeX, or InDesign when exporting to PDF. They persist independently of the page content — redacting text on a page does not automatically update or remove the corresponding bookmark entry.

Comments and Annotations

Sticky notes, review comments, text highlights, and markup. These are stored as separate annotation objects, not as part of the page content. A comment that says "Legal approved this liability cap at $50M" will survive page-level redaction unless annotations are explicitly stripped.

Layers (Optional Content Groups)

PDFs can contain togglable layers, commonly used in architectural drawings, maps, and translated documents. A layer that is turned off is still in the file. Anyone can turn it back on. If redacted content was placed on a hidden layer instead of being deleted, it is one click away from being visible.

Metadata

Author names, email addresses, creation software, revision dates, file paths, GPS coordinates from scanned documents, and custom properties. The document properties dialog shows some of this, but XMP metadata streams buried in the file structure can contain far more. A title field that reads "DRAFT — Contract v7 — FINAL pricing included" can reveal precisely what the redactors intended to withhold.

Thumbnails

Page thumbnail images are sometimes stored as separate image objects. If the thumbnail was generated before redaction, it may contain a small but readable image of the original, unredacted page. Some tools update thumbnails automatically. Others do not.

Embedded Files and Attachments

PDFs can carry other files inside them — spreadsheets, images, earlier drafts, or source documents. The attachments panel is rarely checked. An embedded Excel file containing the full pricing breakdown would survive any amount of page-level redaction.

Form Fields

Interactive form fields carry their own data, separate from the visible page content. A form field's value, default value, tooltip text, and export value can all contain information that does not appear on the rendered page.

Why This Keeps Happening

The leak occurred because the redaction workflow was page-centric. The team opened the document, identified the sensitive text on each page, and applied proper redaction to that text. The page content was handled correctly. But the PDF was treated like a printed page — as if the page were the entire document.

Every PDF viewer opens to the page view by default. The bookmarks panel, the attachments panel, the layers panel, the metadata dialog — these are all secondary interfaces that most users never open. When the mental model of a PDF is "a stack of pages," the pages get cleaned and the job is considered done. But the stack of pages is only one part of the container, and the container holds everything.

The Redaction Checklist Nobody Uses (But Should)

Proper PDF sanitization requires checking every part of the container, not just the visible pages:

  1. Redact page content. Remove sensitive text from the content stream (the part most people remember).
  2. Delete or edit bookmarks. Open the bookmarks panel and verify that no bookmark text references redacted content.
  3. Remove comments and annotations. Strip all sticky notes, review comments, highlights, and markup.
  4. Check for layers. Open the layers panel. Delete or flatten any layers that contain sensitive content.
  5. Strip metadata. Remove author, title, subject, keywords, creation software, and XMP data. Use PDFb2's Metadata tool to inspect and clean all hidden fields.
  6. Regenerate or remove thumbnails. Ensure page thumbnails reflect the redacted content, not the original.
  7. Check for attachments. Open the attachments panel. Remove any embedded files that should not be distributed.
  8. Inspect form fields. If the document contains forms, check field values, default values, and tooltips.

The Uncomfortable Truth About PDFs

The PDF format was designed to be a faithful digital representation of a printed page. Over 30 years, it accumulated features — interactivity, multimedia, forms, JavaScript, 3D content, embedded files — that have nothing to do with that original mission. Each feature is a potential information channel. Each channel is a potential leak.

The team responsible for this contract knew how to redact text properly — a skill most organizations still get wrong. But knowing how to redact text is not the same as knowing everything a PDF contains. This contract was not exposed by a spy or a whistleblower. It was exposed by a table of contents. In document security, the features you overlook are the ones that create exposure.

See Everything Hidden in Your PDFs

PDFb2's Metadata tool reveals all the hidden information in your PDF files — author names, creation software, timestamps, keywords, and more. Inspect and strip it all, entirely in your browser.

Inspect PDF Metadata Now