The PDF tool market is worth several billion dollars. None of the leading "free" services make their money from selling subscriptions — they make it from advertising, data licensing, and conversion funnels. Your file becomes the product.
This isn't tinfoil-hat paranoia. It's the straightforward business model documented in their own privacy policies. This article walks through what happens when you drag a PDF onto a typical free online tool, what gets logged, what gets shared, and what you can do about it.
What Happens When You Upload a PDF
The journey of a typical upload to a major "free" PDF tool goes something like this:
- You drag a file onto the upload area.
- The browser sends the file to the tool's server (often via S3 multipart upload).
- The server logs: your IP address, browser fingerprint, file size, file hash, filename, and inferred device type.
- The file is queued for processing on a worker. If the tool is a thin wrapper around an open-source library, the open-source binary processes the file and returns the result.
- The processed file is written to a download bucket. You get a temporary URL.
- The "delete after 2 hours" policy is enforced — usually. Sometimes it's lifecycle policies on the bucket, sometimes it's a scheduled job, sometimes it's "best effort". The original file is what's deleted, not necessarily the logs, the hashes, the file metadata, or the analytics events.
- Trackers fire: Google Analytics, Facebook Pixel, LinkedIn Insight, sometimes specialist ad networks. They get your IP, screen size, referrer, and any user IDs the site has assigned.
That's the BEST case. The WORST case is files that get retained indefinitely, used for ML training, or sold in aggregate to data brokers.
What's In Their Privacy Policies (You Should Read Them)
Some real language pulled from major PDF tool privacy policies (paraphrased for length):
- "We retain content for as long as necessary to provide our services." Translation: indefinitely, at our discretion.
- "We may use your content to improve our services." Translation: training data.
- "We share data with third-party providers." Translation: AWS, GCP, Cloudflare, plus ad networks.
- "We may retain logs and metadata." Translation: even after we 'delete' your file, we still know you used the tool, what kind of document it was, and how often.
None of this is illegal. Most of it is in the privacy policy you clicked "Accept" on without reading. But it adds up to a meaningful loss of privacy that most users never notice.
The Cookies and Pixels
Visit a major PDF tool homepage. Open browser DevTools → Network. Filter by "Doc" to see the trackers:
google-analytics.com/collect— page-view + event analytics.googletagmanager.com— orchestrates other tags.doubleclick.net— Google's ad network.facebook.com/tr/— Facebook conversion pixel.linkedin.com/li.lms-analytics— LinkedIn Insight tag.hotjar.comorfullstory.com— session replay (yes, they record what you click).intercom.io— chat widget that captures your interactions.
By the time you've uploaded a file, 5-10 third parties have your IP, browser fingerprint, and a signal that you were doing something with PDFs.
The "Open Source" Test
A simple test for whether a tool actually does what it claims: is the source code public?
- If yes, you can audit what happens to your file.
- If no, you have to take their word for it.
The major PDF tool vendors (iLovePDF, Smallpdf, PDF24, Sejda, Adobe Acrobat Online) are all closed source. The open-source alternatives include:
- PrivaTools — MIT-licensed full-stack, both online and self-hosted.
- Stirling-PDF — Java/Spring; self-host only.
- Mozilla pdf.js — viewer only.
- qpdf / pdftk — command line.
What Privacy-Respecting Tools Look Like
A genuinely privacy-respecting PDF tool has these properties:
- Open source. You can read the code.
- No account required. No identity to log against.
- Minimal logging. Aggregate metrics, not request-level identifiable logs.
- Aggressive deletion. Files removed immediately after response, not "after 2 hours".
- Browser-side processing where possible. Tools that don't need a server should run in WebAssembly.
- No third-party trackers. Or, if any, anonymized analytics with explicit disclosure.
- Self-host option. So you can run the tools on your own infrastructure if you don't want to trust ANY hosted service.
How PrivaTools Handles It
For full transparency, here's exactly what happens when you use PrivaTools:
- Files are processed inside an isolated Docker container. The container has no network egress; it can't phone home.
- Files are deleted immediately after the HTTP response. No "2 hours" retention. The cleanup is a background task that fires within seconds.
- No account, ever. The site has no login mechanism.
- Only anonymous Google Analytics 4 page-view telemetry. No identifiable events; IP anonymization is on; blockable by any standard extension. We're considering removing GA4 entirely.
- No third-party ad pixels, no remarketing, no session replay.
- Source code is MIT-licensed at github.com/taiyeba-dg/privatools. Audit it yourself.
- Browser-side tools run entirely in your browser. Files never reach our servers for tools like Summarize, Smart Redact, JWT Decoder, Regex Tester, Password Generator, and more.
- Self-hostable.
docker compose up --buildand you're running your own instance.
What You Can Do Right Now
- Use browser-side tools when possible. Look for "client-side" or "browser-only" badges.
- Install uBlock Origin. Blocks the ad pixels and analytics from firing.
- Read privacy policies. Search them for "retain", "share", "improve our services". The honest ones are short and specific.
- Self-host the tools you use most. Open-source projects make this trivial.
- Don't upload anything you wouldn't want stored. If it's truly sensitive (medical, legal, financial), use a desktop tool or a self-hosted instance.
FAQ
Are the trackers actually a problem if I'm not doing anything secret?
The trackers themselves aren't dangerous. The aggregation problem is. Every site sees a slice; advertisers and data brokers stitch them together. You don't get to see your composite profile or correct it.
Is "we delete after 2 hours" enough?
It's better than retaining indefinitely. It's worse than not uploading in the first place. Two hours is plenty of time for a misconfigured backup, a debugging engineer, or an internal log query to copy the file somewhere it won't be deleted.
What's the safest way to use online PDF tools?
In order of safety: (1) use a desktop tool, (2) self-host an open-source one, (3) use a browser-side tool that doesn't upload, (4) use an open-source online tool with aggressive deletion, (5) use any free closed-source tool with no caveats about retention.