Guide

How Safe-Doc pseudonymizes or anonymizes documents (visually).

Safe-Doc secures external AI usage with a simple process: pseudonymize (or anonymize), analyze, de-anonymize.
A controllable flow: import → detect → replace → review → export (with optional restore).

Overview (1 minute)

Input

Single document, pasted text, or multi-document processing with Data Room.

Protection

Choose replacement type and protection level N1–N2 (N3 on the roadmap Q3 2026) based on risk.

Outputs

Copy-ready text + mapping (if reversible) + residual risk indicators.

1) Import: File, Paste, or Data Room

Safe-Doc screenshot: import area and mode choices

Import

Upload a file, paste text, or use Data Room to process multiple documents.

Safe-Doc screenshot: restore module with JSON mapping

Restore

Re-inject values locally using mapping_xxx.json and a tokenized AI answer.

Safe-Doc screenshot: Data Room multi-documents section

Data Room

Multi-document analysis with consistent pseudonyms across the whole batch.

Import options

  • Upload a file: drag & drop or select a document (PDF/DOCX/TXT).
  • Paste text: great for an email, an excerpt, a note.
  • Data Room (multi-docs): import multiple documents at once for multi-document analysis.
File Paste Data Room

Why Data Room helps

With pseudonymization, the same entity remains the same pseudonym across documents.

Doc 1
Doc 2
Carrefour
Carrefour
[ORG_2]
[ORG_2]

Great for diligence, contracts, multi-piece case files.

2) Choose a mode: Anonymize, Pseudonymize, or Fake data

Anonymized

Generic tokens: [PERSON], [LOCATION]

When you don't need to keep links between occurrences.

Pseudonymized

Numbered, consistent tokens: [PERSON_1], [ORG_2].

Best to preserve narrative consistency across a document set.

Fake data

Readable replacements (invented names/addresses) for a "natural" text.

Useful for review and presentation while masking real values.

3) Increase protection: N1–N2, N3 (roadmap)

Principle

Higher levels reduce re-identification through context (dates, amounts, locations, writing style).

N1–N2 Standard · Advanced

Direct identifiers (PII) and risky context: tokenization, cleanup and generalization (dates, amounts, locations, references).

N3 High security

Stylistic fingerprint and weak signals. Roadmap Q3 2026.

Visual example

"-17.3M – Feb 12, 2026 – Rouen"
N1–N2: contextual reduction · N3 (roadmap)
"mid-teen millions – Q1 2026 – North France"

Goal: keep useful meaning while reducing identifiability.

4) Review: detected entities and residual scan

Safe-Doc screenshot: detected entities table and review

Detected entities

Review, filter and adjust what must be masked.

Safe-Doc screenshot: pseudonymized output and copy/mapping options

Copy-ready

Copy the pseudonymized result and export mapping/report if needed.

What you control

  • Detected entities: people, orgs, locations, emails, phones, IBAN, amounts…
  • User choice: uncheck what you don't want to mask.
  • Comparisons: views by source (e.g., model vs regex) depending on UI.

Why human review matters

Automated detection can produce false positives and false negatives. Indicators (residual scan / leakage score) help assess risk, but don't replace a final review.

5) Export + Restore (optional)

Primary outputs

  • Pseudonymized text (or anonymized depending on mode): ready to paste into an AI tool.
  • JSON mapping: original → token mapping (reversible mode).
  • Report: audit-friendly indicators (depending on level).

Restore an AI answer

Paste the AI answer containing tokens, then import your mapping to re-inject values locally.

Ready to try it on a real document?

Pseudonymize or anonymize in seconds.