Formats

Enterprise-ready formats

Safe-Doc is built for sensitive documents teams actually use: contracts, reports, legal case files, HR and finance documents. The goal isn't "document parsing" - it's secure AI usage on these files.

What teams really share with AI

PDF and Word cover most sensitive enterprise workflows. Safe-Doc pseudonymizes or anonymizes, reduces risky context, and produces a copy-ready version for secure AI analysis (single doc or Data Room).

PDF (native text) Full support

Typical documents

  • Contracts and appendices
  • Reports, audits, memos
  • Legal case files
  • Finance documents

What Safe-Doc does for AI

  • Anonymization or pseudonymization of sensitive information
  • Metadata removal (author, tools, timestamps)
  • Context precision reduction (dates, amounts, locations)
  • Invisible layer removal
  • Clean document reconstruction
  • Audit summary (depending on level)

Advanced mode

  • Stronger context reduction and generalization
  • Advanced reconstruction (cleaned structure)

Note: Scanned PDFs require text extraction (OCR).

DOCX (Microsoft Word) Full support

Typical documents

  • Contracts and clauses
  • Internal memos
  • HR procedures and case files
  • Legal notes and summaries

What Safe-Doc does for AI

  • Anonymization or pseudonymization of sensitive information
  • Author properties removal
  • Comments removal
  • Revision history removal
  • Internal metadata cleanup

Advanced mode

  • Clean rebuild
  • Deeper structural cleanup

Limitation: Legacy .doc format is not supported at this stage.

OCR (option)

For scanned PDFs (text inside images):

  • Text extraction
  • Then pseudonymization or anonymization and context reduction
  • Same copy-ready output for secure AI analysis

Limitation: Quality depends on the original scan.

55+ entity types detected - multinational coverage

Safe-Doc detects and can pseudonymize or anonymize 55+ categories of sensitive information - a coverage level rarely published openly by competing solutions.

Identities

PERSON, ORGANIZATION, PSEUDO

Contact & location

EMAIL, PHONE, ADDRESS, LOCATION, REGION, POSTAL_CODE, GPS_COORDINATES

Financial

AMOUNT, IBAN, RIB, BIC, BANK_ACCOUNT, CARD_NUMBER, CARD_FRAGMENT, CVV, CARD_EXPIRY

France

SIRET, SIREN, VAT, RNA, SOCIAL_SECURITY

EU / International

PASSPORT, REISEPASS, DRIVER_LICENSE, PATENTE, LICENSE_PLATE, ID_CARD, PERSONALAUSWEIS, DNI, NIE, CODICE_FISCALE, SSN, NI_NUMBER, EIN, STEUER_ID, STEUERNUMMER, CIF_NIF, PARTITA_IVA, SEGURIDAD_SOCIAL

Digital

URL, IP, MAC_ADDRESS, API_KEY, PASSWORD

Other

UNIQUE_REFERENCE, DATE, MISC

Countries: France, Germany, Spain, Italy, UK, USA. Indicative list - actual scope depends on document and level N1–N2 (N3 on roadmap).

Enterprise roadmap

Goal: cover real enterprise formats without compromising document security.

XLSX (Excel)

Useful for:

  • Finance exports
  • HR datasets
  • Diligence tables

Excel involves complex structures (formulas, links, macros), requiring a dedicated pipeline for robustness and control.

PPTX (PowerPoint)

Useful for:

  • Committee decks
  • Internal presentations
  • Notes and comments
CSV / Data exports

Useful for:

  • Tool exports
  • Sensitive columns and identifiers
  • Structured data generalization
Batch processing (ZIP)

Useful for:

  • Document batches
  • Consolidated reporting

Current limitations

  • Password-protected files must be unlocked before upload.
  • Embedded scripts are removed.
  • Text contained only in images requires OCR.
  • Safe-Doc does not guarantee absolute anonymity; it significantly reduces exposure and re-identification risk through pseudonymization and de-identification.