FR | EN

Formats

Enterprise-ready formats

SafeDoc is built for sensitive documents teams actually use: contracts, reports, legal case files, HR and finance documents. The goal isn’t “document parsing” — it’s secure AI usage on these files.

What teams really share with AI

PDF and Word cover most sensitive enterprise workflows. SafeDoc anonymizes, reduces risky context, and produces a copy-ready version for secure AI analysis (single doc or Data Room).

✓ PDF (native text) Full support

Typical documents

  • Contracts and appendices
  • Reports, audits, memos
  • Legal case files
  • Finance documents

What SafeDoc does for AI

  • Anonymization of sensitive information
  • Metadata removal (author, tools, timestamps)
  • Context precision reduction (dates, amounts, locations)
  • Invisible layer removal
  • Clean document reconstruction
  • Audit summary (depending on level)

Advanced mode

  • Stronger context reduction and generalization
  • Advanced reconstruction (cleaned structure)

Note: Scanned PDFs require text extraction (OCR).

✓ DOCX (Microsoft Word) Full support

Typical documents

  • Contracts and clauses
  • Internal memos
  • HR procedures and case files
  • Legal notes and summaries

What SafeDoc does for AI

  • Anonymization of sensitive information
  • Author properties removal
  • Comments removal
  • Revision history removal
  • Internal metadata cleanup

Advanced mode

  • Clean rebuild
  • Deeper structural cleanup

Limitation: Legacy .doc format is not supported at this stage.

⚙️ OCR (option)

For scanned PDFs (text inside images):

  • Text extraction
  • Then anonymization and context reduction
  • Same copy‑ready output for secure AI analysis

Limitation: Quality depends on the original scan.

Enterprise roadmap

Goal: cover real enterprise formats without compromising document security.

📌 XLSX (Excel)

Useful for:

  • Finance exports
  • HR datasets
  • Diligence tables

Excel involves complex structures (formulas, links, macros), requiring a dedicated pipeline for robustness and control.

📌 PPTX (PowerPoint)

Useful for:

  • Committee decks
  • Internal presentations
  • Notes and comments
📌 CSV / Data exports

Useful for:

  • Tool exports
  • Sensitive columns and identifiers
  • Structured data generalization
📌 Batch processing (ZIP)

Useful for:

  • Document batches
  • Consolidated reporting

Current limitations

  • Password-protected files must be unlocked before upload.
  • Embedded scripts are removed.
  • Text contained only in images requires OCR.
  • SafeDoc does not guarantee absolute anonymity; it significantly reduces exposure and re-identification risk.