Scrubbing personal data out of text
DeID-Anonymizer detects sensitive data in text and masks it, behind a FastAPI service with a job queue, object storage, and a React UI. The detection runs inline; the async pipeline is built out and ready to carry it.
What it does
The shape is a de-identification platform: point it at text, and it detects personal data and masks it by policy, with an API, background jobs, and object storage behind it.
What runs today is deliberately narrow. Detection is regex, email and phone, masking what it matches, inline in the endpoint. The FastAPI service, the Celery queue, and the MinIO storage are wired as scaffolding for the next layer: spaCy and Presidio doing real named-entity recognition, and files moving through the queue instead of the request. The frame is built; the detector is the part that grows.
The pipeline around the detector
/deid/text
async jobs
object storage
The bones
It is where I started building services, and the bones are the same ones I still reach for: a typed API, a queue, object storage, a small React front. The detector is regex today, and the platform around it is ready for the model that replaces it.