- Print
- DarkLight
- PDF
Automatically inventory personal information
Metadata Extractors
Constellio's metadata extractors allow you to detect the personal information that is contained in your files. Using the power of regular expressions, it is possible to search for multiple information formats. Note that if your Constellio server is configured with OCR, the detection will be done for both text and image documents (e.g. scanned PDF documents).
Here are some examples for different sensitive information
Title | Regular Expression | Detected Formats |
---|---|---|
Social Insurance Number | \b((\d{3}[- ]\d{3}[- ]\d{3})|( \d{9}))\b | 999999999, 999 999 999, 999-999-999 |
Credit card | \b((\d{4}[- ]\d{4}[- ]\d{4}[- ]\d{4})|( \d{16}))\b |
|
Telephone number | (\b[1-9][-\s]|\b) ([(]\d{3}[)]|\d{3}) [-\s]?\d{3}[-\s]?\d{4}\b |
|
E-mail address | \b[_A-Za-z0-9-\+]+(\.[ _A-Za-z0-9-]+)*@[A-Za-z0-9-]+(\.[ A-Za-z0-9]+)*(\.[ A-Za-z]{2,})\b |
|
data:image/s3,"s3://crabby-images/c244d/c244d9c40e2bfdfde91ec1e40f9a279288fd07c8" alt=""
You can either simply detect the presence of personal information or extract the value.
Here are the different possible parameters :
Possible Values Definition | Field | |
---|---|---|
Metadata | The metadata in which the analysis is done | To parse text in a PDF, Docx, etc. file; select File metadata |
Regex | Regular expression to detect targeted data | *See examples above |
Type | Determines whether we want to detect the information or if we want to extract it |
|
Value | Determines what is written to the metadata |
|
For more information on metadata extractors, see the Metadata Extractor.