Automatically inventory personal information
  • 29 Jan 2025
  • 1 Minute to read
  • Contributors
  • Dark
    Light
  • PDF

Automatically inventory personal information

  • Dark
    Light
  • PDF

Article summary

Metadata Extractors

Constellio's metadata extractors allow you to detect the personal information that is contained in your files. Using the power of regular expressions, it is possible to search for multiple information formats. Note that if your Constellio server is configured with OCR, the detection will be done for both text and image documents (e.g. scanned PDF documents).

Here are some examples for different sensitive information

TitleRegular ExpressionDetected Formats
Social Insurance Number\b((\d{3}[- ]\d{3}[- ]\d{3})|( \d{9}))\b999999999, 999 999 999, 999-999-999
Credit card\b((\d{4}[- ]\d{4}[- ]\d{4}[- ]\d{4})|( \d{16}))\b
  • 9999999999999999
  • 9999 9999 9999 9999
  • 9999-9999-9999-9999
Telephone number(\b[1-9][-\s]|\b) ([(]\d{3}[)]|\d{3}) [-\s]?\d{3}[-\s]?\d{4}\b
  • 9 (999) 999-9999
  • (999) 999-9999
  • 999-999-9999
  • 999 999 9999
  • 9-999-999-9999
  • 9 999 999 9999
E-mail address\b[_A-Za-z0-9-\+]+(\.[ _A-Za-z0-9-]+)*@[A-Za-z0-9-]+(\.[ A-Za-z0-9]+)*(\.[ A-Za-z]{2,})\b
  • XXXXXXXXXXX@XXXXX.XXX
  • XXXXXXXXXXX@XXXXX.XXX.XXX



You can either simply detect the presence of personal information or extract the value.

Here are the different possible parameters : 

Possible Values Definition

Field
MetadataThe metadata in which the analysis is doneTo parse text in a PDF, Docx, etc. file; select File metadata
RegexRegular expression to detect targeted data*See examples above
TypeDetermines whether we want to detect the information or if we want to extract it
  • Substitution: If the information is detected, write a predefined value in the metadata, e.g. "Contains a Social Insurance Number"
  • Transformation: If the intelligence is detected, extract the value from the metadata
ValueDetermines what is written to the metadata
  • Override: Enter a preset value like "Contains a Social Insurance Number"
  • Transformation: The written value is the position of the detected value. For example, if the text detects a credit card 3 times, write
    • $0 for the first match
    • $1 for the second match
    • $2 for the third connection

 For more information on metadata extractors, see the Metadata Extractor.






Was this article helpful?

What's Next
Changing your password will log you out immediately. Use the new password to log back in.
First name must have atleast 2 characters. Numbers and special characters are not allowed.
Last name must have atleast 1 characters. Numbers and special characters are not allowed.
Enter a valid email
Enter a valid password
Your profile has been successfully updated.