Metadata extractor
  • 29 Jan 2025
  • 1 Minute to read
  • Contributors
  • Dark
    Light
  • PDF

Metadata extractor

  • Dark
    Light
  • PDF

Article summary

Metadata extractor

1. General

The metadata extractor collects information from documents deposited in Constellio and populates the metadata automatically. For example, the extractor can be used to populate custom metadata even if users use drag 'n drop.

There are three ways to use the extractor:

  1. By document properties
  2. By the styles applied in the content
  3. By the recognition of regular expressions
Types of metadata

Be careful, the metadata we want to extract must be of the string, text or reference type.


2. Properties

2.1 Adding the metadata to be extracted

  1.  The type of schema from which you want to extract the metadata. Here "Document";
  2.  Schema, for the default schema, choose "Document";
  3.  Select the specific metadata to populate;
  4. Go to "Property Analyzer";
  5.  Drag and drop a template document;
  6. Click on the desired property.



3. Style

3.1 Adding the metadata to be extracted

  1.  The type of schema from which you want to extract the metadata. Here "Document";
  2.  Schema, for the default schema, choose "Document";
  3.  Select the specific metadata to populate;
  4. Go to "Property Analyzer";
  5.  Drag and drop a template document;
  6. Click on the desired style.


4. Regular Expression

4.1 Adding the metadata to be extracted

  1.  The type of schema from which you want to extract the metadata. Here "Document";
  2.  Schema, for the default schema, choose "Document";
  3.  Select the specific metadata to populate;
  4.  Enter the regular expression;
    1. The metadata in which the analysis is made. To scan the content of the document, choose "File".
    2.  Regular expression
    3.  Type Choose Input Type
      1. Substitution: Allows you to replace the information with a predefined value such as "Contain a date"
      2. Transformation: The value written in the content is the value that will end up in the metadata.
    4.  Value refers to the method with which the metadata will be populated
      1. Predefined item such as "Contains a Social Insurance Number"
      2. Position of the detected value. $0 for the first connection, $1 for the second connection, etc.
Little tip

AI tools such as ChatGPT can help you determine the regular expression of the desired information.




Was this article helpful?

Changing your password will log you out immediately. Use the new password to log back in.
First name must have atleast 2 characters. Numbers and special characters are not allowed.
Last name must have atleast 1 characters. Numbers and special characters are not allowed.
Enter a valid email
Enter a valid password
Your profile has been successfully updated.