- Print
- DarkLight
- PDF
Article summary
Did you find this summary helpful?
Thank you for your feedback
Metadata extractor
1. General
The metadata extractor collects information from documents deposited in Constellio and populates the metadata automatically. For example, the extractor can be used to populate custom metadata even if users use drag 'n drop.
There are three ways to use the extractor:
- By document properties
- By the styles applied in the content
- By the recognition of regular expressions
Types of metadata
Be careful, the metadata we want to extract must be of the string, text or reference type.
2. Properties
2.1 Adding the metadata to be extracted
- The type of schema from which you want to extract the metadata. Here "Document";
- Schema, for the default schema, choose "Document";
- Select the specific metadata to populate;
- Go to "Property Analyzer";
- Drag and drop a template document;
- Click on the desired property.
3. Style
3.1 Adding the metadata to be extracted
- The type of schema from which you want to extract the metadata. Here "Document";
- Schema, for the default schema, choose "Document";
- Select the specific metadata to populate;
- Go to "Property Analyzer";
- Drag and drop a template document;
- Click on the desired style.
4. Regular Expression
4.1 Adding the metadata to be extracted
- The type of schema from which you want to extract the metadata. Here "Document";
- Schema, for the default schema, choose "Document";
- Select the specific metadata to populate;
- Enter the regular expression;
- The metadata in which the analysis is made. To scan the content of the document, choose "File".
- Regular expression
- Type Choose Input Type
- Substitution: Allows you to replace the information with a predefined value such as "Contain a date"
- Transformation: The value written in the content is the value that will end up in the metadata.
- Value refers to the method with which the metadata will be populated
- Predefined item such as "Contains a Social Insurance Number"
- Position of the detected value. $0 for the first connection, $1 for the second connection, etc.
Little tip
AI tools such as ChatGPT can help you determine the regular expression of the desired information.
Was this article helpful?