- Print
- DarkLight
- PDF
1. Metadata extractors
Allows you to add elements to Constellio and extract their metadata, either by using styles, regular checkouts, or properties. The order of priority for populating a metadata is defined in the system configurations. If no data is defined for styles and regular expressions by the metadata extractor, Constellio will automatically export the property data.
Here is the form to fill out to define styles, properties, and regular expressions for a specific metadata:
It is useful for accurate metadata to specify extraction information in all three methods (styles, properties, and regular expressions). However, for a specific Word template with styles, it may also be useful to create a metadata schema specific to that template and define precisely how to retrieve each metadata with one or more Word styles or templates.
2. Create a metadata extractor
- Click on " Administration " in the navigation menu;
- Click on " Metadata Extractor ";
- In the " Metadata Extractor " window, click " Add ";
- In the second " Metadata Extractor " window, complete the metadata needed to create a metadata extractor. Whether with styles, properties, regular expressions or only the desired elements. Click on " Save ".
Stream 1: Metadata selection | ||
---|---|---|
Field Name | Type | Description |
Schema Type | Obligatory | Select a schema type. |
Scheme | Obligatory | If multiple schemas exist for the selected schema type, choose the precise schema that contains the metadata for which the extractor is to be created. |
Metadata | Obligatory | Select the precise metadata (e.g., title, author, description, etc.) |
Pane 2: Define styles, properties, and regular expressions | |||
---|---|---|---|
Field Name | Type | Description | |
Styles | Facultative | Enter the name given to the style in Word. The name must be written a lowercase and without spaces. (E.g.: if the style is named Proper Title, you must write own title). It is possible to register several styles for a metadata. | |
Properties | Facultative | Enter the name of the property that is equivalent to the metadata. For schemas, documents, and emails, properties that are equivalent to metadata are already specified by default in the metadata extractor. If you add a new schema, it is possible to rely on the ones indicated for the document. | |
Regexes (regular expressions) | Facultative | Allows you to define one or more regular expressions, each for a specific metadata. For each regular expression, when the target metadata matches, it is possible to configure the extractor to use the value found, or another value that is specified. | |
Metadata | The metadata in which the analysis is done. To analyze the text in a PDF, Docx, etc. file; select the File metadata. | ||
Regex | Allows you to register the regular expression. | ||
Type | Allows us to determine if we want to detect the information or if we want to extract it.
| ||
Value |
| ||
Enabled only at creation | Facultative | Allows you to specify whether the checkout is done only when the document is created, or each time it is modified. |
2.1 Property Analyzer
The property parser allows you to select the document of your choice to analyze its properties and choose the metadata you want to extract automatically.
- Click on "Administration" in the navigation menu;
- Click on "Metadata Extractor";
- In the "Metadata Extractor" window, click "Add";
- Click on the "Properties analyzer" option;
- Select by the button a document or drag it into the page;
- The metadata of properties and styles are displayed, click on the metadata of your choice;
- A confirmation that the property has been added to the list appears;
- Close the window to return to the metadata extraction page. The "Page Count" metadata has been added to the "Page-Count" metadata.
- You must now fill in the other fields to determine in which already existing metadata "Page Count" should be extracted.
- Metadata is now defined as extracted metadata.
- The metadata is now automatically extracted as soon as it is added to Constellio.
3. Edit a metadata extractor
- Click on " Administration " in the navigation menu;
- Click on " Metadata Extractor ";
- In the "Metadata Extractor" window, click on the notebook to the right of the item to be modified;
- Make the changes and click " Save ".
4. Delete a metadata extractor
- Click on " Administration " in the navigation menu;
- Click on " Metadata Extractor ";
- In the " Metadata Extractor " window, click on the red X to the right of the item to be deleted;
- A confirmation window appears, click on " Save ".
5. Configurations
In this section, you will find all the system configurations impacting metadata extractors. To learn more about configurations, see the "Systems configurations" article.
Advanced tab | |||
---|---|---|---|
Configuration | Description | Possible values | Impacts |
Remove the extension in the title of a document | This configuration allows you to remove the extension (e.g., .txt, .doc) in the "Title" field of a document when it is fed using metadata extractors (extraction by properties). | Activated | The title of the metadata card will not include the file extension. |
Disabled | The title of the metadata card will include the file extension. | ||
Priority when populating metadata | This configuration makes it possible to determine the order of prioritization for the populating of the metadata during the automatic extraction of the title in the import of documents. | Styles : For a Word document will be imported and takes into account in priority the style that was created in the Word document. Styles : For a Word document will be imported and takes into account in priority the style that was created in the Word document. | Example: For the Choice Styles -> Regular Expressions -> Properties, Constellio will check it out in the following order if the data is available:
If there is no data in the regular styles and expressions, Constellio will automatically export the property data. Example: For the Choice Styles -> Regular Expressions -> Properties, Constellio will check it out in the following order if the data is available.
|
File name: The file name will be used. | |||
Properties : The title defined in the properties will be used. | |||
Priority when populating the title | This configuration allows you to specify the order in which the title metadata will be extracted when importing documents. To do this, you must configure the Metadata Extractors module. | Styles : For a Word document will be imported and takes into account in priority the style that was created in the Word document. | Example: For the Choice Styles -> Regular Expressions -> Properties, Constellio will check out in the following order if the data is available:
If there is no data in the regular styles and expressions, Constellio will automatically export the property data. |
File name : The file name will be used. | |||
Properties : The title defined in the properties will be used. |