Documentation of the parameter file in XML format
Summary
Introduction
An XML file is the skeleton of a data workbook. It contains the criteria to be met for verification during validation and for data insertion into the database.
The criteria includes:
- The name of the schema in which the data will be inserted (Note: the schema name is case-sensitive).
- The number of the first line of data for all sheets in the workbook.
- Sheet names in the workbook and corresponding table names in the database.
- Column names in each sheet of the workbook and their corresponding attributes in the database tables.
- For each column, you can specify:
- if the value is unique
- If the value is mandatory
- A data type (âintegerâ, ânumericâ, âdateâ, âtextâ, âenumâ)
- A format (for dates)
- A reference to another column in a worksheet or a default value.
- An interval (for numeric and date values)
Note:
- The workbook must contain at least one sheet.
- The consistency of a sheet column value can be checked with up to two reference sheets.
- All sheets in the workbook must be declared in the XML file.
- The structure can be generated using
sido-cli.jar. Seejava -jar sido-cli.jar createxml --help.
Example of a parameter file
<!DOCTYPE workbook>
<workbook schemaNameBD="foret" startLine="4">
<worksheet name="Données" tableNameBD="donnee" ignored="false">
<fieldname name="Code du référentiel"
columnNameBD="ref_taxonomique_code_ref">
<unique>yes</unique>
<missingValuesAccepted>no</missingValuesAccepted>
<fieldType>text</fieldType>
</fieldname>
<fieldname name="Nom du référentiel"
columnNameBD="ref_taxonomique_nom_ref">
<unique>no</unique>
<missingValuesAccepted>no</missingValuesAccepted>
<fieldType>text</fieldType>
</fieldname>
<fieldname name="Date" columnNameBD="donnee_date">
<unique>no</unique>
<missingValuesAccepted>y/n</missingValuesAccepted>
<fieldType>date</fieldType>
<fieldFormat>yyyy-MM-dd</fieldFormat>
<rang min="2000-02-01" max="2005-02-01"></rang>
</fieldname>
<fieldname name="Année" columnNameBD="donnee_annee">
<unique>no</unique>
<missingValuesAccepted>y/n</missingValuesAccepted>
<fieldType>integer</fieldType>
</fieldname>
<fieldname name="Version"
columnNameBD="ref_taxonomique_version">
<unique>no</unique>
<missingValuesAccepted>no</missingValuesAccepted>
<fieldType>text</fieldType>
</fieldname>
<fieldname name="Sigle de la licence d'utilisation"
columnNameBD="licence_sigle">
<unique>yes</unique>
<missingValuesAccepted>no</missingValuesAccepted>
<fieldType>enum</fieldType>
<fieldValues>
<value>LO2.0</value>
<value>OdbL</value>
</fieldValues>
</fieldname>
</worksheet>
</workbook>
Parameter file description
(*) means that the keyword is mandatory for validation of the XML parameter file.
Workbook keywords:
- workbook(*) -> Workbook
- worksheet(*) -> Worksheet
- fieldname(*) -> Column in a worksheet
- startLine(*) -> Number of first line of data (for all sheets)
- ignored(*) -> Boolean (true, false), sheet to be ignored during workbook processing
Database keywords:
- schemaNameBD(*) -> Schema name in database
- tableNameBD(*) -> Name of table in database
- columnNameBD(*) -> Database attribute name, by convention = tableNameBD_columnName
Keywords for values:
-
fieldType -> Value type {âintegerâ, ânumericâ, âdateâ, âtextâ, âenumâ}.
-
fieldFormat -> Date format (e.g.
dd/MM/yyyy). The program uses the [Java DateTimeFormatter] API (https://docs.oracle.com/en/java/javase/11/docs/api/java/time/format/DateTimeFormatter.html) for date format. -
fieldValues -> For the âenumâ type, this tag contains the list of permitted values.
-
unique -> Value uniqueness {âyesâ, ânoâ}.
-
rank -> Check that the value belongs to an interval [min - max].
-
missingValuesAccepted -> Presence of missing value {âyesâ, ânoâ, ây/nâ}. The value ây/nâ means that one and only one of the two columns must be filled in for each row of the sheet. ây/nâ can be used between two columns only and only once in a sheet.
-
refSheet -> Reference(s) to another column in a worksheet.
- value1 -> Sheet and column reference number 1 or default value.
- value2 -> Sheet and column reference number 2
How it works:
- If the format of âvalue1â looks like âSheet:Columnâ.
- then the indicated value must correspond to one of the column values in the reference sheet.
- otherwise, the indicated value must correspond to the default value stated in âvalue1â.
- If the format of âvalue2â is similar to âSheet:Columnâ.
- then the indicated value must correspond to one of the column values in the reference sheet.
Only âvalue1â can designate a default value. âvalue2â if present always refers to a column.
Exemples
ignored:
- Sheet to be skipped during binder processing:
<worksheet name="INFOS" ignored="true"/>
- Sheet to be processed:
<worksheet name="INFOS" ignored="false"/>
fieldValues:
<fieldname name="Sigle de la licence d'utilisation"
columnNameBD="licence_sigle">
<unique>yes</unique>
<missingValuesAccepted>no</missingValuesAccepted>
<fieldType>enum</fieldType>
<fieldValues>
<value>LO2.0</value>
<value>OdbL</value>
</fieldValues>
</fieldname>
In this example, the type is enum, i.e. a list of values defined in the fieldValues in this case (LO2.0, OdbL).
rang:
Data interval:
- Numeric type:
<rang min="47.8" max="48.1"/>
- Date type (format âyyyy-MM-ddâ):
<rang min="2000-02-01" max="2005-02-01"/>
missingValuesAccepted:
<fieldname name="Date" columnNameBD="donnee_date">
<unique>no</unique>
<missingValuesAccepted>y/n</missingValuesAccepted>
<fieldType>date</fieldType>
<fieldFormat>yyyy-MM-dd</fieldFormat>
</fieldname>
<fieldname name="Année" columnNameBD="donnee_annee">
<unique>no</unique>
<missingValuesAccepted>y/n</missingValuesAccepted>
<fieldType>integer</fieldType>
</fieldname>
In this example, the program detects that the two columns âDateâ and âYearâ contain ây/nâ values in the âmissingValuesAcceptedâ criterion. It checks that for each row of the sheet, there is only one value entered for the two columns.
refSheet:
Example of âvalue1â with separator â:â.
<fieldname name="Nom de l'échelle phénologique"
columnNameBD="donnee_nom_echelle_phenologique">
<unique>no</unique>
<missingValuesAccepted>no</missingValuesAccepted>
<fieldType>text</fieldType>
<refSheet>
<value1>Stades BBCH:Nom du référentiel</value1>
<value2>Autres stades:Nom de l'échelle de référence</value2>
</refSheet>
</fieldname>
In this example, the program will check that the value indicated in the âPhenological scale nameâ column is present in the âReference scale nameâ column of the âBBCH stagesâ sheet or in the âReference scale nameâ column of the âOther stagesâ sheet.
Example of âvalue1â without the separator â:â.
âvalue1â without the separator â:â means that this value is the default.
<fieldname name="Nom de l'échelle phénologique"
columnNameBD="donnee_nom_echelle_phenologique">
<unique>no</unique>
<missingValuesAccepted>no</missingValuesAccepted>
<fieldType>text</fieldType>
<refSheet>
<value1>BBCH</value1>
<value2>Autres stades:Nom de l'échelle de référence</value2>
</refSheet>
</fieldname>
In this example, the program will check that the value indicated in the âPhenological scale nameâ column is âBBCHâ or that its value is present in the âReference scale nameâ column of the âOther stagesâ sheet.