SIDO User Guide
Summary
- I. Introduction
- II. First connection
- III. Authentification
- IV. Associate with a data source
- V. Roles
- VI. Data source management
- VII. Data set management
- VIII. Workbooks management
- IX. Formats
- X. Attached
I. Introduction
SIDO (Système d'Information pour les Données Orphelines) is a platform for depositing data collected by orphan data sources, which do not have an information system, in a dedicated database.
Each data source contains its own schema in the database.
Each schema is built from an XML file with parameters provided by the users associated with a data source (or sources). (XML Documentation)
SIDO is currently in Production phase (since 07/2020) : sido.pheno.fr
In this document, and on the website, ERDG refers to the multi-disciplinary Recherche Data Gouv repository, which is is a sovereign solution for sharing and opening up research data produced by communities that do not have a recognized disciplinary repository. It is based on the Dataverse open-source software.
II. First connection
During the first connection, you are asked to accept the use of cookies by SIDO. If the use of cookies is refused, access to SIDO is impossible. After agreeing to the conditions described in the "cookies" page, at the next connection, SIDO will no longer ask for acceptance of the use of these. If the agreement had been given previously, and the explanatory window for cookies appears again, it means that the cookies of the browser visiting SIDO have been deleted, or all of the data local to the machine, or even that the machine has been changed.
III. Authentification
After accepting cookies, the authentication page is presented by SIDO. It offers several identity providers (Google, INRAE, ORCID and B2Access). Choosing one of these providers for the first time, and connecting to it, creates an account on return to SIDO automatically. This account is unique, and dependent on the identifiers and the identity provider. The user's name will show up in SIDO as returned by the identity provider.
Changing identity provider creates a different account.
IV. Associate with a data source
To use SIDO, you must be associated with a data source (or several). If you are not yet associated with a data source, the home page displays a message indicating that you must contact the site administrator. This person is responsible for changing roles after verification, distributing the administered data source(s), and providing explanations of the procedure for entering data in SIDO.
The user associated with one or more data sources is then called the data provider.
Once the account is associated with one or more data sources, the SIDO home page presents a list of them. If only one data source is present, SIDO automatically redirects to its management page.
V. Roles
SIDO users have a role (this role is restricted to one Datasource). This role can be :
-
Data Provider: This role allows to create datasets, add files to them, delete files from them, or delete these datasets. A Data Provider cannot delete access to its own datasets, but can grant or revoke access to other Providers of the same data source.
-
Data Manager: In addition to the rights of the Provider role, the Data Manager can modify the description of the data source, has access to all the data sets of the data source, and has the power to grant the Data Provider role to people with an account in SIDO.
A Datasource has at least one Manager, and potentially several. The Manager role is granted by the SIDO administration team, and the Provider role is granted by the Managers or the SIDO administration team.
VI. Data source management
All actions related to the data source are located in the source management page. This is accessible immediately after choosing a source at home page.
In the first part of the page, a table represents each dataset in the data source.
This table presents a summary of each dataset, and action buttons for:
- add a workbook in the dataset
- visualize the workbooks of the dataset
- manage the dataset (change of status, metadata file if present, etc.)
- display more information about the dataset
Below this table, a button to create a new dataset in the datasource.
In the 2nd part of the page, 2 fields to modify the data source descriptions, in English and in French, which are used by TEMPO in the "Description of data sources" page.
In the 3rd part of the page, SIDO presents the model file and the parameter file associated with the data source by the administration. They are downloadable at any time.
VII. Data set management
Data management in SIDO is organized by dataset, a subset of the data source, which includes at least a status, a creation date and a last modification date.
These datasets are divided into 2 types:
- Datasets whose sole purpose is to be exported by webservice
- The datasets that benefit from the additions relating to the ERDG
1/ Data set for webservices only
Datasets whose sole purpose is to be exported to TEMPO have 2 statuses:
- Draft: the dataset is editable
- Published: the dataset is no longer editable and the data is available for harvesting by webservices
It is possible to change the objective of a dataset for publication via web services on ERDG. If the dataset is published, its status will be changed to draft to allow integration of the metadata file.
2/ Data set for webservices and ERDG
The datasets that benefit from the additions relating to the ERDG have 3 statuses:
- Draft: the dataset is editable, it is possible to download an empty metadata sheet. The provider fills it in, he has a field to upload it to SIDO, which checks if the sheet is correctly filled in.
- Publishable: If the user switches from draft mode to publishable, the DRAFT of the dataset is created on the ERDG (or a new draft version of the dataset, if there was already a published version in the ERDG), with its metadata sheet, and inferred metadata. Its files are transferred to this DRAFT dataset (or the current DRAFT version). The dataset is no longer editable. A DOI is created for the dataset on ERDG if it was not already there.
- Published: the dataset is no longer editable, it is published on the ERDG (or the current DRAFT version). The dataset is available for harvesting by TEMPO. The DOI becomes public and the link is functional.
If the user changes the status back from Published to other statuses, the dataset retrieves the possibilities associated with these statuses. The dataset will no longer be harvestable by webservices while waiting for the return to Published status.
A new draft version is only created when going from draft to publishable, so going from Published to Publiable only prevents harvesting by webservices.
Then going back through the Publishable (new reading of metadata, new uploading of files in a DRAFT version) and Published (publication on the ERDG) steps will put the new version online.
A metadata file will be automatically added by SIDO when a dataset is published in ERDG. It will be used to facilitate the display of the dataset on the PNDB catalog. It uses the Ecological Metadata Language (EML), a comprehensive vocabulary and a readable XML markup syntax for documenting research data.
If the dataset was not published to ERDG, it is possible to change its objective of a dataset for publication via ERDG on web services only.
Update a dataset for ERDG
When you want to replace one or more files in the dataset, for example following a new measurement campaign, you must do the following procedure:
- Set the dataset to Draft status (click on the button).
- Return to the dataset home page
- Click on the button to view and manage the files
- Use the existing DOI in the DOI column
- Insert the updated file
- Delete the old file
- Set the dataset to Publishable status
- Set the dataset to Published status
VIII. Workbook management
1/ Insert a workbook
It is only possible to insert workbooks into a data source after having created a dataset, or into an existing dataset and in "Draft" status. It is possible to insert one or more workbooks at a time, in the insertion page accessible by the "+" next to the corresponding dataset.
For example, being associated with the fictitious data source "Forest". Create a "Pins 2017 Dordogne" dataset, this one is in "Draft" status, with the objective of harvesting by webservices only.
Click on the "+" in the actions corresponding to the dataset. From here it is possible to re-download the template file as defined with the SIDO team. Let's say the workbook is already ready. Browse local folders, select the workbook and insert it (it is possible to insert several workbooks at the same time).
- Case 1: the workbook is invalid
The list of errors encountered is displayed.
The error report is downloadable with the button below.
After correction, repeat the insertion step described above.
- Case 2: the workbook is validated (respects the parameters XML file)
A message indicating the success of the operation is displayed. It is then possible to insert new workbooks, or return to the data source management page. To view the contents of the dataset, click on the "eye" icon in the actions corresponding to the dataset, and the workbook has been added.
2/ Check inserted workbooks
To return to the data source management page, simply click on the link at the top of the page, or return to the home page and click on the data source's name again.
The list of datasets is displayed again. To consult the workbooks inserted in a dataset, click on the "eye" icon in the actions corresponding to the dataset.
A list shows the name of the workbooks, who and when they uploaded them.
On each workbook, it is possible to view part of the data inserted in the database, to download the workbook deposited on the server, or to delete the workbook and the data from the database with the buttons present to the right of the table.
By clicking on the "eye" icon, SIDO allows you to preview the first 20 lines of a workbook sheet.
Tooltips appear when the mouse is hovered over to explain the functions of the buttons.
IX. Formats
Real numbers can be written without distinction between "." and ",", SIDO does the translation into an understandable format if necessary. Both can be used, including in the same column and the same file.
SIDO can also read formulas and calculate the result to take into account only this one. The library used is https://poi.apache.org/.
SIDO knows how to translate any type of date format, as long as they are correctly defined in the model file corresponding to the inserted data.
The format of the dates must be exactly the same in the column of data entered and in the corresponding template file. This format can take any conceivable form (the Verification.java class uses DateTimeFormatter: https://docs.oracle.com/javase/8/docs/api/java/time/format/DateTimeFormatter.html). The text declared in the template file field is passed as an argument to the DateTimeFormatter as is.
As a reminder, a date is declared in the .xml as follows (the tag that interests us being
<fieldname name="****" columnNameBD="******">
<unique>no</unique>
<missingValuesAccepted>****</missingValuesAccepted>
<fieldType>date</fieldType>
<fieldFormat>yyyy-MM-dd</fieldFormat>
</fieldname>
Attached
Languages
To change the language, click on the flag at the top right.
Contact
The contact section is in the footer.
Credits
The credits section is in the footer banner.
Legal notices and conditions of use
The legal notices and terms of use are in the footer.
Release notes
The current release notes are in the footer.