Data Structure¶
The Data Tracker is based on a few main components:
Order
Dataset
Project
User
Log
DOI
Terminology¶
Fields:
Fields in the documents for the datatype/collection
Computed fields:
Values that are either calculated or retrieved from documents in other collection(s)
Included when the entity is requested via API
Order¶
Added automatically when e.g. order in order portal changes to
accepted
Import data from order portal
Can have any number of associated datasets
Fields¶
_id
Uuid for the order
Title
Name
Description
Description in markdown
Creator (facility name)
Creator can be set to e.g. external if a non-facility wants to add a dataset
Receiver
Email or uuid of the user who made the order
Input to add will be an email address, which is mapped to the user collection
Uuid saved: user exists
Email saved: user does not exist
Datasets
All datasets generated for the order
Extra fields
Custom fields in the style
{'key': 'key_name', 'value': 'data value'}
Dataset¶
Data generated by e.g. facility
One per “data delivery” from facility
Can have identifier(s) (e.g. DOIs)
Data links can be added by
receiver
Receiver
andcreator
can edit the entry (inherited fromorder
)
Fields¶
_id
Uuid of the dataset
Title
Name
Description
Description in markdown
Links
List of links to where the dataset can be found.
List entry:
{ 'title': 'name', 'url': 'https://place', 'hashes': { 'type': 'sha256', 'files': [ { 'name': 'filename', 'hash': 'FEDCBA9...' } ] } }
title
andurl
are mandatory for each link,hashes
is optional
Extra
Custom fields in the style
{'key': 'key_name', 'value': 'data value'}
Computed fields¶
Related
All other datasets from the same order
Projects
Identifiers
Local identifier
DOIs
Creator
Inherited from
order
Name of e.g. facility that generated the dataset
Project¶
Created by users
Can have multiple owners
Can have identifiers
Intended as a way for a user to have a page to show off their data and be able to get an identifier (DOI)
Fields¶
_id
Uuid for the projects
Title
Name
Description
Description in markdown
Contact
Contact information (email) for the project
Datasets
Datasets connected to the dataset
Can be added by
receiver
/creator
of datasetCan be removed by any user listed in
owners
Publications
List of publications related to the project
Entry:
{ 'title': 'name', 'doi': 'doi-id' }
title
+doi
mandatory, but maybe include ability to add e.g.journal
,year
etc
DMP
Data management plan
URL
Owners
List of
uuid
s/email
sJust like with
order
;email
can be used if user not in db yetAllow facilities to prepare project pages
Extra fields
Custom fields in the style
{'key': 'key_name', 'value': 'data value'}
Computed fields:¶
Identifiers
Local identifier
DOIs
User¶
Everyone using the system is a user
Login via Elixir AAI
On first login, the user will be added to db
Use
auth_id
to recognize userRead e.g.
email
from the login info
API can also be accessed using an API key
may be created by any user
“admin” can create user for facility
A user can “claim entries”
Will check all order
receiver
s/projectowners
whether the users email is listedEmail
will be replaced with useruuid
Facilities cannot log in via Elixir, but must do so via
api_key
Fields¶
_id
Uuid for the order
Email
Email address of the user
Auth_id
Identifer received from Elixir
Is set to
--facility--
for facilities to avoid Elixir login
Api_key
Key that can be used as an alternative to login for authentication
Name
Name of the user (can be e.g. name of facility for facility accounts)
Affiliation
University/company etc
Country
The country of the user
Permissions
A list of the extra permissions the user has (see Permissions)
Log¶
Whenever an entry (
order
,dataset
,project
, oruser
) is changed, a log should be writtenAll logs are in the same collection
A function is required to show changes between different versions of an entry
Fields¶
_id
uuid
for the log
`Action’
Type of action (add, edit, or delete)
Data_type
The collection that was modified (
order
,dataset
,project
, oruser
)
Data
Add/edit: complete copy of document
Delete: empty
Timestamp
The time the action was performed
User
Uuid
of the user performing the action
DOI¶
Two collections
doi_req
- Requests for a DOIdoi
- Accepted DOIs
Users can request a DOI for datasets and projects
Upon request, data is copied to
doi_req
A reviewer will need to check the data for the request
Required fields
File hashes
If accepted, the data will be copied to
doi
Each DOI document is a complete copy of the entire data structure that was accepted for the DOI
Fields (request)¶
_id
Uuid
for the request
Data
A complete copy of all relevant data
A project with associated datasets will include copies of the datasets in
datasets
instead of onlyuuid
s
Status
Requested
,Accepted
,Rejected
User
User that made the request
Updates
Mini log system
{ 'timestamp': <current time>, 'new_status': 'new_status' }
Type
dataset
orproject
Comments
Comments from the reviewer
Computed Fields (request)¶
Other_requests
Other requests that have been made for the same entry
To allow the reviewer to see e.g. earlier comments
Fields (doi entry)¶
_id
The DOI identifier
timestamp
When the entry was created
Data
The complete entry that has been accepted
Other topics¶
Identifiers¶
Only uuid initially
Can request a “fancier” local identifer for
dataset
/project
Style similar to:
scilifelab.facility.orderxyz.dataset1
scilifelab.projects.title1
All datasets and projects can request DOI
The required fields will be checked if empty. If not the request will be sent for evaluation by e.g. admin
Permissions¶
“Permission classes” used to evaluate what a user may do
CREATE_ORDERS
MANAGE_USERS
EDIT_ANY_DATA
READ_OWNERS
DOI_REVIEWER
“Default groups”
Template for user, giving a specific set of permissions
Admin - “all”
Facility - “create orders”+”read ownerships”