Data Tracker¶
Data Structure¶
The Data Tracker is based on a few main components:
Order
Dataset
Project
User
Log
DOI
Terminology¶
Fields:
Fields in the documents for the datatype/collection
Computed fields:
Values that are either calculated or retrieved from documents in other collection(s)
Included when the entity is requested via API
Order¶
Added automatically when e.g. order in order portal changes to
accepted
Import data from order portal
Can have any number of associated datasets
Fields¶
_id
Uuid for the order
Title
Name
Description
Description in markdown
Creator (facility name)
Creator can be set to e.g. external if a non-facility wants to add a dataset
Receiver
Email or uuid of the user who made the order
Input to add will be an email address, which is mapped to the user collection
Uuid saved: user exists
Email saved: user does not exist
Datasets
All datasets generated for the order
Extra fields
Custom fields in the style
{'key': 'key_name', 'value': 'data value'}
Dataset¶
Data generated by e.g. facility
One per “data delivery” from facility
Can have identifier(s) (e.g. DOIs)
Data links can be added by
receiver
Receiver
andcreator
can edit the entry (inherited fromorder
)
Fields¶
_id
Uuid of the dataset
Title
Name
Description
Description in markdown
Links
List of links to where the dataset can be found.
List entry:
{ 'title': 'name', 'url': 'https://place', 'hashes': { 'type': 'sha256', 'files': [ { 'name': 'filename', 'hash': 'FEDCBA9...' } ] } }
title
andurl
are mandatory for each link,hashes
is optional
Extra
Custom fields in the style
{'key': 'key_name', 'value': 'data value'}
Computed fields¶
Related
All other datasets from the same order
Projects
Identifiers
Local identifier
DOIs
Creator
Inherited from
order
Name of e.g. facility that generated the dataset
Project¶
Created by users
Can have multiple owners
Can have identifiers
Intended as a way for a user to have a page to show off their data and be able to get an identifier (DOI)
Fields¶
_id
Uuid for the projects
Title
Name
Description
Description in markdown
Contact
Contact information (email) for the project
Datasets
Datasets connected to the dataset
Can be added by
receiver
/creator
of datasetCan be removed by any user listed in
owners
Publications
List of publications related to the project
Entry:
{ 'title': 'name', 'doi': 'doi-id' }
title
+doi
mandatory, but maybe include ability to add e.g.journal
,year
etc
DMP
Data management plan
URL
Owners
List of
uuid
s/email
sJust like with
order
;email
can be used if user not in db yetAllow facilities to prepare project pages
Extra fields
Custom fields in the style
{'key': 'key_name', 'value': 'data value'}
Computed fields:¶
Identifiers
Local identifier
DOIs
User¶
Everyone using the system is a user
Login via Elixir AAI
On first login, the user will be added to db
Use
auth_id
to recognize userRead e.g.
email
from the login info
API can also be accessed using an API key
may be created by any user
“admin” can create user for facility
A user can “claim entries”
Will check all order
receiver
s/projectowners
whether the users email is listedEmail
will be replaced with useruuid
Facilities cannot log in via Elixir, but must do so via
api_key
Fields¶
_id
Uuid for the order
Email
Email address of the user
Auth_id
Identifer received from Elixir
Is set to
--facility--
for facilities to avoid Elixir login
Api_key
Key that can be used as an alternative to login for authentication
Name
Name of the user (can be e.g. name of facility for facility accounts)
Affiliation
University/company etc
Country
The country of the user
Permissions
A list of the extra permissions the user has (see Permissions)
Log¶
Whenever an entry (
order
,dataset
,project
, oruser
) is changed, a log should be writtenAll logs are in the same collection
A function is required to show changes between different versions of an entry
Fields¶
_id
uuid
for the log
`Action’
Type of action (add, edit, or delete)
Data_type
The collection that was modified (
order
,dataset
,project
, oruser
)
Data
Add/edit: complete copy of document
Delete: empty
Timestamp
The time the action was performed
User
Uuid
of the user performing the action
DOI¶
Two collections
doi_req
- Requests for a DOIdoi
- Accepted DOIs
Users can request a DOI for datasets and projects
Upon request, data is copied to
doi_req
A reviewer will need to check the data for the request
Required fields
File hashes
If accepted, the data will be copied to
doi
Each DOI document is a complete copy of the entire data structure that was accepted for the DOI
Fields (request)¶
_id
Uuid
for the request
Data
A complete copy of all relevant data
A project with associated datasets will include copies of the datasets in
datasets
instead of onlyuuid
s
Status
Requested
,Accepted
,Rejected
User
User that made the request
Updates
Mini log system
{ 'timestamp': <current time>, 'new_status': 'new_status' }
Type
dataset
orproject
Comments
Comments from the reviewer
Computed Fields (request)¶
Other_requests
Other requests that have been made for the same entry
To allow the reviewer to see e.g. earlier comments
Fields (doi entry)¶
_id
The DOI identifier
timestamp
When the entry was created
Data
The complete entry that has been accepted
Other topics¶
Identifiers¶
Only uuid initially
Can request a “fancier” local identifer for
dataset
/project
Style similar to:
scilifelab.facility.orderxyz.dataset1
scilifelab.projects.title1
All datasets and projects can request DOI
The required fields will be checked if empty. If not the request will be sent for evaluation by e.g. admin
Permissions¶
“Permission classes” used to evaluate what a user may do
CREATE_ORDERS
MANAGE_USERS
EDIT_ANY_DATA
READ_OWNERS
DOI_REVIEWER
“Default groups”
Template for user, giving a specific set of permissions
Admin - “all”
Facility - “create orders”+”read ownerships”
API¶
Base URL for the API is <url>/api
. All API description have the base implied before the first /
.
Order¶
-
/order/<identifier>
GET Get information about the order with uuid
identifier
.DELETE Delete the order with uuid
identifier
.PUT Update the order with uuid
identifier
.
-
/order/add
GET Get an object describing the input fields for POST.
POST Add a new order.
-
/order/<identifier>/addDataset
GET Get an object describing the input fields for POST.
POST Add a new dataset belonging to order with uuid
identifier
.
-
/order/user
GET Get a list of orders created or received by current user or
username
(if provided as parameter).
Dataset¶
-
/dataset/<identifier>
GET Get information about the dataset with uuid
identifier
.DELETE Delete the dataset with uuid
identifier
.PUT Update the dataset with uuid
identifier
.
-
/dataset/all
GET Get a list of all datasets. Can be limited by parameters.
-
/dataset/user
GET Get a list of datasets created or received by current user or
username
(if provided as parameter).
Project¶
-
/project/<identifier>
GET Get information about the project with uuid
identifier
.DELETE Delete the project with uuid
identifier
.PUT Update the project with uuid
identifier
.
-
/project/all
GET Get a list of all projects. Can be limited by parameters.
-
/project/user
GET Get a list of projects created or received by current user or
username
(if provided as parameter).
User¶
-
/user/me
GET Get information about the current user
-
/user/edit
GET Update information of current user
-
/user/edit/<uuid>
GET Update information of user with uuid
uuid
-
/user/logout
GET Log out current user
-
/user/login
GET Log in user via elixir
-
/user/all
GET Get a list of all users
-
/user/countries
GET Get a list of countries
System for development¶
Build and activate the containers:
docker-compose up
The system can be accessed in a web browser at localhost:5000
.
Randomized test data can be generated by test/gen_test_db.py
. Run it using e.g.:
PYTHON_PATH=backend python3 test/gen_test_db.py
Code reference¶
app.py¶
config.py¶
Settings manager for the data tracker.
Read settings from ./config.yaml, ../config.yaml or from the provided path.
-
config.
init
(app)[source]¶ Read settings and add them to the app config.
- Parameters
app – the Flask app
-
config.
read_config
(path: str = '')[source]¶ Look for settings.yaml and parse the settings from there.
The file is expected to be found in the current, parent or provided folder.
- Parameters
path (str) – The yaml file to use
- Returns
The loaded settings
- Return type
dict
- Raises
FileNotFoundError – No settings file found
dataset.py¶
developer.py¶
order.py¶
project.py¶
structure.py¶
Required fields for the different data types.
-
structure.
dataset
()[source]¶ Provide a basic data structure for a dataset.
- Returns
the data structure for datasets
- Return type
dict
-
structure.
order
()[source]¶ Provide a basic data structure for an order.
- Returns
the data structure for orders
- Return type
dict
-
structure.
order_validator
(data: dict)[source]¶ Validate the content of the fields of an incoming order.
- Parameters
data (dict) – order to check
- Raises
ValueError – bad incoming data
user.py¶
utils.py¶
General helper functions.
-
utils.
check_csrf_token
()[source]¶ Compare the csrf token from the request (header) with the one in the cookie.session.
-
utils.
check_mongo_update
(document: dict)[source]¶ Make sure that some fields in a document are not changed during an update.
Also make sure indata is not empty.
- Parameters
document (dict) – received input to update a document
-
utils.
convert_keys_to_camel
(chunk)[source]¶ Convert keys given in snake_case to camelCase.
The capitalization of the first letter is preserved.
- Parameters
chunk – Object to convert
- Returns
chunk converted to camelCase dict, otherwise chunk
- Return type
-
utils.
country_list
()[source]¶ Provide a list of countries.
- Returns
A selection of countries.
- Return type
list
-
utils.
gen_csrf_token
() → str[source]¶ Genereate a csrf token.
- Returns
the csrf token
- Return type
str
-
utils.
get_dataset
(identifier: str)[source]¶ Query for a dataset from the database.
- Parameters
identifier (str) – the uuid of the dataset
- Returns
the dataset
- Return type
dict
-
utils.
get_db
(dbserver: pymongo.mongo_client.MongoClient) → pymongo.database.Database[source]¶ Get the connection to the MongoDB database.
- Parameters
dbserver – connection to the db
- Returns
the database connection
- Return type
pymongo.database.Database
-
utils.
get_dbserver
() → pymongo.mongo_client.MongoClient[source]¶ Get the connection to the MongoDB database server.
- Returns
the client connection
- Return type
pymongo.mongo_client.MongoClient
-
utils.
get_project
(identifier: str)[source]¶ Query for a project from the database.
- Parameters
identifier (str) – the uuid of the project
- Returns
the project
- Return type
dict
-
utils.
is_email
(indata: str)[source]¶ Check whether a string seems to be an email address or not.
- Parameters
indata (str) – data to check
- Returns
is the indata an email address or not
- Return type
bool
-
utils.
is_owner
(dataset: str = None, project: str = None)[source]¶ Check if the current user owns the given dataset or project.
If both a dataset and a project is provided, an exception will be raised.
- Parameters
dataset (str) – the dataset to check
project (str) – the project to check
- Returns
whether the current owns the dataset/project
- Return type
bool
- Raises
ValueError – one of dataset or project must be set, and not both
-
utils.
make_log
(data_type: str, action: str, data: dict = None)[source]¶ Log a change in the system.
Saves a complete copy of the new object.
It is assumed that all values are curated, e.g. that data only contains permitted fields.
- Parameters
action (str) – type of action (insert, update etc)
data_type (str) – the collection name
data (dict) – the new data for the entry
-
utils.
make_timestamp
()[source]¶ Generate a timestamp of the current time.
- Returns
the current time
- Return type
datetime.datetime
-
utils.
new_uuid
() → uuid.UUID[source]¶ Generate a uuid for a field in a MongoDB document.
- Returns
the new uuid in binary format
- Return type
uuid.UUID