Data Structure

The Data Tracker is based on a few main components:

  • Order

  • Dataset

  • Project

  • User

  • Log

  • DOI

Terminology

  • Fields:

    • Fields in the documents for the datatype/collection

  • Computed fields:

    • Values that are either calculated or retrieved from documents in other collection(s)

    • Included when the entity is requested via API

Order

  • Added automatically when e.g. order in order portal changes to accepted

    • Import data from order portal

  • Can have any number of associated datasets

Fields

  • _id

    • Uuid for the order

  • Title

    • Name

  • Description

    • Description in markdown

  • Creator (facility name)

    • Creator can be set to e.g. external if a non-facility wants to add a dataset

  • Receiver

    • Email or uuid of the user who made the order

    • Input to add will be an email address, which is mapped to the user collection

    • Uuid saved: user exists

    • Email saved: user does not exist

  • Datasets

    • All datasets generated for the order

  • Extra fields

    • Custom fields in the style {'key': 'key_name', 'value': 'data value'}

Dataset

  • Data generated by e.g. facility

  • One per “data delivery” from facility

  • Can have identifier(s) (e.g. DOIs)

  • Data links can be added by receiver

  • Receiver and creator can edit the entry (inherited from order)

Fields

  • _id

    • Uuid of the dataset

  • Title

    • Name

  • Description

    • Description in markdown

  • Links

    • List of links to where the dataset can be found.

      • List entry:

      {
        'title': 'name',
        'url': 'https://place',
        'hashes': {
          'type': 'sha256',
          'files': [
            {
              'name': 'filename',
              'hash': 'FEDCBA9...'
            }
          ]
        }
      }
      
      • title and url are mandatory for each link, hashes is optional

  • Extra

    • Custom fields in the style {'key': 'key_name', 'value': 'data value'}

Computed fields

  • Related

    • All other datasets from the same order

  • Projects

  • Identifiers

    • Local identifier

    • DOIs

  • Creator

    • Inherited from order

    • Name of e.g. facility that generated the dataset

Project

  • Created by users

  • Can have multiple owners

  • Can have identifiers

  • Intended as a way for a user to have a page to show off their data and be able to get an identifier (DOI)

Fields

  • _id

    • Uuid for the projects

  • Title

    • Name

  • Description

    • Description in markdown

  • Contact

    • Contact information (email) for the project

  • Datasets

    • Datasets connected to the dataset

    • Can be added by receiver/creator of dataset

    • Can be removed by any user listed in owners

  • Publications

    • List of publications related to the project

      • Entry:

      {
        'title': 'name',
        'doi': 'doi-id'
      }
      
      • title+doi mandatory, but maybe include ability to add e.g. journal, year etc

  • DMP

    • Data management plan

    • URL

  • Owners

    • List of uuids/emails

    • Just like with order; email can be used if user not in db yet

      • Allow facilities to prepare project pages

  • Extra fields

    • Custom fields in the style {'key': 'key_name', 'value': 'data value'}

Computed fields:

  • Identifiers

    • Local identifier

    • DOIs

User

  • Everyone using the system is a user

  • Login via Elixir AAI

  • On first login, the user will be added to db

    • Use auth_id to recognize user

    • Read e.g. email from the login info

  • API can also be accessed using an API key

    • may be created by any user

  • “admin” can create user for facility

  • A user can “claim entries”

    • Will check all order receivers/project owners whether the users email is listed

      • Email will be replaced with user uuid

  • Facilities cannot log in via Elixir, but must do so via api_key

Fields

  • _id

    • Uuid for the order

  • Email

    • Email address of the user

  • Auth_id

    • Identifer received from Elixir

    • Is set to --facility-- for facilities to avoid Elixir login

  • Api_key

    • Key that can be used as an alternative to login for authentication

  • Name

    • Name of the user (can be e.g. name of facility for facility accounts)

  • Affiliation

    • University/company etc

  • Country

    • The country of the user

  • Permissions

    • A list of the extra permissions the user has (see Permissions)

Log

  • Whenever an entry (order, dataset, project, or user) is changed, a log should be written

  • All logs are in the same collection

  • A function is required to show changes between different versions of an entry

Fields

  • _id

    • uuid for the log

  • `Action’

    • Type of action (add, edit, or delete)

  • Data_type

    • The collection that was modified (order, dataset, project, or user)

  • Data

    • Add/edit: complete copy of document

    • Delete: empty

  • Timestamp

    • The time the action was performed

  • User

    • Uuid of the user performing the action

DOI

  • Two collections

    • doi_req - Requests for a DOI

    • doi - Accepted DOIs

  • Users can request a DOI for datasets and projects

  • Upon request, data is copied to doi_req

  • A reviewer will need to check the data for the request

    • Required fields

    • File hashes

  • If accepted, the data will be copied to doi

  • Each DOI document is a complete copy of the entire data structure that was accepted for the DOI

Fields (request)

  • _id

    • Uuid for the request

  • Data

    • A complete copy of all relevant data

    • A project with associated datasets will include copies of the datasets in datasets instead of only uuids

  • Status

    • Requested, Accepted, Rejected

  • User

    • User that made the request

  • Updates

    • Mini log system

    {
      'timestamp': <current time>,
      'new_status': 'new_status'
    }
    
  • Type

    • dataset or project

  • Comments

    • Comments from the reviewer

Computed Fields (request)

  • Other_requests

    • Other requests that have been made for the same entry

    • To allow the reviewer to see e.g. earlier comments

Fields (doi entry)

  • _id

    • The DOI identifier

  • timestamp

    • When the entry was created

  • Data

    • The complete entry that has been accepted

Other topics

Identifiers

  • Only uuid initially

  • Can request a “fancier” local identifer for dataset/project

    • Style similar to:

      • scilifelab.facility.orderxyz.dataset1

      • scilifelab.projects.title1

  • All datasets and projects can request DOI

    • The required fields will be checked if empty. If not the request will be sent for evaluation by e.g. admin

Permissions

  • “Permission classes” used to evaluate what a user may do

    • CREATE_ORDERS

    • MANAGE_USERS

    • EDIT_ANY_DATA

    • READ_OWNERS

    • DOI_REVIEWER

  • “Default groups”

    • Template for user, giving a specific set of permissions

    • Admin - “all”

    • Facility - “create orders”+”read ownerships”