Skip to content

participants.tsv and .json

About

The participants.json and participants.tsv files are metadata files that contain information about the participants in the study. The participant.tsv file contains the information while the participant.json file acts as a sidecar that describes the columns in the participant.tsv file.

Specifications

Requirement

This metadata is mandatory for all datasets.

Name and format

The participants.json file must be in JSON format and the participants.tsv must be in TSV format.

Content

participants.tsv

The content of the participants.tsv file must be as follows:

  • All column labels must follow the following naming convention: Only a-z characters (lowercase English alphabet) and 0-9 (numerical characters) are allowed with the exception of underscores that must be used to separate words (no white space allowed).
  • The first column must be labeled participant_id and must list the IDs used in the study for all participants in the dataset.
  • There must be one label for each datatype directory named exactly as as the corresponding directory. A boolean value (true/false) must be assigned for each participant based on if data from that datytype is included in the dataset for the participant or not.
  • If participants are organized into multiple group/cohort, a column labeled group must be included with the group name specified for each participant
  • Optionally include other columns as deemed adequate to facilitate data reuse such as age, etc.

participants.json

The participants.json is intended to document the meaning and allowable values of each column in the the participants.tsv file. Its content must be as follows:

  • There must be one key for each column label in the participants.tsv file except for the labels corresponding to the root-level datatype directories
  • All sub-keys must follow the same naming convention as the column labels in the participants.tsv file
  • A description sub-key must be included for each key to describe what the corresponding column represents
  • A data_type sub-key must be included to indicate what type of data is associated with that column (e.g., string, integer, etc.)
  • If there are a limited number of possible values for a given column, a sub-key called levels must be included to list and define the different possible values
  • If a key value is expressed in a given unit, a sub-key named unit must be included to specify the unit
  • Other sub-keys can be included as deemed necessary for understanding the content of the corresponding column.