5 Open Data
Open data should include all variables, treatment conditions, and observations described in the manuscript, and provide a full account of the procedures used to collect, preprocess, clean, or generate the data. This data should allow for the reproduction of any plots, tables, or analyses reported in the manuscript.
If data are secondary, they should be cited rather than shared directly (see Section 25). Include any information on how others can also obtain the data.
Refer authors to Section 14.1 for tips and resources to make data open.
5.1 Resources
- Data Management and Data Sharing in Psychological Science: Revision of the DGPs Recommendations (Gollwitzer et al., 2020)
- Is It Time to Share Qualitative Research Data? (DuBois et al., 2018)
- APA Statement on data sharing
5.2 Data sharing issues
While it is ideal to shared the raw data exactly as collected, and include instructions or code needed to clean and process it to the analysed data, this may not be possible due to privacy or ethical issues. However, there are solutions that allow for partial sharing. For example, if it is impossible to share individual-level or trial-level data, authors could provide item-level and scale-level descriptives, variance/covariances, both for the full data set and stratified by subgroups.
5.2.1 Data are not anonymizable
Data may be in a format, such as videos of interviews, where it is not possible to make them anonymous. Consider recommending that they be archived at a managed-access data archiving service, such as the UK Data Service or Databrary. See Appendix C for a list of data repositories.
5.2.2 Anonymity could be compromised by data triangulation
Triangulation is when a combination of variables can uniquely identify some subjects, such as when there is only one person of a given age, gender, and ethnicity at the university of data collection. Measures can be taken such as omitting some data or redacting some levels of variables (e.g., grouping small-n groups under "Other"). An unredacted version can be stored in a managed-access archive.
5.2.3 Data are owned by others
If all or a subset of the original data are secondary and do not have a license compatible with re-sharing, the authors should detail the steps needed to access the original data from the source. Authors should be encouraged to determine if it is possible to share processed data at a higher level. For example, if it is permissible to show a scatter plot of data in a publication, it should also be permissible to archive a tabular version of the data in that format.
5.3 Checklist
- Does the TOP statement have the correct link to a publicly accessible version of the data? (or a statement that data is not available or N/A)
- Does the dataset contain the data required to produce all tables, plots, and analyses in the manuscript?
- If no, is any omission noted and explained?
- If raw data are shared, is there code or instructions for processing it?
- Is the dataset saved in an accessible format? (An order of preference for accessibility is CSV > Excel > SPSS)
- Are the variables clearly explained?
- Is there a codebook?
- Are level values understandable?
- Are units clear?
- Is the dataset being shared ethically? Is there any personally identifiable data?
- Is the dataset clearly licensed for reuse? (see Appendix B)
- Is the dataset hosted on a persistent archive? (see Appendix C)