Plans for sharing and reusing data are an integral part of the research data management planning process.
Many funders and journal publishers have policies which encourage, expect, or require researchers to prepare and provide their data for sharing. This is particularly true of data produced through public funding.
The OECD Declaration on Access to Research Data from Public Funding, to which Canada is a signatory, sets out reasons for sharing research.
Rationale for sharing data:
Preparation of data for sharing begins with the creation of a data management plan during the initial stages of the research project. Researchers should familiarize themselves with the policies of their funders as part of the planning process.
Factors to consider include:
Researchers should consult the Tri-Council Policy Statement: Ethical Conduct for Research Involving Humans (TCPS2) and Saint Paul University’s Office of Research and Ethics for information on contractual and ethical obligations.
The Tri-Council Policy Statement (TCPS2) stipulates that informed consent from project participants is necessary for the sharing and reuse of data containing identifiable information (TCPS2 Article 3.2 and Article 5.2). To ensure that consent has been received, consideration should be paid to wording in the consent form regarding preservation, reuse and/or sharing data containing identifiable information, how this information would be protected, and under what conditions the data would be shared or reused. Consent from participants is not required for secondary use or reuse of anonymous or aggregated data; however, informing participants about preservation, reuse, and sharing of this data is considered ethical.
Conditions for Sharing
Canadian copyright legislation does not cover raw research data although it does cover descriptions of data such as tables, graphs, and databases. The sharing of data files can be controlled and protected with licences. Researchers, in many cases, can decide on the level of access and conditions of use related to data they are sharing or depositing in a repository. Individual repositories may have embedded licence choices within the repository platform.
For help with author rights questions, visit our Copyright section.
Several online licensing options can be adopted for personal use:
Conditions of use should reflect the nature of the data and level of confidentiality involved.
Conditions of use can include:
Personal identifiable information should never be disclosed through research findings unless explicit informed consent from participants has been provided in writing.
Researchers must ensure that a person’s identity cannot be disclosed through:
Direct identifiers collected during the research process are usually not essential for data analysis and can be easily removed from the data. Consideration should be paid to the length of time these identifiers are kept separately and securely and to the manner in which they may be destroyed. In many cases, the collection of direct identifiers can be avoided during the initial collection process.
Anonymizing quantitative data may involve removing or aggregating variables. Techniques such as cell suppression, rounding, inference control, and perturbation can be employed to anonymize data. Coding information using standard classifications at higher levels than the one used during data collection is an example of a low-risk technique that can be employed in the anonymizing process.
Relational data requires particular attention where connections between variables may inadvertently cause identities to be revealed. Transcription of interviews may require the employment of different techniques, such as the use of consistent pseudonyms or more generalized terms, to reduce risk of identification without rendering the data unusable. Retain unedited versions of your data for use within the team or in the event of errors during anonymization. Remember to log all techniques used and instances of replacement or aggregation of variables.
Please refer to the UK Anonymization Network’s UKAN Resources for additional information and documentation on data anonymization, including comprehensive guides to performing anonymization.
Acknowledgements
We would like to thank the UK Data Service for use of their training materials in the creation of these modules.
We would also like to thank EDINA and the Data Library at the University of Edinburgh for use of materials from the Research Data MANTRA [online course] in the creation of these modules.