Plans for sharing and reusing data are an integral part of the research data management planning process.
Many funders and journal publishers have policies which encourage, expect, or require researchers to prepare and provide their data for sharing. This is particularly true of data produced through public funding.
The OECD Declaration on Access to Research Data from Public Funding, to which Canada is a signatory, sets out reasons for sharing research.
Rationale for sharing data:
- Encourages scientific enquiry
- Promotes innovation
- Reduces duplication of research projects
- Leads to new collaborations
- Increases impact of research results
- Reduces costs of research in developing nations
- Encourages scrutiny, transparency, and accountability
- Can be used in teaching
Preparing Data for Sharing
Preparation of data for sharing begins with the creation of a data management plan during the initial stages of the research project. Researchers should familiarize themselves with the policies of their funders as part of the planning process.
Factors to consider include:
- Legal and ethical implications
- Will confidentiality of participants be compromised?
- Will sensitive information be compromised?
- Will sharing violate contractual agreements?
- Will sharing violate licencing agreements?
- Was sharing included in the informed consent agreement?
- Will the data need to be anonymized prior to release?
- Do you have consent from project partners?
- Do you have the right to share secondary data?
- Intellectual property rights
- Will you be commercializing or seeking patents?
Researchers should consult the Tri-Council Policy Statement: Ethical Conduct for Research Involving Humans (TCPS2) and Saint Paul University’s Office of Research and Ethics for information on contractual and ethical obligations.
Obtaining Consent
The Tri-Council Policy Statement (TCPS2) stipulates that informed consent from project participants is necessary for the sharing and reuse of data containing identifiable information (TCPS2 Article 3.2 and Article 5.2). To ensure that consent has been received, consideration should be paid to wording in the consent form regarding preservation, reuse and/or sharing data containing identifiable information, how this information would be protected, and under what conditions the data would be shared or reused. Consent from participants is not required for secondary use or reuse of anonymous or aggregated data; however, informing participants about preservation, reuse, and sharing of this data is considered ethical.
Conditions for Sharing
Canadian copyright legislation does not cover raw research data although it does cover descriptions of data such as tables, graphs, and databases. The sharing of data files can be controlled and protected with licences. Researchers, in many cases, can decide on the level of access and conditions of use related to data they are sharing or depositing in a repository. Individual repositories may have embedded licence choices within the repository platform.
For help with author rights questions, visit our Copyright section.
Several online licensing options can be adopted for personal use:
- Creative Commons (CC) allows users to combine elements of licences to create a licence for the research data in question
- Open Data Commons (ODC) provides three licence options
Conditions of use should reflect the nature of the data and level of confidentiality involved.
Conditions of use can include:
- Requiring researcher authorization for access
- Setting access permissions for specific researcher groups
- Placing data under timed embargos
- Providing secure access to data
- Requiring acknowledgment and attribution of authorship to original researcher
Anonymizing Data
Personal identifiable information should never be disclosed through research findings unless explicit informed consent from participants has been provided in writing.
Researchers must ensure that a person’s identity cannot be disclosed through:
- Direct identifiers
- Includes names, addresses, dates of birth, postal codes, telephone numbers, social insurance numbers, images, etc.
- Indirect identifiers
- When combined with multiple identifiers or publicly available information, they have the potential to reveal a participant’s identity
- Includes workplace information, occupation, age, salary, etc.
Direct identifiers collected during the research process are usually not essential for data analysis and can be easily removed from the data. Consideration should be paid to the length of time these identifiers are kept separately and securely and to the manner in which they may be destroyed. In many cases, the collection of direct identifiers can be avoided during the initial collection process.
Anonymizing quantitative data may involve removing or aggregating variables. Techniques such as cell suppression, rounding, inference control, and perturbation can be employed to anonymize data. Coding information using standard classifications at higher levels than the one used during data collection is an example of a low-risk technique that can be employed in the anonymizing process.
Relational data requires particular attention where connections between variables may inadvertently cause identities to be revealed. Transcription of interviews may require the employment of different techniques, such as the use of consistent pseudonyms or more generalized terms, to reduce risk of identification without rendering the data unusable. Retain unedited versions of your data for use within the team or in the event of errors during anonymization. Remember to log all techniques used and instances of replacement or aggregation of variables.
Please refer to the UK Anonymization Network’s UKAN Resources for additional information and documentation on data anonymization, including comprehensive guides to performing anonymization.
Acknowledgements
We would like to thank the UK Data Service for use of their training materials in the creation of these modules.
We would also like to thank EDINA and the Data Library at the University of Edinburgh for use of materials from the Research Data MANTRA [online course] in the creation of these modules.