Sharing and Reusing Your Data - Saint Paul University

Plans for sharing and reusing data are an integral part of the research data management planning process.

Many funders and journal publishers have policies which encourage, expect, or require researchers to prepare and provide their data for sharing. This is particularly true of data produced through public funding.

The OECD Declaration on Access to Research Data from Public Funding, to which Canada is a signatory, sets out reasons for sharing research.

Rationale for sharing data:

Encourages scientific enquiry
Promotes innovation
Reduces duplication of research projects
Leads to new collaborations
Increases impact of research results
Reduces costs of research in developing nations
Encourages scrutiny, transparency, and accountability
Can be used in teaching

Preparing Data for Sharing

Preparation of data for sharing begins with the creation of a data management plan during the initial stages of the research project. Researchers should familiarize themselves with the policies of their funders as part of the planning process.

Factors to consider include:

Legal and ethical implications
- Will confidentiality of participants be compromised?
- Will sensitive information be compromised?
- Will sharing violate contractual agreements?
- Will sharing violate licencing agreements?
- Was sharing included in the informed consent agreement?
- Will the data need to be anonymized prior to release?
- Do you have consent from project partners?
- Do you have the right to share secondary data?

Intellectual property rights
- Will you be commercializing or seeking patents?

Researchers should consult the Tri-Council Policy Statement: Ethical Conduct for Research Involving Humans (TCPS2) and Saint Paul University’s Office of Research and Ethics for information on contractual and ethical obligations.

Obtaining Consent

The Tri-Council Policy Statement (TCPS2) stipulates that informed consent from project participants is necessary for the sharing and reuse of data containing identifiable information (TCPS2 Article 3.2 and Article 5.2). To ensure that consent has been received, consideration should be paid to wording in the consent form regarding preservation, reuse and/or sharing data containing identifiable information, how this information would be protected, and under what conditions the data would be shared or reused. Consent from participants is not required for secondary use or reuse of anonymous or aggregated data; however, informing participants about preservation, reuse, and sharing of this data is considered ethical.

Conditions for Sharing

Canadian copyright legislation does not cover raw research data although it does cover descriptions of data such as tables, graphs, and databases. The sharing of data files can be controlled and protected with licences. Researchers, in many cases, can decide on the level of access and conditions of use related to data they are sharing or depositing in a repository. Individual repositories may have embedded licence choices within the repository platform.

For help with author rights questions, visit our Copyright section.

Several online licensing options can be adopted for personal use:

Creative Commons (CC) allows users to combine elements of licences to create a licence for the research data in question
Open Data Commons (ODC) provides three licence options

Conditions of use should reflect the nature of the data and level of confidentiality involved.

Conditions of use can include:

Requiring researcher authorization for access
Setting access permissions for specific researcher groups
Placing data under timed embargos
Providing secure access to data
Requiring acknowledgment and attribution of authorship to original researcher

Anonymizing Data

Personal identifiable information should never be disclosed through research findings unless explicit informed consent from participants has been provided in writing.

Researchers must ensure that a person’s identity cannot be disclosed through:

Direct identifiers
- Includes names, addresses, dates of birth, postal codes, telephone numbers, social insurance numbers, images, etc.
Indirect identifiers
- When combined with multiple identifiers or publicly available information, they have the potential to reveal a participant’s identity
- Includes workplace information, occupation, age, salary, etc.

Direct identifiers collected during the research process are usually not essential for data analysis and can be easily removed from the data. Consideration should be paid to the length of time these identifiers are kept separately and securely and to the manner in which they may be destroyed. In many cases, the collection of direct identifiers can be avoided during the initial collection process.

Anonymizing quantitative data may involve removing or aggregating variables. Techniques such as cell suppression, rounding, inference control, and perturbation can be employed to anonymize data. Coding information using standard classifications at higher levels than the one used during data collection is an example of a low-risk technique that can be employed in the anonymizing process.

Relational data requires particular attention where connections between variables may inadvertently cause identities to be revealed. Transcription of interviews may require the employment of different techniques, such as the use of consistent pseudonyms or more generalized terms, to reduce risk of identification without rendering the data unusable. Retain unedited versions of your data for use within the team or in the event of errors during anonymization. Remember to log all techniques used and instances of replacement or aggregation of variables.

Please refer to the UK Anonymization Network’s UKAN Resources for additional information and documentation on data anonymization, including comprehensive guides to performing anonymization.

Acknowledgements

We would like to thank the UK Data Service for use of their training materials in the creation of these modules.

We would also like to thank EDINA and the Data Library at the University of Edinburgh for use of materials from the Research Data MANTRA [online course] in the creation of these modules.