An essential step in the research process is keeping records of your work throughout the project. You will at the very least need the data to describe your research findings in any upcoming presentations or publications. In order to preserve your data in a repository or share your data with others, you will be required to provide supplementary information, such as citation information, an explanation of survey methodology, sampling information, question context and coding, how and why derived variables were created, and so on.
Metadata
Many long-term preservation platforms and services require a varying amount of structured metadata to accompany deposited data files.
Metadata refers to the data or information that supports the discovery, understanding, and management of your research data. Good quality metadata is essential for accurate and informed usage, especially if data is to be reused or shared in the future. Metadata initiated during the planning stages of a research project decreases risks associated with data loss during and after project completion. It is critical, therefore, to start documenting your data from the very beginning of your project.
Use metadata:
- To enable others to reuse your data
- To facilitate preservation
- To allow replication at a later date
- To make the data understandable to others
The level of structure used to document your data will depend on the complexity of the project or data collected and the number of people involved in the project. Consider documenting (describing, outlining, identifying, etc.) the following information:
Study Level
- Creators, collaborators, funders, rights
- Research question and rationale
- Date the data was gathered or analysed
- Survey methodology
- Sampling frame
- Instruments, instrument setting, or measures used
File or Database Level
- Relationship between files
- Information contained within the files
- Format files are stored in
- Tests or analysis performed on the file(s)
- Information at the file or folder level (using a readme.txt file)
- Includes information on file naming, abbreviations, or acronyms used as well as contents of the file(s)
Variable Level
- Variable name and variable label explaining the variable meaning, unit of measure, sample weighting, etc.
- Information could be contained in a codebook
Sources of Metadata Information
- Standard information submitted in Research Ethics Board (REB) request
- Laboratory notebooks and experimental protocols
- Questionnaires, codebooks, data dictionaries
- Software syntax and output files
- Information about equipment settings and instrument calibration
- Database schema
- Methodology reports
- Provenance of derived data
Using Standards, Taxonomies, Classification Systems
When preserving or sharing data, using standards, taxonomies, or classification systems allows you to categorize or document data or other information in a widely understood way. Data repositories usually request that you use an international metadata standard.
Standards
A wide variety of standards and schemas are available for use in documenting research data. Most are discipline-specific but some can be adapted for use in other fields; all have a core set of tags collecting vital information related to your project including title, author, funding sources, abstract, keywords, terms of use, and copyright information. Examples include:
- Dublin Core (DC): general purpose standard for basic element description
- Data Documentation Initiative (DDI): XML-based standard for description of social and behavioural science data sets
- Federal Geographic Data Committee (FGDC) for geospatial description
Classifications
This is a method of standardizing information into relational schemas ensuring widespread understanding of concepts and descriptions.
Classification Systems
Used extensively by governments to depict hierarchical relationships and standard descriptions of specific classes such as goods, crops, geographical units, industries, occupations. Examples include:
- National Occupational Classification
- North American Industry Classification System
- Canadian System of Soil Classification
Example of a Metadata Standard: Dublin Core Metadata Element Set
This fifteen-term vocabulary set is considered to be the core elements which should be used to describe a digital resource. It is part of a more complex set of vocabularies known as the DCMI Metadata Terms, which is an ISO Standard [ISO15836] and an ANSI/NISO Standard [NISOZ3985].
See the table from the Dublin Core Metadata Element Set, Version 1.1 document and used under Creative Commons Attribution 3.0 Unported Licence.
Acknowledgements
We would like to thank the UK Data Service for use of their training materials in the creation of these modules.
We would also like to thank EDINA and the Data Library at the University of Edinburgh for use of materials from the Research Data MANTRA [online course] in the creation of these modules.