Creating a strategy for how you will manage your project files throughout the research process is a fundamental element of your overall data management plan. A research project may include multiple files in a variety of formats, multiple versions of files, spreadsheets, images, lab notes, interview tapes, etc., that are essential to the project. Establishing good file management practices at the outset is much easier than trying to organize the work midway through the project.
Managing your project files will render benefits later on by:
Elements central to managing project files include:
When organizing your files, consider including elements such as the project title, a unique identifier, and the date (year) in the folder directory name. The substructure should include a clear, documented naming convention; for instance, each component or run of an experiment, each version of a dataset, and/or each person in the group. The structure should follow a consistent pattern that can be clearly recognizable to the entire research group.
Elements of file naming conventions (below) apply to directory folder names as well.
Project files should be named and organized in a consistent and descriptive manner and in a way that is logical and predicable to yourself and others. Clear distinctions between files will facilitate effective and efficient file browsing and retrieval.
There are three things to keep in mind when labelling data:
Consider using several of the following elements in file names:
Other considerations:
Consider using version control systems for bulk renaming of files where necessary.
If, partway through a project, there is a need to rename a large number of files to conform to a systematic file naming convention you have adopted, there are a number of tools available to make this process easier.
Examples of file renaming tools:
Versioning, or version control, refers to the management of file revisions. Versioning assists researchers in managing data during a project where experimentation, revisions, and re-examinations are undertaken. Text files as well as data files may undergo numerous changes before the final version is set.
Versioning mechanisms, such as directory structure and file naming conventions, assist users in differentiating between different versions of a dataset and accompanying files.
Researchers should also consider discarding obsolete versions of files, but care should be taken in making decisions about future use of files before discarding. In some instances, keeping backup copies of versions may be advisable.
A number of tools are available for file versioning, including:
Apache Subversion Git
Backing up files refers to the creation of file copies. These copies should reside in a separate physical location from the working or stored files. Arranging a regular back up schedule mitigates the possibility of data loss and backup copies can be used to restore damaged or lost original files.
We would like to thank the UK Data Service for use of their training materials in the creation of these modules.
We would also like to thank EDINA and the Data Library at the University of Edinburgh for use of materials from the Research Data MANTRA [online course] in the creation of these modules.