Creating a strategy for how you will manage your project files throughout the research process is a fundamental element of your overall data management plan. A research project may include multiple files in a variety of formats, multiple versions of files, spreadsheets, images, lab notes, interview tapes, etc., that are essential to the project. Establishing good file management practices at the outset is much easier than trying to organize the work midway through the project.
Managing your project files will render benefits later on by:
- Increasing efficiency
- Reducing risk of loss or file redundancy
- Increasing research impact by making it easier to share files
- Complying with legal/ethical requirements or policies
- Providing clear record of research process
- Facilitating preservation at conclusion of project
Elements central to managing project files include:
- Adopting and documenting folder and file naming conventions
- Creating a clear hierarchy of folders
- Documenting file contents
- Tracking file versions
- Understanding file formats used in long-term preservation
Directory Structure
When organizing your files, consider including elements such as the project title, a unique identifier, and the date (year) in the folder directory name. The substructure should include a clear, documented naming convention; for instance, each component or run of an experiment, each version of a dataset, and/or each person in the group. The structure should follow a consistent pattern that can be clearly recognizable to the entire research group.
Elements of file naming conventions (below) apply to directory folder names as well.
File Naming Conventions
Project files should be named and organized in a consistent and descriptive manner and in a way that is logical and predicable to yourself and others. Clear distinctions between files will facilitate effective and efficient file browsing and retrieval.
There are three things to keep in mind when labelling data:
- Organization: Important for future access and retrieval.
- Context: This could include content specific or descriptive information independent of where the data is stored.
- Consistency: Choose a naming convention and ensure that the rules are followed systematically by always including the same information (such as date and time) in the same order and following the same format. (e.g., YYYYMMDD)
Consider using several of the following elements in file names:
- Project name, number, or acronym
- Creator surname and initials
- Name of research team/department associated with the data
- File version number
- Date of creation
- Date experiment undertaken
- Description of content
- Publication date
Other considerations:
- Keep file names to a manageable length—preferably 25 characters or less
- Do not give files the same name as the folder in which they reside
- Avoid using unusual characters, such as: ! – @ # $ % ^ & * ( ) [] {}+ ? > <
- Avoid using spaces. In place of spaces between words use one of the following methods:
- Use a capital for the first letter of each word:
- ProjectAcronymLastNameFirstNameTopic.txt
- ProjectAcronymTopicOfDocumentDate.pdf
- ProjetAcronymeSujetDuDocumentDate.pdf
- Use an underscore between each word:
- ProjectAcronym_last_name_first_name_topic.txt
- ProjectAcronym_topic_of_document_date.pdf
- Acronyme de projet_sujet_du_document_date.pdf
Consider using version control systems for bulk renaming of files where necessary.
If, partway through a project, there is a need to rename a large number of files to conform to a systematic file naming convention you have adopted, there are a number of tools available to make this process easier.
Examples of file renaming tools:
Versioning
Versioning, or version control, refers to the management of file revisions. Versioning assists researchers in managing data during a project where experimentation, revisions, and re-examinations are undertaken. Text files as well as data files may undergo numerous changes before the final version is set.
Versioning mechanisms, such as directory structure and file naming conventions, assist users in differentiating between different versions of a dataset and accompanying files.
Researchers should also consider discarding obsolete versions of files, but care should be taken in making decisions about future use of files before discarding. In some instances, keeping backup copies of versions may be advisable.
A number of tools are available for file versioning, including:
Backing Up Files
Backing up files refers to the creation of file copies. These copies should reside in a separate physical location from the working or stored files. Arranging a regular back up schedule mitigates the possibility of data loss and backup copies can be used to restore damaged or lost original files.
Acknowledgements
We would like to thank the UK Data Service for use of their training materials in the creation of these modules.
We would also like to thank EDINA and the Data Library at the University of Edinburgh for use of materials from the Research Data MANTRA [online course] in the creation of these modules.