Skip to Main Content

File management

File names

File names are useful for both identifying and finding your data. Giving your file a meaningful name is important. File names should be unique (within your own file-naming system) and the naming conventions you choose should be consistently applied to all files. It is also useful if the date is included in the file name (in YYYYMMDD format) for all backup files so that:

  1. you can easily distinguish it from the working document
  2. you can quickly determine when the backup file was created. (This will also enable you to identify which backup is the latest, if more than one exists.)

Characters in Filenames

Observe the following conventions when naming files:

  • Filenames should only start and end with an alphanumeric character in the range[A-Z],[a-z] and [0-9].
  • Avoid starting or ending your filename with a non-alphanumeric character such as a space, period, hyphen or underscore character.
  • Avoid using spaces in filenames. Instead, use hyphens (-) or underscores (_) to separate words in your filename.

Data versioning

In the process of managing and analyzing research data, it is often necessary to make changes to the original dataset in some way. For example, an error in the data may need correcting; the data may need to be re-processed to include new calculations; or additional data may be generated and appended to the original data set.

While researchers may be tempted to simply make these changes to a dataset by overwriting the original file, this is dangerous because if an error is made at any point some of the original data could be lost and it may be necessary to collect the data again from scratch, or even abandon the research project altogether if this isn't possible. Because of this, it's important to retain older versions of the dataset when changes are made. Furthermore, in order to justify or replicate the findings of your research project, it's vital to know which version of the dataset was used to develop the conclusions made in the final paper.

In order to manage and keep track of your different dataset versions, it's recommended that a number be added to the file name for each version. While there is no set way of doing this, adding the version number to the end of the file name prefixed by an underscore and the letter "v" (e.g. "_v1", "_v2", "_v3", etc.) is a common practice which will also enable you and other researchers to easily recognize the different dataset versions, and in particular identify the latest version, well into the future.

The Australian National Data Service (ANDS) has more information about this on their Data Versioning page.