Durable file formats
Durable file formats
In order to ensure the digital preservation of your data, it is essential that your data is stored in files that have a durable file format. A file is considered to be 'durable' if it is able to be opened by others or your future self using readily available programs beyond the duration of the research project.
Qualities of durable file formats
Durable file formats have the following qualities:
- File format is endorsed by recognised standards agencies (e.g. Standards Australia; ISO).
- Software required to read the file format is open and non-proprietary.
- File format specifications are well-documented and readily available.
- File format is widely used within your research discipline.
- File format is self-documenting. That is, useful metadata about a file is saved within the file itself. This means that when the file is moved, any metadata stored within the file properties travels with it.
Where possible, data should be stored in a well-documented, widely-used, plain-text, open-standard format. Data stored in such a format will generally be human-readable (able to be opened and ‘naturally read’ in a text editor) and it will be more likely to be machine-readable by a variety of software. Conversely, data stored in a binary, closed, proprietary format is neither human-readable in a text editor nor readily or fully machine-readable other than through the use of the proprietary software in which the data was generated.
Examples of durable file formats
Common durable formats include:
- Text documents - .TXT; .DOCX
- Spreadsheets/tabular data - .CSV; .XLSX
- Web pages - .HTML; .XML/.XSLT
- Images - .PNG; .JPG; .TIFF
- Audio files - .FLAC; .MP3; .WAV
- Video files - .MP4; .AVI; .MPG
Examples of closed proprietary file types are those used in the pre-2007 Microsoft Office Suite, which have been repaced with open (archive) XML-equivalents:
- .doc ⇒ .docx (Word)
- .xls ⇒ .xlsx (Excel)
- .ppt ⇒ .pptx (PowerPoint)
Converting files to a durable format
You may need to use software that saves data in a less durable format because of discipline-specific or other requirements. For example, you may be constrained to using a specific program or operating system to capture/generate your data. Where this is so, see if you can export your data to a more durable format without loss of data integrity. Otherwise, document such limitations and reference hardware and software requirements necessary to (re)use your data.
Export ('dump') databases to plain-text.