File Formats for Long-Term Access

The file formats you use have implications for the preservation of your data. This includes what type of files you are archiving and how they can be accessed by authorized users in the future. 

    Guidelines for selecting file formats

    While it is hard to have a specific recommendation for every type of data you may have, good long-term file formats typically have the following characteristics:

    • Non-proprietary
    • Open, documented standard
    • Common usage by the research community
    • Standard representation (e.g. ASCII, Unicode)
    • Unencrypted
    • Uncompressed

    Regardless of which standards or formats you use, you should also document which software is necessary to access and use the data in a README file. The software may be a legacy system, however if the file format is an open standard such as .TXT, then the data will likely be recoverable.

    Examples of preferred file formats

    Note that this is not an exhaustive listing:

    • Audio: WAV, AIFF, MP3
    • Image: TIFF
    • Spreadsheet: CSV
    • Statistics: ASCII, DTA, POR, SAS, SAV
    • Text: ODF (Open Door Format), PDF/A, ASCII
    • Video: MOV, MPEG, AVI

    More information:

    Some resources for identifying good long-term preservation formats include: