Data Documentation

Data documentation provides important information about your data including how it was created/collected, what it contains, and how it was processed. Your documentation should contain any information needed to understand, interpret, and use your data. This may include:

  • Administrative information about the data 
  • How data was collected or generated
  • Sampling techniques and experimental protocols 
  • Methods used for data processing and analysis
  • Information about software or instruments used to create, process, or analyze the data, including software version numbers
  • Any definitions needed to understand the data (e.g., variable names, terms, acronyms, etc.)
  • Any contextual information required to understand the data 
  • Any known issues with the data
  • Any other information that allows you, your research group, or others to work with the data

Creating and updating data documentation throughout a project will help you keep track of what actions have been done to your data. This can be useful for tracking progress, confirming results, and identifying and correcting errors. Good documentation also supports collaboration by standardizing how research group members are expected to work with the data.

Data documentation is also an important component of data sharing and re-use. For more information go to Preparing for deposit.

Common examples of data documentation

  • README file: a plain text file that provides information about the content and organization of the files it accompanies. A README may include information about data processes, including actions done to specific files. Your project may include multiple README files, including a README for every data file or group of data files. 
  • Codebook: documentation that defines and provides detailed information about dataset variables, including definitions, values, ranges, units of measurements, etc.
  • Data dictionary: documentation that provides information about the components in a dataset or database. It includes information about the structure of tables, fields (columns), relationships, constraints, and usage instructions.
  • Lab notebooks or observational notes: a document used to record observations or experiments. This may include information about methods, context, annotations, calculations, and results. 
  • Metadata: structured information about a dataset. Metadata is particularly important for data sharing and reuse.

Resources

Library services
The library provides support for:

  • Understanding what information to include in data documentation
  • Guidance on developing good data documentation

External resources