Structuring your data | Preparing your dataset for deposit | Creating deposit workflows | Licensing | Making your data discoverable
Structuring your data
You can deposit your research data directly into U of T Dataverse, or you can create a sub-dataverse collection for your datasets (go to adding data for more information). If you just want to publish a dataset, it may be easier to deposit it directly into U of T Dataverse. If you're depositing many related datasets and want to keep them together in one place, you may want to create your own dataverse collection within U of T Dataverse.
- Think about how you want to use your dataverse collection over time. If you decide to create a dataverse collection for your data, think about who it is for and how you want to use it. For example, is this dataverse collection for your own personal research data, a project team or lab, or for a specific research project?
- Think about how your research is structured. Depending on your research, you may choose to organize your datasets by publication, project, topic, date, data type, or some other organizing principle that makes sense for your data. .
- Consider how others will search for and navigate your data. Organize and label your dataverse collections, datasets, and files so that someone outside of your research team could easily understand and navigate your data. This may include using file tags to clearly indicate what each file is and, if relevant, limiting the number of nested dataverse collections you have.
- Preserve or create hierarchies. If you want to maintain an existing file hierarchy you can zip your folder before adding it to your dataverse collection. U of T dataverse will automatically unzip your files while maintaining your file structure in the metadata. You can also add a folder hierarchy to your files after they’ve been uploaded by editing the File Path in the file-level metadata. Go to updating metadata for more information.
Preparing your dataset for deposit
It’s important that your files are organized and understandable before you deposit them. This will make it easier for others to find, access, and reuse your data. Even if you plan to restrict access to your dataset, taking the time to organize and describe your data will ensure you receive relevant requests for access and that you can understand and interpret your data years from now. Some things to consider when preparing your data for deposit include:
- Make sure your files are clean and usable. This may include ensuring variable names and labels are consistent, removing duplicate or erroneous data, and removing any temporary or administrative notes not meant for external users.
- Use consistent and meaningful file names. Applying a consistent file naming structure will help you and future users stay organized and identify a file’s contents without opening it. Think about what information would be useful to an external user when browsing your files and, if necessary, consider renaming your files to support user access and data reusability. Avoid using long file names (i.e., over 32 characters), special characters, and spaces to ensure files can be easily used with different software.
- Include data documentation. Data documentation provides information about your data, including how it was created/collected, processed, and analyzed. This information allows future users to understand, interpret, and use your data. Some examples of data documentation include README files, codebooks, and data dictionaries.
- Consider including additional non-proprietary copies. If your dataset contains proprietary file types (e.g., DOC, XLS), consider adding an additional copy in a non-proprietary format (e.g., TXT, CSV). This will make it easier for researchers to access your files now and in the future without requiring access to specific proprietary programs.
Creating deposit workflows
A deposit workflow is a set of instructions outlining what needs to happen for data to be deposited. A deposit workflow can be manual or automated. If you or your team will be regularly adding or updating data files, establishing a deposit workflow will help ensure your data files are described and deposited in a consistent manner.
You can use the Dataverse APIs to automate parts of your workflow, including creating and publishing a dataverse collection or dataset, uploading files, managing permissions, and downloading datasets and metadata. Go to the advanced guide for more information on using the Dataverse APIs.
Licensing
Applying a license to your dataset lets users know how they can use your data and under what conditions. U of T Dataverse allows you to apply Creative Commons licenses to your datasets (e.g., CC BY, CC BY-SA, etc.). You can use the Creative Commons License Chooser to help determine what license to apply to your data. U of T Dataverse also allows users to apply a custom license that may be more appropriate for your data, such as Open Data Commons.
Making your data discoverable
Improving the discoverability of your data can increase the reach and impact of your research, promote research integrity, and contribute to knowledge production in your field. It is also an important part of making your data FAIR (Findable, Accessible, Interoperable, and Reusable). Visit the Fair Principles website to learn more.
U of T Dataverse offers a number of features that help make your data discoverable, including assigning a DOI to all published datasets and allowing data to be indexed through platforms like Google, Lunaris, DataCite, and other indexes of datasets.
Some additional ways to improve data discoverability:
- Describe your data with rich, standardized metadata. This may include inputting a detailed description of your dataset, adding keywords, and using controlled vocabularies. U of T Dataverse also allows you to input discipline-specific metadata, including geospatial metadata, social science and humanities metadata, astronomy and astrophysics metadata, and life sciences metadata. Go to the Metadata Best Practices Guide for more information and best practices.
- Choose your keywords carefully. Think about what search terms researchers in your field would use to find your data. Consider including both broader and more granular terms (e.g., “Sustainable transportation” and “Cycling”) as well as synonymous terms (e.g. “Public transportation” and “Mass transit”). Note that the keyword field is case-sensitive and you should use sentence case when entering terms (i.e. capitalize the first word and proper nouns). Go to the Metadata Best Practices Guide for more information.
- Link your data to related outputs. U of T Dataverse allows you to link your dataset to related outputs, publications and datasets through the Related Publication, Related Material, and Related Dataset fields in the Dataset metadata. This can be done through a formatted citation, a URL, or a variety of identifiers (e.g., DOI, ISSN, arXiv, Handle, etc.).