U of T Dataverse | Getting started with U of T Dataverse | Making U of T Dataverse work for your project | FAQs
U of T Dataverse
Click here to go to U of T Dataverse
Borealis, the Canadian Dataverse Repository, is a bilingual, multi-disciplinary, secure, Canadian research data repository, supported by academic libraries and research institutions across Canada. Borealis supports discovery, management, sharing, and preservation of Canadian research data. U of T operates as the national service provider for Borealis.
U of T Dataverse is the University of Toronto’s institutional data repository in Borealis. It is a free, secure space where U of T researchers can deposit and share their research data. The repository accepts research data from any research discipline that was conducted at or under the auspices of the University of Toronto.
Is U of T Dataverse the best repository for my data?
Choosing the right repository for your data will help ensure it is preserved, discoverable and accessible. Below are some considerations to help you decide if U of T Dataverse is the right repository for you.
Considerations
- Disciplinary data. Depending on your field of study, there may be a disciplinary data repository that is a better fit for your data. If there is a repository that is commonly used in your discipline, you may want to deposit your data there so it will be discovered by other researchers in your field.
- Confidential and sensitive data. At this time, U of T Dataverse is not an appropriate repository for sensitive or confidential data. Currently, all files must be anonymized or de-identified before being deposited.
- File size. The maximum file size for individual file uploads is 3 GB. A dataset containing multiple files can exceed 3 GB as long as each individual file is 3 GB or less. If you are working with larger files go to depositing large files.
Features
- Data types. U of T Dataverse accepts all types of files, including documents, images, video, audio, tabular files, compressed files, and more.
- Persistent identifiers. All published datasets receive a DOI (Digital Object Identifier), making it easy to share and cite your data (e.g., https://doi.org/10.5683/SP3/SCB4JJ).
- Automatically generated citations. U of T Dataverse automatically generates a recommended citation for all datasets. This makes it easy for future users to properly cite your data and ensures you are properly credited for your research.
- Preservation. All data is stored on secure servers located at U of T and undergoes monthly integrity checks to protect against data loss and corruption.
- Discoverability. Published datasets are automatically indexed for discovery, meaning they can be discovered through platforms like Google, the Canadian Federated Research Data Repository, DataCite, and other data indexes. To learn more about making your data discoverable go to making your data discoverable.
- Controlled access. U of T Dataverse allows you to share your data openly or restrict access at the dataset and file level. To learn more about restricting access go to access restrictions.
Getting started with U of T Dataverse
The demo repository
Borealis offers a demo repository where you can try out the platform and its functionality. The demo repository has all the same features as U of T Dataverse and is a great way to get familiar with the platform without worrying about making mistakes. It allows you to practice things like creating a dataverse, adding data, organizing your data, restricting files and seeing how things will look when published. Please note that we recomend using a dummy dataset in the demo repository.
You will need to create a separate account to log in to the demo repository.
Creating an account
- Go to U of T Dataverse
- Click Log in in the top right corner
- Select University of Toronto from the drop-down menu
- Enter your UTORid and password
Adding data
Collections and datasets
U of T Dataverse is organized into collections and datasets.
- A collection (also known as a “dataverse collection”) is a container for datasets and other collections. The first step in depositing your data will be to create your own collection.
- A dataset is a container for a set of research data, documentation, and code. Datasets contain files and metadata that describe those files. A dataset will always reside inside a collection.
Creating a collection
You will need to create your own collection in U of T Dataverse where you can deposit your data. Do not add your data directly into U of T Dataverse.
You can create a collection for your personal use, a project team or lab, a specific research project, or any other organizing principle that makes sense for you. Go to organizing your collection for guidance on how to structure and organize your collection.
To create a collection:
- Go to U of T Dataverse
- On the right side, click Add Data
- Select New Dataverse
- Fill in the appropriate information
- Click Create Dataverse
You have now created your collection. At this point your collection is unpublished and will only be visible to you. Go to publishing your data for instructions on making your collection available to others.
Adding a dataset
- Go to your collection
- On the right side, click Add Data
- Select New Dataset
- Fill in the appropriate information (note: go to licensing for guidance on choosing a license)
- Add your data files and documentation by clicking Select Files to Add or by dragging and dropping your files
- Click Save Dataset
Note that you can upload up to 1000 files at a time. If your dataset has over 1000 files you can upload them in separate sessions, or you can use the DVUploader, a command-line bulk uploader.
Depositing large files
U of T Dataverse can currently accept individual files that are up to 3 GB. If you need to deposit files that are larger than 3GB you can compress your files into a ZIP or TAR file format. Note that U of T Dataverse will automatically unzip the first level of compression when you add your data files. If you are depositing files larger than 3GB you will need to double zip your file(s).
We are currently working to increase the file size limitation. If you are working with a particularly large dataset we recommend contacting rdm@utoronto.ca before uploading your files.
Adding Collaborators
Adding collaborators affiliated with U of T
- Go to your collection
- On the right side, click Edit
- Select Permissions
- Click the Users/Groups heading
- Click Assign Roles to Users/Groups
- In the User/Group field, start typing the name of the user you wish to add (if they have created U of T Dataverse account their name will automatically appear in the dropdown menu)
- Select the user you wish to add
- Select the role you want to assign to the user
- Click Save Changes
Adding collaborators external to U of T
Only users who are affiliated with U of T and have a UTORid can create a collection in U of T Dataverse. However, once you’ve created a collection, you can add external collaborators. The person you would like to add must first sign up for an account with Borealis.
To register as an external collaborator:
- Go to https://borealisdata.ca/dataverse/dv
- Click Log in in the top right corner
- Underneath Institution not listed? click Sign up
- Register with your preferred email
After the external collaborator has created an account with Borealis, the collection administrator can add them as a collaborator by following the instructions above.
Updating metadata
You can add additional metadata to your collection and your dataset once they’ve been created. Adding metadata will make it easier for future users to search for, access, and reuse your data. For more information on metadata, go to the Metadata Best Practices Guide.
Collection-level metadata
- Go to your collection
- On the right side, click Edit
- Select General Information
- Update the relevant metadata
- Click Save Changes
Dataset-level metadata
- Go to your dataset
- On the right side, click Edit Dataset
- Select Metadata
- Update the relevant metadata
- Click Save Changes
File-level metadata
- Go to your dataset
- Select the file you wish to update from the file list
- Click the three vertical dots to the right of the file name
- Select Metadata
- Update the relevant metadata
- Click Save Changes
Adding file tags
Adding file tags allows you to clearly indicate what a file is (i.e., a data file, documentation, or code).
To add a file tag:
- Go to your dataset
- Select the file you wish to update from the file list
- Click the three vertical dots to the right of the file name
- Select Metadata
- Click the three vertical dots to the right of the file name
- Select Tags
- Select the appropriate tag from the File Tags drop-down menu
- Optional: insert a custom data tag in the Custom File Tag field and click Apply
- Click Save Changes
Access restrictions
U of T Dataverse allows you to restrict who can view and access your data. This means that while the metadata record of your dataset will be public, users will need to request access in order to view and download restricted files. You can choose to restrict some or all of your files, including documentation.
To apply access restrictions:
- Go to your dataset
- Select the files you wish to restrict from the file list (note: you can select all files by clicking the checkbox at the top of the file list)
- On the right side, click Edit Files
- Select Restrict
- In the Terms of Access field in the pop-up box, enter any information about how and if users can gain access to the restricted files (go to terms of access for more information)
- If you would like users to be able to request access to restricted files through the U of T Dataverse platform, click the Request Access checkbox (note: in order to restrict access to a file, you will need to either add Terms of Access or enable the Request Access function).
- Click Save Changes
You can also update the terms of access after you’ve restricted the files.
Terms of access
U of T Dataverse allows you to add terms of access to your dataset to explain who can access restricted files, how, and under what conditions (e.g., “only U of T staff and students can access restricted files on completion of a data use agreement”).
To add or edit terms of access:
- Go to your dataset
- On the right side, click Edit Dataset
- Select Terms
- Go to the Restricted Files + Terms of Access tab
- Fill in the appropriate fields (tip: hover your mouse over the blue question mark next to a field’s name to learn more about that field).
- Click Save Changes
Embargos
Note that once you apply an embargo, you will not be able to change the embargo end date. If you need to update this information, you must contact rdm@utoronto.ca.
Adding an embargo to your dataset or to specific data files means those files will not be accessible when a dataset is published until the embargo date has passed. The metadata records of your published dataset, including a DOI, will be available, but users will not be able to preview, access, or request access to the embargoed files.
Users may choose to apply an embargo to indicate that a dataset exists and will be available at a specified future date. This can allow researchers to satisfy journal or funder requirements to publish research data while also protecting a researcher’s data and intellectual property rights.
To add an embargo:
- Go your dataset
- Select the files you wish to embargo from the file list
- On the right side, click Edit Files
- Select Embargo
- Enter the date you want the embargo to end in the Select the embargo end date field
- Enter a brief reason for the embargo in the Add a reason field (optional)
- Click Save Changes
The embargoed files will automatically be made public once the embargo end date has passed (unless they have also been restricted).
If you want to restrict access to your files and do not need to provide a definitive date that your dataset/files will be made available, we recommend using the access restrictions function.
Publishing your data
Publishing a collection and dataset make them viewable and searchable by others. Note that once you publish a dataset, it cannot be unpublished.
Publishing a collection
- Go to your collection
- On the right side, click Publish
Publishing a dataset
- Go to your dataset
- On the right side, click Publish
- Select Publish
- In the pop-up window, click Continue
Version management
U of T Dataverse automatically tracks all changes to your published dataset, including any changes to your files and metadata. Any changes to a published dataset are recorded in the Versions tab as a minor or major release. A minor release includes small metadata changes that do not impact the file/dataset citation. A major release includes file updates, large metadata changes, or citation changes. Note that replacing a file in a published dataset will automatically be recorded as a major version update.
Replacing a file in a published dataset
- Go to your dataset
- Select the file you wish to replace from the file list
- Click the three vertical dots to the right of the file name
- Select Replace
- Add the new file using the Select Files to Add button or by dragging and dropping the file
- If needed, update the File Name, File Path, and Description
- Click Save Changes
- On the right side of the dataset homepage, click Publish Dataset
- Select Publish
- A pop-up will appear indicating this will be a Major Release - click Continue
Updating metadata in a published dataset
- Go to your dataset
- Make necessary changes to your metadata (go to updating metadata for details)
- In the pop-up window, select Minor Release or Major Release to indicate whether this is a minor or major version update
- Minor Release: select this option for small metadata changes that do not impact the file/dataset citation
- Major Release: select this option or large metadata changes or citation changes
- Click Continue
Making U of T Dataverse work for your project
In this section we will discuss how to organize and structure your Dataverse collection in a way that makes sense for you and your research data.
Organizing your collection
Much like a folder structure, collections can be organized into hierarchies. For example, you can add sub-collections under your primary collection, or you can have multiple primary collections that reflect different datasets, projects, or research groups.
Here are some tips and considerations to help you think about how you want to structure and organize your collection:
- Think about how you want to use your collection. For example, is this collection for your own personal research data, or are you creating a dataverse collection for a project team or lab?
- Think about how your research is structured. Depending on your research, you may choose to organize your datasets by date, data type, or some other organizing principle.
- Consider how others will search for and navigate your data. Organize and label your datasets and files so that someone outside of your research team could easily understand and navigate your data. This may include limiting the number of nested dataverse collections you have, and using file tags to clearly indicate what each file is.
- Preserve or create hierarchies. If you want to maintain an existing file hierarchy you can zip your folder before adding it to your dataverse collection. U of T Dataverse will automatically unzip your files while maintaining your file structure in the metadata. You can also add a folder hierarchy to your files after they’ve been uploaded by editing the File Path in the file-level metadata. Go to updating metadata for more information.
Preparing your dataset for deposit
Before depositing your dataset, it’s important that your files are organized and understandable. This will make it easier for others to find, access, and reuse your data. Even if you plan to restrict access to your dataset, taking the time to organize and describe your data will ensure you can understand and interpret your data five, ten, or fifteen years from now.
Here are some tips to help you prepare your dataset for deposit:
- Make sure your files are clean and usable. This may include ensuring variable names and labels are consistent, removing duplicate or erroneous data, and removing any temporary or administrative notes not meant for external users.
- Use consistent and meaningful file names. Applying a consistent file naming structure will help you and future users stay organized and identify a file’s contents without opening it. Think about what information would be useful to an external user when browsing your files and, if necessary, consider renaming your files to support user access and data reusability. Avoid using long file names (i.e., over 32 characters), special characters, and spaces to ensure files can be easily used with different software.
- Include data documentation. Data documentation provides information about your data, including how it was created/collected and processed. This information allows future users to understand, interpret, and use your data. Some examples of data documentation include README files, codebooks, and data dictionaries.
- Include additional non-proprietary copies. If your dataset contains proprietary file types (e.g., DOC, XLS), consider adding an additional copy in a non-proprietary format (e.g., TXT, CSV). This will make it easier for researchers to access your files now and in the future.
Creating deposit workflows
A deposit workflow is a set of instructions outlining what needs to happen for data to be deposited in what order and by what mechanism. A deposit workflow can be manual or automated. If you or your team will be regularly adding or updating data files, establishing a deposit workflow will help ensure your data files are described and deposited in a consistent manner.
You can use the Dataverse APIs (Application Programming Interface) to automate parts of your workflow, including creating and publishing a dataverse collection or dataset, uploading files, managing permissions, and downloading datasets and metadata. Go to the advanced guide for more information on using the Dataverse APIs.
Licensing
Applying a license to your dataset lets users know how they can use your data and under what conditions. U of T Dataverse allows you to apply Creative Commons licenses to your datasets (e.g., CC BY, CC BY-SA, etc.). You can use the Creative Commons Licence Chooser to help determine what license to apply to your data.
U of T Dataverse also allows users to apply a custom license. However, it can be difficult for users to navigate these non-standard licenses. When possible, we strongly recommend using a standard license.
Making your data discoverable
Improving the discoverability of your data can increase the reach and impact of your research, promote research integrity, and contribute to knowledge production in your field. It is also an important part of making your data FAIR (Findable, Accessible, Interoperable, and Reusable). Visit the Fair Principles website to learn more.
U of T Dataverse offers a number of features that help make your data discoverable, including assigning a DOI to all published datasets and allowing data to be indexed through platforms like Google, the Federated Research Data Repository, DataCite, and other data citation indexes.
Below are some additional ways to improve data discoverability:
- Describe your data with rich, standardized metadata. This may include inputting a detailed description of your dataset, adding keywords, and using controlled vocabularies. U of T Dataverse also allows you to input discipline-specific metadata, including geospatial metadata, social science and humanities metadata, astronomy and astrophysics metadata, and life sciences metadata. Go to the Metadata Best Practices Guide for more information and best practices.
- Choose your keywords carefully. Think about what search terms researchers in your field would use to find your data. Consider including both broader and more granular terms (e.g., “Sustainable transportation” and “Cycling”) as well as synonymous terms (e.g. “Public transportation” and “Mass transit”). Note that the keyword field is case-sensitive and you should use sentence case when entering terms (i.e. capitalize the first word and proper nouns). Go to the Metadata Best Practices Guide for more information.
- Link your data to related outputs. U of T Dataverse allows you to link your dataset to related outputs, publications and datasets through the Related Publication, Related Material, and Related Dataset fields in the Dataset metadata. This can be done through a formatted citation, a URL, or a variety of identifiers (e.g., DOI, ISSN, arXiv, Handle, etc.).
Frequently Asked Questions
Is there somewhere I can practice using U of T Dataverse?
Yes! If you want to practice using U of T Dataverse you can use the demo repository (go to getting started with U of T Dataverse for more information). The demo repository allows you to practice things like creating a dataverse, adding data, organizing your data, and seeing how things will look when published. Please note that we recomend using a dummy dataset in the demo repository.
What file types can I add?
U of T Dataverse supports the uploading of any file type, including documents, images, video, audio, tabular files, compressed files, and more. We also encourage you to include a readme file and documentation to help users navigate and reuse your data.
Why can’t I find the “Add Data” button?
The Add Data button is located on the U of T Dataverse home page to the right of the search bar. If you cannot find it, make sure you are logged in and that you are in the U of T Dataverse and not the main Borealis Dataverse repository. If you are in the main Borealis repository, you can get to U of T Dataverse by scrolling through the institutional dataverse collections at the top of the page until you find U of T. If you still cannot find the Add Data button once you're in the U of T Dataverse Collection, please contact rdm@utoronto.ca
When is a DOI assigned?
A DOI is assigned once you publish your dataset. If you need a DOI before you're ready to make your data publicly available you can use access restrictions. Go to access restrictions for more information.
Why is my collection/dataset not showing up in U of T Dataverse?
The most likely reason your collection or dataset is not showing up is because it is unpublished. Your unpublished collections and datasets are visible to you when you are logged in, but will not be visible to anyone else. For more information go to publishing your data.
Why do my files convert to .tab files when I upload them?
When uploading tabular file types (e.g., XLS, SPSS, CSV, etc.), U of T Dataverse creates a .tab version of the files so they can be read and used in Dataverse Explorer. This process does not damage the original file, and users can download the file in both the original and .tab format.
Why do I get a “Tabular Ingest Error” when I try to upload my dataset?
The most likely reason you’ve received a "Tabular Ingest Error" is a formatting issue with your original file, such as commas within cells or inconsistent column headers.
If you receive a "Tabular Ingest Error" but do not want to reformat your cells, you can simply ignore this message. Users will not get an error message after your dataset is published and will still be able to access and download your data. However, you will not be able to use the Data Explorer or Data Curation tools if your files are not converted to the .tab format.
What is the Data Curation Tool and how do I use it?
The Data Curation Tool allows a dataset administrator, curator, or contributor to add and edit variable-level metadata. For example, you can add information about weighted variables or how a variable was collected. Adding this metadata data helps make your data easier to understand, interpret, and reuse. For more information on the Data Curation tool go to the Borealis User Guide.
What is Data Explorer and how do I use it?
The Data Explorer allows users to visualize and analyze data contained in tabular data files (.tab). Users can use this tool to cross-tabulate data and view summary statistics and charts. For more information on Data Explorer go to the Borealis User Guide.
I have a large number of files to add, is there a way to batch upload them?
Yes! You can use the DVUploader to batch upload files and automate parts of your deposit workflow. DVUploader is a command-line bulk uploader that uses the existing Dataverse application programming interface (API) to upload files from a specified directory into a specified Dataset. For more information on DVUploader visit the advanced guide.
Should I put all files from a project in one dataset, or should I break it down into multiple datasets?
There are a number of factors that may influence how you decide to organize your files, including:
- What makes the most sense for the data, or for someone using the data? If the datasets are related and would be used together, then it may make most sense to keep them together. If the datasets may be useful independently and you want to include more detailed descriptions of each, then you may want to separate them.
- Is linking to the data from a publication (or elsewhere) a priority? If it is, then you may want to think about how you want to reference or refer to it. For example, if you were writing a data availability statement, deciding if you wanted to provide one DOI or multiple DOIs may be a determining factor. If you want to include multiple DOIs, your data would need to be organized into multiple Datasets (note that you can provide individual citations for data files within a Dataset, but they will all have the same DOI).
- Do your data files have different authors? If yes, you may want to create datasets based on authorship to ensure the author is properly acknowledged.
For more information on how to structure a dataset, go to organizing your collection.
How should I reference my data in a publication?
Different publications may have different guidelines around how to reference data. This information can usually be found on the publication’s website under “submission guidelines” or “guide for authors”. Typically, you would include the following information:
- Author(s)
- Year
- Dataset title
- Version (if applicable)
- Repository name
- Persistent identifier
U of T Dataverse uses DOIs as persistent identifiers, which are assigned to all published datasets.
If a publication does not specify how your data should be referenced, you can use the recommended citation automatically generated by U of T Dataverse. This citation block can be found in the blue box at the top of the dataset homepage. In addition to the citation block provided, you can also download the XML, RIS, or BIB file for ingestion into your citation manager.
How do I link datasets to publications and vice versa?
You can link publications and other related outputs through the dataset-level metadata. Specifically, you can use the fields Related Publication, Related Material, and Related Dataset fields to input information. This can be done through a formatted citation, a URL, or a variety of identifiers (e.g., DOI, ISSN, arXiv, Handle, etc.).
For instructions on how to update a dataset’s metadata go to updating metadata.
Can I reuse, add to, and make derivatives of other data in U of T Dataverse?
This will depend on the license and/or terms of access applied to the data you are using, which can be found in the Terms tab of a specific dataset. If you plan on using a dataset created by someone else you will need to ensure the license allows you to create and share derivative products.
Note that some licenses require that anything derived from the data must be shared under the same open license, which will determine what license is most appropriate to apply to your dataset.