APIs

 ArXiv

Function: programmatic access to arXiv, Cornell’s open-access print repository of primarily physics, math, and computer science research data, searching, and linking facilities

Access: API calls are made using any web-enabled client (e.g. a web browser) to make an HTTP GET or POST request to an appropriate URL.  API users can use the programming language of their choice

Result Format: ATOM

Registration: no registration or API key required

Limitations: none stated, but high volume users should contact arXiv through https://arxiv.org/help/contact

Development Contact: arXiv Google Group: https://groups.google.com/forum/#!forum/arxiv-api

More Information: https://arxiv.org/help/api/index

BioMed Central

Function: retrieves BMC's latest articles, BMC editors picks, data on article subscription and access, and bibliographic search data

Access: RESTful interface, queries are made as HTTP GET requests

Result Format: JSON and Prism Aggregate (PAM)

Registration: no registration required

Limitations: none stated

Development Contact: info@biomedcentral.com

More information: https://www.biomedcentral.com/getpublished/indexing-archiving-and-acces…

SAO/NASA

Function: provides access to ADS database of bibliographic data on astronomy and physics publications

Access: HTTP GET requests, or via an unofficial Python client

Result Format: varies

Registration: free; API key required

Limitations: rate limits apply, but are not specified

Development Contact: adshelp@cfa.harvard.edu

More information: general information ; terms of use

CORE

Function: gives programmatic access to metadata and full-text of millions of OA research papers. Major data sources include the PubMed OA subset, archive.org, and DOAJ.

Access: RESTful interface, queries are made as HTTP GET requests

Result Format: JSON

Registration: personal account is free to use, API key required

Limitations: Queries of up to 50,000 records. For queries between 50 – 100,000 records, a result set will be created and you will be assigned a token that allows you to scroll through these sets. For queries larger than 100,000, or to inquire about a researcher account, please contact theteam@core.ac.uk

Development Contact: theteam@core.ac.uk

More information: https://core.ac.uk/services/api/

CrossRef REST

Function: allows access to metadata records for over 75 million scholarly works that have CrossRef DOIs, covering around 5000 publishers.  Can be used for text- and data-mining, checking against funder mandates, and to obtain metadata in a variety of representations.

Access: RESTful interface

Result Format: JSON

Registration: no registration required

Limitations: no stated limitations

Development Contact: support@crossref.org

Dataverse Network

Function: multiple APIs available to allow programmatic access to data and metadata in the Dataverse Network, which includes the Scholar’s Portal Ontario University Dataverse Network, Harvard Dataverse Network, MIT Libraries-purchased data, and data deposited in other Dataverse Network repositories

Access: HTTPS.  A Dataverse community-written software program can also be used to access the APIs via an RCurl package

Result Format: XML; Byte Stream for Data Access Requests

Registration: metadata access does not require registration.  Data set downloads require a user account and agreement to terms of use; users interested in data sets should contact DVN support.  Access to restricted data sets requires approval by data owners.
dvn_support@help.hmdc.harvard.edu

Limitations: no limitations on public data set downloads after agreeing to terms of use.  No limitations on restricted data set downloads after access is granted by data owners.

Development Contact: dvn_support@help.hmdc.harvard.edu; Questions can also be posted in https://groups.google.com/forum/#!forum/dataverse-community

More information: https://guides.dataverse.org/en/latest/api/index.html

Europe PubMed

Function: a RESTful Web Service giving you access to all of the publications and related information in the Europe PubMed Central database.

Access: RESTful interface

Result Format: XML, JSON, or Dublin Core

Registration: no registration required

Limitations: none stated

More information: https://groups.google.com/a/ebi.ac.uk/forum/#!forum/epmc-webservices

HathiTrust Bibliographic

Function: retrieves bibliographic and rights information for items in the HathiTrust Digital Library

Access: RESTful Interface

Result Format: JSON

Registration: no registration required

Limitations: none stated, but not designed for large scale data retrieval

Development Contact: feedback@issues.hathitrust.org/

More information: https://www.hathitrust.org/bib_api

HathiTrust Data

Function: retrieves content (page images, OCR, and in some cases whole volume packages), and metadata for HathiTrust Digital Library volumes

Access: RESTful Interface

Result Format: XML, JSON, or Binary depending on the resource queried

Registration: two methods of access: via a Web client, requiring authentication (users who are not members of a HathiTrust partner institution must sign up for a University of Michigan “Friend” Account), or programmatically using an access key that can be obtained at https://babel.hathitrust.org/cgi/kgs/request

Limitations: no stated limitations, but it is not designed for large scale data retrieval

Development Contact: feedback@issues.hathitrust.org/

More information: https://www.hathitrust.org/data_api

IEEE Xplore

Function: provides flexible query and retrieval of metadata records for more then 4 million documents comprising IEEE journals, conference proceedings, and technical standards

Access: HTTP requests using structured URL queries

Result Format: JSON, XML

Registration: required - https://developer.ieee.org/getting_started

Limitations: a maximum of 200 results may be retrieved in a single query.  A query term can only contain a maximum of 10 words

Development Contact: onlinesupport@ieee.org

More information: https://developer.ieee.org/

JSTOR Data for Research

Function: this is not a true API, but allows computational analysis and selection of JSTOR’s scholarly journal and primary resource collections.  Includes tools for faceted searching and filtering, text analysis, topic modeling, data extraction, and visualization

Access: web interface

Result Format: CSV

Registration: free, but registration is required to obtain results. An institutional affiliation is not required

Limitations: datasets are capped by default at 1,000 articles; users seeking larger results are asked to contact JSTOR Data for Research

Development Contact: https://www.jstor.org/contact-us/

National Library of Medicine

Multiple APIs and other data tools for accessing various NLM databases.

Includes: Entrez Programming Utilities, Digital Collection Web Service, Open-i-Open Access Image Search, PMC Open Access Web Service

More Information: https://eresources.nlm.nih.gov/nlm_eresources/

OECD Data

Function: allows programmatic access to a selection of OECD datasets

Access: two RESTful APIs available for queries in SDMX-JSON or SDMX-ML formats

Result Format: JSON and XML

Registration: no registration required

Limitations: one million data points; not all OECD datasets are covered

Development Contact: OECDdotStat@oecd.org

OpenAlex

Function: gives programmatic access to metadata for over 200 million scientific publication records. The database is built on data from Microsoft Academic Graph (MAG), which is now retired. Data has been standardized and enhanced using sources such as Crossref, ORCID and ROR.

Access: RESTful interface, queries are made as HTTP GET requests

Result Format: JSON

Registration: free to use; no registration required, but an e-mail may be provided to enter the “polite pool”, which provides faster response time. To do so, add the mailto=you@example.com parameter in your API request

Limitations: 100,000 per day. Please contact team@ourresearch.org for larger requests

Development Contact: team@ourresearch.org

More information: https://docs.openalex.org/api

ORCHID

Function: queries and searches the ORCID researcher identifier system and obtain researcher profile data

Access: RESTful interface

Result Format: HTML, XML, or JSON

Registration: two options: users can access the Public API, which only returns data marked as “public”; or become an ORCID member to receive API credentials

Limitations: data retrieved through Public API is limited

Development Contact: https://support.orcid.org/hc/en-us/requests/new

PLoS Article-Level Metrics

Function: retrieves article-level metrics (including usage statistics, citation counts, and social networking activity) for articles published in PLOS journals and articles added to PLOS Hubs: Biodiversity

Access: queries made as HTTP GET requests through a RESTful interface

Result Format: XML, JSON, CSV

Registration: free to register; API key needed; Go to https://api.plos.org/registration/

Limitations: Results limited to batches of 50 at a time

Development Contact: alm@plos.org; questions can also be posted in PLoS API Google Group

PLOS Search

Function: allows PLoS content to be queried for integration into web, desktop, or mobile applications

Access: RESTful interface, queries are made as HTTP GET requests

Result Format: XML

Registration: free to register; API key needed; go to https://api.plos.org/registration/.

Limitations: maximum of 7200 requests a day, 300 per hour, 10 per minute; users should wait 5 seconds for each query to return results; requests should not return more than 100 rows; high-volume users should contact api@plos.org; API users are limited to no more than five concurrent connections from a single IP address

Development Contact: alm@plos.org; Questions can also be posted in PLoS API Google Group

Scholars Portal (SP) Journals API (Beta)

Function: gives programmatic access to the metadata and full-text of over 65 million journal articles in the Scholars Portal Journals Collection. Articles are licensed from a variety of vendors, with major sources including Springer Nature, DeGruyter, Taylor & Francis, and Wiley.

Access: RESTful interface, queries are made as HTTP GET requests. Sample python scripts for harvester or generating a corpus are available.

Result Format: JSONL (JSON lines file)

Registration: registration is not required, but access is restricted to Canadian University IP addresses. You must be on campus or using a University VPN.

Limitations: there are no limitations. However, only articles licensed by your University are accessible (not all institutions license all collections). Attempts to retrieve articles that are not licensed will result in an error.

Development Contact: journals@scholarsportal.info

More information: https://github.com/scholarsportal/text-mining

Science Direct

Function: multiple APIs available for different use cases, including text mining of full-text content, search widgets, displaying journal or book level data, federated searching, and indexing

Access: varies

Result Format: varies

Registration: free to register

Limitations: varies

Development Contact: integrationsupport@elsevier.com

SCOPUS

Function: multiple APIs available for different use cases, including displaying publications on a website, showing cited-by counts on a website, federated searching, populating repositories with metadata, populating VIVO profiles, and others

Access: varies

Result Format: varies

Registration: free to register

Limitations: varies

Development Contact: integrationsupport@elsevier.com

Springer

Function: multiple APIs providing access to Springer Nature metadata and open access content

Access: RESTful interface, using structured URL requests

Result Format: XML, JSON, PRISM, A++ depending on query specifications

Registration: free to register, API key required

Limitations: maximum results for a single query is 100 results for metadata queries, or 20 results for open access queries

Development Contact: support.api@springer.com

More information: https://dev.springer.com/; https://dev.springer.com/docs;  https://dev.springer.com/restfuloperations

 

Wiley Text and Data Mining

Function: allows text- and data-mining access to content in the Wiley Online Library

Access: accessible via CrossRef’s TDM service; RESTful interface

Result Format: JSON

Registration: must be part of a subscribing institution to have full text access. Users will encounter a click-through agreement and will receive a Client API Token, which is needed when requesting full text of articles

Limitations: rate-limits implemented through CrossRef rate-limiting headers, exact limitations not specified

Development Contact: TDM@wiley.com

More information: https://olabout.wiley.com/WileyCDA/Section/id-826542.html

Chronicling America

Function: provides access to information about historic newspaper and select digitized newspapers

Access: RESTful interface

Result Format: HTML, ATOM, JSON

Registration: none required

Limitations: none stated

More information: https://github.com/LibraryofCongress/chronam

Digital Public Library of America

Function: programmatic access to metadata in DPLA collections, including partner data from Harvard, New York Public Library, The Smithsonian, ARTstor, and others

Access: RESTful Interface

Result Format: structured JSON-LD objects

https://pro.europeana.eu/what-we-do/creative-industries

Registration: free to use; API key must be requested with information here: https://pro.dp.la/developers/api-codex#get-a-key

Limitations: none stated

Development Contact: https://pro.dp.la/developers/api-codex#get-a-key

Europeana

Function: a suite of four APIs available to allow access to metadata, annotation, and download of Europeana data

Access: RESTful Interface

Result Format: varies by API

Registration: Services and Tools | Europeana Pro

Limitations: none stated

Development Contact: https://groups.google.com/forum/?pli=1#!forum/europeanaapi

Library of Congress

Function: APIs available to download bibliographic data and search Library of Congress digital collections, including images, public radio and television, and historic newspapers

Access: varies by API

Result Format: varies by API used

Registration: free to use; most APIs do not require an API key

Limitations: not specified

Development Contact: https://labs.loc.gov/lc-for-robots/

Metropolitan Museum of Art Collection

Function: provides datasets of information on more than 470,000 artworks in its collection for unrestricted commercial and noncommercial use.

Access: RESTful interface

Result Format: JSON

Registration: none required

Limitations: none stated

Development Contact: openaccess@metmuseum.org

More information: https://github.com/metmuseum/openaccess

OCLC WorldCat Search

Function: search WorldCat and retrieve bibliographic records for cataloged items such as books, videos, music and more in WorldCat

Access: RESTful interface

Result Format: XML, Dublin Core; item level information available in standard bibliographic formats

Registration: users must be affiliated with a catalogue-subscribing library. API key required.

Limitations: 50,000 queries/24 hours; limit of 100 records per batch

More information: WorldCat Search API | OCLC Developer Network

STAT!Ref OpenSearch

Function: bibliographic search service for displaying STAT!Ref results on a website.

Access: OpenSearch Specifications

Result Format: RSS, ATOM, HTML

Registration: free to register for users from a subscribing institution

Limitations: not specified

Development Contact: support@statref.com

More information: https://online.statref.com/Resources/StatRefOpenSearch.aspx

Caselaw Access Project

Function: provides queryable access to all published US court decisions

Access: in-browser API viewer or RESTful interface, also available as bulk download

Result Format: structured XML, presentation HTML, or plain text

Registration: most queries do not require registration. Jurisdictions with access restrictions require a free API key

Limitations: full text of cases limited to 500 cases per person per day, unless otherwise authorized

Development Contact: https://case.law/api/#problems

More information: https://case.law/

Data.gov

Function: access to metadata of US government open datasets

Access: RESTful interface

Result Format: JSON

Registration: API key required

Limitations: 1000 requests/hour

More information: https://api.data.gov/about/#how-it-works

Open Canada

Function: access to Canadian provincial and federal open datasets

Access: RESTful interface

Result Format: JSON

Registration: registration not required

Limitations: only supports GET requests

More information: https://ckan.org/portfolio/api/

Open Parliament

Function: access to data related to Canadian federal MPs, bills, debates, and committees. This is not a government site, but rather a repository of information gathered by external stakeholders

Access: RESTful interface

Result Format: JSON

Registration: no registration required

Limitations: limits exist, but are unspecified

Development Contact: https://github.com/michaelmulley/openparliament

World Bank

Function: provide access to World Bank statistical databases, indicators, projects, and loans, credits, financial statements and other data related to financial operations

Access: three RESTful APIs available to provide access to different datasets: Indicators (time series data), Projects (data on the World Bank’s operations), Finances (World Bank financial data)

Result Format: XML, JSON, RDF, and Atom, depending on specific API used

Registration: no registration required

Limitations: request volumes are unspecified

Development Contact: data@worldbank.org or “Contact support” link here

More information: https://datahelpdesk.worldbank.org/knowledgebase/topics/125589

UN Comtrade

Function: allows access to data on International Merchandise Trade Statistics (IMTS) and the work of the International Merchandise Trade Statistics Section (IMTSS) of the United Nations Statistics Division

Access: some services in REST, some in SOAP

Result Format: XML, CSV

Registration: Comtrade Web Services requires IP authentication, users must have site license account, however, access to metadata and data availability is not restricted

Limitations: depending on access rights, the following data can be obtained: Comtrade Data, Tariff Line Data, Total Trade, Annual Totals, Processed Data or Original Data. The latest three are restricted for data exchange between UN and OECD.

Development Contact: comtrade@un.org

More information: https://comtrade.un.org/ws/

 

 

Text and data mining are associated methods for identifying patterns within large bodies of text, in the case of text mining, or data, in the case of data mining. There are a number of different techniques associated with this method.

Voyant Tools is a web-based platform for generating statistical information about text corpora that may offer preliminary information about your text(s). 

Some vendors, publishers, journals, and other organizations have made text and data available via application programming interfaces (APIs). Please see our list of APIs available to researchers. University of Toronto Libraries has some locally loaded materials available for text mining as well. Some openly accessible collections may also be useful; the University of Illinois at Urbana Champaign has compiled a list of open resources for text mining. For text-wrangling and text mining skills, consult the University of Southern California's excellent list of training resources.

For help with using APIs or to inquire about available materials for text mining, contact us.