This project uses data from various sources that are openly licensed or in the public domain. Below are the sources and their respective information:
Description: arXiv is a free distribution service and an open-access archive for scholarly articles in physics, mathematics, computer science, quantitative biology, quantitative finance, statistics, electrical engineering and systems science, and economics. All arXiv articles are available under various open licenses or are in the public domain.
API documentation link:
- arXiv API User Manual
- arXiv API Reference
- Base URL
- arXiv Subject Classifications
- Terms of Use for arXiv APIs
API information:
- No API key required
- Query limit: No official limit, but requests should be made responsibly
- Data available through Atom XML format
- Supports search by fields: title (ti), author (au), abstract (abs), comment (co), journal reference (jr), subject category (cat), report number (rn), id, all (searches all fields), and submittedDate (date filter)
- Metadata includes licensing information for each paper
Description: A .txt file provided by Timid Robot containing all legal
tool paths.
API documentation link:
google_custom_search/legal-tool-paths.txt: a list of all current Creative Commons (CC) legal tool pathsdata/prioritized-tool-urls.txt: a prioritized list of all current CC legal tool URLs
API information:
- No API key required
- No query limits
Description: The Europeana Search API provides access to digital cultural heritage metadata records aggregated from museums, libraries, and archives across Europe. This project uses the API to fetch aggregated counts of cultural heritage records by data provider, rights statement, and theme.
Official API Documentation:
- Search API Documentation
- Themes are listed in the Search API Request Parameter accordion
API information:
- API key required
- Minimum 0.003 seconds between queries
- Query parameters allow:
- Full-text searching (
query) - Retrieving metadata facets (
profile=facets) - Filtering by data provider, rights statement, and theme
- Full-text searching (
- Data available through JSON format
- Offset-based pagination
Description: The Custom Search JSON API allows user-defined detailed query and access towards related query data using a programmable search engine.
Admin links:
API documentation links:
- Custom Search JSON API Reference | Programmable Search Engine | Google Developers
- Google API Python Client Library
- Google API Client Library for Python Docs |
google-api-python-client
- Reference documentation for the core library
googleapiclient.
- See: googleapiclient.discovery > build
- Library reference documentation by API
- See Custom Search v1 cse()
- Reference documentation for the core library
googleapiclient.
- Google API Client Library for Python Docs |
google-api-python-client
- Method: cse.list | Custom Search JSON API | Google Developers
- XML API reference appendices
API information:
- API key required
- Query limit: 100 queries per day
- Data available through JSON format
Notes:
- The data from Google Custom Search will only cover 50+ general, most significant categories of CC License for data collection quota constraint. As an additional note, the order of precedence of license the collected data's first column is sorted due to intermediate data analysis progress.
Description: A development platform for hosting and managing code.
API documentation link:
API information:
- API key not required but recommended by GitHub
- Query limit: 60 requests per hour if unauthenticated, 5000 requests per hour if authenticated
- Data available through JSON format
Description: Openverse is a search engine for openly licensed media,
including images and audio. It provides access to over 700 million works from
more than 20 sources, all of which are under Creative Commons licenses or in the
public domain. The API allows querying for media by source, license type, and
other parameters. Because anonymous Openverse API access returns a maximum of
~240 result count per source-license combination, the openverse_fetch.py
script currently provides approximate counts. It does not include pagination or
license_version breakdown.
API documentation link:
API information:
- No API key required for basic access
- Query limit: Rate-limited to prevent abuse (anonymous access provides ~240 results per source-license combination)
- Data available through JSON format
- Supports filtering by source, license, media type (images, audio)
- Media types:
images,audio - Supported licenses:
by,by-nc,by-nc-nd,by-nc-sa,by-nd,by-sa,cc0,nc-sampling+,pdm,sampling+
Description: The Wikipedia API allows users to query statistics of pages,
categories, revisions from a public API endpoint. We have included two urls in
the project: The WIKIPEDIA_BASE_URL AND WIKIPEDIA_MATRIX_URL. The
WIKIPEDIA_BASE_URL provides access to articles, categories, and metadata from
the English version of Wikipedia. It runs on the MediaWiki Action API, but this
instance only provides English Wikipedia data. Then the WIKIPEDIA_MATRIX_URL
provides access to information of all wikimedia projects including the different
language edition of wikipedia. It runs on the Meta-Wiki API.
API documentation link: WIKIPEDIA_BASE_URL documentation WIKIPEDIA_BASE_URL reference page WIKIPEDIA_MATRIX_URL documentation WIKIPEDIA_MATRIX_URL reference page
API information:
- No API key required
- Query limit: It is rate-limited only to prevent abuse
- Data available through XML or JSON format