A simple Web Crawler which is intended to crawl through several research paper publishing websites and extract relevant data. The extraction will be done from manuscripts available online in pdf form. The query will consist of the keywords which are to be matched with the titles of the research articles. The result of the search will contain the Name of the Author, Affiliation and the email address.
Scrapy is a free and open-source web-crawling framework written in Python. Originally designed for web scraping, it can also be used to extract data using APIs or as a general-purpose web crawler. Selenium is an open-source web-based automation tool. Selenium can send the standard Python commands to different browsers, despite variation in their browser's design. Chrome Driver for the version of your browser. https://chromedriver.chromium.org/downloadstypecaster/Chrome-Based-Web-Scraper
Folders and files
| Name | Name | Last commit date | ||
|---|---|---|---|---|