Skip to content

husticfilip/WebsitePhoneNumAndLogoExtractor

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

32 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Application is used to extract phone numbers and path to logo from webpage represented by provided url.

Before running the script install all necessary modules which are listed in requirements.txt. You can do it by running command:
$ pip install -r requirements.txt 


extract.py is a entrance point of the application. Script expects url to the website to be passed as the first program argument.
Example of running the script:
$ python3 extract.py https://contact.pepsico.com/pepsi

Program has two lines of output. First one contains phone numbers found on the webpage and the second one contains
url to the logo used in the webpage. If program found no phone numbers on the webpage, output of the first line is None.
If program found no logo url on the webpage, output of the second line is None.
Example output:
(800) 433 2652, 1 800 433 2652, 1 833 548 0119
https://contact.pepsico.com/pepsi/files/pepsi/brands/1692797414/Pepsi_New_Logo@2x.png


Phone number extraction is done by collecting all possible phone numbers from the page. This phone numbers 
are then processed in two phases:
	1) While collecting numbers we collect numbers prefixes as well. If prefix contains some of the key words like
	   "Phone", "Tel" or "Fax" number is accepted as the phone number.
	2) All numbers that are not accepted in the first stage continue to the second stage.
	   Here we apply number of filters on numbers trying to filter out phone numbers from other types of numbers.
	   For example we filter out dates, decimal numbers, numbers to short to be a phone number and so on.

Logo path extraction works by collecting all <img> tags as well as their parent tags from the html document. In this
tags we are searching for mention of some key words like "logo" or "icon". If we find multiple tags containing key
words we pick one that is most probable. 

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages