How to find all emails on the web page ?

Published on: 15.03.2018

Conclusion

Use get_emails() from webscraping Python package.

Python strength

The best thing about Python is huge numbers of 3rd party packages.

With a lot of them, you can solve your problems with just a few lines of code.

Let’s say that you want to find all emails in some HTML document, either for an offline or online web page.

This can be done with webscraping package.

First, install it with:

pip install webscraping

1	pip install webscraping

Code for finding all emails on the single page is:

from webscraping import download, alg

D = download.Download()
html = D.get('http://buklijas.info/')

emails = alg.extract_emails(html)

print emails

from webscraping import download, alg

D = download.Download()

html = D.get('http://buklijas.info/')

emails = alg.extract_emails(html)

print emails

Line 1 is importing download and alg from webscraping package that you have just installed.

Line 3 is creating download.Download() object and calling it D.

Line 4 is saving the web page from where you want to find all emails in html variable.

Line 6 is finding all emails from your html variable and saving all emails in emails Python list.

Line 8 is showing all emails that have been found on the screen.

This will work for a single web page.

How to find emails on the whole site

If you want to search the whole website for emails, not just one page, you can use following code.

from webscraping import download

D = download.Download()

emails = D.get_emails("http://buklijas.info/", max_depth=2, max_urls=None, max_emails=None)

print emails

from webscraping import download

D = download.Download()

emails = D.get_emails("http://buklijas.info/", max_depth=2, max_urls=None, max_emails=None)

print emails

With max_depth, max_urls, max_emails parameters you can define how long your searching should be.

Happy spamming.

P.S. just joking 🙂

4 comments

tahn says:

March 16, 2018 at 8:20 pm

Thank you so much hun!xx

1. Sasa Buklijas says:
  
  March 17, 2018 at 6:41 pm
  
  I am glad you found it useful.
  
Dominic says:

December 31, 2018 at 3:58 pm

Hi. First of all Thank you?

My question is, how can i use/rotade proxies while processing? and how can i save those harvested mails in a txt or excel?

1. Sasa Buklijas says:
  
  January 13, 2019 at 7:32 pm
  
  Dear Dominic,
  
  You need to write additional code for it.
  For excel example take a look at:
  https://github.com/sasa-buklijas/hrvatski_dinar/blob/master/xlsx_to_csv.py

Join the Conversation Cancel reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Conclusion

Python strength

How to find emails on the whole site

Share this:

4 comments

Join the Conversation Cancel reply