Published on: 15.03.2018
Conclusion
Use get_emails() from webscraping Python package.
Python strength
The best thing about Python is huge numbers of 3rd party packages.
With a lot of them, you can solve your problems with just a few lines of code.
Let’s say that you want to find all emails in some HTML document, either for an offline or online web page.
This can be done with webscraping package.
First, install it with:
1 |
pip install webscraping |
Code for finding all emails on the single page is:
1 2 3 4 5 6 7 8 |
from webscraping import download, alg D = download.Download() html = D.get('http://buklijas.info/') emails = alg.extract_emails(html) print emails |
Line 1 is importing download
and alg
from webscraping
package that you have just installed.
Line 3 is creating download.Download()
object and calling it D
.
Line 4 is saving the web page from where you want to find all emails in html
variable.
Line 6 is finding all emails from your html
variable and saving all emails in emails
Python list.
Line 8 is showing all emails that have been found on the screen.
This will work for a single web page.
How to find emails on the whole site
If you want to search the whole website for emails, not just one page, you can use following code.
1 2 3 4 5 6 7 |
from webscraping import download D = download.Download() emails = D.get_emails("http://buklijas.info/", max_depth=2, max_urls=None, max_emails=None) print emails |
With max_depth, max_urls, max_emails parameters you can define how long your searching should be.
Happy spamming.
P.S. just joking 🙂
Thank you so much hun!xx
I am glad you found it useful.
Hi. First of all Thank you?
My question is, how can i use/rotade proxies while processing? and how can i save those harvested mails in a txt or excel?
Dear Dominic,
You need to write additional code for it.
For excel example take a look at:
https://github.com/sasa-buklijas/hrvatski_dinar/blob/master/xlsx_to_csv.py