-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathREADME
33 lines (27 loc) · 905 Bytes
/
README
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
Crawler Version 0.04
A crawler is a program that starts with a url on the web (ex: http://python.org),
fetches the web-page corresponding to that url, and parses all the links on that
page into a repository of links. Next, it fetches the contents of any of the url
from the repository just created, parses the links from this new content into
the repository and continues this process for all links in the repository until
stopped or after a given number of links are fetched.
Requirements
1.Libraries
1.requests
2.optparse
3.urlparse
4.BeautifulSoup
Version 0.01
Fetches links for content-type : text/html
Fetches links for only ANCHOR TAGS -- <A>
Version 0.02
Added Support for fetching links from IFRAME & FRAME TAGS
Improve Results Display
Bug Fixes
Version 0.03
Bug Fixes & Enhancements
Added Logging
Version 0.04
Bug Fixes
Code Improvement
Exception Handling & Logging For Certain Cases