A Python package for simplified web scraping functionality for data scientists new to web scraping.
$ pip install dsci524_group29_webscraping
fetch_html(url)
: Retrieves the raw HTML content from the specified URL, handling HTTP requests and potential errors.parse_content(html, selector, selector_type)
: Parses the provided HTML content using CSS selectors or XPath to extract specified data.save_data(data, format, destination)
: Saves the extracted data into the desired format (e.g., TXT, CSV, JSON) at the specified destination path.
While libraries like BeautifulSoup
and Scrapy
offer comprehensive web scraping capabilities,
dsci524_group29_webscraping aims to provide a more streamlined and beginner-friendly approach.
By focusing on three core functions, it abstracts
the complexities involved in web scraping, making
it accessible for quick tasks and educational purposes.
Similar Packages:
webscraping
: Provides web scraping functions but contains a rich set of functionality that is beyond beginner level.webscraping_tools
: Offers similar functionalities and many more that in our opinion, places it in the intermediate level.
dsci524_group29_webscraping differentiates itself by offering a simple set of functions that do the job for simple, beginner level needs.
- Lixuan Lin
- Hui Tang
- Sienko Ikhabi
Interested in contributing? Check out the contributing guidelines.
Please note that this project is released with a Code of Conduct. By contributing to this project, you agree to abide by the specified terms.
Package dsci524_group29_webscraping
was created by Lixuan Lin, Hui Tang and Sienko Ikhabi for the Master of Data Science, University of British Columbia. It is licensed under the terms of the MIT license.
This project was created with cookiecutter
from the py-pkgs-cookiecutter
template.