dsci524_group29_webscraping

A Python package for simplified web scraping functionality for data scientists new to web scraping.

Installation

$ pip install dsci524_group29_webscraping

Functions

fetch_html(url): Retrieves the raw HTML content from the specified URL, handling HTTP requests and potential errors.
parse_content(html, selector, selector_type): Parses the provided HTML content using CSS selectors or XPath to extract specified data.
save_data(data, format, destination): Saves the extracted data into the desired format (e.g., TXT, CSV, JSON) at the specified destination path.

Python Ecosystem

While libraries like BeautifulSoup and Scrapy offer comprehensive web scraping capabilities, dsci524_group29_webscraping aims to provide a more streamlined and beginner-friendly approach. By focusing on three core functions, it abstracts the complexities involved in web scraping, making it accessible for quick tasks and educational purposes.

Similar Packages:

webscraping: Provides web scraping functions but contains a rich set of functionality that is beyond beginner level.
webscraping_tools: Offers similar functionalities and many more that in our opinion, places it in the intermediate level.

dsci524_group29_webscraping differentiates itself by offering a simple set of functions that do the job for simple, beginner level needs.

Contributors

Lixuan Lin
Hui Tang
Sienko Ikhabi

Contributing

Interested in contributing? Check out the contributing guidelines.

Please note that this project is released with a Code of Conduct. By contributing to this project, you agree to abide by the specified terms.

License

Package dsci524_group29_webscraping was created by Lixuan Lin, Hui Tang and Sienko Ikhabi for the Master of Data Science, University of British Columbia. It is licensed under the terms of the MIT license.

Credits

This project was created with cookiecutter from the py-pkgs-cookiecutter template.

Name		Name	Last commit message	Last commit date
Latest commit History 49 Commits
.github/workflows		.github/workflows
docs		docs
src/dsci524_group29_webscraping		src/dsci524_group29_webscraping
tests		tests
.gitignore		.gitignore
.readthedocs.yml		.readthedocs.yml
CHANGELOG.md		CHANGELOG.md
CONDUCT.md		CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
poetry.lock		poetry.lock
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

dsci524_group29_webscraping

Installation

Functions

Python Ecosystem

Contributors

Contributing

License

Credits

About

Releases 2

Packages

Contributors 3

Languages

License

UBC-MDS/524_group29_webscraping

Folders and files

Latest commit

History

Repository files navigation

dsci524_group29_webscraping

Installation

Functions

Python Ecosystem

Contributors

Contributing

License

Credits

About

Resources

License

Stars

Watchers

Forks

Releases 2

Packages 0

Contributors 3

Languages

Packages