-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge pull request #5 from lucaromagnoli/docs
Docs
- Loading branch information
Showing
40 changed files
with
1,473 additions
and
775 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -139,3 +139,4 @@ dmypy.json | |
cython_debug/ | ||
|
||
.idea/ | ||
/temp/ |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,16 +1,16 @@ | ||
repos: | ||
- repo: https://github.com/pre-commit/pre-commit-hooks | ||
rev: v2.3.0 | ||
hooks: | ||
- id: check-yaml | ||
- id: end-of-file-fixer | ||
- id: trailing-whitespace | ||
- repo: https://github.com/pre-commit/pre-commit-hooks | ||
rev: v2.3.0 | ||
hooks: | ||
- id: check-yaml | ||
- id: end-of-file-fixer | ||
- id: trailing-whitespace | ||
- repo: https://github.com/astral-sh/ruff-pre-commit | ||
# Ruff version. | ||
rev: v0.5.6 | ||
hooks: | ||
# Run the linter. | ||
# Run the linter and sort imports. | ||
- id: ruff | ||
args: [ --fix ] | ||
args: [--fix] | ||
# Run the formatter. | ||
- id: ruff-format |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,22 @@ | ||
version: 2 | ||
|
||
build: | ||
os: ubuntu-22.04 | ||
tools: | ||
python: "3.12" | ||
|
||
|
||
sphinx: | ||
configuration: source/conf.py | ||
|
||
# Optionally build your docs in additional formats such as PDF and ePub | ||
# formats: | ||
# - epub | ||
|
||
# Optional but recommended, declare the Python requirements required | ||
# to build your documentation | ||
# See https://docs.readthedocs.io/en/stable/guides/reproducible-builds.html | ||
# python: | ||
# install: | ||
# - requirements: docs/requirements.txt |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,20 @@ | ||
# Minimal makefile for Sphinx documentation | ||
# | ||
|
||
# You can set these variables from the command line, and also | ||
# from the environment for the first two. | ||
SPHINXOPTS ?= | ||
SPHINXBUILD ?= sphinx-build | ||
SOURCEDIR = source | ||
BUILDDIR = build | ||
|
||
# Put it first so that "make" without argument is like "make help". | ||
help: | ||
@$(SPHINXBUILD) -M help "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O) | ||
|
||
.PHONY: help Makefile | ||
|
||
# Catch-all target: route all unknown targets to Sphinx using the new | ||
# "make mode" option. $(O) is meant as a shortcut for $(SPHINXOPTS). | ||
%: Makefile | ||
@$(SPHINXBUILD) -M $@ "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,49 @@ | ||
DataService | ||
=========== | ||
|
||
Lightweight - async - data gathering for Python. | ||
____________________________________________________________________________________ | ||
DataService is a lightweight data gathering library for Python. | ||
|
||
Designed for simplicity, it uses common web scraping and data gathering patterns. | ||
|
||
No complex API to learn, just standard Python idioms. | ||
|
||
Asynchronous implementation, synchronous interface. | ||
|
||
How to use DataService | ||
------- | ||
|
||
To start, create a ``DataService`` instance with an ``Iterable`` of ``Request`` objects. This setup provides you with an ``Iterator`` of data objects that you can then iterate over or convert to a ``list``, ``tuple``, a ``pd.DataFrame`` or any data structure of choice. | ||
|
||
.. code-block:: python | ||
start_requests = [Request(url="https://books.toscrape.com/index.html", callback=parse_books_page, client=HttpXClient())] | ||
data_service = DataService(start_requests) | ||
data = tuple(data_service) | ||
A ``Request`` is a ``Pydantic`` model that includes the URL to fetch, a reference to the ``client`` callable, and a ``callback`` function for parsing the ``Response`` object. | ||
|
||
The client can be any Python callable that accepts a ``Request`` object and returns a ``Response`` object. ``DataService`` provides an ``HttpXClient`` class, which is based on the ``httpx`` library, but you are free to use your own custom async client. | ||
|
||
The callback function processes a ``Response`` object and returns either ``data`` or additional ``Request`` objects. | ||
|
||
In this trivial example we are requesting the `Books to Scrape <https://books.toscrape.com/index.html>`_ homepage and parsing the number of books on the page. | ||
|
||
Example ``parse_books_page`` function: | ||
|
||
.. code-block:: python | ||
def parse_books_page(response: Response): | ||
articles = response.soup.find_all("article", {"class": "product_pod"}) | ||
return { | ||
"url": response.request.url, | ||
"title": response.soup.title.get_text(strip=True), | ||
"articles": len(articles), | ||
} | ||
This function takes a ``Response`` object, which has a ``soup`` attribute (a ``BeautifulSoup`` object of the HTML content). The function parses the HTML content and returns data. | ||
|
||
The callback function can ``return`` or ``yield`` either ``data`` (dict or dataclass) or more ``Request`` objects. | ||
|
||
If you have used Scrapy before, you will find this pattern familiar. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,17 +1,22 @@ | ||
from dataservice.clients import HttpXClient | ||
from dataservice.config import ServiceConfig | ||
from dataservice.data import BaseDataItem, DataWrapper | ||
from dataservice.exceptions import RequestException, RetryableRequestException | ||
from dataservice.logs import setup_logging | ||
from dataservice.models import Request, Response | ||
from dataservice.pipeline import Pipeline | ||
from dataservice.service import DataService | ||
|
||
__all__ = [ | ||
"BaseDataItem", | ||
"DataService", | ||
"DataWrapper", | ||
"HttpXClient", | ||
"Pipeline", | ||
"Request", | ||
"Response", | ||
"RequestException", | ||
"RetryableRequestException", | ||
"ServiceConfig", | ||
"setup_logging", | ||
] | ||
|
||
__version__ = "0.0.1" |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.