diff --git a/README.md b/README.md index 46ee86dca..f17b58289 100644 --- a/README.md +++ b/README.md @@ -1,330 +1,267 @@
B()
and C()
refer to function A
in their parameters
-
-
-
-
- Optional UI to browse transforms, monitor datasets, and track executions -
+# Installation -## Problems Hamilton Solves -β Model a dataflow -- If you can model your problem as a DAG in python, Hamilton is the cleanest way to build it.
+
+
+
+
+ DAG catalog, automatic dataset profiling, and execution tracking +
-### Tracking in the UI -To get started with tracking in the UI, you'll first have to install the `sf-hamilton[ui]` package: +## Get started with the Hamilton UI -```bash -pip install "sf-hamilton[ui,sdk]" -``` +1. To use the Hamilton UI, install the dependencies (see `Installation` section) and start the server with -Then, you can run the following code to start the UI: + ```bash + hamilton ui + ``` -```bash -hamilton ui -# python -m hamilton.cli.__main__ ui # on windows -``` +2. On the first connection, create a `username` and a new project (the `project_id` should be `1`). -This will start the UI at [localhost:8241](https://localhost:8241). You can then navigate to the UI to see your dataflows. -You will next want to create a project (you'll have an empty project page), and remember the project ID (E.G. 2 in the following case). -You will also be prompted to enter a username -- recall that as well! - -To track, we'll modify the driver you wrote above: - -```python -import pandas as pd -import my_functions -from hamilton import driver -from hamilton_sdk import adapters -dr = ( - driver - .Builder() - .with_modules(my_functions) - .with_adapters(adapters.HamiltonTracker( - username="elijah", # replace with your username - project_id=2, - dag_name="hello_world", - )) - .build() -) - -# This is input data -- you can get it from anywhere -initial_columns = { - 'signups': pd.Series([1, 10, 50, 100, 200, 400]), - 'spend': pd.Series([10, 10, 20, 40, 40, 50]), -} -output_columns = [ - 'spend', - 'signups', - 'avg_3wk_spend', - 'spend_per_signup', -] -df = dr.execute(output_columns, inputs=initial_columns) -print(df) -``` -Run this script, navigate back to the UI/select your project, and click on the `runs` -link on the left hand side. You'll see your run! +Layer | +Purpose | +Example Tools | +
---|---|---|
Orchestration | +Operational system for the creation of assets | +Airflow, Metaflow, Prefect, Dagster | +
Asset | +Organize expressions into meaningful units (e.g., dataset, ML model, table) |
+ Hamilton, dbt, dlt, SQLMesh, Burr | +
Expression | +Language to write data transformations | +pandas, SQL, polars, Ibis, LangChain | +
Execution | +Perform data transformations | +Spark, Snowflake, DuckDB, RAPIDS | +
Data | +Physical representation of data, inputs and outputs | +S3, Postgres, file system, Snowflake | +