Skip to content

Commit

Permalink
Fix titles and navigation
Browse files Browse the repository at this point in the history
  • Loading branch information
FlorianJacta committed Jan 25, 2024
1 parent 27a7eb0 commit ff15287
Show file tree
Hide file tree
Showing 2 changed files with 18 additions and 16 deletions.
30 changes: 16 additions & 14 deletions docs/knowledge_base/tips/big_data_models/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@ enables effortless management of large datasets
This article focuses on the seamless integration of Dask (for handling out-of-core data)
with Taipy, a Python library used for pipeline orchestration and scenario management.

# 1. Sample Application
# Sample Application
Integrating Dask and Taipy is demonstrated best with an example. In this article, we'll
consider a Taipy data workflow with four tasks:

Expand Down Expand Up @@ -49,7 +49,7 @@ config.toml # (Optional) Taipy configuration in TOML made using Taipy Studio
```
<hr/>

# 2. Introducing Taipy — A Comprehensive Solution
# Taipy Solution

Taipy is more than just another orchestration tool. Specially designed for ML engineers,
data scientists, and Python developers, Taipy brings several essential and straightforward features.
Expand Down Expand Up @@ -86,7 +86,7 @@ data sources that can be used interchangeably, resulting in a cleaner, more main

<hr/>

# 3. Introducing Dask
# Introducing Dask

Dask is a popular Python package for distributed computing. The Dask API implements the
familiar Pandas, Numpy, and Scikit-learn APIs — which makes learning and using Dask much
Expand All @@ -98,7 +98,7 @@ by the Dask team.

<hr/>

# 4. Application: Customer Analysis (algos/algo.py)
# Application: Customer Analysis

![Taipy Graph](images/graph.png){width=80% style="margin:auto;display:block"}

Expand Down Expand Up @@ -241,7 +241,7 @@ def high_value_cust_summary_statistics(df: pd.DataFrame, segment_analysis: pd.Da
return result_df
```

## Task 1 — Data Preprocessing and Customer Scoring (*preprocess_and_score()*)
<h3>Task 1 — Data Preprocessing and Customer Scoring (*preprocess_and_score()*)</h3>

This is the first step in your pipeline and perhaps the most crucial. It reads a large
dataset using **Dask**, designed for larger-than-memory computation. It then calculates a
Expand All @@ -251,22 +251,24 @@ dataset using **Dask**, designed for larger-than-memory computation. It then cal
After reading and processing the dataset with Dask, this task will output a Pandas
DataFrame for further use in the remaining three tasks.

## Task 2 — Feature Engineering and Segmentation (*featurization_and_segmentation()*)
<h3>
Task 2 — Feature Engineering and Segmentation (*featurization_and_segmentation()*)
</h3>

This task takes the scored DataFrame and adds new features, such as high spending indicators It
also segments the customers based on their scores.

## Task 3 — Segment Analysis (*segment_analysis()*)
<h3>Task 3 — Segment Analysis (*segment_analysis()*)</h3>

This task takes the segmented DataFrame and performs a group-wise analysis based on the
customer segments to calculate various metrics.

## Task 4 — Summary Statistics for Customers (*high_value_cust_summary_statistics()*)
<h3>Task 4 — Summary Statistics for Customers (*high_value_cust_summary_statistics()*)</h3>

This task performs an in-depth analysis of the high-value customer segment and returns
summary statistics.

# 5. Modeling the Workflow in Taipy (config.py)
# Modeling the Workflow

![Taipy DAG](images/dag.png){width=80% style="margin:auto;display:block;border: 4px solid rgb(210,210,210);border-radius:7px" }

Expand Down Expand Up @@ -363,7 +365,7 @@ You can read more about configuring Scenarios, Tasks, and Data Nodes in the
[documentation here](../../../manuals/core/config/index.md).


# 6. Scenario Creation and Execution
# Scenario Creation and Execution

Executing a Taipy scenario involves:

Expand Down Expand Up @@ -393,7 +395,7 @@ if __name__ == "__main__":
One of Taipy's most practical features is its ability to skip a task execution if its output is
already computed. Let's explore this with some scenarios:

### Changing Payment Threshold
**Changing Payment Threshold**

```python
# Changing Payment Threshold to 1600
Expand All @@ -407,7 +409,7 @@ scenario_1.submit()
affects Task 2. In this case, we see more than a 50% reduction in execution time by running
your pipeline with Taipy.

### Changing Metric for Segment Analysis
**Changing Metric for Segment Analysis**

```python
# Changing metric to median
Expand All @@ -421,7 +423,7 @@ scenario_1.submit()
and Task 2.


### Changing Summary Statistic Type
**Changing Summary Statistic Type**
```python
# Changing summary_statistic_type to max
scenario_1.summary_statistic_type.write("max")
Expand All @@ -439,7 +441,7 @@ incredibly useful when dealing with large datasets.
<hr/>


# 7. Taipy Studio
# Taipy Studio

If you have VS Code, you may use [Taipy Studio](../../../manuals/studio/config/index.md)
to build the Taipy *config.toml* configuration file in place of defining the
Expand Down
4 changes: 2 additions & 2 deletions docs/knowledge_base/tips/databricks/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@ versions of a business problem. In this article, we'll look into the integration
Databricks jobs with Taipy scenarios, showcasing how this can elevate your data
processing capabilities.

![Databricks](databricks.png){width=100%}
![Databricks](databricks.png){width=50% style="margin:auto;display:block;"}

# Scenarios and Databricks Integration

Expand All @@ -38,7 +38,7 @@ create the notebook.
- **Define Notebook Details:** Enter a name for your notebook, choose the language
(e.g., Python, Scala, or SQL), and select the cluster you want to use.

** 2 - Define Databricks Job Logic**
**2 - Define Databricks Job Logic**

- **Create the Cluster**: Go to the Compute section to create a cluster with your
packages required by your code. You would also need to install `dbutils` to be able to
Expand Down

0 comments on commit ff15287

Please sign in to comment.