Skip to content

Commit

Permalink
Apply suggestions
Browse files Browse the repository at this point in the history
  • Loading branch information
FlorianJacta committed Jan 25, 2024
1 parent b56f2d1 commit 5268bd0
Show file tree
Hide file tree
Showing 2 changed files with 10 additions and 10 deletions.
14 changes: 7 additions & 7 deletions docs/knowledge_base/tips/databricks/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@ processing capabilities.

![Databricks](databricks.png){width=100%}

## Scenarios and Databricks Integration
# Scenarios and Databricks Integration

Creating and executing jobs on Databricks involves several steps, from setting up your
Databricks workspace to defining and running jobs. Here's a step-by-step guide on how
Expand All @@ -27,7 +27,7 @@ scenarios:

- A Databricks Workspace.

### 1. Create a Databricks Notebook
**1 - Create a Databricks Notebook**

- **Navigate to Workspace:** In Databricks, navigate to the workspace where you want to
create the notebook.
Expand All @@ -38,7 +38,7 @@ create the notebook.
- **Define Notebook Details:** Enter a name for your notebook, choose the language
(e.g., Python, Scala, or SQL), and select the cluster you want to use.

### 2. Define Databricks Job Logic
** 2 - Define Databricks Job Logic**

- **Create the Cluster**: Go to the Compute section to create a cluster with your
packages required by your code. You would also need to install `dbutils` to be able to
Expand Down Expand Up @@ -76,7 +76,7 @@ through this interface.
- **Test in Notebook:** Test your code within the notebook to ensure it runs
successfully.

### 3. Create a Databricks Job
**3 - Create a Databricks Job**

- **Convert Notebook to Job:** Once your code is ready, convert the notebook into a
job. Click on the "File" menu in the notebook and select "Jobs" > "Create Job."
Expand All @@ -89,15 +89,15 @@ job. Click on the "File" menu in the notebook and select "Jobs" > "Create Job."

- **Advanced Options:** Configure any advanced options based on your requirements.

### 4. Run and Monitor the Databricks Job
** 4 - Run and Monitor the Databricks Job**

- **Run the Job:** After configuring the job settings, click "Run Now" to execute the job immediately.

- **Monitor Job Execution:** Monitor the job execution in real-time. Databricks
provides logs and detailed information about the job's progress.


## Databricks Class: Bridging the Gap
# Databricks Class: Bridging the Gap

To seamlessly integrate Databricks jobs with scenarios, we introduce the `Databricks`
class. This class is to be used within your own Taipy project. It facilitates communication with Databricks clusters, enabling users to
Expand Down Expand Up @@ -207,7 +207,7 @@ if __name__ == "__main__":

[Download the code](./example.py){: .tp-btn target='blank' }

## Databricks + Taipy
# Databricks + Taipy

In conclusion, the integration of Databricks jobs with Taipy scenarios is unlocked by
creating a class for handling Databricks jobs. This class can then be used inside Taipy as a
Expand Down
6 changes: 3 additions & 3 deletions docs/knowledge_base/tips/pyspark/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -80,7 +80,7 @@ app/
You can find the contents of each file (other than *penguins.csv*, which you can
get from [palmerpenguins repository](https://github.com/allisonhorst/palmerpenguins/blob/main/inst/extdata/penguins.csv)) in code blocks within this article.

# 1. The Spark Application (*penguin_spark_app.py*)
# The Spark Application

Usually, we run PySpark tasks with the `spark-submit` command line utility. You can read
more about the what and the why of submitting Spark jobs in their documentation
Expand Down Expand Up @@ -161,7 +161,7 @@ parameters**:
- *input-csv-path*: Path to the input penguin CSV file; and
- *output-csv-path*: Path to save the output CSV file after processing by the Spark app.

# 2. The Taipy configuration (*config.py*)
# The Taipy configuration

At this point, we have our *penguin_spark_app.py* PySpark application and need to create
**a Taipy task to run this PySpark application**.
Expand Down Expand Up @@ -395,7 +395,7 @@ However, we also defined a *validity_period* of 1 day for the *processed_penguin
node, so Taipy will still re-run the task if the DataFrame was last cached more than a
day ago.

# 3. Building a GUI (*main.py*)
# Building a GUI

We'll complete our application by **building the GUI** which we saw at the beginning of
this article:
Expand Down

0 comments on commit 5268bd0

Please sign in to comment.