Skip to content

Commit

Permalink
Documentation for advanced 'large datasets' example (#1153)
Browse files Browse the repository at this point in the history
RefMan: Fixed links to packages
Fixed charts type list generation
  • Loading branch information
FabienLelaquais authored Oct 8, 2024
1 parent 7fcdc40 commit ce4b663
Show file tree
Hide file tree
Showing 17 changed files with 241 additions and 49 deletions.
56 changes: 31 additions & 25 deletions docs/refmans/gui/viselements/generic/chart.md_template
Original file line number Diff line number Diff line change
Expand Up @@ -176,39 +176,44 @@ for more details.

## Large datasets

Displaying large datasets poses several challenges, both from a technical and user experience
standpoint. These challenges can impact performance, usability, and the overall effectiveness of
data presentation.
Displaying large datasets presents several challenges from both technical and user experience
perspectives. These challenges can impact performance, usability, and the overall effectiveness of
data visualization.

The most prominent issues are:

- Performance Issues<br/>
- Loading Time: Large datasets can significantly increase the display time of your application,
leading to a poor user experience.
- Memory Usage: Keeping large amounts of data in memory can strain client devices, potentially
leading to browser crashes or sluggish performance.
- Network Performance: Transmitting large datasets over the network can be slow and expensive in
terms of bandwidth, especially for users on limited data plans or slow connections.
- Loading Time: Large datasets can significantly increase application load times, leading to a
poor user experience;
- Memory Usage: Storing large datasets in memory can strain client devices, potentially causing
browser crashes or slow performance;
- Network Performance: Transmitting large datasets over the network can be slow and
resource-intensive, especially for users with limited bandwidth or slow connections.
- User Experience Challenges<br/>
- Designing visualizations that scale well with the size of the data and remain informative and
actionable can be challenging.
Designing visualizations that effectively scale with large datasets while remaining informative
and actionable can be difficult.
- Technical Limitations<br/>
Browsers have limitations on how much data they can efficiently process and display, which can
restrict the amount of data that can be shown at one time.
Browsers have inherent limits on how much data they can efficiently process and display,
restricting the amount of data that can be shown at once.

The chart control provides different classes that can help deliver relevant representations of large
data sets by applying a technical method called *data decimation*.<br/>
Data decimation involves reducing the volume of data to be sent to the browser and displayed without
significantly losing the usefulness of the original data. The goal is to make visualization more
efficient while retaining the essential characteristics of the data.
The chart control in Taipy GUI provides several classes that can help manage and visualize large
datasets by using a technique called *data decimation*.<br/>
Data decimation reduces the amount of data sent to the browser and displayed without significantly
losing its value, making visualization more efficient while preserving essential characteristics.

The [`taipy.gui.data`](../../../../refmans/reference/pkg_taipy/pkg_gui/pkg_data/index.md) package
defines implementations of different decimation algorithms in classes that inherit the `Decimator^`
class.
The `taipy.gui.data^` package offers various implementations of decimation algorithms through
classes that inherit from the `Decimator^` class.

To use these algorithms and manage the challenges posed by representing large datasets, you must
instantiate the decimator class that best matches the dataset you are dealing with and set the
instance to the [`decimator`](#p-decimator) property of the chart control that represents the data.
To leverage these algorithms and handle large datasets efficiently:

- Instantiate the decimator class that best suits your dataset.
- Assign the instance to the [`decimator`](#p-decimator) property of the chart control representing
the data.

An [advanced example](charts/advanced.md#large-datasets) demonstrates when and how decimation can
be used.<br/>
This documentation also contains an [article](../../../../tutorials/articles/decimator/index.md)
that provides details on this feature.

## The *rebuild* property {data-source="gui:doc/examples/charts/example_rebuild.py"}

Expand Down Expand Up @@ -299,7 +304,8 @@ name to select the charts on your page and apply style.

## [Stylekit](../../../../userman/gui/styling/stylekit.md) support

The [Stylekit](../../../../userman/gui/styling/stylekit.md) provides a specific class that you can use to style charts:
The [Stylekit](../../../../userman/gui/styling/stylekit.md) provides a specific class that you can
use to style charts:

* *has-background*<br/>
When the chart control uses the *has-background* class, the rendering of the chart
Expand Down
180 changes: 179 additions & 1 deletion docs/refmans/gui/viselements/generic/charts/advanced.md_template
Original file line number Diff line number Diff line change
@@ -1,4 +1,3 @@

# Advanced topics

Taipy exposes advanced features that the Plotly library provides. This section
Expand Down Expand Up @@ -90,6 +89,185 @@ Here is what the resulting plot looks like:
<figcaption>Unbalanced data sets</figcaption>
</figure>

## Large datasets {data-source="gui:doc/examples/charts/advanced_large_datasets.py"}

When binding a chart component to a large dataset, performance becomes a critical consideration.
Large datasets can consume significant system resources, resulting in slower rendering times,
reduced responsiveness, and overall degraded performance, especially in interactive
applications.<br/>
These issues can negatively impact the user experience, particularly when users need to interact
with or manipulate the chart.

Consider a scenario where an application needs to visualize large datasets, allowing users to
trigger calculations and make decisions based on the data.<br/>
Below is a basic example of a large dataset represented in Python:
```python
x_values = ...
Y_values = ...
data = pd.DataFrame({"X": x_values, "Y": y_values})
```

In this example, *x_values* could be a sequence of 50,000 integers, while *y_values* is generated
from a noisy log-sine function, also containing 50,000 samples.

To represent this data using Taipy GUI, you can define a `chart` control as follows:
!!! taipy-element
default={data}
type=markers
x=X
y=Y

The initial chart display with 50,000 data points will look like this:
<figure>
<img src="../advanced_large_datasets_none-d.png" class="visible-dark" />
<img src="../advanced_large_datasets_none-l.png" class="visible-light"/>
<figcaption>Initial dataset</figcaption>
</figure>

As seen in the chart, with 50,000 data points, it becomes difficult to interpret the information due
to over-plotting. Furthermore, rendering such a large dataset takes significant time, and the entire
dataset must be transmitted to the frontend, further affecting performance.

To improve both application performance and chart readability, it is necessary to reduce the number
of data points rendered. This can be achieved through techniques such as downsampling, aggregation,
or filtering, which can limit the volume of data without losing critical insights

<h4>Solution 1: Linear interpolation</h4>

One approach to handle large datasets is to apply linear interpolation. This technique reduces the
number of data points by approximating the values between points along a straight line, effectively
downsampling the dataset.

Using Python and the [NumPy](https://numpy.org/) package, this solution can be implemented
easily and efficiently:
```python linenums="1"
x_values = x_values[::100]
y_values = y_values.reshape(-1, 100)
y_values = np.mean(y_values, axis=1)
```

- line 1: The original *x_values* array is reduced by selecting one value for every 100 points.
- line 2: The *y_values* array is reshaped, grouping every 100 consecutive data points.
- line 3: The mean of each group of 100 points is calculated, resulting in a smaller dataset.

This results in a dataset with 100 times fewer points, while still preserving the overall shape of
the original data.

Here is the updated chart using the downsampled (interpolated) dataset:
<figure>
<img src="../advanced_large_datasets_average-d.png" class="visible-dark" />
<img src="../advanced_large_datasets_average-l.png" class="visible-light"/>
<figcaption>Downsampled dataset</figcaption>
</figure>

While linear interpolation significantly reduces the dataset size and improves chart performance,
it has some limitations. Specifically, it smooths out the data, which can obscure important details,
such as sharp changes or high-frequency fluctuations.<br/>
With this technique, the noise present in the original sine function has been smoothed away, which
might not be desirable for data analysts or scientists who need to observe finer data
characteristics.

<h4>Solution 2: Sub-sampling</h4>

Sub-sampling is a simple technique where a representative subset of the original data is selected,
for example, by picking every 100th data point. This directly reduces the number of points,
enhancing performance.

With [NumPy](https://numpy.org/) and Python's slicing syntax, sub-sampling is straightforward:
```python
x_values = x_values[::100]
y_values = y_values[::100]
```
Both *x_values* and *y_values* are reduced by selecting every 100th element from the original
arrays.

As a result, the dataset *data* is reduced to 1/100th of its original size, while still retaining
the overall shape of the data.

Here is the chart with the sub-sampled dataset:
<figure>
<img src="../advanced_large_datasets_subsampling-d.png" class="visible-dark" />
<img src="../advanced_large_datasets_subsampling-l.png" class="visible-light"/>
<figcaption>Sub-sampled dataset</figcaption>
</figure>

However, sub-sampling has its limitations. It may skip over significant trends or abrupt changes in
the data, especially if key points are not selected. While it's a quick and efficient solution, it
may result in the loss of critical details in datasets with high-frequency variations or sudden
transitions.<br/>
As you can see, only a very few number of noisy data points remain after the sampling, which is
expected.

<h4>Solution 3: Decimation</h4>

Decimation is a more refined approach to reducing dataset size while preserving essential
information and trends. By selectively removing data points based on specific criteria, such as
frequency content or statistical significance, decimation balances performance with data integrity.


The `chart` control has a [*decimator*](../chart.md#p-decimator) property that accepts an instance
of a subclass of `Decimator^`. This class transforms the dataset declared in the
[*data*](../chart.md#p-data) property, reducing the number of points to be displayed
while ensuring that key data features are retained.


To use decimation, you must instantiate a decimator:
```python
decimator = MinMaxDecimator(200)
```

In this case, the decimator limits the displayed points to 200, which is an extreme reduction, but
it highlights the effect. For practical usage, typical values range from 1000 to 2000, depending on
the horizontal resolution of the screen: it's often unnecessary to render more points than a monitor
can display.

Several decimator types are available, including `MinMaxDecimator^`, `LTTB^`, `RDP^`, and
`ScatterDecimator^`. Each of these implements different algorithms, better suited for specific
shapes of data.

To apply the decimator to the chart, set the *decimator* variable to the
[*decimator*](../chart.md#p-decimator) property:
!!! taipy-element
default={data}
type=markers
x=X
y=Y
decimator={decimator}

Here is the resulting chart after applying decimation:
<figure>
<img src="../advanced_large_datasets_decimation-d.png" class="visible-dark" />
<img src="../advanced_large_datasets_decimation-l.png" class="visible-light"/>
<figcaption>Decimation applied</figcaption>
</figure>

The chart retains more of the dataset's characteristics compared to simpler methods like
sub-sampling or linear interpolation. The global sine-like shape is preserved, and the noise remains
visible.<br/>
At the same time, performance is greatly improved: only 200 points are rendered instead of 50,000,
a reduction by a factor of 250.

Note that decimation does not alter nor duplicate the original dataset.<br/>
You can zoom in and out using the chart's built-in zoom tool to reveal or hide more details
dynamically.

For example, the following image shows the chart zoomed in on a specific area:
<figure>
<img src="../advanced_large_datasets_decimation-zoom1-d.png" class="visible-dark" />
<img src="../advanced_large_datasets_decimation-zoom1-l.png" class="visible-light"/>
<figcaption>Selecting an area to zoom in</figcaption>
</figure>

And here is the result after zooming in:
<figure>
<img src="../advanced_large_datasets_decimation-zoom2-d.png" class="visible-dark" />
<img src="../advanced_large_datasets_decimation-zoom2-l.png" class="visible-light"/>
<figcaption>Chart after zoom</figcaption>
</figure>

As you zoom in, more details are revealed, while still adhering to the 200-point limit, ensuring
smooth performance and responsive interactions.

## Adding annotations {data-source="gui:doc/examples/charts/advanced_annotations.py"}

You can add text annotations on top of a chart using the *annotations* property of
Expand Down
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
9 changes: 3 additions & 6 deletions docs/userman/gui/pages/builder.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,16 +2,13 @@
title: The Page Builder API
---

The Page Builder API is a set of classes located in the
[`taipy.gui.builder`](../../../refmans/reference/pkg_taipy/pkg_gui/pkg_builder/index.md) package
that lets users create Taipy GUI pages entirely from Python code.
The Page Builder API is a set of classes located in the `taipy.gui.builder^` package that lets users
create Taipy GUI pages entirely from Python code.

This package contains a class for every visual element available in Taipy, including those
defined in [extension libraries](../extension/index.md).

To access the Page Builder classes, you must import the
[`taipy.gui.builder`](../../../refmans/reference/pkg_taipy/pkg_gui/pkg_builder/index.md) package in
your script.
To access the Page Builder classes, you must import the `taipy.gui.builder^` package in your script.

# Generating a new page

Expand Down
37 changes: 23 additions & 14 deletions tools/_setup_generation/step_viselements.py
Original file line number Diff line number Diff line change
Expand Up @@ -128,7 +128,7 @@ def __generate_toc_file(self, tocs: Dict[str, VEToc]):
md_file.write(md_template)

@staticmethod
def __get_navigation_section(category: str, prefix:str) -> str:
def __get_navigation_section(category: str, prefix: str) -> str:
if category == "blocks":
return "Blocks"
if prefix == "core_":
Expand Down Expand Up @@ -238,8 +238,8 @@ def __generate_element_doc(self, element_type: str, category: str):
raise ValueError(
f"Couldn't locate first header in documentation for element '{element_type}'"
)
before_properties = match.group(1)
after_properties = match.group(2) + element_documentation[match.end() :]
before_properties = match[1]
after_properties = match[2] + element_documentation[match.end() :]

# Chart hook
if element_type == "chart":
Expand Down Expand Up @@ -288,7 +288,6 @@ def __generate_builder_api(self) -> None:
py_content = py_content[: m.start(0) + 1]

def generate(self, category, base_class: str) -> str:

element_types = self.categories[category]

def build_doc(property: str, desc, indent: int):
Expand Down Expand Up @@ -382,16 +381,17 @@ def __init__(self, [arguments]) -> None:
element_md_location = (
"corelements" if desc["prefix"] == "core_" else "generic"
)
if m := (re.search
(r"(\[`(\w+)`\]\()\2\.md\)", short_doc)):
if m := (re.search(r"(\[`(\w+)`\]\()\2\.md\)", short_doc)):
short_doc = (
short_doc[: m.start()]
+ f"{m[1]}../../../../../refmans/gui/viselements/{element_md_location}/{m[2]}.md)"
+ short_doc[m.end() :]
)

element_md_page = (f"[`{element_type}`](../../../../../../refmans/gui/viselements/{element_md_location}"
f"/{element_type}.md)")
element_md_page = (
f"[`{element_type}`](../../../../../../refmans/gui/viselements/{element_md_location}"
f"/{element_type}.md)"
)
buffer.write(
template.replace("[element_type]", element_type)
.replace("[element_md_page]", element_md_page)
Expand All @@ -414,7 +414,9 @@ def __init__(self, [arguments]) -> None:

# Special case for charts: we want to insert the chart gallery that
# is stored in the file whose path is in self.charts_home_html_path
# This should be inserted before the first level 1 header
# This should be inserted before the first header.
# Simultaneously, we build a list of chart types to point to type pages as text.
# This should be inserted before the "Styling" header.
def __chart_page_hook(
self, element_documentation: str, before: str, after: str, charts_md_dir: str
) -> tuple[str, str]:
Expand All @@ -432,12 +434,12 @@ def __chart_page_hook(
chart_gallery = "\n" + chart_gallery[match.end() :]
SECTION_RE = re.compile(r"^([\w-]+):(.*)$")
chart_sections = ""
for line in match.group(1).splitlines():
for line in match[1].splitlines():
if match := SECTION_RE.match(line):
type = match.group(1)
chart_sections += f"- [{match.group(2)}](charts/{type}.md)\n"
chart_sections += f"\n- [{match.group(2)}](charts/{type}.md)"
# Generate chart type documentation page from template, if possible
template_doc_path = f"{charts_md_dir}/{type}.md_template"
# Generate chart type documentation page if possible
if os.access(template_doc_path, os.R_OK):
with open(template_doc_path, "r") as template_doc_file:
documentation = template_doc_file.read()
Expand All @@ -454,9 +456,16 @@ def __chart_page_hook(
raise ValueError(
"Couldn't locate first header1 in documentation for element 'chart'"
)
styling_match = re.search(
r"\n# Styling\n", after, re.MULTILINE | re.DOTALL
)
if not styling_match:
raise ValueError(
"Couldn't locate \"Styling\" header1 in documentation for element 'chart'"
)
return (
match.group(1) + chart_gallery + before[match.end() :],
after + chart_sections,
match[1] + chart_gallery + before[match.end() :],
after[: styling_match.start()] + chart_sections + "\n\n" + after[styling_match.start() :]
)

def __process_element_md_file(self, type: str, documentation: str) -> str:
Expand Down
Loading

0 comments on commit ce4b663

Please sign in to comment.