Documentation for advanced 'large datasets' example (#1153)

RefMan: Fixed links to packages Fixed charts type list generation
Avaiga · Oct 8, 2024 · ce4b663 · ce4b663
1 parent 7fcdc40
commit ce4b663
Show file tree

Hide file tree

Showing 17 changed files with 241 additions and 49 deletions.
diff --git a/docs/refmans/gui/viselements/generic/chart.md_template b/docs/refmans/gui/viselements/generic/chart.md_template
@@ -176,39 +176,44 @@ for more details.
 
 ## Large datasets
 
-Displaying large datasets poses several challenges, both from a technical and user experience
-standpoint. These challenges can impact performance, usability, and the overall effectiveness of
-data presentation.
+Displaying large datasets presents several challenges from both technical and user experience
+perspectives. These challenges can impact performance, usability, and the overall effectiveness of
+data visualization.
 
 The most prominent issues are:
 
 - Performance Issues<br/>
-    - Loading Time: Large datasets can significantly increase the display time of your application,
-      leading to a poor user experience.
-    - Memory Usage: Keeping large amounts of data in memory can strain client devices, potentially
-      leading to browser crashes or sluggish performance.
-    - Network Performance: Transmitting large datasets over the network can be slow and expensive in
-      terms of bandwidth, especially for users on limited data plans or slow connections.
+    - Loading Time: Large datasets can significantly increase application load times, leading to a
+      poor user experience;
+    - Memory Usage: Storing large datasets in memory can strain client devices, potentially causing
+      browser crashes or slow performance;
+    - Network Performance: Transmitting large datasets over the network can be slow and
+      resource-intensive, especially for users with limited bandwidth or slow connections.
 - User Experience Challenges<br/>
-    - Designing visualizations that scale well with the size of the data and remain informative and
-      actionable can be challenging.
+  Designing visualizations that effectively scale with large datasets while remaining informative
+  and actionable can be difficult.
 - Technical Limitations<br/>
-  Browsers have limitations on how much data they can efficiently process and display, which can
-  restrict the amount of data that can be shown at one time.
+  Browsers have inherent limits on how much data they can efficiently process and display,
+  restricting the amount of data that can be shown at once.
 
-The chart control provides different classes that can help deliver relevant representations of large
-data sets by applying a technical method called *data decimation*.<br/>
-Data decimation involves reducing the volume of data to be sent to the browser and displayed without
-significantly losing the usefulness of the original data. The goal is to make visualization more
-efficient while retaining the essential characteristics of the data.
+The chart control in Taipy GUI provides several classes that can help manage and visualize large
+datasets by using a technique called *data decimation*.<br/>
+Data decimation reduces the amount of data sent to the browser and displayed without significantly
+losing its value, making visualization more efficient while preserving essential characteristics.
 
-The [`taipy.gui.data`](../../../../refmans/reference/pkg_taipy/pkg_gui/pkg_data/index.md) package
-defines implementations of different decimation algorithms in classes that inherit the `Decimator^`
-class.
+The `taipy.gui.data^` package offers various implementations of decimation algorithms through
+classes that inherit from the `Decimator^` class.
 
-To use these algorithms and manage the challenges posed by representing large datasets, you must
-instantiate the decimator class that best matches the dataset you are dealing with and set the
-instance to the [`decimator`](#p-decimator) property of the chart control that represents the data.
+To leverage these algorithms and handle large datasets efficiently:
+
+- Instantiate the decimator class that best suits your dataset.
+- Assign the instance to the [`decimator`](#p-decimator) property of the chart control representing
+  the data.
+
+An [advanced example](charts/advanced.md#large-datasets) demonstrates when and how decimation can
+be used.<br/>
+This documentation also contains an [article](../../../../tutorials/articles/decimator/index.md)
+that provides details on this feature.
 
 ## The *rebuild* property {data-source="gui:doc/examples/charts/example_rebuild.py"}
 
@@ -299,7 +304,8 @@ name to select the charts on your page and apply style.
 
 ## [Stylekit](../../../../userman/gui/styling/stylekit.md) support
 
-The [Stylekit](../../../../userman/gui/styling/stylekit.md) provides a specific class that you can use to style charts:
+The [Stylekit](../../../../userman/gui/styling/stylekit.md) provides a specific class that you can
+use to style charts:
 
 * *has-background*<br/>
     When the chart control uses the *has-background* class, the rendering of the chart

diff --git a/docs/refmans/gui/viselements/generic/charts/advanced.md_template b/docs/refmans/gui/viselements/generic/charts/advanced.md_template
@@ -1,4 +1,3 @@
-
 # Advanced topics
 
 Taipy exposes advanced features that the Plotly library provides. This section
@@ -90,6 +89,185 @@ Here is what the resulting plot looks like:
     <figcaption>Unbalanced data sets</figcaption>
 </figure>
 
+## Large datasets {data-source="gui:doc/examples/charts/advanced_large_datasets.py"}
+
+When binding a chart component to a large dataset, performance becomes a critical consideration.
+Large datasets can consume significant system resources, resulting in slower rendering times,
+reduced responsiveness, and overall degraded performance, especially in interactive
+applications.<br/>
+These issues can negatively impact the user experience, particularly when users need to interact
+with or manipulate the chart.
+
+Consider a scenario where an application needs to visualize large datasets, allowing users to
+trigger calculations and make decisions based on the data.<br/>
+Below is a basic example of a large dataset represented in Python:
+```python
+x_values = ...
+Y_values = ...
+data = pd.DataFrame({"X": x_values, "Y": y_values})
+```
+
+In this example, *x_values* could be a sequence of 50,000 integers, while *y_values* is generated
+from a noisy log-sine function, also containing 50,000 samples.
+
+To represent this data using Taipy GUI, you can define a `chart` control as follows:
+!!! taipy-element
+    default={data}
+    type=markers
+    x=X
+    y=Y
+
+The initial chart display with 50,000 data points will look like this:
+<figure>
+    <img src="../advanced_large_datasets_none-d.png" class="visible-dark" />
+    <img src="../advanced_large_datasets_none-l.png" class="visible-light"/>
+    <figcaption>Initial dataset</figcaption>
+</figure>
+
+As seen in the chart, with 50,000 data points, it becomes difficult to interpret the information due
+to over-plotting. Furthermore, rendering such a large dataset takes significant time, and the entire
+dataset must be transmitted to the frontend, further affecting performance.
+
+To improve both application performance and chart readability, it is necessary to reduce the number
+of data points rendered. This can be achieved through techniques such as downsampling, aggregation,
+or filtering, which can limit the volume of data without losing critical insights
+
+<h4>Solution 1: Linear interpolation</h4>
+
+One approach to handle large datasets is to apply linear interpolation. This technique reduces the
+number of data points by approximating the values between points along a straight line, effectively
+downsampling the dataset.
+
+Using Python and the [NumPy](https://numpy.org/) package, this solution can be implemented
+easily and efficiently:
+```python linenums="1"
+x_values = x_values[::100]
+y_values = y_values.reshape(-1, 100)
+y_values = np.mean(y_values, axis=1)
+```
+
+- line 1: The original *x_values* array is reduced by selecting one value for every 100 points.
+- line 2: The *y_values* array is reshaped, grouping every 100 consecutive data points.
+- line 3: The mean of each group of 100 points is calculated, resulting in a smaller dataset.
+
+This results in a dataset with 100 times fewer points, while still preserving the overall shape of
+the original data.
+
+Here is the updated chart using the downsampled (interpolated) dataset:
+<figure>
+    <img src="../advanced_large_datasets_average-d.png" class="visible-dark" />
+    <img src="../advanced_large_datasets_average-l.png" class="visible-light"/>
+    <figcaption>Downsampled dataset</figcaption>
+</figure>
+
+While linear interpolation significantly reduces the dataset size and improves chart performance,
+it has some limitations. Specifically, it smooths out the data, which can obscure important details,
+such as sharp changes or high-frequency fluctuations.<br/>
+With this technique, the noise present in the original sine function has been smoothed away, which
+might not be desirable for data analysts or scientists who need to observe finer data
+characteristics.
+
+<h4>Solution 2: Sub-sampling</h4>
+
+Sub-sampling is a simple technique where a representative subset of the original data is selected,
+for example, by picking every 100th data point. This directly reduces the number of points,
+enhancing performance.
+
+With [NumPy](https://numpy.org/) and Python's slicing syntax, sub-sampling is straightforward:
+```python
+x_values = x_values[::100]
+y_values = y_values[::100]
+```
+Both *x_values* and *y_values* are reduced by selecting every 100th element from the original
+arrays.
+
+As a result, the dataset *data* is reduced to 1/100th of its original size, while still retaining
+the overall shape of the data.
+
+Here is the chart with the sub-sampled dataset:
+<figure>
+    <img src="../advanced_large_datasets_subsampling-d.png" class="visible-dark" />
+    <img src="../advanced_large_datasets_subsampling-l.png" class="visible-light"/>
+    <figcaption>Sub-sampled dataset</figcaption>
+</figure>
+
+However, sub-sampling has its limitations. It may skip over significant trends or abrupt changes in
+the data, especially if key points are not selected. While it's a quick and efficient solution, it
+may result in the loss of critical details in datasets with high-frequency variations or sudden
+transitions.<br/>
+As you can see, only a very few number of noisy data points remain after the sampling, which is
+expected.
+
+<h4>Solution 3: Decimation</h4>
+
+Decimation is a more refined approach to reducing dataset size while preserving essential
+information and trends. By selectively removing data points based on specific criteria, such as
+frequency content or statistical significance, decimation balances performance with data integrity.
+
+
+The `chart` control has a [*decimator*](../chart.md#p-decimator) property that accepts an instance
+of a subclass of `Decimator^`. This class transforms the dataset declared in the
+[*data*](../chart.md#p-data) property, reducing the number of points to be displayed
+while ensuring that key data features are retained.
+
+
+To use decimation, you must instantiate a decimator:
+```python
+decimator = MinMaxDecimator(200)
+```
+
+In this case, the decimator limits the displayed points to 200, which is an extreme reduction, but
+it highlights the effect. For practical usage, typical values range from 1000 to 2000, depending on
+the horizontal resolution of the screen: it's often unnecessary to render more points than a monitor
+can display.
+
+Several decimator types are available, including `MinMaxDecimator^`, `LTTB^`, `RDP^`, and
+`ScatterDecimator^`. Each of these implements different algorithms, better suited for specific
+shapes of data.
+
+To apply the decimator to the chart, set the *decimator* variable to the
+[*decimator*](../chart.md#p-decimator) property:
+!!! taipy-element
+    default={data}
+    type=markers
+    x=X
+    y=Y
+    decimator={decimator}
+
+Here is the resulting chart after applying decimation:
+<figure>
+    <img src="../advanced_large_datasets_decimation-d.png" class="visible-dark" />
+    <img src="../advanced_large_datasets_decimation-l.png" class="visible-light"/>
+    <figcaption>Decimation applied</figcaption>
+</figure>
+
+The chart retains more of the dataset's characteristics compared to simpler methods like
+sub-sampling or linear interpolation. The global sine-like shape is preserved, and the noise remains
+visible.<br/>
+At the same time, performance is greatly improved: only 200 points are rendered instead of 50,000,
+a reduction by a factor of 250.
+
+Note that decimation does not alter nor duplicate the original dataset.<br/>
+You can zoom in and out using the chart's built-in zoom tool to reveal or hide more details
+dynamically.
+
+For example, the following image shows the chart zoomed in on a specific area:
+<figure>
+    <img src="../advanced_large_datasets_decimation-zoom1-d.png" class="visible-dark" />
+    <img src="../advanced_large_datasets_decimation-zoom1-l.png" class="visible-light"/>
+    <figcaption>Selecting an area to zoom in</figcaption>
+</figure>
+
+And here is the result after zooming in:
+<figure>
+    <img src="../advanced_large_datasets_decimation-zoom2-d.png" class="visible-dark" />
+    <img src="../advanced_large_datasets_decimation-zoom2-l.png" class="visible-light"/>
+    <figcaption>Chart after zoom</figcaption>
+</figure>
+
+As you zoom in, more details are revealed, while still adhering to the 200-point limit, ensuring
+smooth performance and responsive interactions.
+
 ## Adding annotations {data-source="gui:doc/examples/charts/advanced_annotations.py"}
 
 You can add text annotations on top of a chart using the *annotations* property of

diff --git a/docs/refmans/gui/viselements/generic/charts/advanced_large_datasets_average-d.png b/docs/refmans/gui/viselements/generic/charts/advanced_large_datasets_average-d.png
diff --git a/docs/refmans/gui/viselements/generic/charts/advanced_large_datasets_average-l.png b/docs/refmans/gui/viselements/generic/charts/advanced_large_datasets_average-l.png
diff --git a/...refmans/gui/viselements/generic/charts/advanced_large_datasets_decimation-d.png b/...refmans/gui/viselements/generic/charts/advanced_large_datasets_decimation-d.png
diff --git a/...refmans/gui/viselements/generic/charts/advanced_large_datasets_decimation-l.png b/...refmans/gui/viselements/generic/charts/advanced_large_datasets_decimation-l.png
diff --git a/...s/gui/viselements/generic/charts/advanced_large_datasets_decimation-zoom1-d.png b/...s/gui/viselements/generic/charts/advanced_large_datasets_decimation-zoom1-d.png
diff --git a/...s/gui/viselements/generic/charts/advanced_large_datasets_decimation-zoom1-l.png b/...s/gui/viselements/generic/charts/advanced_large_datasets_decimation-zoom1-l.png
diff --git a/...s/gui/viselements/generic/charts/advanced_large_datasets_decimation-zoom2-d.png b/...s/gui/viselements/generic/charts/advanced_large_datasets_decimation-zoom2-d.png
diff --git a/...s/gui/viselements/generic/charts/advanced_large_datasets_decimation-zoom2-l.png b/...s/gui/viselements/generic/charts/advanced_large_datasets_decimation-zoom2-l.png
diff --git a/docs/refmans/gui/viselements/generic/charts/advanced_large_datasets_none-d.png b/docs/refmans/gui/viselements/generic/charts/advanced_large_datasets_none-d.png
diff --git a/docs/refmans/gui/viselements/generic/charts/advanced_large_datasets_none-l.png b/docs/refmans/gui/viselements/generic/charts/advanced_large_datasets_none-l.png
diff --git a/...efmans/gui/viselements/generic/charts/advanced_large_datasets_subsampling-d.png b/...efmans/gui/viselements/generic/charts/advanced_large_datasets_subsampling-d.png
diff --git a/...efmans/gui/viselements/generic/charts/advanced_large_datasets_subsampling-l.png b/...efmans/gui/viselements/generic/charts/advanced_large_datasets_subsampling-l.png
diff --git a/docs/userman/gui/pages/builder.md b/docs/userman/gui/pages/builder.md
@@ -2,16 +2,13 @@
 title: The Page Builder API
 ---
 
-The Page Builder API is a set of classes located in the
-[`taipy.gui.builder`](../../../refmans/reference/pkg_taipy/pkg_gui/pkg_builder/index.md) package
-that lets users create Taipy GUI pages entirely from Python code.
+The Page Builder API is a set of classes located in the `taipy.gui.builder^` package that lets users
+create Taipy GUI pages entirely from Python code.
 
 This package contains a class for every visual element available in Taipy, including those
 defined in [extension libraries](../extension/index.md).
 
-To access the Page Builder classes, you must import the
-[`taipy.gui.builder`](../../../refmans/reference/pkg_taipy/pkg_gui/pkg_builder/index.md) package in
-your script.
+To access the Page Builder classes, you must import the `taipy.gui.builder^` package in your script.
 
 # Generating a new page
 

diff --git a/tools/_setup_generation/step_viselements.py b/tools/_setup_generation/step_viselements.py
@@ -128,7 +128,7 @@ def __generate_toc_file(self, tocs: Dict[str, VEToc]):
                 md_file.write(md_template)
 
     @staticmethod
-    def __get_navigation_section(category: str, prefix:str) -> str:
+    def __get_navigation_section(category: str, prefix: str) -> str:
         if category == "blocks":
             return "Blocks"
         if prefix == "core_":
@@ -238,8 +238,8 @@ def __generate_element_doc(self, element_type: str, category: str):
             raise ValueError(
                 f"Couldn't locate first header in documentation for element '{element_type}'"
             )
-        before_properties = match.group(1)
-        after_properties = match.group(2) + element_documentation[match.end() :]
+        before_properties = match[1]
+        after_properties = match[2] + element_documentation[match.end() :]
 
         # Chart hook
         if element_type == "chart":
@@ -288,7 +288,6 @@ def __generate_builder_api(self) -> None:
             py_content = py_content[: m.start(0) + 1]
 
         def generate(self, category, base_class: str) -> str:
-
             element_types = self.categories[category]
 
             def build_doc(property: str, desc, indent: int):
@@ -382,16 +381,17 @@ def __init__(self, [arguments]) -> None:
                 element_md_location = (
                     "corelements" if desc["prefix"] == "core_" else "generic"
                 )
-                if m := (re.search
-                    (r"(\[`(\w+)`\]\()\2\.md\)", short_doc)):
+                if m := (re.search(r"(\[`(\w+)`\]\()\2\.md\)", short_doc)):
                     short_doc = (
                         short_doc[: m.start()]
                         + f"{m[1]}../../../../../refmans/gui/viselements/{element_md_location}/{m[2]}.md)"
                         + short_doc[m.end() :]
                     )
 
-                element_md_page = (f"[`{element_type}`](../../../../../../refmans/gui/viselements/{element_md_location}"
-                                   f"/{element_type}.md)")
+                element_md_page = (
+                    f"[`{element_type}`](../../../../../../refmans/gui/viselements/{element_md_location}"
+                    f"/{element_type}.md)"
+                )
                 buffer.write(
                     template.replace("[element_type]", element_type)
                     .replace("[element_md_page]", element_md_page)
@@ -414,7 +414,9 @@ def __init__(self, [arguments]) -> None:
 
     # Special case for charts: we want to insert the chart gallery that
     # is stored in the file whose path is in self.charts_home_html_path
-    # This should be inserted before the first level 1 header
+    # This should be inserted before the first header.
+    # Simultaneously, we build a list of chart types to point to type pages as text.
+    # This should be inserted before the "Styling" header.
     def __chart_page_hook(
         self, element_documentation: str, before: str, after: str, charts_md_dir: str
     ) -> tuple[str, str]:
@@ -432,12 +434,12 @@ def __chart_page_hook(
         chart_gallery = "\n" + chart_gallery[match.end() :]
         SECTION_RE = re.compile(r"^([\w-]+):(.*)$")
         chart_sections = ""
-        for line in match.group(1).splitlines():
+        for line in match[1].splitlines():
             if match := SECTION_RE.match(line):
                 type = match.group(1)
-                chart_sections += f"- [{match.group(2)}](charts/{type}.md)\n"
+                chart_sections += f"\n- [{match.group(2)}](charts/{type}.md)"
+                # Generate chart type documentation page from template, if possible
                 template_doc_path = f"{charts_md_dir}/{type}.md_template"
-                # Generate chart type documentation page if possible
                 if os.access(template_doc_path, os.R_OK):
                     with open(template_doc_path, "r") as template_doc_file:
                         documentation = template_doc_file.read()
@@ -454,9 +456,16 @@ def __chart_page_hook(
             raise ValueError(
                 "Couldn't locate first header1 in documentation for element 'chart'"
             )
+        styling_match = re.search(
+            r"\n# Styling\n", after, re.MULTILINE | re.DOTALL
+        )
+        if not styling_match:
+            raise ValueError(
+                "Couldn't locate \"Styling\" header1 in documentation for element 'chart'"
+            )
         return (
-            match.group(1) + chart_gallery + before[match.end() :],
-            after + chart_sections,
+            match[1] + chart_gallery + before[match.end() :],
+            after[: styling_match.start()] + chart_sections + "\n\n" + after[styling_match.start() :]
         )
 
     def __process_element_md_file(self, type: str, documentation: str) -> str: