diff --git a/docs/usage_guide/optimization_notes/index.rst b/docs/usage_guide/optimization_notes/index.rst index 0dcbe5a25d7..dd9ed22352e 100644 --- a/docs/usage_guide/optimization_notes/index.rst +++ b/docs/usage_guide/optimization_notes/index.rst @@ -314,6 +314,36 @@ Copy-pastable example, showing how mixing pandas and Modin DataFrames in a singl # Possible output: TypeError +Execute dataframe operations using pandas NativeQueryCompiler +"""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""" +By default, Modin distributes every dataframe across partitions and performs operations +using the PandasQueryCompiler. However, for certain scenarios such as handling small or empty dataframes, +distributing them may introduce unnecessary overhead. In such cases, it's more efficient to default +to Pandas at the query compiler level. This can be achieved by setting the cfg.NativeDataframeMode +:doc:`configuration variable: to "Pandas". When enabled, all operations in Modin default to Pandas, and the dataframes are not distributed, +avoiding additional overhead. This configuration can be toggled on or off depending on whether +dataframe distribution is required. + +Dataframes created while the NativeDataframeMode is active will continue to use the NativeQueryCompiler +even after the config is disabled. Modin supports interoperability between distributed Modin dataframes and + those using the NativeQueryCompiler. + +.. code-block:: python + + import modin.pandas as pd + import modin.config as cfg + + # This dataframe will be distributed and use `PandasQueryCompiler` by default + df_distributed = pd.DataFrame(...) + + # Set mode to "Pandas" to avoid distribution and use `NativeQueryCompiler` + cfg.NativeDataframeMode.put("Pandas") + df_native_qc = pd.DataFrame(...) + + # Revert to default settings for distributed dataframes + cfg.NativeDataframeMode.put("Default") + df_distributed = pd.DataFrame(...) + Operation-specific optimizations """"""""""""""""""""""""""""""""