torch compile config standardization update (#3166)

* torch.compile config update * torch.compile config update * yaml test files * yaml test files * Fixed regression failure * Fixed regression failure * Fixed regression failure * Workaround for regression failure * Workaround for regression failure * Workaround for regression failure * skipping torchtext test * Update test_example_torch_compile.py * Update test_torch_compile.py * Rename toy_model.py to model.py * Update test_torch_compile.py * Update test_torch_compile.py * :Addressed review comments * Addressed review comments
pytorch · Jun 11, 2024 · d29059f · d29059f
1 parent 3d17a94
commit d29059f
Show file tree

Hide file tree

Showing 14 changed files with 198 additions and 44 deletions.
diff --git a/examples/image_classifier/resnet_18/README.md b/examples/image_classifier/resnet_18/README.md
@@ -23,7 +23,11 @@ Ex:  `cd  examples/image_classifier/resnet_18`
 In this example , we use the following config
 
 ```
-echo "pt2 : {backend: inductor, mode: reduce-overhead}" > model-config.yaml
+echo "pt2:
+  compile:
+    enable: True
+    backend: inductor
+    mode: reduce-overhead" > model-config.yaml
 ```
 
 ##### Sample commands to create a Resnet18 torch.compile model archive, register it on TorchServe and run image prediction

diff --git a/examples/pt2/README.md b/examples/pt2/README.md
@@ -16,16 +16,22 @@ pip install torchserve-nightly torch-model-archiver-nightly
 
 ## torch.compile
 
-PyTorch 2.x supports several compiler backends and you pick which one you want by passing in an optional file `model_config.yaml` during your model packaging
+PyTorch 2.x supports several compiler backends and you pick which one you want by passing in an optional file `model_config.yaml` during your model packaging. The default backend with the below minimum config is `inductor`
 
 ```yaml
-pt2: "inductor"
+pt2:
+  compile:
+    enable: True
 ```
 
-You can also pass a dictionary with compile options if you need more control over torch.compile:
+You can also pass various compile options if you need more control over torch.compile:
 
 ```yaml
-pt2 : {backend: inductor, mode: reduce-overhead}
+pt2:
+  compile:
+    enable: True
+    backend: inductor
+    mode: reduce-overhead
 ```
 
 An example of using `torch.compile` can be found [here](./torch_compile/README.md)

diff --git a/examples/pt2/torch_compile/README.md b/examples/pt2/torch_compile/README.md
@@ -19,7 +19,9 @@ Ex:  `cd  examples/pt2/torch_compile`
 In this example , we use the following config
 
 ```
-echo "pt2 : {backend: inductor, mode: reduce-overhead}" > model-config.yaml
+echo "pt2:
+  compile:
+    enable: True" > model-config.yaml
 ```
 
 ### Create model archive
@@ -76,9 +78,13 @@ After a few iterations of warmup, we see the following
 #### Measure inference time with `torch.compile`
 
 ```
-echo "pt2: {backend: inductor, mode: reduce-overhead}" > model-config.yaml && \
-echo "handler:" >> model-config.yaml && \
-echo "  profile: true" >> model-config.yaml
+echo "pt2:
+  compile:
+    enable: True
+    backend: inductor
+    mode: reduce-overhead" > model-config.yaml && \
+echo "handler:
+  profile: true" >> model-config.yaml
 ```
 
 Once the `yaml` file is updated, create the model-archive, start TorchServe and run inference using the steps shown above.

diff --git a/examples/pt2/torch_compile/model-config.yaml b/examples/pt2/torch_compile/model-config.yaml
@@ -1 +1,5 @@
-pt2 : {backend: inductor, mode: reduce-overhead}
+pt2:
+  compile:
+    enable: True
+    backend: inductor
+    mode: reduce-overhead
diff --git a/examples/pt2/torch_compile_openvino/README.md b/examples/pt2/torch_compile_openvino/README.md
@@ -36,15 +36,21 @@ In this example, we use the following config:
 ```bash
 echo "minWorkers: 1
 maxWorkers: 2
-pt2: {backend: openvino}" > model-config.yaml
+pt2:
+  compile:
+    enable: True
+    backend: openvino" > model-config.yaml
 ```
 
 If you want to measure the handler `preprocess`, `inference`, `postprocess` times, use the following config:
 
 ```bash
 echo "minWorkers: 1
 maxWorkers: 2
-pt2: {backend: openvino}
+pt2:
+  compile:
+    enable: True
+    backend: openvino
 handler:
   profile: true" > model-config.yaml
 ```
@@ -132,7 +138,11 @@ Update the model-config.yaml file to specify the Inductor backend:
 ```bash
 echo "minWorkers: 1
 maxWorkers: 2
-pt2: {backend: inductor, mode: reduce-overhead}
+pt2:
+  compile:
+    enable: True
+    backend: inductor
+    mode: reduce-overhead
 handler:
   profile: true" > model-config.yaml
 ```
@@ -153,7 +163,10 @@ Update the model-config.yaml file to specify the OpenVINO backend:
 ```bash
 echo "minWorkers: 1
 maxWorkers: 2
-pt2: {backend: openvino}
+pt2:
+  compile:
+    enable: True
+    backend: openvino
 handler:
   profile: true" > model-config.yaml
 ```

diff --git a/examples/pt2/torch_inductor_caching/README.md b/examples/pt2/torch_inductor_caching/README.md
@@ -41,7 +41,11 @@ Ex:  `cd  examples/pt2/torch_inductor_caching`
 In this example , we use the following config
 
 ```yaml
-pt2 : {backend: inductor, mode: max-autotune}
+pt2:
+  compile:
+    enable: True
+    backend: inductor
+    mode: max-autotune
 ```
 
 ### Create model archive
@@ -126,7 +130,11 @@ Ex:  `cd  examples/pt2/torch_inductor_caching`
 In this example , we use the following config
 
 ```yaml
-pt2 : {backend: inductor, mode: max-autotune}
+pt2:
+  compile:
+    enable: True
+    backend: inductor
+    mode: max-autotune
 ```
 
 ### Create model archive

diff --git a/examples/pt2/torch_inductor_caching/model-config-cache-dir.yaml b/examples/pt2/torch_inductor_caching/model-config-cache-dir.yaml
@@ -1,7 +1,11 @@
 minWorkers: 4
 maxWorkers: 4
 responseTimeout: 600
-pt2 : {backend: inductor, mode: max-autotune}
+pt2:
+  compile:
+    enable: True
+    backend: inductor
+    mode: max-autotune
 handler:
   torch_inductor_caching:
     torch_inductor_cache_dir: "/home/ubuntu/serve/examples/pt2/torch_inductor_caching/cache"
diff --git a/examples/pt2/torch_inductor_caching/model-config-fx-cache.yaml b/examples/pt2/torch_inductor_caching/model-config-fx-cache.yaml
@@ -1,7 +1,11 @@
 minWorkers: 4
 maxWorkers: 4
 responseTimeout: 600
-pt2 : {backend: inductor, mode: max-autotune}
+pt2:
+  compile:
+    enable: True
+    backend: inductor
+    mode: max-autotune
 handler:
   torch_inductor_caching:
     torch_inductor_fx_graph_cache: true
diff --git a/test/pytest/test_data/torch_compile/pt2_enable_default.yaml b/test/pytest/test_data/torch_compile/pt2_enable_default.yaml
@@ -0,0 +1,3 @@
+pt2:
+  compile:
+    enable: True
diff --git a/test/pytest/test_data/torch_compile/pt2_enable_false.yaml b/test/pytest/test_data/torch_compile/pt2_enable_false.yaml
@@ -0,0 +1,5 @@
+pt2:
+  compile:
+    enable: False
+    backend: inductor
+    mode: reduce-overhead
diff --git a/test/pytest/test_data/torch_compile/pt2_enable_true.yaml b/test/pytest/test_data/torch_compile/pt2_enable_true.yaml
@@ -0,0 +1,5 @@
+pt2:
+  compile:
+    enable: True
+    backend: inductor
+    mode: reduce-overhead
diff --git a/test/pytest/test_example_torch_compile.py b/test/pytest/test_example_torch_compile.py
@@ -1,4 +1,5 @@
 import os
+import sys
 from pathlib import Path
 
 import pytest
@@ -31,34 +32,36 @@
 EXPECTED_RESULTS = ["tabby", "tiger_cat", "Egyptian_cat", "lynx", "plastic_bag"]
 
 
-@pytest.fixture
-def custom_working_directory(tmp_path):
-    # Set the custom working directory
-    custom_dir = tmp_path / "model_dir"
-    custom_dir.mkdir()
-    os.chdir(custom_dir)
-    yield custom_dir
-    # Clean up and return to the original working directory
-    os.chdir(tmp_path)
+@pytest.fixture(scope="function")
+def chdir_example(monkeypatch):
+    # Change directory to example directory
+    monkeypatch.chdir(EXAMPLE_ROOT_DIR)
+    monkeypatch.syspath_prepend(EXAMPLE_ROOT_DIR)
+    yield
 
+    # Teardown
+    monkeypatch.undo()
 
-@pytest.mark.skipif(PT2_AVAILABLE == False, reason="torch version is < 2.0")
-@pytest.mark.skip(reason="Skipping as its causing other testcases to fail")
-def test_torch_compile_inference(monkeypatch, custom_working_directory):
-    monkeypatch.syspath_prepend(EXAMPLE_ROOT_DIR)
-    # Get the path to the custom working directory
-    model_dir = custom_working_directory
+    # Delete imported model
+    model = MODEL_FILE.split(".")[0]
+    if model in sys.modules:
+        del sys.modules[model]
 
-    try_and_handle(
-        f"wget https://download.pytorch.org/models/{MODEL_PTH_FILE} -P {model_dir}"
-    )
+
+@pytest.mark.skipif(PT2_AVAILABLE == False, reason="torch version is < 2.0")
+def test_torch_compile_inference(chdir_example):
+    # Download weights
+    if not os.path.isfile(EXAMPLE_ROOT_DIR.joinpath(MODEL_PTH_FILE)):
+        try_and_handle(
+            f"wget https://download.pytorch.org/models/{MODEL_PTH_FILE} -P {EXAMPLE_ROOT_DIR}"
+        )
 
     # Handler for Image classification
     handler = ImageClassifier()
 
     # Context definition
     ctx = MockContext(
-        model_pt_file=model_dir.joinpath(MODEL_PTH_FILE),
+        model_pt_file=MODEL_PTH_FILE,
         model_dir=EXAMPLE_ROOT_DIR.as_posix(),
         model_file=MODEL_FILE,
         model_yaml_config_file=MODEL_YAML_CFG_FILE,

diff --git a/test/pytest/test_torch_compile.py b/test/pytest/test_torch_compile.py
@@ -3,12 +3,16 @@
 import os
 import platform
 import subprocess
+import sys
 import time
 from pathlib import Path
 
 import pytest
 import torch
 from pkg_resources import packaging
+from test_data.torch_compile.compile_handler import CompileHandler
+
+from ts.torch_handler.unit_tests.test_utils.mock_context import MockContext
 
 PT_2_AVAILABLE = (
     True
@@ -20,15 +24,42 @@
 CURR_FILE_PATH = Path(__file__).parent
 TEST_DATA_DIR = os.path.join(CURR_FILE_PATH, "test_data", "torch_compile")
 
-MODEL_FILE = os.path.join(TEST_DATA_DIR, "model.py")
+MODEL = "model.py"
+MODEL_FILE = os.path.join(TEST_DATA_DIR, MODEL)
 HANDLER_FILE = os.path.join(TEST_DATA_DIR, "compile_handler.py")
 YAML_CONFIG_STR = os.path.join(TEST_DATA_DIR, "pt2.yaml")  # backend as string
 YAML_CONFIG_DICT = os.path.join(TEST_DATA_DIR, "pt2_dict.yaml")  # arbitrary kwargs dict
+YAML_CONFIG_ENABLE = os.path.join(
+    TEST_DATA_DIR, "pt2_enable_true.yaml"
+)  # arbitrary kwargs dict
+YAML_CONFIG_ENABLE_FALSE = os.path.join(
+    TEST_DATA_DIR, "pt2_enable_false.yaml"
+)  # arbitrary kwargs dict
+YAML_CONFIG_ENABLE_DEFAULT = os.path.join(
+    TEST_DATA_DIR, "pt2_enable_default.yaml"
+)  # arbitrary kwargs dict
 
 
 SERIALIZED_FILE = os.path.join(TEST_DATA_DIR, "model.pt")
 MODEL_STORE_DIR = os.path.join(TEST_DATA_DIR, "model_store")
 MODEL_NAME = "half_plus_two"
+EXPECTED_RESULT = 3.5
+
+
+@pytest.fixture(scope="function")
+def chdir_example(monkeypatch):
+    # Change directory to example directory
+    monkeypatch.chdir(TEST_DATA_DIR)
+    monkeypatch.syspath_prepend(TEST_DATA_DIR)
+    yield
+
+    # Teardown
+    monkeypatch.undo()
+
+    # Delete imported model
+    model = MODEL.split(".")[0]
+    if model in sys.modules:
+        del sys.modules[model]
 
 
 @pytest.mark.skipif(
@@ -119,7 +150,6 @@ def _response_to_tuples(response_str):
         os.environ.get("TS_RUN_IN_DOCKER", False),
         reason="Test to be run outside docker",
     )
-    @pytest.mark.skip(reason="Test failing on regression runner")
     def test_serve_inference(self):
         request_data = {"instances": [[1.0], [2.0], [3.0]]}
         request_json = json.dumps(request_data)
@@ -146,3 +176,45 @@ def test_serve_inference(self):
                 "Compiled model with backend inductor, mode reduce-overhead"
                 in model_log
             )
+
+    @pytest.mark.parametrize(
+        ("compile"), ("disabled", "enabled", "enabled_reduce_overhead")
+    )
+    def test_compile_inference_enable_options(self, chdir_example, compile):
+        # Reset dynamo
+        torch._dynamo.reset()
+
+        # Handler
+        handler = CompileHandler()
+
+        if compile == "enabled":
+            model_yaml_config_file = YAML_CONFIG_ENABLE_DEFAULT
+        elif compile == "disabled":
+            model_yaml_config_file = YAML_CONFIG_ENABLE_FALSE
+        elif compile == "enabled_reduce_overhead":
+            model_yaml_config_file = YAML_CONFIG_ENABLE
+
+        # Context definition
+        ctx = MockContext(
+            model_pt_file=SERIALIZED_FILE,
+            model_dir=TEST_DATA_DIR,
+            model_file=MODEL,
+            model_yaml_config_file=model_yaml_config_file,
+        )
+
+        torch.manual_seed(42 * 42)
+        handler.initialize(ctx)
+        handler.context = ctx
+
+        # Check that model is compiled using dynamo
+        if compile == "enabled" or compile == "enabled_reduce_overhead":
+            assert isinstance(handler.model, torch._dynamo.OptimizedModule)
+        else:
+            assert not isinstance(handler.model, torch._dynamo.OptimizedModule)
+
+        # Data for testing
+        data = {"body": {"instances": [[1.0], [2.0], [3.0]]}}
+
+        result = handler.handle([data], ctx)
+
+        assert result[0] == EXPECTED_RESULT