Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
It is cherry-pick of #728
Resolves issue due to release of triton v3.2.0 (January 23rd, 2025). This is a workaround. A proper fix to support triton v3.2.0 may be required.
Error when triton v3.2.0 is used is shown below.
Traceback (most recent call last):
File "/workspace/vllm/test_evaluation.py", line 15, in
from vllm import LLM, SamplingParams
File "/workspace/vllm/vllm/init.py", line 7, in
from vllm.engine.arg_utils import AsyncEngineArgs, EngineArgs
File "/workspace/vllm/vllm/engine/arg_utils.py", line 11, in
from vllm.config import (CacheConfig, ConfigFormat, DecodingConfig,
File "/workspace/vllm/vllm/config.py", line 16, in
from vllm.model_executor.layers.quantization import QUANTIZATION_METHODS
File "/workspace/vllm/vllm/model_executor/layers/quantization/init.py", line 6, in
from vllm.model_executor.layers.quantization.awq_marlin import AWQMarlinConfig
File "/workspace/vllm/vllm/model_executor/layers/quantization/awq_marlin.py", line 6, in
import vllm.model_executor.layers.fused_moe # noqa
File "/workspace/vllm/vllm/model_executor/layers/fused_moe/init.py", line 34, in
import vllm.model_executor.layers.fused_moe.fused_marlin_moe # noqa
File "/workspace/vllm/vllm/model_executor/layers/fused_moe/fused_marlin_moe.py", line 8, in
from vllm.model_executor.layers.fused_moe.fused_moe import (
File "/workspace/vllm/vllm/model_executor/layers/fused_moe/fused_moe.py", line 18, in
from vllm_hpu_extension.ops import scaled_fp8_quant
File "/usr/local/lib/python3.10/dist-packages/vllm_hpu_extension/ops.py", line 9, in
import habana_frameworks.torch as htorch
File "/usr/local/lib/python3.10/dist-packages/habana_frameworks/torch/init.py", line 54, in
import habana_frameworks.torch.core
File "/usr/local/lib/python3.10/dist-packages/habana_frameworks/torch/core/init.py", line 114, in
import_compilers()
File "/usr/local/lib/python3.10/dist-packages/habana_frameworks/torch/dynamo/compile_backend/backends.py", line 39, in import_compilers
from .compilers import hpu_inference_compiler, hpu_training_compiler_bw, hpu_training_compiler_fw
File "/usr/local/lib/python3.10/dist-packages/habana_frameworks/torch/dynamo/compile_backend/compilers.py", line 27, in
from .freezing_passes import freeze
File "/usr/local/lib/python3.10/dist-packages/habana_frameworks/torch/dynamo/compile_backend/freezing_passes.py", line 28, in
from torch._inductor.freezing import discard_traced_gm_params, invalidate_eager_modules, replace_params_with_constants
File "/usr/local/lib/python3.10/dist-packages/torch/_inductor/freezing.py", line 15, in
from torch._inductor.fx_passes.freezing_patterns import freezing_passes
File "/usr/local/lib/python3.10/dist-packages/torch/_inductor/fx_passes/freezing_patterns.py", line 5, in
from torch._inductor.compile_fx import fake_tensor_prop
File "/usr/local/lib/python3.10/dist-packages/torch/_inductor/compile_fx.py", line 49, in
from torch._inductor.debug import save_args_for_compile_fx_inner
File "/usr/local/lib/python3.10/dist-packages/torch/_inductor/debug.py", line 26, in
from . import config, ir # noqa: F811, this is needed
File "/usr/local/lib/python3.10/dist-packages/torch/_inductor/ir.py", line 77, in
from .runtime.hints import ReductionHint
File "/usr/local/lib/python3.10/dist-packages/torch/_inductor/runtime/hints.py", line 36, in
attr_desc_fields = {f.name for f in fields(AttrsDescriptor)}
File "/usr/lib/python3.10/dataclasses.py", line 1198, in fields
raise TypeError('must be called with a dataclass type or instance') from None
TypeError: must be called with a dataclass type or instance