-
Notifications
You must be signed in to change notification settings - Fork 30
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Inference] Add debug mode #257
base: main
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -69,6 +69,9 @@ | |
MllmPromptInput, | ||
) | ||
from llm_on_ray.inference.utils import decide_torch_dtype | ||
from llm_on_ray.inference.logger import get_logger | ||
|
||
logger = get_logger(__name__) | ||
|
||
|
||
class HPUPredictor(Predictor): | ||
|
@@ -79,9 +82,16 @@ def __init__(self, infer_conf: InferenceConfig): | |
# decide correct torch dtype for loading HF model | ||
decide_torch_dtype(infer_conf) | ||
|
||
logger.debug(f"Print inference config: {infer_conf}") | ||
|
||
self.use_lazy_mode = not infer_conf.hpu_model_config.torch_compile | ||
self.use_hpu_graphs = infer_conf.hpu_model_config.use_hpu_graphs | ||
|
||
# optimize transformers for gaudi | ||
from optimum.habana.transformers.modeling_utils import adapt_transformers_to_gaudi | ||
|
||
adapt_transformers_to_gaudi() | ||
|
||
if infer_conf.deepspeed: | ||
# DeepSpeed is enabled, start worker group | ||
# Prepare placement group | ||
|
@@ -105,13 +115,6 @@ def __init__(self, infer_conf: InferenceConfig): | |
|
||
htcore.hpu_set_env() | ||
|
||
# Tweak transformer to optimize performance on Gaudi | ||
from optimum.habana.transformers.modeling_utils import ( | ||
adapt_transformers_to_gaudi, | ||
) | ||
|
||
adapt_transformers_to_gaudi() | ||
|
||
self.device = torch.device("hpu") | ||
model = AutoModelForCausalLM.from_pretrained( | ||
model_desc.model_id_or_path, **model_desc.config.dict() | ||
|
@@ -219,6 +222,7 @@ def generate(self, input: GenerateInput, **config) -> GenerateOutput: | |
|
||
def streaming_generate(self, prompt, streamer, **config): | ||
self._process_config(config) | ||
# Q1: Why it is handled here when using both deepspeed and hpu? | ||
if self.infer_conf.deepspeed: | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Here in hpu_predictor,py, it is a little bit confused since we have another predictor called deepspeed_predicotr. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. there is a TODO comment to consolidate these two predictors. |
||
self.deepspeed_workers[0].streaming_generate.remote(prompt, streamer, **config) | ||
for worker in self.deepspeed_workers[1:]: | ||
|
@@ -284,10 +288,6 @@ def load_model_and_tokenizer(self): | |
self.world_size = int(os.environ["WORLD_SIZE"]) | ||
self.local_rank = int(os.environ["LOCAL_RANK"]) | ||
self.device = torch.device("hpu") | ||
# optimize transformers for gaudi | ||
from optimum.habana.transformers.modeling_utils import adapt_transformers_to_gaudi | ||
|
||
adapt_transformers_to_gaudi() | ||
Comment on lines
-287
to
-290
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. If this function is not executed in every worker, will it work as expected? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Same as above, this function will be executed earlier. |
||
self.load_model() | ||
model_desc = self.infer_conf.model_description | ||
self.tokenizer = load_tokenizer(self.model, model_desc.tokenizer_name_or_path) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why move this function here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Move this function out of this if:
llm-on-ray/llm_on_ray/inference/predictors/hpu_predictor.py
Line 85 in 0b44ac4
Both with deepspeed or not will execute this function.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As I understand from the PR title, this PR is to add debug mode, why touch other code? Could you submit a separate PR to address other issues.