You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When running the llama-2-7b-chat-hf model with openai api for gsm8k(Mathematical Ability Test), it needs to set temperature=0.0
But I get unexpected error like
lm_eval --model local-chat-completions --model_args model=llama-2-7b-chat-hf,base_url=http://localhost:8000/v1 --task gsm8k
2024-03-12:16:09:56,344 INFO [main.py:225] Verbosity set to INFO
2024-03-12:16:09:56,344 INFO [init.py:373] lm_eval.tasks.initialize_tasks() is deprecated and no longer necessary. It will be removed in v0.4.2 release. TaskManager will instead be used.
2024-03-12:16:10:01,070 INFO [main.py:311] Selected Tasks: ['gsm8k']
2024-03-12:16:10:01,070 INFO [main.py:312] Loading selected tasks...
2024-03-12:16:10:01,075 INFO [evaluator.py:129] Setting random seed to 0 | Setting numpy seed to 1234 | Setting torch manual seed to 1234
2024-03-12:16:10:01,419 INFO [evaluator.py:190] get_task_dict has been updated to accept an optional argument, task_managerRead more here:https://github.com/EleutherAI/lm-evaluation-harness/blob/main/docs/interface.md#external-library-usage
2024-03-12:16:10:17,655 INFO [task.py:395] Building contexts for gsm8k on rank 0...
100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1319/1319 [00:06<00:00, 192.73it/s]
2024-03-12:16:10:24,524 INFO [evaluator.py:357] Running generate_until requests
0%| 2024-03-12:16:11:08,170 INFO [_client.py:1026] HTTP Request: POST http://localhost:8000/v1/chat/completions "HTTP/1.1 500 Internal Server Error"
2024-03-12:16:11:08,171 INFO [_base_client.py:952] Retrying request to /chat/completions in 0.788895 seconds
2024-03-12:16:11:09,010 INFO [_client.py:1026] HTTP Request: POST http://localhost:8000/v1/chat/completions "HTTP/1.1 500 Internal Server Error"
2024-03-12:16:11:09,011 INFO [_base_client.py:952] Retrying request to /chat/completions in 1.621023 seconds
2024-03-12:16:11:10,683 INFO [_client.py:1026] HTTP Request: POST http://localhost:8000/v1/chat/completions "HTTP/1.1 500 Internal Server Error"
Traceback (most recent call last):
File "/home/yutianchen/Project/lm-evaluation-harness/lm_eval/models/utils.py", line 333, in wrapper
return func(*args, **kwargs)
File "/home/yutianchen/Project/lm-evaluation-harness/lm_eval/models/openai_completions.py", line 75, in completion
return client.chat.completions.create(**kwargs)
File "/home/yutianchen/anaconda3/envs/llm-eval/lib/python3.9/site-packages/openai/_utils/_utils.py", line 303, in wrapper
return func(*args, **kwargs)
File "/home/yutianchen/anaconda3/envs/llm-eval/lib/python3.9/site-packages/openai/resources/chat/completions.py", line 598, in create
return self._post(
File "/home/yutianchen/anaconda3/envs/llm-eval/lib/python3.9/site-packages/openai/_base_client.py", line 1088, in post
return cast(ResponseT, self.request(cast_to, opts, stream=stream, stream_cls=stream_cls))
File "/home/yutianchen/anaconda3/envs/llm-eval/lib/python3.9/site-packages/openai/_base_client.py", line 853, in request
return self._request(
File "/home/yutianchen/anaconda3/envs/llm-eval/lib/python3.9/site-packages/openai/_base_client.py", line 916, in _request
return self._retry_request(
File "/home/yutianchen/anaconda3/envs/llm-eval/lib/python3.9/site-packages/openai/_base_client.py", line 958, in _retry_request
return self._request(
File "/home/yutianchen/anaconda3/envs/llm-eval/lib/python3.9/site-packages/openai/_base_client.py", line 916, in _request
return self._retry_request(
File "/home/yutianchen/anaconda3/envs/llm-eval/lib/python3.9/site-packages/openai/_base_client.py", line 958, in _retry_request
return self._request(
File "/home/yutianchen/anaconda3/envs/llm-eval/lib/python3.9/site-packages/openai/_base_client.py", line 930, in _request
raise self._make_status_error_from_response(err.response) from None
openai.InternalServerError: Error code: 500 - {'generated_text': None, 'num_input_tokens': None, 'num_input_tokens_batch': None, 'num_generated_tokens': None, 'num_generated_tokens_batch': None, 'preprocessing_time': None, 'generation_time': None, 'timestamp': 1710259870.67887, 'finish_reason': None, 'error': {'object': 'error', 'message': 'Internal Server Error', 'internal_message': 'Internal Server Error', 'type': 'InternalServerError', 'param': {}, 'code': 500}}
The error is similar when testing temperature=0.0 using llm-on-ray query_openai_sdk.py --model_name llama-2-7b-chat-hf --temperature 0.0
python examples/inference/api_server_openai/query_openai_sdk.py --model_name llama-2-7b-chat-hf --temperature 0.0
Traceback (most recent call last):
File "/home/yutianchen/Project/latest_lib/llm-on-ray/examples/inference/api_server_openai/query_openai_sdk.py", line 98, in
for i in chunk_chat():
File "/home/yutianchen/Project/latest_lib/llm-on-ray/examples/inference/api_server_openai/query_openai_sdk.py", line 75, in chunk_chat
output = client.chat.completions.create(
File "/home/yutianchen/anaconda3/envs/llm-on-ray/lib/python3.9/site-packages/openai/_utils/_utils.py", line 275, in wrapper
return func(*args, **kwargs)
File "/home/yutianchen/anaconda3/envs/llm-on-ray/lib/python3.9/site-packages/openai/resources/chat/completions.py", line 663, in create
return self._post(
File "/home/yutianchen/anaconda3/envs/llm-on-ray/lib/python3.9/site-packages/openai/_base_client.py", line 1200, in post
return cast(ResponseT, self.request(cast_to, opts, stream=stream, stream_cls=stream_cls))
File "/home/yutianchen/anaconda3/envs/llm-on-ray/lib/python3.9/site-packages/openai/_base_client.py", line 889, in request
return self._request(
File "/home/yutianchen/anaconda3/envs/llm-on-ray/lib/python3.9/site-packages/openai/_base_client.py", line 965, in _request
return self._retry_request(
File "/home/yutianchen/anaconda3/envs/llm-on-ray/lib/python3.9/site-packages/openai/_base_client.py", line 1013, in _retry_request
return self._request(
File "/home/yutianchen/anaconda3/envs/llm-on-ray/lib/python3.9/site-packages/openai/_base_client.py", line 965, in _request
return self._retry_request(
File "/home/yutianchen/anaconda3/envs/llm-on-ray/lib/python3.9/site-packages/openai/_base_client.py", line 1013, in _retry_request
return self._request(
File "/home/yutianchen/anaconda3/envs/llm-on-ray/lib/python3.9/site-packages/openai/_base_client.py", line 980, in _request
raise self._make_status_error_from_response(err.response) from None
openai.InternalServerError: Error code: 500 - {'generated_text': None, 'num_input_tokens': None, 'num_input_tokens_batch': None, 'num_generated_tokens': None, 'num_generated_tokens_batch': None, 'preprocessing_time': None, 'generation_time': None, 'timestamp': 1710260245.4014304, 'finish_reason': None, 'error': {'object': 'error', 'message': 'Internal Server Error', 'internal_message': 'Internal Server Error', 'type': 'InternalServerError', 'param': {}, 'code': 500}}
But openai api’s client.chat.completions.create(**kwargs)
It does not support the do_sample parameter, and there is no suitable args to solve the problem of temperature=0.0
The text was updated successfully, but these errors were encountered:
When running the llama-2-7b-chat-hf model with openai api for gsm8k(Mathematical Ability Test), it needs to set temperature=0.0
But I get unexpected error like
The error is similar when testing temperature=0.0 using llm-on-ray query_openai_sdk.py --model_name llama-2-7b-chat-hf --temperature 0.0
Both LLaMA https://github.com/facebookresearch/llama/issues/687 and Transformers https://github.com/huggingface/transformers/pull/25722 officials suggest to ”set do_sample = False in case temperature = 0“
But openai api’s
client.chat.completions.create(**kwargs)
It does not support the do_sample parameter, and there is no suitable args to solve the problem of temperature=0.0
The text was updated successfully, but these errors were encountered: