You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Environment, CPU architecture, OS, and Version: OS: 5.10.0-28-amd64 #1 SMP Debian 5.10.209-2 (2024-01-31) x86_64 GNU/Linux ENV: Docker version 26.0.1, build d260a54 HW: i9-10900F, RTX3080, 128GB RAM
Describe the bug
When using the Endpoint 'v1/chat/completions' with the max_tokens parameter set to a specific value, the completion may be cut off, but the finish_reason remains stop instead of changing to length, making it difficult to determine if the answer is complete.
Additionally, when not using the max_tokens property, the response may still be cut off, but the finish_reason remains 'stop'.
To Reproduce
Send a request to the v1/chat/completions endpoint with the max_tokens property set to a specific value (e.g., 20).
Observe the response.
Expected behavior
When the max_tokens property is set, the response should clearly indicate if the completion is complete or not. If the completion is cut off, the finish_reason should be length instead of stop.
The text was updated successfully, but these errors were encountered:
LocalAI version:
v2.20.1
a9c521eb41dc2dd63769e5362f05d9ab5d8bec50
Environment, CPU architecture, OS, and Version:
OS: 5.10.0-28-amd64 #1 SMP Debian 5.10.209-2 (2024-01-31) x86_64 GNU/Linux
ENV: Docker version 26.0.1, build d260a54
HW: i9-10900F, RTX3080, 128GB RAM
Describe the bug
When using the Endpoint 'v1/chat/completions' with the
max_tokens
parameter set to a specific value, the completion may be cut off, but thefinish_reason
remainsstop
instead of changing tolength
, making it difficult to determine if the answer is complete.Additionally, when not using the
max_tokens
property, the response may still be cut off, but thefinish_reason
remains 'stop'.To Reproduce
v1/chat/completions
endpoint with themax_tokens
property set to a specific value (e.g., 20).Expected behavior
When the
max_tokens
property is set, the response should clearly indicate if the completion is complete or not. If the completion is cut off, thefinish_reason
should belength
instead ofstop
.The text was updated successfully, but these errors were encountered: