Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Confusing finish_reason when using max_tokens property in 'v1/chat/completions' endpoint #3533

Open
daJuels opened this issue Sep 10, 2024 · 0 comments
Labels
bug Something isn't working confirmed

Comments

@daJuels
Copy link

daJuels commented Sep 10, 2024

LocalAI version:

v2.20.1 a9c521eb41dc2dd63769e5362f05d9ab5d8bec50

Environment, CPU architecture, OS, and Version:
OS: 5.10.0-28-amd64 #1 SMP Debian 5.10.209-2 (2024-01-31) x86_64 GNU/Linux
ENV: Docker version 26.0.1, build d260a54
HW: i9-10900F, RTX3080, 128GB RAM

Describe the bug
When using the Endpoint 'v1/chat/completions' with the max_tokens parameter set to a specific value, the completion may be cut off, but the finish_reason remains stop instead of changing to length, making it difficult to determine if the answer is complete.

Additionally, when not using the max_tokens property, the response may still be cut off, but the finish_reason remains 'stop'.

To Reproduce

  1. Send a request to the v1/chat/completions endpoint with the max_tokens property set to a specific value (e.g., 20).
  2. Observe the response.

Expected behavior
When the max_tokens property is set, the response should clearly indicate if the completion is complete or not. If the completion is cut off, the finish_reason should be length instead of stop.

@daJuels daJuels added bug Something isn't working unconfirmed labels Sep 10, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working confirmed
2 participants