Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Streaming messages not compliant with openAI spec #340

Closed
Lederstrumpf opened this issue May 21, 2023 · 0 comments · Fixed by #341
Closed

Streaming messages not compliant with openAI spec #340

Lederstrumpf opened this issue May 21, 2023 · 0 comments · Fixed by #341
Labels
bug Something isn't working

Comments

@Lederstrumpf
Copy link
Contributor

LocalAI version:
ed5df1e

Describe the bug
Some mismatches between localAI's and openAI's streaming messages:

  1. when streaming, openAI does not send the role key with every data message, but instead only sends the role in an initial delta message that lacks any content, which all follow in lean data messages. localAI streams currently send the role key with every delta message. Aside from not being compatible, this is also inefficient over the wire.
  2. when streaming, openAI terminates the stream not only with a ..., choices: [... finish_reason: stop] message, but also a separate message containing only "[DONE]". localAI streams currently lack this.

These break integration with tools which trigger explicitly on these expected aspects of the openAI spec, such as org-ai. For 2., see https://platform.openai.com/docs/api-reference/chat/create#chat/create-stream, and for 1., see https://github.com/openai/openai-cookbook/blob/main/examples/How_to_stream_completions.ipynb (I couldn't find this in the spec itself, only the examples).
There are some other differences (lack of id & created keys, and localAI superfluously sends consecutive data events as named events) that haven't lead to any practical issues in my testing.

To Reproduce
An example response from localAI's streaming API:

HTTP/1.1 200 OK
Date: Sun, 21 May 2023 11:01:33 GMT
Content-Type: text/event-stream
Vary: Origin
Access-Control-Allow-Origin: *
Cache-Control: no-cache
Connection: keep-alive
Transfer-Encoding: chunked

event: data

data: {"object":"chat.completion.chunk","model":"ggml-gpt4all-j","choices":[{"delta":{"role":"assistant","content":"1"}}],"usage":{"prompt_tokens":0,"completion_tokens":0,"total_tokens":0}}


event: data

data: {"object":"chat.completion.chunk","model":"ggml-gpt4all-j","choices":[{"delta":{"role":"assistant","content":" 2"}}],"usage":{"prompt_tokens":0,"completion_tokens":0,"total_tokens":0}}


event: data

data: {"object":"chat.completion.chunk","model":"ggml-gpt4all-j","choices":[{"delta":{"role":"assistant","content":" 3"}}],"usage":{"prompt_tokens":0,"completion_tokens":0,"total_tokens":0}}


event: data

data: {"model":"ggml-gpt4all-j","choices":[{"finish_reason":"stop"}],"usage":{"prompt_tokens":0,"completion_tokens":0,"total_tokens":0}}

Expected behavior
An example response from openAI's v1 streaming API:

HTTP/1.1 200 OK
Date: Sun, 21 May 2023 10:44:49 GMT
Content-Type: text/event-stream
Transfer-Encoding: chunked
Connection: keep-alive
access-control-allow-origin: *
Cache-Control: no-cache, must-revalidate
openai-model: gpt-4-0314
openai-organization: user-blablabla
openai-processing-ms: 1210
openai-version: 2020-10-01
strict-transport-security: max-age=15724800; includeSubDomains
x-ratelimit-limit-requests: 200
x-ratelimit-limit-tokens: 40000
x-ratelimit-remaining-requests: 199
x-ratelimit-remaining-tokens: 39963
x-ratelimit-reset-requests: 300ms
x-ratelimit-reset-tokens: 55ms
x-request-id: blablabla
CF-Cache-Status: DYNAMIC
Server: cloudflare
CF-RAY: blablabla
alt-svc: h3=":443"; ma=86400, h3-29=":443"; ma=86400

data: {"id":"chatcmpl-blablabla","object":"chat.completion.chunk","created":1684665888,"model":"gpt-4-0314","choices":[{"delta":{"role":"assistant"},"index":0,"finish_reason":null}]}

data: {"id":"chatcmpl-blablabla","object":"chat.completion.chunk","created":1684665888,"model":"gpt-4-0314","choices":[{"delta":{"content":"1"},"index":0,"finish_reason":null}]}

data: {"id":"chatcmpl-blablabla","object":"chat.completion.chunk","created":1684665888,"model":"gpt-4-0314","choices":[{"delta":{"content":" "},"index":0,"finish_reason":null}]}

data: {"id":"chatcmpl-blablabla","object":"chat.completion.chunk","created":1684665888,"model":"gpt-4-0314","choices":[{"delta":{"content":"2"},"index":0,"finish_reason":null}]}

data: {"id":"chatcmpl-blablabla","object":"chat.completion.chunk","created":1684665888,"model":"gpt-4-0314","choices":[{"delta":{"content":" "},"index":0,"finish_reason":null}]}

data: {"id":"chatcmpl-blablabla","object":"chat.completion.chunk","created":1684665888,"model":"gpt-4-0314","choices":[{"delta":{"content":"3"},"index":0,"finish_reason":null}]}

data: {"id":"chatcmpl-blablabla","object":"chat.completion.chunk","created":1684665888,"model":"gpt-4-0314","choices":[{"delta":{},"index":0,"finish_reason":"stop"}]}

data: [DONE]
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
2 participants