Streaming messages not compliant with openAI spec #340

Lederstrumpf · 2023-05-21T12:23:34Z

LocalAI version:
ed5df1e

Describe the bug
Some mismatches between localAI's and openAI's streaming messages:

when streaming, openAI does not send the role key with every data message, but instead only sends the role in an initial delta message that lacks any content, which all follow in lean data messages. localAI streams currently send the role key with every delta message. Aside from not being compatible, this is also inefficient over the wire.
when streaming, openAI terminates the stream not only with a ..., choices: [... finish_reason: stop] message, but also a separate message containing only "[DONE]". localAI streams currently lack this.

These break integration with tools which trigger explicitly on these expected aspects of the openAI spec, such as org-ai. For 2., see https://platform.openai.com/docs/api-reference/chat/create#chat/create-stream, and for 1., see https://github.com/openai/openai-cookbook/blob/main/examples/How_to_stream_completions.ipynb (I couldn't find this in the spec itself, only the examples).
There are some other differences (lack of id & created keys, and localAI superfluously sends consecutive data events as named events) that haven't lead to any practical issues in my testing.

To Reproduce
An example response from localAI's streaming API:

HTTP/1.1 200 OK
Date: Sun, 21 May 2023 11:01:33 GMT
Content-Type: text/event-stream
Vary: Origin
Access-Control-Allow-Origin: *
Cache-Control: no-cache
Connection: keep-alive
Transfer-Encoding: chunked

event: data

data: {"object":"chat.completion.chunk","model":"ggml-gpt4all-j","choices":[{"delta":{"role":"assistant","content":"1"}}],"usage":{"prompt_tokens":0,"completion_tokens":0,"total_tokens":0}}


event: data

data: {"object":"chat.completion.chunk","model":"ggml-gpt4all-j","choices":[{"delta":{"role":"assistant","content":" 2"}}],"usage":{"prompt_tokens":0,"completion_tokens":0,"total_tokens":0}}


event: data

data: {"object":"chat.completion.chunk","model":"ggml-gpt4all-j","choices":[{"delta":{"role":"assistant","content":" 3"}}],"usage":{"prompt_tokens":0,"completion_tokens":0,"total_tokens":0}}


event: data

data: {"model":"ggml-gpt4all-j","choices":[{"finish_reason":"stop"}],"usage":{"prompt_tokens":0,"completion_tokens":0,"total_tokens":0}}

Expected behavior
An example response from openAI's v1 streaming API:

HTTP/1.1 200 OK
Date: Sun, 21 May 2023 10:44:49 GMT
Content-Type: text/event-stream
Transfer-Encoding: chunked
Connection: keep-alive
access-control-allow-origin: *
Cache-Control: no-cache, must-revalidate
openai-model: gpt-4-0314
openai-organization: user-blablabla
openai-processing-ms: 1210
openai-version: 2020-10-01
strict-transport-security: max-age=15724800; includeSubDomains
x-ratelimit-limit-requests: 200
x-ratelimit-limit-tokens: 40000
x-ratelimit-remaining-requests: 199
x-ratelimit-remaining-tokens: 39963
x-ratelimit-reset-requests: 300ms
x-ratelimit-reset-tokens: 55ms
x-request-id: blablabla
CF-Cache-Status: DYNAMIC
Server: cloudflare
CF-RAY: blablabla
alt-svc: h3=":443"; ma=86400, h3-29=":443"; ma=86400

data: {"id":"chatcmpl-blablabla","object":"chat.completion.chunk","created":1684665888,"model":"gpt-4-0314","choices":[{"delta":{"role":"assistant"},"index":0,"finish_reason":null}]}

data: {"id":"chatcmpl-blablabla","object":"chat.completion.chunk","created":1684665888,"model":"gpt-4-0314","choices":[{"delta":{"content":"1"},"index":0,"finish_reason":null}]}

data: {"id":"chatcmpl-blablabla","object":"chat.completion.chunk","created":1684665888,"model":"gpt-4-0314","choices":[{"delta":{"content":" "},"index":0,"finish_reason":null}]}

data: {"id":"chatcmpl-blablabla","object":"chat.completion.chunk","created":1684665888,"model":"gpt-4-0314","choices":[{"delta":{"content":"2"},"index":0,"finish_reason":null}]}

data: {"id":"chatcmpl-blablabla","object":"chat.completion.chunk","created":1684665888,"model":"gpt-4-0314","choices":[{"delta":{"content":" "},"index":0,"finish_reason":null}]}

data: {"id":"chatcmpl-blablabla","object":"chat.completion.chunk","created":1684665888,"model":"gpt-4-0314","choices":[{"delta":{"content":"3"},"index":0,"finish_reason":null}]}

data: {"id":"chatcmpl-blablabla","object":"chat.completion.chunk","created":1684665888,"model":"gpt-4-0314","choices":[{"delta":{},"index":0,"finish_reason":"stop"}]}

data: [DONE]

The text was updated successfully, but these errors were encountered:

mudler added the bug Something isn't working label May 21, 2023

Lederstrumpf mentioned this issue May 21, 2023

fix: spec compliant instantiation and termination of streams #341

Merged

1 task

mudler closed this as completed in #341 May 21, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Streaming messages not compliant with openAI spec #340

Streaming messages not compliant with openAI spec #340

Lederstrumpf commented May 21, 2023

Streaming messages not compliant with openAI spec #340

Streaming messages not compliant with openAI spec #340

Comments

Lederstrumpf commented May 21, 2023