• CharlieDigital a day ago

    You can exploit this and build your own stream of streams for interesting use cases: https://chrlschn.dev/blog/2024/05/need-for-speed-llms-beyond... (screen in action: https://chrlschn.dev/img/need-for-speed/generation-example.g...)

    Most interesting is combining this with web components and having GPT directly output streams with small web components.

    • jackmpcollins a day ago

      That gif is really cool! I built a Python package magentic [0] which similarly parses the LLM streamed output and allows it to be used before it is finished being generated. There are plenty of use cases / prompts that can be refactored into a "generate list, then generate for each item" pattern to take advantage of this speedup from concurrent generation.

      [0] https://magentic.dev/streaming/#object-streaming

    • Yanael a day ago

      When you ask to return JSON data using streaming, you will notice that the response is incomplete and unparseable by JSON libraries, resulting in malformed errors. You will have to wait for the entire stream to complete.

      To solve this problem I tried to define a spec and built a lib for it:

      - [lib] https://github.com/st3w4r/openai-partial-stream/tree/main

      - [spec] https://github.com/st3w4r/openai-partial-stream/blob/main/sp...

      • vintagedave a day ago

        Very interesting. I tried to solve this problem too, and my code parses incomplete JSON allowing partial values and fully complete values to be accessed.

        Why do you wait for the entire stream to be complete? Some objects in the JSON structure can be shown to be complete before the stream ends.

        • Yanael a day ago

          Yeah, it's an interesting problem to solve. The library is designed to parse incomplete json without waiting for the stream to finish.

        • simonw a day ago

          I’ve been using the ijson Python library for that - I have notes on that here: https://til.simonwillison.net/json/ijson-stream

          • jackmpcollins a day ago

            Pydantic also have support for parsing partial JSON. https://docs.pydantic.dev/latest/concepts/json/#partial-json...

              from pydantic_core import from_json
            
              partial_json_data = '["aa", "bb", "c'  
              
              result = from_json(partial_json_data, allow_partial=True)
              print(result)  
              #> ['aa', 'bb']
            
            You can also use their `jiter` package directly if you don't otherwise use pydantic. https://github.com/pydantic/jiter/tree/main/crates/jiter-pyt...
          • Yanael a day ago

            Nice, it looks like a good library to build on top of. I like the available events: start_map, end_map, etc. I did try a library in JS that had similar ones, but it lacked the granularity to cover all use cases for individual fields instead of an entire item. I'll keep a note of this one if I do Python JSON streaming.

          • kordlessagain a day ago

            These are great. I've been working on trying to get markup working with streaming and it's a seemingly hard problem. This should help with figuring it out!

            • bschmidt1 a day ago

              Awesome, works great! Love the modes "Real-time", "progressive", etc.

              • Yanael a day ago

                Thanks! Yeah, creating an abstraction over the raw JSON and how you want to use it in your code makes it more practical.

            • vintagedave a day ago

              Here’s a bit more info on generating streams like this: https://parnassus.co/building-a-copilot-1-server-fundamental...

              I’m slowly building a copilot stack, and end up wrapping multiple layers of streaming: SSE as in this article, parsed on the fly as it streams from JSON (ie parsi g incomplete invalid JSON), parsed on the fly as it streams to extract Markdown, parsed on the fly as it streams to format that Markdown and render it. You can read about this here: https://parnassus.co/building-a-copilot-2-parsing-converting...

              • threecheese a day ago

                Interesting subject, but came here to comment that you are “doing the lord’s work” by writing an LLM tool for Delphi developers. All six of them! (i kid) Best of luck with Owl.

                • vintagedave 8 hours ago

                  Thankyou! I plan to expand it to other under-served languages. Delphi is a fun starting one :)

              • Yanael a day ago

                I have been working with streaming LLMs and Server Sent Events. It provides a very simple interface to work with, but you can feel SSE was never designed for this use case. As mentioned in the blog post:

                > Annoyingly these can't be directly consumed using the browser EventSource API because that only works for GET requests, and these APIs all use POST.

                It is not designed to send data to open a connection. You will then struggle to work with this streaming approach using frameworks and libraries that are based on the EventSource API.

                • ekojs a day ago

                  EventSource is really really limited. However, you can instead use Fetch via something like https://github.com/Azure/fetch-event-source to consume SSEs.

                  • Yanael a day ago

                    This looks very good. The Fetch API is a nice one, so leveraging it sounds perfect. Thanks for the link.

                  • undefined a day ago
                    [deleted]
                  • mmoustafa a day ago

                    OpenAI streaming has many peculiarities at production scale.

                    e.g. you will get “half-chunks” occasionally which are not parseable on their own and must be concatenated with the previous or subsequent chunk for parsing.

                    • ekojs a day ago

                      I do actually wonder if it's more efficient to use something like MessagePack instead of using JSON. It's a lot of strings so it may not matter too much I guess.

                      • brrrrrm a day ago

                        should really be titled streaming output, as full duplex streaming isn't mentioned at all. that'd be necessary for things low latency things like speech etc.

                        • simonw a day ago

                          Do you know of any public APIs from LLM vendors that do that?

                          As far as I know the ChatGPT voice chat API isn’t public.

                        • bionhoward a day ago

                          Looks like a ton of wasted data on extraneous fields

                          • undefined a day ago
                            [deleted]