• cbhl 20 hours ago

    If you use the Cursor IDE: the folks that wrote it talked about their use of speculative decoding to make "Apply" faster on the Lex Friedman podcast last month.

    Here it is on YouTube, although you can also find it on Spotify and other podcast platforms:

    https://youtu.be/oFfVt3S51T4?t=1206

    • afro88 18 hours ago

      For those that prefer text: it seems they use a weaker but faster model for the "predicted output" / speculation. Pretty smart.

      • pandada8 17 hours ago

        https://fireworks.ai/blog/cursor Fireworks AI has a blog about it.

      • creativenolo 14 hours ago

        I found the OpenAI page to be more interesting https://platform.openai.com/docs/guides/latency-optimization...

        • msp26 11 hours ago

          It's incredibly well written. I can see this being very helpful for newcomers.

          As for the Predicted Outputs feature, it looks incredibly useful in a few of my pipelines. Can't wait to test it out.

        • nunez 9 hours ago

          This is like the likely() and unlikely() macros in the Linux kernel! Huge speedup if you're right; small penalty if you're not.

          • digdugdirk 3 hours ago

            Any recommendations for high level overview/learning resources about this? It seems interesting, but like most Linux internals, things get real technical, real quick.

          • undefined 19 hours ago
            [deleted]
            • user_james92 19 hours ago

              [dead]

              • 768DataSeeker 18 hours ago

                [flagged]

                • AIFounder 21 hours ago

                  [dead]