• d4rkp4ttern 3 days ago

    I’ve tried several, including this one, and I’ve settled on VoiceInk (local, one-time payment), and with Parakeet V3 it’s stunningly fast (near-instant) and accurate enough to talk to LLMs/code-agents, in the sense that the slight drop in accuracy relative to Whisper Turbo3 is immaterial since they can “read between the lines” anyway.

    My regular cycle is to talk informally to the CLI agent and ask it to “say back to me what you understood”, and it almost always produces a nice clean and clear version. This simultaneously works as confirmation of its understanding and also as a sort of spec which likely helps keep the agent on track.

    UPDATE - just tried handy with Parakeet v3, and it works really well too, so I'll use this instead of VoiceInk for a few days. I just also discovered that turning on the "debug" UI with Cmd-shift-D shows additional options like post processing and appending trailing space.

    • thethimble 3 days ago

      I wish one of these models was fine tuned for programming.

      I want to be able to say things like "cd ~/projects" or "git push --force".

      • netghost 3 days ago

        I'll bet you could take a relatively tiny model and get it to translate the transcribed "git force push" or "git push dash dash force" into "git push --force".

        Likewise "cd home slash projects" into "cd ~/projects".

        Maybe with some fine tuning, maybe without.

        • vismit2000 2 days ago

          You can try VSCode Speech to Text extension that works decently well in Github Copilot chat as part of Microsoft accessibility suite.

      • blutoot 3 days ago

        I have dystonia which often stiffens my arms in a way that makes it impossible for me to type on a keyboard. TTS apps like SuperWhisper have proven to be very helpful for me in such situations. I am hoping to get a similar experience out of "Handy" (very apt maming from my perspective).

        I do, however, wonder if there is a way all these TTS tools can get to the next level. The generated text should not be just a verbatim copy of what I just said, but depending on the context, it should elaborate. For example, if my cursor is actively inside an editor/IDE with some code, my coding-related verbal prompts should actually generate the right/desired code in that IDE.

        Perhaps this is a bit of combining TTS with computer-use.

        • mritchie712 3 days ago

          I made something called `ultraplan`. It's is a CLI tool that records multi-modal context (audio transcription via local Whisper, screenshots, clipboard content, etc.) into a timeline that AI agents like Claude Code can consume.

          I have a claude skill `/record` that runs the CLI which starts a new recording. I debug, research, etc., then say "finito" (or choose your own stopword). It outputs a markdown file with your transcribed speech interleaved with screenshots and text that you copied. You can say other keywords like "marco" and it will take a screenshot hands-free.

          When the session ends, claude reads the timeline (e.g. looks at screenshots) and gets to work.

          I can clean it up and push to github if anyone would get use out of it.

        • sipjca 3 days ago

          I totally agree with you and largely what you’re describing is one of the reasons I made Handy open source. I really want to see something like this and see someone go experiment with making it happen. I did hear some people playing with using some small local models (moondream, qwen) to get some more context of the computer itself

          I initially had a ton of keyboard shortcuts in handy for myself when I had a broken finger and was in a cast. It let me play with the simplest form of this contextual thing, as shortcuts could effectively be mapped to certain apps with very clear uses cases

          • eddyg 3 days ago

            There’s lots of existing work on “coding by voice” long before LLMs were a thing. For example (from 2013): http://xahlee.info/emacs/emacs/using_voice_to_code.html and the associated HN discussion (“Using Voice to Code Faster than Keyboard”): https://news.ycombinator.com/item?id=6203805

            There’s also more recent-ish research, like https://dl.acm.org/doi/fullHtml/10.1145/3571884.3597130

            • hasperdi 3 days ago

              What you said is possible by feeding the output of speech-to-text tools into an LLM. You can prompt the LLM to make sense of what you're trying to achieve and create sets of actions. With a CLI it’s trivial, you can have your verbal command translated into working shell commands. With a GUI it’s slightly more complicated because the LLM agent needs to know what you see on the screen, etc.

              That CLI bit I mentioned earlier is already possible. For instance, on macOS there’s an app called MacWhisper that can send dictation output to an OpenAI‑compatible endpoint.

              • sipjca 3 days ago

                Handy can post process with LLMs too! It’s just currently hidden behind a debug menu as an alpha feature (ctrl/cmd+shift+d)

                • sanex 3 days ago

                  I was just thinking about building something like this, looks like you beat me to the punch, I will have to try it out. I'm curious if you're able to give commands just as well as some wording you want cleaned up. I could see a model being confused between editting the command input into text to be inserted and responding to the command. Sorry if that's unclear, might be better if I just try it.

                  • sipjca 2 days ago

                    I’d just try it and fork handy if it doesn’t work how you want :)

            • kuatroka 3 days ago

              Love it. I had been searching for STT app for weeks. Every single app was either paid as a one off or had a monthly subscription. It felt a bit ridiculous having to pay when it’s all powered by such small models on the back end. So I decided to build my own. But then I found “Handy” and it’s been a really amazing partner for me. Super fast, super simple, doesn’t get in my way and it’s constantly updated. I just love it. Thanks a lot for making it! Thanks a lot

              P.S. The post processing that you are talking about, wouldn’t it be awesome.

              • frankdilo 3 days ago

                This looks great! What’s missing for me to switch from something like Wispr Flow is the ability to provide a dictionary for commonly mistaken words (name of your company, people, code libraries).

                • tin7in 3 days ago

                  It has something called "Custom Words" which might be what you are describing. Haven't tested this feature yet properly.

                  • frankdilo 2 days ago

                    So is this already in Handy or you are referring to a feature of the underlying models you are still not actively using?

                    • tin7in 2 days ago

                      This is already in Handy in Advanced > Custom Words.

                      There is also Post Processing where you can rerun the output through an LLM and refine it, which is the closest to what Wispr Flow is doing.

                      This can be found in the debug menu in the GUI (Cmd + Shift + D).

                  • jauntywundrkind 3 days ago

                    I dig that some models have an ability to say how sure they are of words. Manually entering a bunch of special words is ok, but I want to be able to review the output and see what words the model was less sure of, so I can go find out what I might need to add.

                    • sipjca 3 days ago

                      There’s a PR for this which will be pulled in soon enough, I can kick off a build of the PR if you want to download a pre release version

                  • Barbing 3 days ago

                    Quick thoughts re: mentioned transcribers

                    Superwhisper — Been using it a long time. It's paid with a lifetime subscription available. Tons of features. Language models are built right in without additional charge. Solo dev is epic; may defer upgrades to avoid occasional bugs/regressions (hey, it's complex software).

                    Trying each for a few minutes:

                    Hex — Feels the leanest (& cleanest) free options mentioned for Mac in this thread.

                    Fluid Voice — Offers a unique feature, a real-time view of your speech as you talk! Superwhisper has this, but only with an online model. (You can't see your entire transcript in Fluid, though. The recording window view is limited to about one sentence at a time--of course you do see everything when you complete your dictation.)

                    Handy — Pink and cute. I like the history window. As far as clipboard handling goes, I might note that the "don't modify clipboard" setting is more of a "restore clipboard" setting. Though it doesn't need as many permissions as Hex because it's willing to move clipboard items around a bit, if I'm not mistaken.

                    Note Hex seems to be upset about me installing all the others... lots of restarting in between installs all around. Each has something to offer.

                    ---

                    Big shout out to Nvidia open-sourcing Parakeet--all of these apps are lightning fast.

                    Also I'm partial to being able to stream transcriptions to the cursor into any field, or at least view live like Fluid (or superwhisper online). I know it's complex b/c models transcribe the whole file for accuracy. (I'm OK with seeing a lower quality transcript realtime and waiting a second for the higher-quality version to paste at the end.)

                    • mncharity 3 days ago

                      A cautionary user experience report. The default hotkey upon download is ctrl+space. Press to begin recording, release to transcribe and insert. Key-up on the space key constitutes hotkey release. If the ctrl key is still down when the insertion lands, the transcribed text is treated as ctrl characters. The test app was emacs. (x64 linux x11, with and without xdotool)

                      • PhilippGille 3 days ago

                        Has anyone compared this with https://github.com/HeroTools/open-whispr already? From the description they seem very similar.

                        Handy first release was June 2025, OpenWhispr a month later. Handy has ~11k GitHub stars, OpenWhispr has ~730.

                        • kuatroka 3 days ago

                          I did have tried, but the ease of installing handy as just a macOS app is so much simpler than needing to constantly run in npm commands. I think at the time when I was checking it, which was a couple of months ago they did not have the parakeet model, which is a non-whisper model, so I had decided against it. If I remember correctly, the UI was also not the smoothest.

                          Handy’s ui is so clean and minimalistic that you always know what to do or where to go. Yes, it lacks in some advanced features, but honestly, I’ve been using it for two months now and I’ve never looked back or searched for any other STT app.

                          • ranguna 3 days ago

                            The OP asked if someone compared both, which usually means actually trying both and not just installing one and skimming through the other's README file. So, in summary, you didn't try both and didn't answer the OP.

                        • aucisson_masque 3 days ago

                          It’s incredibly fast on my MacBook m1 air and more accurate that the native speech to text.

                          The ui is well thought out, just the right amount of setting for my usage.

                          Incredible !

                          Btw, do you know what « discharging the model » does ? It’s set to never by default, tried to check if it has an impact on ram or cpu but it doesn’t seem to do anything.

                          • mixtureoftakes 3 days ago

                            the model is permanently loaded into ram for access speed. discharging it would unload it from ram and lead to longer start times

                            • sipjca 3 days ago

                              It does unload it, and actually might be a good default for most people as the model loading does happen in the background as soon as you hit the key

                          • peterldowns 3 days ago

                            Huge fan! Parakeet v3 works great with it. I have used Monologue, Superwhisper, and Aqua, at various times in the past. But Handy is at least as good, and it's not an expensive subscription. I love that it runs locally, too. Strongly recommend!

                            • Jack5500 3 days ago

                              The Parakeet V3 model is really great!

                              • Jayakumark 3 days ago

                                Its great, i have been using it . Two requests though 1. iOS app 2. API option to use against meeting transcription or route audio from Mic .

                                • blensor 3 days ago

                                  +1 on the meeting tranecription

                                • holtwick 3 days ago

                                  FluidVoice for macOS is pretty handy as well. Open source under Apache License. https://altic.dev/fluid https://github.com/altic-dev/FluidVoice

                                  • jimmydoe 3 days ago

                                    Its vibe coded UI feels too complicated.

                                  • llarsson 3 days ago

                                    A question because I'm not using speech-to-text, but find it intriguing (especially since it's now possible to do locally and for free).

                                    How have your computing habits changed as a result of having this? When do you typically use this instead of typing on the keyboard?

                                    • tin7in 3 days ago

                                      I use it all the time with coding agents, especially if I'm running multiple terminals. It's way faster to talk than type. The only problem is that it looks awkward if there are others around.

                                      • johnisgood 3 days ago

                                        Interesting. I can think and type faster, but not talk. I am not much of a talker.

                                        • stavros 3 days ago

                                          Same, whenever I try to dictate something I always umm and ahhh and go back a bunch of times, and it's faster to just type. I guess it's just a matter of practice, and I'm fine when I'm talking to other people, it's only dictation I'm having trouble with.

                                      • noneofyour 3 days ago

                                        Part of my job is to give feedback to people using Word Comments. Using STT, it's been a breeze. The time saving really is great. Thing is, I only do this when working at home with no one around. So really only when WFH.

                                      • dumbmrblah 3 days ago

                                        I just set this up today. I had Whispering app set up on my Windows computer, but it really wasn't working well on my Ubuntu computer that I just set up. I found Handy randomly. It was the last app I needed to go Linux full-time. Thank you!

                                        • unutranyholas 3 days ago
                                          • wi5eif6E 3 days ago

                                            This looks and works great! A settings option to keep no recording history at all would be terrific.

                                            • sipjca 2 days ago

                                              It’s in the debug menu right now (ctrl/cmd+shift+d)

                                            • vladstudio 3 days ago

                                              Use it daily. Looks and works great.

                                              • erelong 3 days ago

                                                WhisperTux on linux worked ok, curious how Handy compares: https://github.com/cjams/whispertux

                                                • mrroryflint 3 days ago

                                                  On a M4 Macbook Air, there was enough lag to make it unusable for me. I hit the shortcut and start speaking but there was always a 1-2sec delay before it would actually start transcribing even if the icon was displayed.

                                                  • jborichevskiy 3 days ago

                                                    Curious if you were using AirPods or other Bluetooth headphones for this?

                                                    If so, there should be "keep microphone on" or similar setting in the config that may help with this, alternatively, I set my microphone to my MacBook mic so that my headphones aren't involved at all and there is much less latency on activation

                                                    • mrroryflint 3 days ago

                                                      Airpods Max (is that the name?) - the big ones.

                                                      • jborichevskiy 21 hours ago

                                                        Makes sense. If you enable the Debug menu (Shift+CMD+D), there is an option for "Always-On Microphone". Might be worth a try to remove that latency.

                                                    • kuatroka 3 days ago

                                                      Yes, I’ve got the same situation too. I kind of learned to wait for one or two seconds before talking. I am using it with the AirPods, so maybe it’s indeed the Bluetooth thing.

                                                      • sipjca 3 days ago

                                                        What microphone are you using?

                                                        • mrroryflint 3 days ago

                                                          Airpods Max (is that the name?) - the big ones.

                                                          • sipjca 2 days ago

                                                            Yeah like the other commenters mentioned, using Bluetooth devices does not work super well at the moment. Hopefully I’ll have a fix at some point. There’s just some time over bluetooth to negotiate the connection and everything, and the app doesn’t do a good job showing this at all right now

                                                            On a Mac I definitely recommend using the internal mic even if wearing airpods

                                                      • miniwark 3 days ago

                                                        Did this thing (or open-whispr) work well with other languages than english ?

                                                        • dawkins 3 days ago

                                                          In Spanish works very well

                                                          • wi5eif6E 3 days ago

                                                            German also works great.

                                                          • walthamstow 3 days ago

                                                            Nice. I spent most of Christmas vibe coding with Google Antigravity with one hand while holding a sleeping baby in the other. MacOS built in dictation is OK, but struggles with technical language.

                                                            • qprofyeh 3 days ago

                                                              As a Mac user, am I missing something? macOS has Dictation built-in, when you short press F5 it should start transcribing your spoken words into text in real time. It even does non-English languages.

                                                              • d4rkp4ttern 3 days ago

                                                                Besides being trash as others said, there’s a trade off with real time transcription word by word - there’s no opportunity for an AI to holistically correct/clean up the transcription

                                                                • SkyPuncher 3 days ago

                                                                  But, OSX does come back and fix things.

                                                                  • d4rkp4ttern 3 days ago

                                                                    You mean, after displaying each word as it is spoken, then OSX goes back and fixes what’s been displayed? I think I’ve seen it fix one or two recent words, but I guess you’re saying it could fix the entire sentence as well. I didn’t know that

                                                                    • SkyPuncher 2 days ago

                                                                      Yea, I use it daily for getting my thoughts into Claude. I often see it rewriting sentences it’s confused on.

                                                                • luigi23 3 days ago

                                                                  it's trash if:

                                                                  - you're not a native speaker or have accent

                                                                  - using airpods mic

                                                                  - surroundings is noisy

                                                                  - use novel words like 'claude code'

                                                                  - mumble a bit

                                                                • mnmalst 3 days ago

                                                                  This is really cool. Works out of the box and I'm typing this using handy.

                                                                  Is there any way to execute commands directly on Linux?

                                                                  Also a feature to edit or correct already typed text would be really great.

                                                                  • oybng 3 days ago

                                                                    On Windows this depends on webview2, which the installer attempts to download. No mention of this requirement in the readme. It's a shame this software isn't portable

                                                                    • chainmail2029 3 days ago

                                                                      There's a slightly awkward naming overlap with an existing product.

                                                                      • unwind 3 days ago

                                                                        Which one? I did a quick search but that didn't turn up anything so perhaps it's a partial word overlap or something.

                                                                        I did find the projects "user-facing" home page [1] which was nice. I found it rather hard to find a link from that to the code on GitHub, which was surprising.

                                                                        [1]: https://handy.computer/

                                                                        • DomB 3 days ago

                                                                          It's the German word for smart phone / mobile phone

                                                                          • zavec 3 days ago

                                                                            There's also a sex toy

                                                                            • sReinwald 3 days ago

                                                                              [dead]

                                                                            • ensocode 3 days ago

                                                                              This is a slightly German-centric comment.

                                                                              • xfeeefeee 3 days ago

                                                                                [dead]

                                                                              • bn-usd-mistake 3 days ago

                                                                                Does anyone have a similar mobile application that works locally and is not too expensive? Mostly looking to transcribe voice messages sent over Signal which does not offer this OOTB

                                                                                • 4mitkumar 3 days ago

                                                                                  I have been using this one from Futo for quite some time and love it: https://keyboard.futo.org/

                                                                                  They also have a voice input only version if you still would like to keep your typing keyboard: https://voiceinput.futo.org/

                                                                                  • bogtap82 3 days ago

                                                                                    There is one single app I've been able to find that offers Parakeet-v3 for free locally and it's called Spokenly. They have paid cloud models available as well, but the local Parakeet-v3 implementation is totally free and is the best STT has to offer these days regardless. Super fast and accurate. I consider single-user STT basically a solved problem at this point.

                                                                                    • kuatroka 3 days ago

                                                                                      Spokenly is great too, but Handy's minimalistic and focused UI won me over.

                                                                                      • dumbmrblah 3 days ago

                                                                                        Spokenly is my go-to app on iOS for transcription as well.

                                                                                        • Esus-ai a day ago

                                                                                          [dead]

                                                                                        • nerdfax 3 days ago

                                                                                          [dead]

                                                                                        • jborichevskiy 3 days ago

                                                                                          Big Handy fan!

                                                                                          • swordsith 2 days ago

                                                                                            from the read-me, 'Handy isn't trying to be the best speech-to-text app—it's trying to be the most forkable one.' Why cant we write a readme without using generative AI, seriously, it's not that hard. :<

                                                                                            • skor 3 days ago

                                                                                              This is so handy, thank you very much. Good work!!

                                                                                              • dotancohen 3 days ago

                                                                                                Looks interesting. Why does it need a GUI at all?

                                                                                                • tin7in 3 days ago

                                                                                                  As an alternative to Wisprflow, Superwhisper and so on. It works really well compared to the commercial competitors but with a local model.

                                                                                                  • Barbing 3 days ago

                                                                                                    I hear a CLI request? Tons of CLI speech-to-text tools by the way, really glad to see this. Excellent competitors (Superwhisper, MacWhisper, etc.) are closed/paid.

                                                                                                    • sipjca 3 days ago

                                                                                                      It doesn’t! Just makes it more accessible to more people I feel. There’s a cli version for Mac which I wrote first handy-cli

                                                                                                      • unwind 3 days ago

                                                                                                        Ah, that was a typo: you meant "GPU" (Graphics Processing Unit, not "GUI" which of course is Graphical User Interface) since that is listed in the system requirements. Explained implicitly by an existing comment, thanks!

                                                                                                        • kristianp 3 days ago

                                                                                                          So more people can use it?

                                                                                                          • satvikpendem 3 days ago

                                                                                                            Because local AI models run well on a GPU, better than on a CPU

                                                                                                          • ekjhgkejhgk 3 days ago

                                                                                                            Explain to me why a speech-to-text app has 50% of its code in typescript...?

                                                                                                            • beklein 3 days ago

                                                                                                              Not the author/contributor, but the app is built using Tauri for easy multi-platform support, so the backend logic is implemented in Rust and the frontend UI is implemented in TypeScript. I think it’s a valid choice. GitHub does not include any model _code_ in the stats; the models will be downloaded separately the first time you use them. Hope this helps.

                                                                                                              I know many people hate sites like this, but I actually like them for these use cases. You can get a quick, LLM-generated overview of the architecture, e.g. here: https://codewiki.google/github.com/cjpais/handy

                                                                                                              • sipjca 2 days ago

                                                                                                                Tauri

                                                                                                              • fittingopposite 3 days ago

                                                                                                                Is there any good android app featuring parakeet v3?

                                                                                                                • fittingopposite 2 days ago

                                                                                                                  Went into a rabbit hole and found this: https://github.com/notune/android_transcribe_app Solid app that uses Parakeet V3. With these random apps on the internet I am always a bit sceptical. Checked it with adb and it is really fully local. I now have a voice keyboard that is a lot better than Google's and has local multilanguage support. I am stoked :)

                                                                                                              • Dnguyen 3 days ago

                                                                                                                Would be nice if the output can be piped directly into Claude Code.

                                                                                                                • laylower 3 days ago

                                                                                                                  Is it deployed locally or does it send data to your servers?

                                                                                                                  • sipjca 3 days ago

                                                                                                                    It’s all local

                                                                                                                    • mixtureoftakes 3 days ago

                                                                                                                      Which model would be the best to use for mandarin? Are there any models on par with Parakeet that are just as fast but also understand Chinese?

                                                                                                                      • sipjca 2 days ago

                                                                                                                        I believe sensevoice, I’ll hopefully be implementing it soon enough

                                                                                                                        • mixtureoftakes 3 days ago

                                                                                                                          also is there a way to make parakeet type more naturally? less capitallization, less punctuation? can this be a setting?

                                                                                                                          this can already be done via local llm processing the text but surely there is an easier way to do this, right

                                                                                                                    • blutoot 3 days ago

                                                                                                                      Crashes on Tahoe 26.3 Betq 1 :(

                                                                                                                      • sipjca 3 days ago

                                                                                                                        Please send me a crash log!

                                                                                                                      • sirjaz 3 days ago

                                                                                                                        This is great, and I love that this is not another webapp

                                                                                                                        • atay123 2 days ago

                                                                                                                          [dead]

                                                                                                                          • olya_pllkh 2 days ago

                                                                                                                            [dead]