Comments Page - Handy – Free open source speech-to-text app

« Back Handy – Free open source speech-to-text appgithub.comSubmitted by tin7in 3 days ago

d4rkp4ttern 3 days ago
I’ve tried several, including this one, and I’ve settled on VoiceInk (local, one-time payment), and with Parakeet V3 it’s stunningly fast (near-instant) and accurate enough to talk to LLMs/code-agents, in the sense that the slight drop in accuracy relative to Whisper Turbo3 is immaterial since they can “read between the lines” anyway.
My regular cycle is to talk informally to the CLI agent and ask it to “say back to me what you understood”, and it almost always produces a nice clean and clear version. This simultaneously works as confirmation of its understanding and also as a sort of spec which likely helps keep the agent on track.
UPDATE - just tried handy with Parakeet v3, and it works really well too, so I'll use this instead of VoiceInk for a few days. I just also discovered that turning on the "debug" UI with Cmd-shift-D shows additional options like post processing and appending trailing space.
- thethimble 3 days ago
  I wish one of these models was fine tuned for programming.
  I want to be able to say things like "cd ~/projects" or "git push --force".
  netghost 3 days ago
  I'll bet you could take a relatively tiny model and get it to translate the transcribed "git force push" or "git push dash dash force" into "git push --force".
  Likewise "cd home slash projects" into "cd ~/projects".
  Maybe with some fine tuning, maybe without.
  vismit2000 2 days ago
  You can try VSCode Speech to Text extension that works decently well in Github Copilot chat as part of Microsoft accessibility suite.
blutoot 3 days ago
I have dystonia which often stiffens my arms in a way that makes it impossible for me to type on a keyboard. TTS apps like SuperWhisper have proven to be very helpful for me in such situations. I am hoping to get a similar experience out of "Handy" (very apt maming from my perspective).
I do, however, wonder if there is a way all these TTS tools can get to the next level. The generated text should not be just a verbatim copy of what I just said, but depending on the context, it should elaborate. For example, if my cursor is actively inside an editor/IDE with some code, my coding-related verbal prompts should actually generate the right/desired code in that IDE.
Perhaps this is a bit of combining TTS with computer-use.
- mritchie712 3 days ago
  I made something called `ultraplan`. It's is a CLI tool that records multi-modal context (audio transcription via local Whisper, screenshots, clipboard content, etc.) into a timeline that AI agents like Claude Code can consume.
  I have a claude skill `/record` that runs the CLI which starts a new recording. I debug, research, etc., then say "finito" (or choose your own stopword). It outputs a markdown file with your transcribed speech interleaved with screenshots and text that you copied. You can say other keywords like "marco" and it will take a screenshot hands-free.
  When the session ends, claude reads the timeline (e.g. looks at screenshots) and gets to work.
  I can clean it up and push to github if anyone would get use out of it.
  mritchie712 3 days ago
  https://github.com/definite-app/ultraplan
  heliostatic 3 days ago
  Definitely interested in that!
  mritchie712 3 days ago
  Added link above!
  wanderingmind 3 days ago
  Sounds interesting I would love to use it if you get a chance to push to github
  mritchie712 3 days ago
  https://github.com/definite-app/ultraplan
- sipjca 3 days ago
  I totally agree with you and largely what you’re describing is one of the reasons I made Handy open source. I really want to see something like this and see someone go experiment with making it happen. I did hear some people playing with using some small local models (moondream, qwen) to get some more context of the computer itself
  I initially had a ton of keyboard shortcuts in handy for myself when I had a broken finger and was in a cast. It let me play with the simplest form of this contextual thing, as shortcuts could effectively be mapped to certain apps with very clear uses cases
- eddyg 3 days ago
  There’s lots of existing work on “coding by voice” long before LLMs were a thing. For example (from 2013): http://xahlee.info/emacs/emacs/using_voice_to_code.html and the associated HN discussion (“Using Voice to Code Faster than Keyboard”): https://news.ycombinator.com/item?id=6203805
  There’s also more recent-ish research, like https://dl.acm.org/doi/fullHtml/10.1145/3571884.3597130
- hasperdi 3 days ago
  What you said is possible by feeding the output of speech-to-text tools into an LLM. You can prompt the LLM to make sense of what you're trying to achieve and create sets of actions. With a CLI it’s trivial, you can have your verbal command translated into working shell commands. With a GUI it’s slightly more complicated because the LLM agent needs to know what you see on the screen, etc.
  That CLI bit I mentioned earlier is already possible. For instance, on macOS there’s an app called MacWhisper that can send dictation output to an OpenAI‑compatible endpoint.
  sipjca 3 days ago
  Handy can post process with LLMs too! It’s just currently hidden behind a debug menu as an alpha feature (ctrl/cmd+shift+d)
  sanex 3 days ago
  I was just thinking about building something like this, looks like you beat me to the punch, I will have to try it out. I'm curious if you're able to give commands just as well as some wording you want cleaned up. I could see a model being confused between editting the command input into text to be inserted and responding to the command. Sorry if that's unclear, might be better if I just try it.
  sipjca 2 days ago
  I’d just try it and fork handy if it doesn’t work how you want :)
kuatroka 3 days ago
Love it. I had been searching for STT app for weeks. Every single app was either paid as a one off or had a monthly subscription. It felt a bit ridiculous having to pay when it’s all powered by such small models on the back end. So I decided to build my own. But then I found “Handy” and it’s been a really amazing partner for me. Super fast, super simple, doesn’t get in my way and it’s constantly updated. I just love it. Thanks a lot for making it! Thanks a lot
P.S. The post processing that you are talking about, wouldn’t it be awesome.
frankdilo 3 days ago
This looks great! What’s missing for me to switch from something like Wispr Flow is the ability to provide a dictionary for commonly mistaken words (name of your company, people, code libraries).
- tin7in 3 days ago
  It has something called "Custom Words" which might be what you are describing. Haven't tested this feature yet properly.
  frankdilo 2 days ago
  So is this already in Handy or you are referring to a feature of the underlying models you are still not actively using?
  tin7in 2 days ago
  This is already in Handy in Advanced > Custom Words.
  There is also Post Processing where you can rerun the output through an LLM and refine it, which is the closest to what Wispr Flow is doing.
  This can be found in the debug menu in the GUI (Cmd + Shift + D).
- jauntywundrkind 3 days ago
  I dig that some models have an ability to say how sure they are of words. Manually entering a bunch of special words is ok, but I want to be able to review the output and see what words the model was less sure of, so I can go find out what I might need to add.
- sipjca 3 days ago
  There’s a PR for this which will be pulled in soon enough, I can kick off a build of the PR if you want to download a pre release version
  sipjca 3 days ago
  Okay so it's more directly text replacements
  https://github.com/cjpais/Handy/actions/runs/21025848728
  There is also LLM post processing which can do this, and the built in dictionary feature
Barbing 3 days ago
Quick thoughts re: mentioned transcribers
Superwhisper — Been using it a long time. It's paid with a lifetime subscription available. Tons of features. Language models are built right in without additional charge. Solo dev is epic; may defer upgrades to avoid occasional bugs/regressions (hey, it's complex software).
Trying each for a few minutes:
Hex — Feels the leanest (& cleanest) free options mentioned for Mac in this thread.
Fluid Voice — Offers a unique feature, a real-time view of your speech as you talk! Superwhisper has this, but only with an online model. (You can't see your entire transcript in Fluid, though. The recording window view is limited to about one sentence at a time--of course you do see everything when you complete your dictation.)
Handy — Pink and cute. I like the history window. As far as clipboard handling goes, I might note that the "don't modify clipboard" setting is more of a "restore clipboard" setting. Though it doesn't need as many permissions as Hex because it's willing to move clipboard items around a bit, if I'm not mistaken.
Note Hex seems to be upset about me installing all the others... lots of restarting in between installs all around. Each has something to offer.
---
Big shout out to Nvidia open-sourcing Parakeet--all of these apps are lightning fast.
Also I'm partial to being able to stream transcriptions to the cursor into any field, or at least view live like Fluid (or superwhisper online). I know it's complex b/c models transcribe the whole file for accuracy. (I'm OK with seeing a lower quality transcript realtime and waiting a second for the higher-quality version to paste at the end.)
mncharity 3 days ago
A cautionary user experience report. The default hotkey upon download is ctrl+space. Press to begin recording, release to transcribe and insert. Key-up on the space key constitutes hotkey release. If the ctrl key is still down when the insertion lands, the transcribed text is treated as ctrl characters. The test app was emacs. (x64 linux x11, with and without xdotool)
PhilippGille 3 days ago
Has anyone compared this with https://github.com/HeroTools/open-whispr already? From the description they seem very similar.
Handy first release was June 2025, OpenWhispr a month later. Handy has ~11k GitHub stars, OpenWhispr has ~730.
- kuatroka 3 days ago
  I did have tried, but the ease of installing handy as just a macOS app is so much simpler than needing to constantly run in npm commands. I think at the time when I was checking it, which was a couple of months ago they did not have the parakeet model, which is a non-whisper model, so I had decided against it. If I remember correctly, the UI was also not the smoothest.
  Handy’s ui is so clean and minimalistic that you always know what to do or where to go. Yes, it lacks in some advanced features, but honestly, I’ve been using it for two months now and I’ve never looked back or searched for any other STT app.
  ranguna 3 days ago
  The OP asked if someone compared both, which usually means actually trying both and not just installing one and skimming through the other's README file. So, in summary, you didn't try both and didn't answer the OP.
aucisson_masque 3 days ago
It’s incredibly fast on my MacBook m1 air and more accurate that the native speech to text.
The ui is well thought out, just the right amount of setting for my usage.
Incredible !
Btw, do you know what « discharging the model » does ? It’s set to never by default, tried to check if it has an impact on ram or cpu but it doesn’t seem to do anything.
- mixtureoftakes 3 days ago
  the model is permanently loaded into ram for access speed. discharging it would unload it from ram and lead to longer start times
  sipjca 3 days ago
  It does unload it, and actually might be a good default for most people as the model loading does happen in the background as soon as you hit the key
peterldowns 3 days ago
Huge fan! Parakeet v3 works great with it. I have used Monologue, Superwhisper, and Aqua, at various times in the past. But Handy is at least as good, and it's not an expensive subscription. I love that it runs locally, too. Strongly recommend!
Jack5500 3 days ago
The Parakeet V3 model is really great!
Jayakumark 3 days ago
Its great, i have been using it . Two requests though 1. iOS app 2. API option to use against meeting transcription or route audio from Mic .
- blensor 3 days ago
  +1 on the meeting tranecription
holtwick 3 days ago
FluidVoice for macOS is pretty handy as well. Open source under Apache License. https://altic.dev/fluid https://github.com/altic-dev/FluidVoice
- jimmydoe 3 days ago
  Its vibe coded UI feels too complicated.
llarsson 3 days ago
A question because I'm not using speech-to-text, but find it intriguing (especially since it's now possible to do locally and for free).
How have your computing habits changed as a result of having this? When do you typically use this instead of typing on the keyboard?
- tin7in 3 days ago
  I use it all the time with coding agents, especially if I'm running multiple terminals. It's way faster to talk than type. The only problem is that it looks awkward if there are others around.
  johnisgood 3 days ago
  Interesting. I can think and type faster, but not talk. I am not much of a talker.
  stavros 3 days ago
  Same, whenever I try to dictate something I always umm and ahhh and go back a bunch of times, and it's faster to just type. I guess it's just a matter of practice, and I'm fine when I'm talking to other people, it's only dictation I'm having trouble with.
- noneofyour 3 days ago
  Part of my job is to give feedback to people using Word Comments. Using STT, it's been a breeze. The time saving really is great. Thing is, I only do this when working at home with no one around. So really only when WFH.
dumbmrblah 3 days ago
I just set this up today. I had Whispering app set up on my Windows computer, but it really wasn't working well on my Ubuntu computer that I just set up. I found Handy randomly. It was the last app I needed to go Linux full-time. Thank you!
unutranyholas 3 days ago
https://hex.kitlangton.com/ is good
wi5eif6E 3 days ago
This looks and works great! A settings option to keep no recording history at all would be terrific.
- sipjca 2 days ago
  It’s in the debug menu right now (ctrl/cmd+shift+d)
vladstudio 3 days ago
Use it daily. Looks and works great.
erelong 3 days ago
WhisperTux on linux worked ok, curious how Handy compares: https://github.com/cjams/whispertux
mrroryflint 3 days ago
On a M4 Macbook Air, there was enough lag to make it unusable for me. I hit the shortcut and start speaking but there was always a 1-2sec delay before it would actually start transcribing even if the icon was displayed.
- jborichevskiy 3 days ago
  Curious if you were using AirPods or other Bluetooth headphones for this?
  If so, there should be "keep microphone on" or similar setting in the config that may help with this, alternatively, I set my microphone to my MacBook mic so that my headphones aren't involved at all and there is much less latency on activation
  mrroryflint 3 days ago
  Airpods Max (is that the name?) - the big ones.
  jborichevskiy 21 hours ago
  Makes sense. If you enable the Debug menu (Shift+CMD+D), there is an option for "Always-On Microphone". Might be worth a try to remove that latency.
- kuatroka 3 days ago
  Yes, I’ve got the same situation too. I kind of learned to wait for one or two seconds before talking. I am using it with the AirPods, so maybe it’s indeed the Bluetooth thing.
- sipjca 3 days ago
  What microphone are you using?
  mrroryflint 3 days ago
  Airpods Max (is that the name?) - the big ones.
  sipjca 2 days ago
  Yeah like the other commenters mentioned, using Bluetooth devices does not work super well at the moment. Hopefully I’ll have a fix at some point. There’s just some time over bluetooth to negotiate the connection and everything, and the app doesn’t do a good job showing this at all right now
  On a Mac I definitely recommend using the internal mic even if wearing airpods
miniwark 3 days ago
Did this thing (or open-whispr) work well with other languages than english ?
- dawkins 3 days ago
  In Spanish works very well
- wi5eif6E 3 days ago
  German also works great.
walthamstow 3 days ago
Nice. I spent most of Christmas vibe coding with Google Antigravity with one hand while holding a sleeping baby in the other. MacOS built in dictation is OK, but struggles with technical language.
qprofyeh 3 days ago
As a Mac user, am I missing something? macOS has Dictation built-in, when you short press F5 it should start transcribing your spoken words into text in real time. It even does non-English languages.
- d4rkp4ttern 3 days ago
  Besides being trash as others said, there’s a trade off with real time transcription word by word - there’s no opportunity for an AI to holistically correct/clean up the transcription
  SkyPuncher 3 days ago
  But, OSX does come back and fix things.
  d4rkp4ttern 3 days ago
  You mean, after displaying each word as it is spoken, then OSX goes back and fixes what’s been displayed? I think I’ve seen it fix one or two recent words, but I guess you’re saying it could fix the entire sentence as well. I didn’t know that
  SkyPuncher 2 days ago
  Yea, I use it daily for getting my thoughts into Claude. I often see it rewriting sentences it’s confused on.
- luigi23 3 days ago
  it's trash if:
  - you're not a native speaker or have accent
  - using airpods mic
  - surroundings is noisy
  - use novel words like 'claude code'
  - mumble a bit
mnmalst 3 days ago
This is really cool. Works out of the box and I'm typing this using handy.
Is there any way to execute commands directly on Linux?
Also a feature to edit or correct already typed text would be really great.
oybng 3 days ago
On Windows this depends on webview2, which the installer attempts to download. No mention of this requirement in the readme. It's a shame this software isn't portable
chainmail2029 3 days ago
There's a slightly awkward naming overlap with an existing product.
- unwind 3 days ago
  Which one? I did a quick search but that didn't turn up anything so perhaps it's a partial word overlap or something.
  I did find the projects "user-facing" home page [1] which was nice. I found it rather hard to find a link from that to the code on GitHub, which was surprising.
  [1]: https://handy.computer/
  DomB 3 days ago
  It's the German word for smart phone / mobile phone
  zavec 3 days ago
  There's also a sex toy
  sReinwald 3 days ago
  [dead]
- ensocode 3 days ago
  This is a slightly German-centric comment.
- xfeeefeee 3 days ago
  [dead]
bn-usd-mistake 3 days ago
Does anyone have a similar mobile application that works locally and is not too expensive? Mostly looking to transcribe voice messages sent over Signal which does not offer this OOTB
- 4mitkumar 3 days ago
  I have been using this one from Futo for quite some time and love it: https://keyboard.futo.org/
  They also have a voice input only version if you still would like to keep your typing keyboard: https://voiceinput.futo.org/
- bogtap82 3 days ago
  There is one single app I've been able to find that offers Parakeet-v3 for free locally and it's called Spokenly. They have paid cloud models available as well, but the local Parakeet-v3 implementation is totally free and is the best STT has to offer these days regardless. Super fast and accurate. I consider single-user STT basically a solved problem at this point.
  kuatroka 3 days ago
  Spokenly is great too, but Handy's minimalistic and focused UI won me over.
  dumbmrblah 3 days ago
  Spokenly is my go-to app on iOS for transcription as well.
  Esus-ai a day ago
  [dead]
- nerdfax 3 days ago
  [dead]
jborichevskiy 3 days ago
Big Handy fan!
swordsith 2 days ago
from the read-me, 'Handy isn't trying to be the best speech-to-text app—it's trying to be the most forkable one.' Why cant we write a readme without using generative AI, seriously, it's not that hard. :<
skor 3 days ago
This is so handy, thank you very much. Good work!!
dotancohen 3 days ago
Looks interesting. Why does it need a GUI at all?
- tin7in 3 days ago
  As an alternative to Wisprflow, Superwhisper and so on. It works really well compared to the commercial competitors but with a local model.
- Barbing 3 days ago
  I hear a CLI request? Tons of CLI speech-to-text tools by the way, really glad to see this. Excellent competitors (Superwhisper, MacWhisper, etc.) are closed/paid.
- sipjca 3 days ago
  It doesn’t! Just makes it more accessible to more people I feel. There’s a cli version for Mac which I wrote first handy-cli
- unwind 3 days ago
  Ah, that was a typo: you meant "GPU" (Graphics Processing Unit, not "GUI" which of course is Graphical User Interface) since that is listed in the system requirements. Explained implicitly by an existing comment, thanks!
- kristianp 3 days ago
  So more people can use it?
- satvikpendem 3 days ago
  Because local AI models run well on a GPU, better than on a CPU
ekjhgkejhgk 3 days ago
Explain to me why a speech-to-text app has 50% of its code in typescript...?
- beklein 3 days ago
  Not the author/contributor, but the app is built using Tauri for easy multi-platform support, so the backend logic is implemented in Rust and the frontend UI is implemented in TypeScript. I think it’s a valid choice. GitHub does not include any model _code_ in the stats; the models will be downloaded separately the first time you use them. Hope this helps.
  I know many people hate sites like this, but I actually like them for these use cases. You can get a quick, LLM-generated overview of the architecture, e.g. here: https://codewiki.google/github.com/cjpais/handy
- sipjca 2 days ago
  Tauri
fittingopposite 3 days ago
Is there any good android app featuring parakeet v3?
- fittingopposite 2 days ago
  Went into a rabbit hole and found this: https://github.com/notune/android_transcribe_app Solid app that uses Parakeet V3. With these random apps on the internet I am always a bit sceptical. Checked it with adb and it is really fully local. I now have a voice keyboard that is a lot better than Google's and has local multilanguage support. I am stoked :)
  fittingopposite 2 days ago
  Now I can continue coding via tmux/Claude Code with the https://github.com/rberg27/doom-coding setup while going for a walk in nature.
Dnguyen 3 days ago
Would be nice if the output can be piped directly into Claude Code.
laylower 3 days ago
Is it deployed locally or does it send data to your servers?
- sipjca 3 days ago
  It’s all local
  mixtureoftakes 3 days ago
  Which model would be the best to use for mandarin? Are there any models on par with Parakeet that are just as fast but also understand Chinese?
  sipjca 2 days ago
  I believe sensevoice, I’ll hopefully be implementing it soon enough
  mixtureoftakes 3 days ago
  also is there a way to make parakeet type more naturally? less capitallization, less punctuation? can this be a setting?
  this can already be done via local llm processing the text but surely there is an easier way to do this, right
blutoot 3 days ago
Crashes on Tahoe 26.3 Betq 1 :(
- sipjca 3 days ago
  Please send me a crash log!
sirjaz 3 days ago
This is great, and I love that this is not another webapp
atay123 2 days ago
[dead]
olya_pllkh 2 days ago
[dead]