The faster this train runs, the bigger the bang once it hits the wall that isn't going anywhere.
We could have accomplished a lot of good with those resources.
Tbf LLMs are pretty incredible accessibility tools.
Speech to text using whisper is almost perfect.
I once worked with someone who wasn’t fluent in English, but could read it really well. He had whisper running during meetings because he could read at the speed it translated, but couldn’t keep up with our casual speech.
Honestly, we didn’t even notice he was using speech to text to keep up with us for a few weeks. We only noticed because of screen sharing.
I imagine that would be a big social boon for deaf people as well.
All that to say, it’s not like there was _no_ good done with all that money.
> Speech to text using whisper is almost perfect.
This isn't true. On benchmarks whisper is not SOTA. It is said to be noise resistant but it doesn't compare well with Conformer based architectures ever on Librispeech mixed. Definitely not perfect, and it doesn't work for medical transcription.
If anything other than a small minority of people needed medical transcripts, sure! But for the remaining 99% of use cases, a fast and easy to deploy model is what's most useful.
Its not fast without pre-segmentation as they do in WhisperX. It actually has terrible transcription speed. For speedup we have to use Ctranslate2 kernels. The decoding code is also a mess where its hard to plug your own custom language model. Not to mention streaming ASR requires even more tweaks. Whisper Small is very fast and quite inaccurate. If you deploy whisper on a GPU which costs around dollar per hr, you really to ensure that the cost savings are worth it.
Although all of this is from a production lens. For personal use, honestly nothing is as easy to use as Whisper (even works on a laptop).
Not a shadow of what good could have been done.
Any day now you'll barely notice he isn't even there anymore; which seems to be the end goal.
It's not like this mad rush into la-la-land doesn't have negative consequences for society.
It seems the answer in this article is reasoning models are expensive and they are becoming the norm.
Reasoning/chain of thought seem like diminishing returns to me and I worry they are a bit of a dead end/local optimum. Reasoning models call the language models tens of times so it will be tens of times less efficient than the underlying language model but the quality is not tens of times better. It also feels finicky to me. The bump from Chat GPT-3 to Chat GPT-4 was a reasonable positive shift across the board. The reasoning models produce answers with a different vibe, maybe better overall but worse at some tasks better at others. I can use O1 at no additional cost so I do use it fairly often but I often consciously opt for 4o either because I prefer the results of the quality boost from O1 isn't worth the wait.
I would estimate that the people spending the money have not run out yet.
Some of them can't really run out. Monopoly profit margins guarantee them cash flow that they can reinvest into AI.
Because everyone is too afraid to be the first to throw in the towel
the "throwing in the towel" that you see out of the market is some of the bigger earlier players agreeing to get essentially acquihired back into Big Tech. Biggest one that comes to mind: Inflection AI.
My guess is that the people with enough money to invest into AI are too busy stroking their... egos to realize that it's a bubble. They're betting on this because they want to replace all of their human workers while keeping all the profits. Just talking about it is enough to give a stiffy to these people. And as we all know horniness clouds our judgement.
Paywalled. I'm assuming some variation on the Sunk Cost Fallacy cognitive bias.