Comments Page - Instant AI Response

« Back Instant AI Responsechatjimmy.aiSubmitted by hochmartinez a day ago

personalcompute a day ago
This is a demo of Taalas inference ASIC hardware. Prior discussion @ https://news.ycombinator.com/item?id=47086181
pella a day ago
- https://news.ycombinator.com/item?id=47086181
- https://taalas.com/the-path-to-ubiquitous-ai/
- https://www.nextplatform.com/2026/02/19/taalas-etches-ai-mod...
undefined a day ago
[deleted]
nacs a day ago
What model and hardware powers this?
Is this a Google T5 based model?
- pella a day ago
  3bit hard-wired Llama 3.1 8B ( https://taalas.com/the-path-to-ubiquitous-ai/ )
  cyansmoker a day ago
  3bit is a bit ridiculous. From that page I am unclear if the current model is 3 or 4bit. If it’s 4bit… well, NVIDIA showed that a well organized model can perform almost as well as 8bit.
alansaber a day ago
I love seeing optimised SLM inference. Is there a current use-case for this? Edge CNNs make sense to me but not edge SLMs (yet).
Kuyawa a day ago
If this is possible, why not all online AI engines work like this?
- yomismoaqui a day ago
  This is an specific model (Llama 3.1 8B) baked in hardware form. You can only use this model but get "low" power consumption and crazy speed.
  If you want to run a different model you need new hardware for that new model.
  sixtyj a day ago
  It is really a crazy speed. 15k tokens/second.
  sixtyj a day ago
  I have tried it again. This is the future of chat UI, imho.
  Generated in 0,074s • 15 754 tok/s
OutOfHere a day ago
Impressive, but this particular underlying LLM is objectively weak. I'd like to see it done with a larger and newer better model.
notronic a day ago
imagine a model like opus 4.6 at that speed, that would be insane