Comments Page - LlamaF: An Efficient Llama2 Architecture Accelerator on Embedded FPGAs

« Back LlamaF: An Efficient Llama2 Architecture Accelerator on Embedded FPGAsarxiv.orgSubmitted by PaulHoule 7 hours ago

fhdsgbbcaA 4 hours ago
Looks like LLM inference will follow the same path as Bitcoin: CPU -> GPU -> FPGA -> ASIC.
- bee_rider an hour ago
  LLM inference is a small task built into some other program you are running, right? Like an office suite with some sentence suggestion feature, probably a good use for an LLM, would be… mostly office suite, with a little LLM inference sprinkled in.
  So, the “ASIC” here is probably the CPU with, like, slightly better vector extensions. AVX1024-FP16 or something, haha.
- hackernudes 3 hours ago
  I really doubt it. Bitcoin mining is quite fixed, just massive amounts of SHA256. On the other hand, ASICs for accelerating matrix/tensor math are already around. LLM architecture is far from fixed and currently being figured out. I don't see an ASIC any time soon unless someone REALLY wants to put a specific model on a phone or something.
  YetAnotherNick 3 hours ago
  Google's TPU is an ASIC and performs competitively. Also Tesla and Meta is building something AFAIK.
  Although I doubt you could get lot better as GPUs already have half the die area reserved for matrix multiplication.
  danielmarkbruce an hour ago
  It depends on your precise definition of ASIC. The FPGA thing here would be analogous to an MSIC where m = model.
  It's clearly different to build a chip for a specific model than what a TPU is.
  Maybe we'll start seeing MSICs soon.
jsheard 4 hours ago
Is there any particular reason you'd want to use an FPGA for this? Unless your problem space is highly dynamic (e.g. prototyping) or you're making products in vanishing low quantities for a price insensitive market (e.g. military) an ASIC is always going to be better.
There doesn't seem to be much flux in the low level architectures used for inferencing at this point, so may as well commit to an ASIC, as is already happening with Apple, Qualcomm, etc building NPUs into their SOCs.
- israrkhan 3 hours ago
  You can open-source your FPGA designs for wider collaboration with the community? wider collaboration. Also, FPGA is the starting step to make any modern digital chip.
- PaulHoule 4 hours ago
  (1) Academics could make an FPGA but not an ASIC, (2) FPGA is a first step to make an ASIC
- wongarsu 3 hours ago
  This specific project looks like a case of "we have this platform for automotive and industrial use, running Llama on the dual-core ARM CPU is slow but there's an FPGA right next to it". That's all the justification you really need for a university project.
  Not sure how useful this is for anyone who isn't already locked into this specific architecture. But it might be a useful benchmark or jumping-off-point for more useful FPGA-based accelerators, like ones optimized for 1 bit or 1.58 bit LLMs
- danielmarkbruce an hour ago
  Model architecture changes fast. Maybe it will slow down.
- someguydave 4 hours ago
  gotta prototype the thing somewhere. If it turns out that the LLM algos become pretty mature I suspect accelerators of all kinds will be baked into silicon, especially for inference.
  jsheard 4 hours ago
  That's the thing though, we're already there. Every new consumer ARM and x86 ASIC is shipping with some kind of NPU, the time for tentatively testing the waters with FPGAs was a few years ago before this stuff came to market.
  PaulHoule 4 hours ago
  But the NPU might be poorly designed for your model or workload or just poorly designed.
  mistrial9 3 hours ago
  like this? https://www.d-matrix.ai/product/
KeplerBoy 4 hours ago
4 times as efficient as on the SoC's low end arm cores, soo many times less efficient than on modern GPUs I guess?
Not that I was expecting GPU like efficiency from a fairly small scale FPGA project. Nvidia engineers spent thousands of man-years making sure that stuff works well on GPUs.