Comments Page - ZML - High performance AI inference stack

« Back ZML - High performance AI inference stackgithub.comSubmitted by msoad 10 months ago

ismailmaj 10 months ago
What would be the benefit of using ZML instead of relying on StableHLO/PJRT? Because the cost of porting models is for sure high.
- gwenzek 10 months ago
  ZML the zig library is mostly a wrapper of StableHLO/pjrt. But it's a high quality wrapper, and the tagged tensor syntax is really helpful to write complex ops like dot, or gather.
  And ZML the framework also resolve issues with the complex dependency graph of stablehlo/pjrt.
onurcel 10 months ago
First of all, great job! I think the inference will become more and more important.
That being said, I have a question regarding the ease of use. How difficult it is for someone with python/c++ background to get used to zig and (re)write a model to use with zml?
- gwenzek 10 months ago
  Hi co-author here. Zig is way simpler than C++. Simple like in an afternoon I was able to onboard in the language and rewrote the core meat of a C++ algorithm and see speed gains (fastBPE for reference).
  Coming from Python, the hardest part is learning memory management. What helps with ZML is that the model code is mostly meta programming, so we can be a bit flexible there.
  We have a high level API, that should feel familiar to Pytorch user (as myself), but improves in a few ways
- steeve 10 months ago
  pretty easy, usually the hardest part is figuring out what the python code is doing
hsjdhdvsk 10 months ago
Hi ya! Want to say this looks awesome :) really interested in the sharded inference demo!!! You said it was experimental, is it in the examples folder at all?? (On phone atm, so apologies for not investigating further)
Palmik 10 months ago
Given that the focus is performance, do you have any benchmarks to compare against the likes of TensoRT-LLM.
- gwenzek 10 months ago
  It' s a bit early to compare directly to TensorRT because we don't have a full-blown equivalent.
  Note that our focus is being platform agnostic, easy to deploy/integrate, good performance all-around, and ease of tweaking. We are using the same compiler than Jax, so our performances are on par. But generally we believe we can gain on overall "tok/s/$" by having shorter startup time, choosing the most efficient hardware available, and easily implementing new tricks like multi-token prediction.
- koe123 10 months ago
  I second this, it would help to justify the time investment into a framework if its clear how it stacks up!
montyanderson 10 months ago
my dreams have come true. hardware-agnostic ml primitives in a typed, compiled language.
my only question is: is zig stable enough to base such a project on?
- gwenzek 10 months ago
  Zig has been relatively stable for the past few years for the main Zig code. What has changed the most is the `build.zig` build system (which we aren't using).
  We are also looking ahead at Zig roadmap, and trying to anticipate upcoming breaking changes, and isolate our users from that.
- dartos 10 months ago
  Stable as in unchanging, no.
  Stable as in reliable enough, I’d say so.