Comments Page - Your CI pipeline isn't ready for AI

« Back Your CI pipeline isn't ready for AIblog.morgante.netSubmitted by morgante a year ago

pocketarc a year ago
> More troublingly, performance has not improved in CI at the same pace as on developer machines—it’s usually a lot slower to build our app in CI than it is to do it locally on my M1 laptop.
While some of the other comments around optimizing CI pipelines are solid, this whole thing seems to be due to having CI running on servers that are -worse- than a laptop. Isn't that wild? Servers weaker than laptops. Not even desktops or workstations. LAPTOPS.
And they are, because they're just cloud instances. And most cloud instances... are not fast.
Consider the idea that you could run your CI runner on an M1 laptop if you so choose to. Setting up a self-hosted GH Actions runner (for example) is quite straightforward. Doesn't even need to be an internet-facing machine, it can be a spare machine sitting at home/office. $600 will get you a Mac mini with an M2 CPU and super-fast SSD; everything will build faster than it ever could on any generic CI build server.
- 10000truths a year ago
  My educated speculation is that builds are particularly sensitive to I/O latency (read this file, and based on its contents, read this other file). For that kind of workload, the directly attached NVME that you get in a laptop will beat the pants off the network attached disk that you get in a budget friendly cloud instance.
  suryao a year ago
  This is absolutely true. We run fairly heavy duty CI infra and I can say with confidence that this is one of the biggest factors impacting CI runner performance, almost equal in weight to raw CPU perf.
  ToucanLoucan a year ago
  I think it's also fair to rope in how fucking bloated so much software is now. The size of many applications that do garden variety work like real time chat is fucking ridiculous. Slack at my last time I looked was just under 200 MB. TWO. HUNDRED. MEGS. To send and receive fucking instant messages and do voice/video calling.
- ToucanLoucan a year ago
  Apple Silicon truly changed the game. Like I still have a PC for many applications that a Mac just isn't suited for, but it still floors me how my little Macbook Air will utterly lap coworker's Lenovo's with i9's and double the RAM, while running without any active cooling no less and theirs are practically hovering from fans.
  The M processors truly left PC's in the fucking dust.
  high_na_euv a year ago
  Lunar Lake closed the gap with M*
- closeparen a year ago
  Most cloud instances that people actually use on a day-to-day basis underperform their laptops by a lot!
skeptrune a year ago
It's incredibly frustrating that LLM's still aren't useful for automating CI and IaC configs despite all the hype.
- firesteelrain a year ago
  Not sure what you mean. ChatGPT does a very good job at generating GitLab YAML and Terraform HCL
dan_manges a year ago
We're solving a lot of these problems with Mint: https://rwx.com/mint
Key differentiators:
* Content-based caching eliminates duplicate execution – only run what's relevant based on the changes being made
* Filters are applied before execution, ensuring that cache keys are reliable
* Steps are defined as a DAG, with machines abstracted away, for better performance, efficiency, and composition/reuse
jononor a year ago
At our company our Machine Learning train+eval pipelines run in standard Gitlab CI (in addition to all the standard backend/frontend software builds, and some IoT builds). We have some 4 small PCs at the office set up as runners for the compute intensive jobs. So that each job gets multi-core CPUs with NVME, not just vCPU and virtualized storage. Each job execution is around 8x faster than the standard Gitlab CI runners. And much cheaper than dedicated compute at the standard cloud vendors. Hetzner would be similarly cheap, but I did not want to bother with with remote management, another vendor, network etc.
mike_hearn a year ago
There are some quick wins you can do to improve CI times and reliability. I use them some of these and it does ease the pain. I have a company that develops a tool that is itself a build system that does complex and intensive builds as part of its testing process, so CI times are something I keep an eye on. These tips are mostly useful for JVM/.NET projects, I think. We use self-managed TeamCity which makes this stuff easy.
1. Preserve checkout/build directories between builds. In other words, don't do clean builds. Let your build system do incremental builds and use its dependency caches as it would when running locally. This means not running builds in Docker containers, for instance (unless you take steps to keep them running).
2. Make sure your servers run behind caching HTTP proxies so if you do need to trigger a clean build downloads are properly cached and optimized.
3. Run builds on Macs! Yes, they are now much faster than other machines so if you can afford them and your codebase is portable enough, throw them into the mix and let high priority changes run on them instead of on slower Linux VMs. Apple silicon machines are a bit too new to be reaching obsolescence, but if you do have employees who give up "old" ARM machines then turn those into CI workers.
4. Ensure all build machines have fast SSDs.
5. Use dedicated machines for build workers i.e. not cloud VMs which are often over-subscribed. Or use a cloud that's good value for money and doesn't over-subscribe VMs like Oracle's [1]. Dedicated machines in the big clouds can be expensive, but you can get cheaper smaller machines elsewhere. Or just buy hardware and wire it up yourself in an office. It's not important for build machines to be HA. You always have the option of mixing machines and adding cloud VMs too if your load suddenly increases.
6. Use a build system that understands build graphs properly (i.e. not Maven) and modularize the codebase well. Most build systems can't eliminate redundant unit testing within a module, but can do so between modules, so finer grained modules + incremental builds can reduce the number of tests that are run for a given change.
7. Be judicious about what tests are run on every change. Do you really need to run a full blown end to end test on every commit? Probably not.
Test times are definitely an area where we need some more fundamental R&D though. Integration testing is the highest value testing but it's also the type of test build systems struggle the most to optimize out, as figuring out what might have been broken by a change is too hard.
[1] Disclosure: I do some work for Oracle Labs, but I think this statement is true regardless.
rurban a year ago
My pipeline is. Github self-hosted, running on the inference server with many Nvidia GPU's. Pretty easy to setup.
jdlshore a year ago
Not really about AI, but instead a complaint about the difficulty of optimizing build pipelines.
fire_lake a year ago
Weird article. Bazel does exactly what the author wants. And it seems unrelated to AI.