> Let me rephrase this, 17% of the most popular Rust packages contain code that virtually nobody knows what it does (I can't imagine about the long tail which receives less attention).
I think this post has some good information in it, but this is essentially overstated: I look at crate discrepancies pretty often as part of reviewing dependency updates, and >90% of the time it's a single line difference (like a timestamp, hash, or some other shudder between the state of the tree at tag-time and the state at release-time). These are non-ideal from a consistency perspective, but they aren't cause for this degree of alarm -- we do know what the code does, because the discrepancies are often trivial.
Isn't the point that unless actually audited each time, the code could still be effectively anything?
Yes, but that's already the case. My point was that in practice the current discrepancies observed don't represent a complete disconnect between the ground truth (the source repo) and the package index, they tend to be minor. So describing the situation as "nobody knows what 17% of the top crates.io packages do" is an overstatement.
I think it just depends on whether or not you interpret the phrase "no one knows" neutrally or pessimistically.
Saying that there could be something there, but "no one knows" doesn't mean that there is something there. But it's still true.
If that's the case, it would be a lot simpler (and equally accurate) to say that "no one knows" what the source repo is doing, either! The median consumer of packages in any packaging ecosystem is absolutely not reading the entire source code of their dependencies, in either the ground truth or index form.
That's certainly true - and would also be true (maybe even moreso) if vendoring dependencies was widespread. Seems just as easy to hide things in a "vendored" directory that's 20x the size of the library.
Serious consideration: Claude Mythos is going to change the risk envelope of this problem.
We're still thinking in the old mindset, whereas new tools are going to change how all of this is done.
In some years dependencies will undergo various types of automated vetting - bugs (various categories), memory, performance, correctness, etc. We need to think about how to scale this problem instead. We're not ready for it.
I'm not really convinced that having a few more libraries in the standard library or decentralizing the library repository is going to change much the risks
I really like the idea of implementing the std lib separate from the language. I think that would be a huge blessing for Java, Go and others, ideally allowing faster iteration on most things given that we usually don't need a reinvention of the compiler/runtime just to make a better library.
> In a recent analysis, Adam Harvey found that among the 999 most popular crates on crates.io, around 17% contained code that do not match their code repository.
Huh, how is this possible? Is the code not pulled from the repository? Why not?
Publishing doesn't go through GitHub or another forge, it's done from the local machine. Crates can contain generated code as well.
Random question, does cargo have a way to identify if a package uses unsafe Rust code?
No, but you can use cargo-geiger[1] or siderophile[2] for that.
Rust should add a way to sandbox every dependency.
It's basically what we're already doing in our OSes (mobile at least), but now it should happen on the level of submodules.
How would that work? Rust "crates" are just a compilation unit that gets linked into the resulting binary.
Eh, the only way to secure your Rust programs it the technique not described in the article.
Vendor your dependencies. Download the source and serve it via your own repository (ex. [1]). For dependencies that you feel should be part of the "Standard Library" (i.e. crates developed by the Rust team but not included into std) don't bother to audit them. For the other sources, read the code and decide if it's safe.
I'm honestly starting to regret not starting a company like 7 years ago where all I do is read OSS code and host libraries I've audited (for a fee to the end-user of course). This was more relevant for USG type work where using code sourced from an American is materially different than code sourced from non-American.
The only thing this leads to is that you'll have hundreds of vendored dependencies, with a combined size impossible to audit yourself.
But if you somehow do manage that, then you'll soon have hundreds of outdated vendored dependencies, full of unpatched security issues.
> full of unpatched security issues
If you host your own internal crates.io mirror, I see two ways to stay on top of security issues that have been fixed upstream. Both involving the use of
cargo audit
which uses the RustSec advisory DB https://rustsec.org/Alternative A) would be to redirect the DNS for crates.io in your company internal DNS server to point at your own mirror, and to have your company servers and laptops/workstations all use your company internal DNS server only. And have the servers and laptops/workstations trust a company controlled CA certificate that issues TLS certificates for “crates.io”. Then cargo and cargo audit would work transparently assuming they use the host CA trust store when validating the TLS certificates when they connect to crates.io. The RustSec DB you use directly from upstream, not even mirroring it and hosting an internal copy. Drawback is if you accidentally leave some servers or laptops/workstations using external DNS, and connections are made to the real crates.io instead. Because then developers end up pulling in versions of deps that have not been audited by the company itself and added to the internal mirror.
Alternative B) that I see is to set up the crates host to use a DNS name under your own control. E.g. crates dot your company internal network DNS name. And then set up cargo audit to use an internally hosted copy of the advisory DB that is always automatically kept up to date but has replaced the cargo registry they are referring to to be your own cargo crates mirror registry. I think that should work. It is already very easy to set up your own crates mirror registry, cargo has excellent support built right into it for using crates registries other than or in addition to crates.io. And then you have a company policy that crates.io is never to be used and you enforce it with automatic scanning of all company repos that checks that no entries in Cargo.toml and Cargo.lock files use crates.io.
It would probably be a good idea even to have separate internal crate registries for crates that are from crates.io and crates that are internal to the company itself. To avoid any name collisions and the likes.
Regardless if going with A) or B), you’d then be able to run cargo audit and see security advisories for all your dependencies, while the dependencies themselves are downloaded from your internal mirror of crates.io crates, and where you audit every package source code before adding it in your internal mirror registry.
A large number of security issues in the supply chain are found in the weeks or months after library version bumps. Simply waiting six months to update dependency versions can skip these. It allows time to pass and for the dependency changes to receive more eyeballs.
Vendoring buys and additional layer of security.
When everyone has Claude Mythos, we can self-audit our supply chain in an automated fashion.
But it's impossible to have a buffet overflow in rust
> But it's impossible to have a buffet overflow in rust
I dunno, I can only listen to Margaritaville so many times in a row.
Coding agents should help us reduce dependencies overall. I agree Go is already best positioned as a language for this. Using random dependencies for some small feature seems archaic now.
Why not pin your packages? Andnwhy not have M of N auditors sign off on releases?