Very annoying - the apparent author of the backdoor was in communication with me over several weeks trying to get xz 5.6.x added to Fedora 40 & 41 because of it's "great new features". We even worked with him to fix the valgrind issue (which it turns out now was caused by the backdoor he had added). We had to race last night to fix the problem after an inadvertent break of the embargo.
He has been part of the xz project for 2 years, adding all sorts of binary test files, and to be honest with this level of sophistication I would be suspicious of even older versions of xz until proven otherwise.
GitHub has suspended @JiaT75's account.
EDIT: Lasse Collin's account @Larhzu has also been suspended.
EDIT: Github has disabled all Tukaani repositories, including downloads from the releases page.
--
EDIT: Just did a bit of poking. xz-embedded was touched by Jia as well and it appears to be used in the linux kernel. I did quick look and it doesn't appear Jia touched anything of interest in there. I also checked the previous mirror at the tukaani project website, and nothing was out of place other than lagging a few commits behind:
https://gist.github.com/Qix-/f1a1b9a933e8847f56103bc14783ab7...
--
Here's a mailing list message from them ca. 2022.
https://listor.tp-sv.se/pipermail/tp-sv_listor.tp-sv.se/2022...
--
MinGW w64 on AUR was last published by Jia on Feb 29: https://aur.archlinux.org/cgit/aur.git/log/?h=mingw-w64-xz (found by searching their public key: 22D465F2B4C173803B20C6DE59FCF207FEA7F445)
--
pacman-static on AUR still lists their public key as a contributor, xz was last updated to 5.4.5 on 17-11-2023: https://aur.archlinux.org/cgit/aur.git/?h=pacman-static
EDIT: I've emailed the maintainer to have the key removed.
--
Alpine was patched as of 6 hours ago.
https://git.alpinelinux.org/aports/commit/?id=982d2c6bcbbb57...
--
OpenSUSE is still listing Jia's public key: https://sources.suse.com/SUSE:SLE-15-SP6:GA/xz/576e550c49a36... (cross-ref with https://web.archive.org/web/20240329235153/https://tukaani.o...)
EDIT: Spoke with some folks in the package channel on libera, seems to be a non-issue. It is not used as attestation nor an ACL.
--
Arch appears to still list Jia as an approved publisher, if I'm understanding this page correctly.
https://gitlab.archlinux.org/archlinux/packaging/packages/xz...
EDIT: Just sent an email to the last committer to bring it to their attention.
EDIT: It's been removed.
--
jiatan's Libera info indicates they registered on Dec 12 13:43:12 2022 with no timezone information.
-NickServ- Information on jiatan (account jiatan):
-NickServ- Registered : Dec 12 13:43:12 2022 +0000 (1y 15w 3d ago)
-NickServ- Last seen : (less than two weeks ago)
-NickServ- User seen : (less than two weeks ago)
-NickServ- Flags : HideMail, Private
-NickServ- jiatan has enabled nick protection
-NickServ- *** End of Info ***
/whowas expired not too long ago, unfortunately. If anyone has it I'd love to know.They are not registered on freenode.
EDIT: Libera has stated they have not received any requests for information from any agencies as of yet (30th Saturday March 2024 00:39:31 UTC).
EDIT: Jia Tan was using a VPN to connect; that's all I'll be sharing here.
Just for posterity since I can no longer edit: Libera staff has been firm and unrelenting in their position not to disclose anything whatsoever about the account. I obtained the last point on my own. Libera has made it clear they will not budge on this topic, which I applaud and respect. They were not involved whatsoever in ascertaining a VPN was used, and since that fact makes anything else about the connection information moot, there's nothing else to say about it.
[flagged]
I am not LE nor a government official. I did not present a warrant of any kind. I asked in a channel about it. Libera refused to provide information. Libera respecting the privacy of users is of course something I applaud and respect. Why wouldn't I?
Respect not giving out identifying information on individuals whenever someone asks, no matter what company they work for and what job they do? Yes. I respect this.
It's called keeping integrity on not disclosing private info any users from your network, regardless whether they are bad actors.
I respect them for that.
Violating that code is just as bad as the bad actor slipping backdoors.
I hope you aren’t in control of any customer data.
> EDIT: Github has disabled all Tukaani repositories, including downloads from the releases page.
Why? Isn't it better to freeze them and let as many people as possible analyze the code?
Good question, though I can imagine they took this action for two reasons:
1. They don't have the ability to freeze repos (i.e. would require some engineering effort to implement it), as I've never seen them do that before.
2. Many distros (and I assume many enterprises) were still linking to the GitHub releases to source the infected tarballs for building. Disabling the repo prevents that.
The infected tarballs and repo are still available elsewhere for researchers to find, too.
They could always archive it. Theoretically (and I mean theoretically only), there's another reason for Microsoft to prevent access to repo: if a nation state was involved, and there've been backdoor conversations to obfuscate the trail.
Archiving the repo doesn't stop the downloads. They would need to rename it in order to prevent distro CI/CD from keeping downloading untrustworthy stuff.
Distros downloading directly from GitHub deserve what they get.
Maybe one can get the code from here. New commits being added it seems.
The latest commit is interesting (f9cf4c05edd14, "Fix sabotaged Landlock sandbox check").
It looks like one of Jia Tan's commits (328c52da8a2) added a stray "." character to a piece of C code that was part of a check for sandboxing support, which I guess would cause the code to fail to compile, causing the check to fail, causing the sandboxing to be disabled.
Lasse has also started his own documentation on the incident.
Shouldn't they have tests running to ensure that the check works on at least some systems?
What do you mean "tests"?
Have a system were you wxpect the sandboxing to work and have an automated check that it compiles there?
Part of the backdoor was in the tests. The attacker in this case could easily have sabotaged the test as well if a test was required.
If your project becomes complex enough eventually you need tests for the configure step. Even without malicious actors its easy to miss that a compiler or system change broke some check.
You can still find the source everywhere, if you look for it. Having a fine-looking page distribute vulnerable source code is a much bigger threat.
You can find it on archive. Someone archived it last night
[flagged]
Don't agree here. I've only ever seen GitHub do this in extreme circumstances where they were absolutely warranted.
[flagged]
The alpine patch includes gettext-dev which is likely also exploited as the same authors have been pushing gettext to projects where their changes have been questioned
What do you mean?
Look at the newest commits, do you see anything suspicious:
https://git.alpinelinux.org/aports/log/main/gettext
libunistring could also be affected as that has also been pushed there
Seeing so many commits that are "skip failing test" is a very strong code smell.
Yes, but it is often a sad reality of trying to run projects mainly written for glibc on musl. Not many people write portable C these days.
It's still the wrong way to go about things. Tests are there for a reason, meaning if they fail you should try to understand them to the point where you can fix the problem (broken test or actual bug) instead of just wantonly distabling tests until you get a green light.
> do you see anything suspicious
No.
> libunistring could also be affected as that has also been pushed there
What do you mean by "that"?
FWIW, that's mingw-w64-xz (cross-compiled xz utils) in AUR, not ming-w64 (which would normally refer to the compiler toolchain itself).
Good catch, thanks :)
It appears to be an RCE, not a public key bypass: https://news.ycombinator.com/item?id=39877312
I've posted an earlier WHOWAS of jiatan here: https://news.ycombinator.com/item?id=39868773
Asking this here too: why isn't there an automated A/B or diff match for the tarball contents to match the repo, auto-flag with a warning if that happens? Am I missing something here?
The tarballs mismatching from the git tree is a feature, not a bug. Projects that use submodules may want to include these and projects using autoconf may want to generate and include the configure script.
> The tarballs mismatching from the git tree is a feature, not a bug.
A feature which allowed the exploit to take place, let's put it that way.
Over here: https://gist.github.com/thesamesam/223949d5a074ebc3dce9ee78b...
> The release tarballs upstream publishes don't have the same code that GitHub has. This is common in C projects so that downstream consumers don't need to remember how to run autotools and autoconf. The version of build-to-host.m4 in the release tarballs differs wildly from the upstream on GitHub.
Multiple suggestions on that thread on how that's a legacy practice that might be outdated, especially in the current climate of cyber threats.
Someone even posted a more thorough gist on what could be done to increase transparency and reduce discrepancies between tarballs and repos: https://gist.github.com/smintrh78/97b5cb4d8332ea4808f25b47c8...
Here is a longer explainer: https://www.redhat.com/en/blog/what-open-source-upstream
"lol"
> Those days are pretty much behind us. Sure, you can compile code and tweak software configurations if you want to--but most of the time, users don't want to. Organizations generally don't want to, they want to rely on certified products that they can vet for their environment and get support for. This is why enterprise open source exists. Users and organizations count on vendors to turn upstreams into coherent downstream products that meet their needs.
> In turn, vendors like Red Hat learn from customer requests and feedback about what features they need and want. That, then, benefits the upstream project in the form of new features and bugfixes, etc., and ultimately finds its way into products and the cycle continues.
"and when the upstream is tainted, everyone drinks poisoned water downstream, simple as that!"
account is back online https://github.com/JiaT75
Hopefully still locked just visible so people can find and alayze his contributions.
I think this has been in the making for almost a year. The whole ifunc infrastructure was added in June 2023 by Hans Jansen and Jia Tan. The initial patch is "authored by" Lasse Collin in the git metadata, but the code actually came from Hans Jansen: https://github.com/tukaani-project/xz/commit/ee44863ae88e377...
> Thanks to Hans Jansen for the original patch.
https://github.com/tukaani-project/xz/pull/53
There were a ton of patches by these two subsequently because the ifunc code was breaking with all sorts of build options and obviously caused many problems with various sanitizers. Subsequently the configure script was modified multiple times to detect the use of sanitizers and abort the build unless either the sanitizer was disabled or the use of ifuncs was disabled. That would've masked the payload in many testing and debugging environments.
The hansjans162 Github account was created in 2023 and the only thing it did was add this code to liblzma. The same name later applied to do a NMU at Debian for the vulnerable version. Another "<name><number>" account (which only appears here, once) then pops up and asks for the vulnerable version to be imported: https://www.mail-archive.com/search?l=debian-bugs-dist@lists...
1 week ago "Hans Jansen" user "hjansen" was created in debian and opened 8 PRs including the upgrade to 5.6.1 to xz-utils
From https://salsa.debian.org/users/hjansen/activity
Author: Hans Jansen <hansjansen162@outlook.com>
- [Debian Games / empire](https://salsa.debian.org/games-team/empire): opened merge request "!2 New upstream version 1.17" - March 17, 2024
- [Debian Games / empire](https://salsa.debian.org/games-team/empire): opened merge request "!1 Update to upstream 1.17" - March 17, 2024
- [Debian Games / libretro / libretro-core-info](https://salsa.debian.org/games-team/libretro/libretro-core-i...): opened merge request "!2 New upstream version 1.17.0" - March 17, 2024
- [Debian Games / libretro / libretro-core-info](https://salsa.debian.org/games-team/libretro/libretro-core-i...): opened merge request "!1 Update to upstream 1.17.0" - March 17, 2024
- [Debian Games / endless-sky](https://salsa.debian.org/games-team/endless-sky): opened merge request "!6 Update upstream branch to 0.10.6" - March 17, 2024
- [Debian Games / endless-sky](https://salsa.debian.org/games-team/endless-sky): opened merge request "!5 Update to upstream 0.10.6" - March 17, 2024
- [Debian / Xz Utils](https://salsa.debian.org/debian/xz-utils): opened merge request "!1 Update to upstream 5.6.1" - March 17, 2024
That looks exactly like what you'd want to see to disguise the actual request you want, a number of pointless upstream updates in things that are mostly ignored, and then the one you want.
[flagged]
glad I didn't merge it ...
Make it two years.
Jia Tan getting maintainer access looks like it is almost certainly to be part of the operation. Lasse Colling mentioned multiple times how Jia has helped off-list and to me it seems like Jia befriended Lasse as well (see how Lasse talks about them in 2023).
Also the pattern of astroturfing dates back to 2022. See for example this thread where Jia, who has helped at this point for a few weeks, posts a patch, and a <name><number>@protonmail (jigarkumar17) user pops up and then bumps the thread three times(!) lamenting the slowness of the project and pushing for Jia to get commit access: https://www.mail-archive.com/xz-devel@tukaani.org/msg00553.h...
Naturally, like in the other instances of this happening, this user only appears once on the internet.
Also I saw this hans jansen user pushing for merging the 5.6.1 update in debian: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1067708
From: krygorin4545 <krygorin4545@proton.me> To: "1067708@bugs.debian.org" <1067708@bugs.debian.org> Cc: "sebastian@breakpoint.cc" <sebastian@breakpoint.cc>, "bage@debian.org" <bage@debian.org> Subject: Re: RFS: xz-utils/5.6.1-0.1 [NMU] -- XZ-format compression utilities Date: Tue, 26 Mar 2024 19:27:47 +0000
Also seeing this bug. Extra valgrind output causes some failed tests for me. Looks like the new version will resolve it. Would like this new version so I can continue work.
--
Wow.
(Edited for clarity.)
Wow, what a big pile of infrastructure for a non-optimization.
An internal call via ifunc is not magic — it’s just a call via the GOT or PLT, which boils down to function pointers. An internal call through a hidden visibility function pointer (the right way to do this) is also a function pointer.
The even better solution is a plain old if statement, which implements the very very fancy “devirtualization” optimization, and the result will be effectively predicted on most CPUs and is not subject to the whole pile of issue that retpolines are needed to work around.
Right, IFUNCs make sense for library function where you have the function pointer indirection anyway. Makes much less sense for internal functions - only argument over a regular function pointer would be the pointer being marked RO after it is resolved (if the library was linked with -z relro -z now), but an if avoids even that issue.
> because the ifunc code was breaking with all sorts of build options and obviously caused many problems with various sanitizers
for example, https://github.com/google/oss-fuzz/pull/10667
>Hans Jansen and Jia Tan
Are they really two people conspiring?
Unless proven otherwise, it is safe to assume one is just a pseudonym alias of the other.
or possibly just one person acting as two, or a group of people?
Or a group managing many identities, backdooring many different projects
Also I see this PR: https://github.com/tukaani-project/xz/pull/64
Does anybody know anything about Jia Tan? Is it likely just a made up persona? Or is this a well-known person.
It’s certainly a pseudonym just like all the other personas we’ve seen popping up on the mailing list supporting this “Jia Tan” in these couple of years. For all intents and purposes they can be of any nationality until we know more.
It seems like Hans Jansen has also an account on proton.me (hansjansen162@proton.me) with the Outlook address configured as recovery-email.
Yesterday sure was fun wasn't it :p Thanks for all your help/working with me on getting this cleaned up in Fedora.
PSA: I just noticed homebrew installed the compromised version on my Mac as a dependency of some other package. You may want to check this to see what version you get:
xz --version
Homebrew has already taken action, a `brew upgrade` will downgrade back to the last known good version.I also had a homebrew installed affected version.
I understand it's unlikely, but is there anything I can do to check if the backdoor was used? Also any other steps I should take after "brew upgrade"?
Quoting[1] from Homebrew on Github:
>> Looks like that Homebrew users (both macOS and Linux, both Intel and ARM) are unlikely affected?
> Correct. Though we do not appear to be affected, this revert was done out of an abundance of caution.
Thanks for this. I just ran brew upgrade and the result was as you described:
xz 5.6.1 -> 5.4.6
sorry, what exact version(s) is the one(s) affected again?
(or SHAs, etc.)
(EDIT: 5.6.0 and 5.6.1 ?)
(EDIT 2: Ooof, looks like the nix unstable channel uses xz 5.6.1 at this time)
I use Nix to manage this stuff on Mac, not Homebrew...
GitHub disabled the xz repo, making it a bit more difficult for nix to revert to an older version. They've made a fix, but it will take several more days for the build systems to finish rebuilding the ~220,000 packages that depend on the bootstrap utils.
Here is the discussion https://github.com/NixOS/nixpkgs/issues/300055
Lol they shouldn't be relying on GitHub in the first place.
What should they be relying on instead? Maybe rsync everything to an FTP server? Or Torrents? From your other comments, you seem to think no one should ever use GitHub for anything.
Is it actually compromised on homebrew though? I guess we can't be sure but it seemed to be checking if it was being packaged as .deb or .rpm?
Is 5.2.2 safe? Just 5.6.0 and 5.6.1 are bad?
Is it normal that when I try to uninstall xz it is trying to install lzma?
It means that `xz` was depended upon by something that depends on eg "xz OR lzma"
because of it's "great new features"
"great" for whom? I've seen enough of the industry to immediately feel suspicious when someone uses that sort of phrasing in an attempt to persuade me. It's no different from claiming a "better experience" or similar.
I made a library where version 2 is really really much faster than version 1. I'd want everyone to just move to version 2.
But then you are saying a specific great new feature, performance, and not just the claim and concept performance, but numbers.
I'm sure they actually had new features…
What are they specifically?
I don't know how you can be missing the essence of the problem here or that comments point.
Vague claims are meaningless and valueless and are now even worse than that, they are a red flag.
Please don't tell me that you would accept a pr that didn't explain what it did, and why it did it, and how it did it, with code that actually matched up with the claim, and was all actually something you wanted or agreed was a good change to your project.
Updating to the next version of a library is completely unrelated. When you update a library, you don't know what all the changes were to the library, _but the librarys maintainers do_, and you essentially trust that librarys maintainers to be doing their job not accepting random patches that might do anything.
Updating a dependency and trusting a project to be sane is entirely a different prospect from accepting a pr and just trusting that the submitter only did things that are both well intentioned and well executed.
If you don't get this then I for sure will not be using or trusting your library.
Yeah... RISCV routine was put in, then some binary test files were added later that are probably now suspect.
don't miss out on the quality code, like the line that has: i += 4 - 2;
https://git.tukaani.org/?p=xz.git;a=commitdiff;h=50255feeaab...
FWIW, "4 - 2" is explained earlier in the file:
// The "-2" is included because the for-loop will
// always increment by 2. In this case, we want to
// skip an extra 2 bytes since we used 4 bytes
// of input.
i += 4 - 2;
> some binary test files were added later that are probably now suspect
That's confirmed
From https://www.openwall.com/lists/oss-security/2024/03/29/4:
> The files containing the bulk of the exploit are in an obfuscated form in
> tests/files/bad-3-corrupt_lzma2.xz
> tests/files/good-large_compressed.lzma
> committed upstream. They were initially added in
> https://github.com/tukaani-project/xz/commit/cf44e4b7f5dfdbf...
It probably makes sense to start isolating build processes from test case resources.
Sure but then you can smuggle it into basically any other part of the build process…?
You can find more examples of that kind of puffer if you go to a website's cookie consent pop-up and find the clause after "we use cookies to...".
I’ve long thought that those “this new version fixes bugs and improves user experience” patch notes that Meta et al copy and paste on every release shouldn’t be permitted.
Tell me about it. I look at all these random updates that get pushed to my mobile phone and they all pretty much have that kind of fluff in the description. Apple/Android should take some steps to improve this or outright ban this practice. In terms of importance to them though I imagine this is pretty low on the list.
I have dreamed about an automated LLM system that can "diff" the changes out of the binary and provide some insight. You know give back a tiny bit of power to the user. I'll keep dreaming.
It's worse, as someone who does try to privide release notes I'm often cut off by the max length of the field. And even then, Play only shows you the notes for the latest version of the app.
Slack's Mac app release notes [1] rotate a few copy pastes, here's the one that shits me the most.
> We tuned up the engine and gave the interiors a thorough clean. Everything is now running smoothly again.
Yeah nah mate, if every release is the first release where everything is running smoothly, I'm not going to believe it this time either.
Makes me wonder if the team has some release quota to fill and will push a build even if nothing meaningful has actually changed.
Ugh. That's especially annoying because they're trying to be hip with slang and use a metaphor that requires cultural knowledge that you can't really assume everyone has.
Interesting that one of the commits commented on update of the test file that it was for better reproducibility for having been generated by a fixed random seed (although how goes unmentioned). For the future, random test data better be generated as part of the build, rather than being committed as opaque blobs...
I agree on principle, but sometimes programmatic generating test data is not so easy.
E.g.: I have a specific JPEG committed into a repository because it triggers a specific issue when reading its metadata. It's not just _random_ data, but specific bogus data.
But yeah, if the test blob is purely random, then you can just commit a seed and generate in during tests.
Debian have reverted xz-utils (in unstable) to 5.4.5 – actual version string is “5.6.1+really5.4.5-1”. So presumably that version's safe; we shall see…
Is that version truly vetted? "Jia Tan" has been the official maintainer since 5.4.3, could have pushed code under any other pseudonym, and controls the signing keys. I would have felt better about reverting farther back, xz hasn't had any breaking changes for a long time.
It looks like this is being discussed, with a complication of additional symbols that were introduced https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1068024
Thanks for this! I found this URL in the thread very interesting!
It is an excellent technical write-up and yet again another testimonial to the importance of keeping things simple.
The other comments here showing that the backdoor was a long-term effort now make me wonder just how long of an effort it was...
It's not only that account, other maintainer has been pushing the same promotion all over the place.
TIL that +really is a canonical string. [0]
[0]: https://www.debian.org/doc/debian-policy/ch-controlfields.ht...
There are suggestions to roll back further
After reading the original post by Andres Freund, https://www.openwall.com/lists/oss-security/2024/03/29/4, his analysis indicates that the RSA_public_decrypt function is being redirected to the malware code. Since RSA_public_decrypt is only used in the context of RSA public key - private key authentication, can we reasonably conclude that the backdoor does not affect username-password authentication?
Isn't it rather that the attacker can log in to the compromised server by exploiting the RSA code path?
I’m surprised there isn’t way more of this stuff. The supply chain is so huge and therefore represents so much surface area.
There probably is. Way more than anyone knows. I bet every major project on github is riddled with state actors.
Imagine if sshd was distributed by PyPI or cargo or npm instead of by a distro.
[dead]
Github accounts of both xz maintainers have been suspended.
Not true, the original author wasn't suspended: https://github.com/Larhzu
https://github.com/JiaT75 was suspended for a moment, but isn't anymore?
GitHub’s UI has been getting notoriously bad for showing consistent and timely information lately, could be an issue stemming from that.
Yeah. Had a weird problem last week where GitHub was serving old source code from the raw url when using curl, but showing the latest source when coming from a browser.
Super frustrating when trying to develop automation. :(
Both are suspended for me. Check followers on both accounts, both have a suspended pill right next to their names.
Ah, thanks for correcting me there - really weird that this isn't visible from the profile itself. Not even from the organization.
The following page for each other show both accounts suspended indeed.
Lasse's account was restored
github should add a badge for "inject backdoor into core open source infrastructure"
Hey maybe it would get bad actors to come clean trying to get that badge.
These shouldn't be suspended, and neither should their repositories. People might want to dig through the source code. It's okay if they add a warning on the repository, but suspending _everything_ is a stupid thing to do.
Tools don't read warnings. Of course the information should not be hidden completely but intentionally breaking the download URLs makes sense.
This can also be handled relatively easily. They can disable the old links and a new one can be added specifically for the disabled repository. Or even just let the repository be browsable through the interface at least.
Simply showing one giant page saying "This respository is disabled" is not helpful in any way.
Do you know if it was actually the commit author, of if their commit access was compromised?
If it was a compromise it also included the signing keys as the release tarball was modified vs the source available on GitHub.
Nice. I worked on a Linux disto when I was a wee lad and all we did was compute a new md5 and ship it.
Name and shame this author. They should never be allowed anywhere near any open projects ever again.
Please don't?
1. You don't actually know what has been done by whom or why. You don't know if the author intended all of this, or if their account was compromised. You don't know if someone is pretending to be someone else. You don't know if this person was being blackmailed, forced against their will, etc. You don't really know much of anything, except a backdoor was introduced by somebody.
2. Assuming the author did do something maliciously, relying on personal reputation is bad security practice. The majority of successful security attacks come from insiders. You have to trust insiders, because someone has to get work done, and you don't know who's an insider attacker until they are found out. It's therefore a best security practice to limit access, provide audit logs, sign artifacts, etc, so you can trace back where an incursion happened, identify poisoned artifacts, remove them, etc. Just saying "let's ostracize Phil and hope this never happens again" doesn't work.
3. A lot of today's famous and important security researchers were, at one time or another, absolute dirtbags who did bad things. Human beings are fallible. But human beings can also grow and change. Nobody wants to listen to reason or compassion when their blood is up, so nobody wants to hear this right now. But that's why it needs to be said now. If someone is found guilty beyond a reasonable doubt (that's really the important part...), then name and shame, sure, shame can work wonders. But at some point people need to be given another chance.
100% fair -- we don't know if their account was compromised or if they meant to do this intentionally.
If it were me I'd be doing damage control to clear my name if my account was hacked and abused in this manner.
Otherwise if I was doing this knowing full well what would happen then full, complete defederation of me and my ability to contribute to anything ever again should commence -- the open source world is too open to such attacks where things are developed by people who assume good faith actors.
upon further reflection all 3 of your points are cogent and fair and valid. my original point was a knee-jerk reaction to this. :/
Your being able to reflect upon it and analyze your own reaction is rare, valuable and appreciated
I think I went through all the stages of grief. Now at the stage of acceptance here’s what I hope: I hope justice is done. Whoever is doing this be they a misguided current black hat (hopefully, future white hat) hacker, or just someone or someones that want to see the world burn or something in between that we see justice. And then forgiveness and acceptance and all that can happen later.
Mitnick reformed after he was convicted (whether you think that was warranted or not). Here if these folks are Mitnick’s or bad actors etc let’s get all the facts on the table and figure this out.
What’s clear is that we all need to be ever vigilant: that seemingly innocent patch could be part of a more nefarious thing.
We’ve seen it before with that university sending patches to the kernel to “test” how well the core team was at security and how well that went over.
Anyways. Yeah. Glad you all allowed me to grow. And I learned that I have an emotional connection to open source for better or worse: so much of my life professional and otherwise is enabled by it and so threats to it I guess I take personally.
It is reasonable to consider all commits introduced by the backdoor author untrustworthy. This doesn't mean all of it is backdoored, but if they were capable of introducing this backdoor, their code needs scrutiny. I don't care why they did it, whether it's a state-sponsored attack, a long game that was supposed to end with selling a backdoor for all Linux machines out there for bazillions of dollars, or blackmail — this is a serious incident that should eliminate them from open-source contributions and the xz project.
There is no requirement to use your real name when contributing to open source projects. The name of the backdoor author ("Jia Tan") might be fake. If it isn't, and if somehow they are found to be innocent (which I doubt, looking at the evidence throughout the thread), they can create a new account with a new fake identity.
They might have burnt the reputation built for this particular pseudonym but what is stopping them from doing it again? They were clearly in it for the long run.
You're assuming that it's even a single person, it's just a gmail address and an avatar with a j icon from a clip art thing.
I literally said "they", I know, I know, in English that can also be interpreted as a gender unspecific singular.
Anyways, yes it is an interesting question whether he/she is alone or they are a group. Conway's law probably applies here as well. And my hunch in general is that these criminal mad minds operate individually / alone. Maybe they are hired by an agency but I don't count that as a group effort.
Can legal action be taken against the author if it's found he maliciously added the backdoor?
Good luck with that. We don't even know what country is he from. Probably from China but even if so. Good luck finding him among 1.5 Billions.
It is not good to take into consideration something with any unreadable text instead of the open text of the programme. It should be excluded.
I wonder who the target was!
Every Linux box inside AWS, Azure, and GCP and other cloud providers that retains the default admin sudo-able user (e.g., “ec2”) and is running ssh on port 22.
I bet they intended for their back door to eventually be merged into the base Amazon Linux image.
You don't need a "ec2" user. A backdoor can just allow root login even when that is disabled for people not using the backdoor.
It just requires the SSH port to be reachable unless there is also a callout function (which is risky as people might see the traffic). And with Debian and Fedora covered and the change eventually making its way into Ubuntu and RHEL pretty much everything would have this backdoor.
my understanding is that any Debian/RPM-based Linux running sshd would become vulnerable in a year or two. The best equivalent of this exploit is the One Ring.
So the really strange thing is why they put so little effort into making this undetectable. All they needed was to make it use less time to check each login attempt.
In the other hand it was very hard to detect. The slow login time was the only thing that gave it away. It more seems like they were so close to being highly successful. In retrospect improving the performance would have been the smart play. But that is one part that went wrong compared to very many that went right.
Distro build hosts and distro package maintainers might not be a bad guess. Depends on whether getting this shipped was the final goal. It might have been just the beginning, part of some bootstrapping.
Probably less of an individual and more of an exploit to sell.
his account is active again on github https://github.com/JiaT75
Sleeper.
[dead]
[flagged]
Not sure why are people downvoting you... it's pretty unlikely that various Chinese IoT companies would just decide it's cool to add a backdoor, which clearly implies that no matter how good their intentions are, they simply might have no other choice.
There are roughly speaking two possibilities here:
1. His machine was compromised, he wasn't at fault past having less than ideal security (a sin we are all guilty of). His country or origin/residence is of no importance and doxing him isn't fair to him.
2. This account was malicious. There's no reason we should believe that the identity behind wasn't fabricated. The country of origin/residence is likely falsified.
In neither case is trying to investigate who he is on a public forum likely to be productive. In both cases there's risk of aiming an internet mob at some innocent person who was 'set up'.
The back door is in the upstream GitHub tarball. The most obvious way to get stuff there is by compromising an old style GitHub token. The new style GitHub tokens are much better but it’s somewhat intransparent what options you need. Most people also don’t use expiring tokens. The authors seems to have a lot of oss contributions, so probably an easy target to choose.
Why do you exclude the possibility that this person was forced to add this at gunpoint?
Yes exactly this. How do people think state actors have all those 0 day exploits. Excellent research? No! They are adding them themselves!
I think the letters+numbers naming scheme for both the main account and the sockpuppets used to get him access to xz and the versions into distros is a strong hint at (2). Taking over xz maintainership without any history of open source contributions is also suspicious.
Because it’s naive to think that the owner of the account used his real identity.
But my point is that people living in China might be "forced" to do such things, so we unfortunately can't ignore the country. Of course, practically this is problematic since the country can be faked
[dead]
[flagged]
Don't blame the guy. Could have happened to anyone. Even you.
[flagged]
It's uncharitable and comes across as a personal attack, which is not allowed in HN comments.
[flagged]
the account was either sold or stolen
That's pure speculation and there are plenty of hints to the contrary.
Fascinating. Just yesterday the author added a `SECURITY.md` file to the `xz-java` project.
> If you discover a security vulnerability in this project please report it privately. *Do not disclose it as a public issue.* This gives us time to work with you to fix the issue before public exposure, reducing the chance that the exploit will be used before a patch is released.
Reading that in a different light, it says give me time to adjust my exploits and capitalize on any targets. Makes me wonder what other vulns might exist in the author's other projects.
Security Researchers: Is this request-for-private-disclosure + "90-days before public" reasonable?
It's a SEVERE issue, to my mind, and 90 days seems too long to me.
In this particular case, there is a strong reason to expect exploitation in the wild to already be occurring (because it's an intentional backdoor) and this would change the risk calculus around disclosure timelines.
But in the general case, it's normal for 90 days to be given for the coordinated patching of even very severe vulnerabilities -- you are giving time not just to the project maintainers, but to the users of the software to finish updating their systems to a new fixed release, before enough detail to easily weaponize the vulnerability is shared. Google Project Zero is an example of a team with many critical impact findings using a 90-day timeline.
As someone in security who doesn't work at a major place that get invited to the nice pre-notification notifications, I hate this practice.
My customers and business are not any less important or valuable than anyone else's, and I should not be left being potentially exploited, and my customers harmed, for 90 more days while the big guys get to patch their systems (thinking of e.g. Log4J, where Amazon, Meta, Google, and others were told privately how to fix their systems, before others were even though the fix was simple).
Likewise, as a customer I should get to know as soon as someone's software is found vulnerable, so I can then make the choice whether to continue to subject myself to the risk of continuing to use it until it gets patched.
> My ... business are not any less ... valuable than anyone else's,
Plainly untrue. The reason they keep distribution minimal is to maximise the chance of keeping the vuln secret. Your business is plainly less valuable than google, than walmart, than godaddy, than BoA. Maybe you're some big cheese with a big reputation to keep, but seeing as you're feeling excluded, I guess these orgs have no more reason to trust you than they have to trust me, or hundreds of thousands of others who want to know. If they let you in, they'd let all the others in, and odds are greatly increased that now your customers are at risk from something one of these others has worked out, and either blabbed about or has themselves a reason to exploit it.
Similarly plainly, by disclosing to 100 major companies, they protect a vast breadth of consumers/customer-businesses of these major companies at a risk of 10,000,000/100 (or even less, given they may have more valuable reputation to keep). Changing that risk to 12,000,000/10,000 is, well, a risk they don't feel is worth taking.
> Your business is plainly less valuable than google, than walmart, than godaddy, than BoA.
The company I work for has a market cap roughly 5x that of goDaddy and we're responsible for network connected security systems that potentially control whether a person can physically access your home, school, or business. We were never notified of this until this HN thread.
If your BofA account gets hacked you lose money. If your GoDaddy account gets hacked you lose your domain. If Walmart gets hacked they lose... What money and have logistics issues for a while?
Thankfully my company's products have additional safeguards and this isn't a breach for us. But what if it was? Our customers can literally lose their lives if someone cracks the security and finds a way to remotely open all the locks in their home or business.
Don't tell me that some search engine profits or someone's emails history is "more valuable" than 2000 schoolchildren's lives.
How about you give copies of the keys to your apartment and a card containing your address to 50 random people on the streets and see if you still feel that having your Gmail account hacked is more valuable.
I think from an exposure point of view, I'm less likely to worry about the software side of my physical security being exploited that the actual hardware side.
None of the points you make are relevant since I have yet to see any software based entry product whose software security can be concidered more than lackluster at best, maybe your company is better since you didn't mention a name I can't say otherwise.
What I'm saying is your customers are more likely to have their doors physically broken than remotely opened by software and you are here on about life and death because of a vuln in xz?
If your companies market cap is as high as you say and they are as security aware as you say why aren't they employing security researchers and actively on the forefront of finding vulns and reporting them? That would get them an invite to the party.
Sorry, but that's not a serious risk analysis. The average person would be hurt a lot more by a godaddy breach by a state actor than by a breach of your service by a state actor.
Man if it was ever appropriate to tell someone to touch grass this would be it.
The think of the children part is a nice touch as well. 10/10 copypasta would repost.
Being in a similar boat, I heartily agree.
But I don't want anyone else to get notified immediately because the odds that somebody will start exploiting people before a patch is available is pretty high. Since I can't have both, I will choose the 90 days for the project to get patches done and all the packagers to include them and make them available, so that by the time it's public knowledge I'm already patched.
I think this is a Tragedy of the Commons type of problem.
Caveat: This assume the vuln is found by a white hat. If it's being exploited already or is known to others, then I fully agree the disclosure time should be eliminated and it's BS for the big companies to get more time than us.
OpenSSL's "notification of an upcoming critical release" is public, not private.
You do get to know that the vulnerability exists quickly, and you could choose to stop using OpenSSL altogether (among other mitigations) once that email goes out.
Yeah I worked in FAANG when we got the advance notice of a number of CVEs. Personally I think it's shady, I don't care how big Amazon or Google is, they shouldn't get special privileges because they are a large corporation.
I empathize with this as I've been in the same boat, but all entities are not equal when performing triage.
> My customers and business are not any less important or valuable than anyone else's
Hate to break it to you but yes they are.
> My customers and business are not any less important or valuable than anyone else's
Of course they are. If Red Hat has a million times more customers than you do then they are collectively more valuable almost by definition.
> but to the users of the software to finish updating their systems to a new fixed release,
Is there "a new fixed release" ?
Whether its reasonable is debatable, but that type of time frame is pretty normal for things that aren't being actively exploited.
This situation is perhaps a little different as its not an accidental bug waiting to be discovered but an intentionally placed exploit. We know that a malicious person already knows about it.
Detecting a security issue is one thing. Detecting a malicious payload is something completely different. The latter has intent to exploit and must be addressed immediately. The former has at least some chance of noone knowing about it.
If you were following Google Project Zero's policy (which many researchers do), any in-the-wild exploits would trigger an immediate reveal.
I think you have to take the credibility of the maintainer into account.
If it's a large company, made of people with names and faces, with a lot to lose by hacking its users, they're unlikely to abuse private disclosure. If it's some tiny library, the maintainers might be in on it.
Also, if there's evidence of exploitation in the wild, the embargo is a gift to the attacker. The existence of a vulnerability in that case should be announced, even if the specifics have to be kept under embargo.
In this case the maintainer is the one who deliberately introduced the backdoor. As Andres Freund puts it deadpan, "Given the apparent upstream involvement I have not reported an upstream bug."
imho it depends on the vuln. I've given a vendor over a year, because it was a very low risk vuln. This isn't a vuln though - this is an attack.
> imho it depends on the vuln. I've given a vendor over a year, because it was a very low risk vuln.
But why? A year is a ridiculous time for fixing a vulnerability even a minor one. If a vendor is taking that long its because they don't prioritize security at all and are just dragging their feet.
The fraudulent author must have enjoyed the 'in joke' -- He's the one create vulnerabilities..
I've always laughed my ass off at the idea of a disclosure window. It takes less than a day to find RCE that grants root privileges on devices that I've bothered to look at. Why on earth would I bother spending months of my time trying to convince someone to fix something?
90 day dark window for maintainers is SOP though. Then after 90 days, it’s free game for public disclosure
How many of people like this one exist?
If this question had a reliable (and public) answer then the world would be a very different place!
That said, this is an important question. We, particularly those us who work on critical infrastructure or software, should be asking ourselves this regularly to help prevent this type of thing.
Note that it's also easy (and similarly catastrophic) to swing too far the other way and approach all unknowns with automatic paranoia. We live in a world where we have to trust strangers every day, and if we lose that option completely then our civilization grinds to a halt.
But-- vigilance is warranted. I applaud these engineers who followed their instincts and dug into this. They all did us a huge service!
EDIT: wording, spelling
Yeah thanks for saying this; I agree. And as cliche as it is to look for a technical solution to a social problem, I also think better tools could help a lot here.
The current situation is ridiculous - if I pull in a compression library from npm, cargo or Python, why can that package interact with my network, make syscalls (as me) and read and write files on my computer? Leftpad shouldn’t be able to install crypto ransomware on my computer.
To solve that, package managers should include capability based security. I want to say “use this package from cargo, but refuse to compile or link into my binary any function which makes any syscall except for read and write. No open - if I want to compress or decompress a file, I’ll open the file myself and pass it in.” No messing with my filesystem. No network access. No raw asm, no trusted build scripts and no exec. What I allow is all you get.
The capability should be transitive. All dependencies of the package should be brought in under the same restriction.
In dynamic languages like (server side) JavaScript, I think this would have to be handled at runtime. We could add a capability parameter to all functions which issue syscalls (or do anything else that’s security sensitive). When the program starts, it gets an “everything” capability. That capability can be cloned and reduced to just the capabilities needed. (Think, pledge). If I want to talk to redis using a 3rd party library, I pass the redis package a capability which only allows it to open network connections. And only to this specific host on this specific port.
It wouldn’t stop all security problems. It might not even stop this one. But it would dramatically reduce the attack surface of badly behaving libraries.
Doesn't this exact exploit not fixed by your capability theory?
It is hijacking a process that has network access at runtime not build time.
The build hack grabs files from the repo and inspects build parameters (in a benign way, everyone checks whether you are running on X platform etc)
if I got it right, the attack uses glibc IFUNC mechanism to patch sshd (and only sshd) to directly run some code in liblzma when sshd verifies logins.
so the problem is IFUNC mechanism, which has its valid uses but can be EASILY misused for any sort of attacks
> We, particularly those us who work on critical infrastructure or software
We should also be asking ourselves if we are working on critical infrastructure. Lasse Collin probably did not consider liblzma being loaded by sshd when vetting the new maintainer. Did the xz project ever agree to this responsibility?
We should also be asking ourselfs if each dependency of critical infrastructure is worth the risk. sshd linking libsystemd just to write a few bytes into an open fd is absurd. libsystemd pulling in liblzma because hey it also does compressed logging is absurd. Yet this kind of absurd dependency bloat is everywhere.
Assume 3% of the population is malicious.
Enough to be cautious, enough to think about how to catch bad actors, not so much as to close yourself off and become a paranoid hermit.
We live in a time of populous, wealthy dictatorships that have computer-science expertise are openly hostile to the US and Canada.
North America is only about 5% of the world's population. [1] (We can assume that malicious actors are in North America, too, but this helps to adjust our perspective.)
The percentage of maliciousness on the Internet is much higher.
[1] _ See continental subregions. https://en.wikipedia.org/wiki/List_of_continents_and_contine...
Huh. I never really thought of it as a percentage.
I've been evil, been wonderful, and indifferent at different stages in life.
I have known those who have done similar for money, fame, and boredom.
I think, given a backstory, incentive, opportunity, and resources it would be possible to most people to flip from wouldn't to enlisted.
Leverage has shown to be the biggest lever when it comes to compliance.
Threat actors create personas. We will need strong social trust to protect our important projects and dependencies.
> How many of people like this one exist?
I guess every 3 letter agency has at least one. You can do the math. They havent't learned anything after Solar Winds.
Honestly it seems like a state-based actor hoping to get whatever high value target compromised before it's made public. Reporting privately buys them more time, and allows them to let handlers know when the jig is up.
Looks like one of the backdoor authors even went and disabled the feature the exploit relied on directly on oss-fuzz to prevent accidental discovery: https://social.treehouse.systems/@Aissen/112180302735030319 https://github.com/google/oss-fuzz/pull/10667
But luckily there was some serendipity: "I accidentally found a security issue while benchmarking postgres changes." https://mastodon.social/@AndresFreundTec/112180083704606941
This is getting addressed here: https://github.com/google/oss-fuzz/issues/11760
This in of itself can be legitimate. ifunc has real uses and it indeed does not work when sanitizer is enabled. Similar change in llvm: https://github.com/llvm/llvm-project/commit/1ef3de6b09f6b21a...
Because of the exploit, so, why should we use configurations in production that were not covered by these tests?
Could that commit also be made by a bad actor?
and that was in mid 2023. Very funny that Wikipedia on this issue says
> It is unknown whether this backdoor was intentionally placed by a maintainer or whether a maintainer was compromised
Yeah, if you've been compromised for a year your attacker is now your identity. Can't just wave hands, practice infosec hygiene
I've long since said that if you want to hide something nefarious you'd do that in the GNU autoconf soup (and not in "curl | sh" scripts).
Would be interesting to see what's going on here; the person who did the releases has done previous releases too (are they affected?) And has commits going back to 2022 – relatively recent, but not that recent. Many are real commits with real changes, and they have commits on some related projects like libarchive. Seems like a lot of effort just to insert a backdoor.
Edit: anyone with access can add files to existing releases and it won't show that someone else added it (I just tested). However, the timestamp of the file will be to when you uploaded it, not that of the release. On xz all the timestamps of the files match with the timestamp of the release (usually the .tar.gz is a few minutes earlier, which makes sense). So looks like they were done by the same person who did the release. I suspected someone else might have added/altered the files briefly after the release before anyone noticed, but that doesn't seem to be the case.
> I've long since said that if you want to hide something nefarious you'd do that in the GNU autoconf soup (and not in "curl | sh" scripts).
Yeah, I've been banging on that same drum for ages too... for example on this very site a decade ago: https://news.ycombinator.com/item?id=7213563
I'm honestly surprised that this autoconf vector hasn't happened more often... or more often that we know of.
Given that this was discovered by sheer luck, I'd expect way more such exploits in the wild.
Every single commit this person ever did should immediately be rolled back in all projects.
It's weird and disturbing that this isn't the default perspective.
Well, it is much easier said than done. Philosophically I agree, but in the real world where you have later commits that might break and downstream projects, etc, it isn't very practical. It strikes me as in a similar vein to high school students and beauty pageant constestants calling for world peace. Really great goal, not super easy to implement.
I would definitely be looking at every single commit though and if it isn't obviously safe I'd be drilling in.
Some of those commits might fix genuine vulnerabilities. So you might trade a new backdoor for an old vulnerability that thousands of criminal orgs have bots for exploiting.
Damage wise, most orgs aren't going to be hurt much by NSA or the Chinese equivalent getting access, but a Nigerian criminal gang? They're far more likely to encrypt all your files and demand a ransom.
Still.. At this point the default assumption should be every commit is a vulnerability or facilitating a potential vulnerability.
For example, change from safe_fprintf to fprintf. It would be appropriate that every commit should be reviewed and either tweaked or re-written to ensure the task is being done in the safest way and doesn't have anything that is "off" or introducing a deviation from the way that codebase standardly goes about tasks within functions.
it's not weird at all?
randomly reverting two years of things across dozens of repositories will break them, almost definitely make them unbuildable, but also make them unreleasable in case any other change needs to happen soon.
all of their code needs to be audited to prove it shouldn't be deleted, of course, but that can't happen in the next ten minutes.
I swear that HN has the least-thought-through hot takes of any media in the world.
* I swear that HN has the least-thought-through hot takes of any media in the world.*
The irony is too good.
Yeah if you tried to revert stuff that was done weeks ago on a relatively small team you know how much painstaking work it can be.
You can't just go and rip out old code, it'll break everything else, you have to review each commit and decide what to do with each.
"immediately" could mean have humans swarm on the task and make a choice, as opposed to
for commit in author_commits
git revert $commit
Imagine someone tried to revert all the commits you ever did. Doesn't sound easy.
Too much fallout.
Rolling back two years worth of commits made by a major contributor is going to be hell. I'm looking forward to see how they'll do this.
Not really. xz worked fine 2 years ago. Roll back to 5.3.1 and apply a fix for the 1 security hole that was fixed since that old version. (ZDI-CAN-16587)
Slight oversimplification, see https://bugs.debian.org/1068024 discussion.
This seems true with so many of these core libraries. Change for the sake of change introduces attack vectors. If it ain't broke, don't fix it!
Hoe will you do that practically though? That’s probably thousands of commits upon which tens or hundred thousand commits from others were built. You can’t just rollback everything two years and expect it not to break or bring back older vulnerabilities that were patched in those commits.
Likely part of what the attacker(s) are counting on. Anyone want to place odds this isn't the only thing that's going to be found?
I’d bet you at even odds that nothing else malicious by this person is found in 1 month, and at 1:2.5 odds that nothing is found in a year.
I don’t thinks that’s necessary: there are enough eyes on this person’s work now.
No one will do it seriously
> they have commits on some related projects like libarchive
Windows started using libarchive to support .rar, .7z, ...
https://arstechnica.com/gadgets/2023/05/cancel-your-winrar-t...
Couldn't the autoconf soup be generated from simpler inputs by the CI/CD system to avoid this kind of problem? Incomprehensible soup as a build artifact (e.g. executables) is perfectly normal, but it seems to me that such things don't belong in the source code.
(This means you too, gradle-wrapper! And your generated wrapper for your generated wrapper. That junk is not source code and doesn't belong in the repo.)
Yes, it's usually regenerated already. However even the source is often pretty gnarly.
And in general, the build system of a large project is doing a lot of work and is considered pretty uninteresting and obscure. Random CMake macros or shell scripts would be just as likely to host bad code.
This is also why I like meson, because it's much more constrained than the others and the build system tends to be more modular and the complex parts split across multiple smaller, mostly independent scripts (written in Python or bash, 20-30 lines max). It's still complex, but I find it easier to organize.
> And in general, the build system of a large project is doing a lot of work and is considered pretty uninteresting and obscure. Random CMake macros or shell scripts would be just as likely to host bad code.
Build systems can even have undefined behaviour in the C++ sense. For example Conan 2 has a whole page on that.
The other thing besides the autoconf soup is the XZ project contains incomprehensible binaries as "test data"; the "bad-3-corrupt_lzma2.xz" part of the backdoor that they even put in the repo.
It's entirely possible they could have got that injection through review, even if they had that framwork and instead put it in source files used to generate autoconf soup.
gradle-wrapper is just a convenience, you can always just build the project with an installed version of gradle. Although I get your point, it’s a great place to hide nefarious code.
Pure speculation but my guess is a specific state actor ahem is looking for developers innocently working with open source to then strongarm them into doing stuff like this.
Or hiring them to do it for years without telling them why until they need a favor.
many people are patriots of their countries. if state agency would approach them proposing to have paid OSS work and help their country to fight terrorism/dictatorships/capitalists/whatever-they-believe, they will feel like killing two birds with one job
While this seems plausible, it is notable that this person seems to be anonymous from the get go. Most open source maintainers are proud of their work and maintain publicly available personas.
While I don't doubt there are people who would gladly do this work for money/patriotism/whatever, adding a backdoor to your own project isn't really reconcilable with the motivations behind wanting to do OSS work.
I would be curious if their commits could be analyzed for patterns that could then be used to detect commits from their other account
One thing that is annoying is that many open source projects have been getting "garbage commits" apparently from people looking to "build cred" for resumes or such.
Easier and easier to hide this junk in amongst them.
annoying ... and convenient for some!
There was a DARPA program on this topic called Social Cyber. [1]
1. https://www.darpa.mil/program/hybrid-ai-to-protect-integrity...
I mean, a backdoor at this scale (particularly if it wasn't noticed for a while and got into stable distros) could be worth millions. Maybe hundreds of millions (think of the insider trading possibilities alone, not to mention espionage). 2 years doesn't seem like that much work relative to the potential pay off.
This is the sort of case where america's over the top hacking laws make sense.
And what law would you use to target someone who wrote some code and posted it for free on the internet that was willingly consumed?
The computer abuse and fraud act? Seems like a pretty easy question to answer.
Maybe I'm miss understanding things, but it seems like anyone can publish an exploit on the internet without being a crime. In the same way encryption is free speech.
It would seem unlikely this guy would be also logging into peoples boxes after this.
It seems a much tougher job to link something like this to an intentional unauthorized access.
At this point, we have no confirmed access via compromise.
Do you know of a specific case where the existence of a backdoor has been prosecuted without a compromise?
Who would have standing to bring this case? Anyone with a vulnerable machine? Someone with a known unauthorized access. Other maintainers of the repo?
IANAL but it is unclear that a provable crime has been committed here
Similar laws we use to prosecute someone who intentionally brought a poisened cake to the potluck.
Are you suggesting intent is impossible to determine?
> I've long since said that if you want to hide something nefarious you'd do that in the GNU autoconf soup
If I recall correctly, xz can be built with both autoconf and cmake, are cmake configs similarly affected?
Yes, there is evidence of sabotage on the CMake configs too.
https://git.tukaani.org/?p=xz.git;a=commit;h=f9cf4c05edd14de...
How about wheels in the python ecosystem
Yeah this was my first thought too. Though I think the case against autoconf is already so overwhelming I think anyone still using it is just irredeemable; this isn't going to persuade them.
For those panicking, here are some key things to look for, based on the writeup:
- A very recent version of liblzma5 - 5.6.0 or 5.6.1. This was added in the last month or so. If you're not on a rolling release distro, your version is probably older.
- A debian or RPM based distro of Linux on x86_64. In an apparent attempt to make reverse engineering harder, it does not seem to apply when built outside of deb or rpm packaging. It is also specific to Linux.
- Running OpenSSH sshd from systemd. OpenSSH as patched by some distros only pulls in libsystemd for logging functionality, which pulls in the compromised liblzma5.
Debian testing already has a version called '5.6.1+really5.4.5-1' that is really an older version 5.4, repackaged with a newer version to convince apt that it is in fact an upgrade.
It is possible there are other flaws or backdoors in liblzma5, though.
Focusing on sshd is the wrong approach. The backdoor was in liblzma5. It was discovered to attack sshd, but it very likely had other targets as well. The payload hasn't been analyzed yet, but _almost everything_ links to libzma5. Firefox and Chromium do. Keepassxc does. And it might have made arbitrary changes to your system, so installing the security update might not remove the backdoor.
From what I'm understanding it's trying to patch itself into the symbol resolution step of ld.so specifically for libcrypto under systemd on x86_64. Am I misreading the report?
That's a strong indication it's targeting sshd specifically.
Lots of software links both liblzma and libcrypto. As I read Andres Freund's report, there is still a lot of uncertainty:
"There's lots of stuff I have not analyzed and most of what I observed is purely from observation rather than exhaustively analyzing the backdoor code."
"There are other checks I have not fully traced."
As mentioned many times in other places now, this account had control over xz code for 2 years. The discovered CVE might be just a tip of an iceberg.
It checks for argv[0] == "sshd"
Ubuntu still ships 5.4.5 on 24.03 (atm).
I did a quick diff of the source (.orig file from packages.ubuntu.com) and the content mostly matched the 5.4.5 github tag except for Changelog and some translation files. It does match the tarball content, though.
So for 5.4.5 the tagged release and download on github differ.
It does change format strings, e.g.
+#: src/xz/args.c:735
+#, fuzzy
+#| msgid "%s: With --format=raw, --suffix=.SUF is required unless writing to stdout"
+msgid "With --format=raw, --suffix=.SUF is required unless writing to stdout"
+msgstr "%s: amb --format=raw, --suffix=.SUF és necessari si no s'escriu a la sortida estàndard"
There is no second argument to that printf for example. I think there is at least a format string injection in the older tarballs.[Edit] formatting
FYI, your formatting is broken. Hacker News doesn't support backtick code blocks, you have to indent code.
Anyway, so... the xz project has been compromised for a long time, at least since 5.4.5. I see that this JiaT75 guy has been the primary guy in charge of at least the GitHub releases for years. Should we view all releases after he got involved as probably compromised?
Thank you, formatting fixed.
My TLDR is that I would regard all commits by JiaT75 as potentially compromised.
Given the ability to manipulate gitnhistory I am not sure if a simple time based revert is enough.
It would be great to compare old copies of the repo with the current state. There is no guarantee that the history wasn't tampered with.
Overall the only safe action would IMHO to establish a new upstream from an assumed good state, then fully audit it. At that point we should probably just abandon it and use zstd instead.
Zstd belongs to the class of speed-optimized compressors providing “tolerable” compression ratios. Their intended use case is wrapping some easily compressible data with negligible (in the grand scale) performance impact. So when you have a server which sends gigabits of text per second, or caches gigabytes of text, or processes a queue with millions of text protocol messages, you can add compression on one side and decompression on the other to shrink them without worrying too much about CPU usage.
Xz is an implant of 7zip's LZMA(2) compression into traditional Unix archiver skeleton. It trades long compression times and giant dictionaries (that need lots of memory) for better (“much-better-than-deflate”) compression ratios. Therefore, zstd, no matter how fashionable that name might be in some circles, is not a replacement for xz.
It should also be noted that those LZMA-based archive formats might not be considered state-of-the-art today. If you worry about data density, there are options for both faster compression at the same size, and better compression in the same amount of time (provided that data is generally compressible). 7zip and xz are widespread and well tested, though, and allow decompression to be fast, which might be important in some cases. Alternatives often decompress much slowly. This is also a trade-off between total time spent on X nodes compressing data, and Y nodes decompressing data. When X is 1, and Y is in the millions (say, software distribution), you can spend A LOT of time compressing even for relatively minuscule gains without affecting the scales.
It should also be noted that many (or most) decoders of top compressing archivers are implemented as virtual machines executing chains of transform and unpack operations defined in archive file over pieces of data also saved there. Or, looking from a different angle, complex state machines initializing their state using complex data in the archive. Compressor tries to find most suitable combination of basic steps based on input data, and stores the result in the archive. (This is logically completed in neural network compression tools which learn what to do with data from data itself.) As some people may know, implementing all that byte juggling safely and effectively is a herculean task, and compression tools had exploits in the past because of that. Switching to a better solution might introduce a lot more potentially exploited bugs.
Not just Jia. There are some other accounts of concern with associated activity or short term/bot-is names.
Note that zstd (the utility) currently links to liblzma since it can compress and decompress other formats.
> Given the ability to manipulate gitnhistory I am not sure if a simple time based revert is enough.
Rewritten history is not a real concern because it would have been immediately noticed by anyone updating an existing checkout.
> Overall the only safe action would IMHO to establish a new upstream from an assumed good state, then fully audit it. At that point we should probably just abandon it and use zstd instead.
This is absurd and also impossible without breaking backwards compatibility all over the place.
"#, fuzzy" means the translation is out-of-date and it will be discarded at compile time.
I tried to get the translation to trigger by switching to french and it does not show. You are right.
So it's just odd that the tags and release tarballs diverge.
RHEL9 is shipping 5.2.5; RHEL8 is on 5.2.4.
Thanks for the heads up.
> Debian testing already has a version called '5.6.1+really5.4.5-1' that is really an older version 5.4, repackaged with a newer version to convince apt that it is in fact an upgrade.
I'm surprised .deb doesn't have a better approach. RPM has epoch for this purpose http://novosial.org/rpm/epoch/index.html
Debian has epochs, but it's a bad idea to use them for this purpose.
Two reasons:
1. Once you bump the epoch, you have to use it forever. 2. The deb filename often doesn't contain the epoch (we use a colon which isn't valid on many filesystems), so an epoch-revert will give the same file name as pre-epoch, which breaks your repository.
So, the current best practice is the +really+ thing.
Thanks for the info, the filename thing sounds like a problem, one aspect of the epoch system doesn't work for the purpose then.
Honestly, the Gentoo-style global blacklist (package.mask) to force a downgrade is probably a better approach for cases like this. Epochs only make sense if your upstream is insane and does not follow a consistent numbering system.
Gentoo also considers the repository (+overlays) to be the entire set of possible versions so simply removing the bad version will cause a downgrade, unlike debian and RPM systems where installing packages outside a repository is supported.
Stop the cap your honor. There is not a single filesystem that prevents you from using colons in filenames except exfat, I went ahead and checked and ext4, xfs, btrfs, zfs, and even reiserfs let you use any characters you want except \0 and /.
And I fail to see why bumping the epoch would ever be a problem. Using the epoch not a reason why its bad.
Got this on OpenSUSE: `5.6.1.revertto5.4-3.2`
.deb has epochs too, but I think Debian developers avoid it where possible because 1:5.4.5 is interpreted as newer than anything without a colon, so it would break eg. packages that depend on liblzma >= 5.0, < 6. There may be more common cases that aren't coming to mind now.
Seems like debian is mixing too many things into the package version - version used for deciding on upgrades and abi version for dependencies should be decoupled like it is in modern RPM distros.
If a binary library ABI is backwards-incompatible, they change the package name. I was just guessing at the reason epoch is avoided and that <6 is probably an awful example.
So now I actually bothered to look it up, and it turns out the actual reason is that the epoch changes what version is considered "greater", but it's not part of the .deb filename, so you still can't reuse version numbers used in the past. If you release 5.0, then 5.1, then you want to rollback and release 1:5.0, it's going to break things in the Debian archives. https://www.debian.org/doc/debian-policy/ch-binary.html#uniq...
Additionally, once you add an epoch you're stuck with it forever, while if you use 5.1+really5.0, you can get rid of the kludge when 5.2 is out. https://www.debian.org/doc/debian-policy/ch-controlfields.ht...
I really like the XBPS way of the reverts keyword in the package template that forces a downgrade from said software version. It's simple but works without any of the troubles RPM epochs have with resolving dependencies as it's just literally a way to tell xbps-install that "yeah, this is a lower version number in the repository but you should update anyway".
Debian packages can have epochs too. I’m not sure why the maintainers haven’t just bumped the epoch here.
Maybe they’re expecting a 5.6.x release shortly that fixes all these issues & don’t want to add an epoch for a very short term packaging issue?
> If you're not on a rolling release distro, your version is probably older.
Ironic considering security is often advertised as a feature of rolling release distros. I suppose in most instances it does provide better security, but there are some advantages to Debian's approach (stable Debian, that is).
>Ironic considering security is often advertised as a feature of rolling release distros.
Security is a feature of rolling release. But supply-chain attacks like this are the exception to the rule.
Isn't that what security-updates-only is for?
This particular backdoor is not shipped inside of a security patch, right?
i mean, rolling implies rolling 0-days, too.
The article gives a link to a simple shell script that detects the signature of the compromised function.
> Running OpenSSH sshd from systemd
I think this is irrelevant.
From the article: "Initially starting sshd outside of systemd did not show the slowdown, despite the backdoor briefly getting invoked." If I understand correctly the whole section, the behavior of OpenSSH may have differed when launched from systemd, but the backdoor was there in both cases.
Maybe some distributions that don't use systemd strip the libxz code from the upstream OpenSSH release, but I wouldn't bet on it if a fix is available.
> From the article: "Initially starting sshd outside of systemd did not show the slowdown, despite the backdoor briefly getting invoked." If I understand correctly the whole section, the behavior of OpenSSH may have differed when launched from systemd, but the backdoor was there in both cases.
It looks like the backdoor "deactivates" itself when it detects being started interactively, as a security researcher might. I was eventually able to circumvent that, but unless you do so, it'll not be active when started interactively.
However, the backdoor would also be active if you started it with an shell script (as the traditional sys-v rc scripts did) outside the context of an interactive shell, as TERM wouldn't be set either in that context.
> Maybe some distributions that don't use systemd strip the libxz code from the upstream OpenSSH release, but I wouldn't bet on it if a fix is available.
There's no xz code in openssh.
> Maybe some distributions that don't use systemd strip the libxz code from the upstream OpenSSH release, but I wouldn't bet on it if a fix is available.
OpenSSH is developed by the OpenBSD project, and systemd is not compatible with OpenBSD. The upstream project has no systemd or liblzma code to strip. If your sshd binary links to liblzma, it's because the package maintainers for your distro have gone out of their way to add systemd's patch to your sshd binary.
> From the article: "Initially starting sshd outside of systemd did not show the slowdown, despite the backdoor briefly getting invoked." If I understand correctly the whole section, the behavior of OpenSSH may have differed when launched from systemd, but the backdoor was there in both cases.
From what I understand, the backdoor detects if it's in any of a handful of different debug environments. If it's in a debug environment or not launched by systemd, it won't hook itself up. ("nothing to see here folks...") But if sshd isn't linked to liblzma to begin with, none of the backdoor's code even exists in the processes' page maps.
I'm still downgrading to an unaffected version, of course, but it's nice to know I was never vulnerable just by typing 'ldd `which sshd`' and not seeing liblzma.so.
I think the distributions that do use systemd are the ones that add the libsystemd code, which in turn brings in the liblzma5 code. So, it may not be entirely relevant how it is run, but it needs to be a version of OpenSSH patched.
I did notice that my debian-based system got noticeably slower and unresponsive at times the last two weeks, without obvious reasons. Could it be related?
I read through the report, but what wasn't directly clear to me was: what does the exploit actually do?
My normal internet connection has such an appalling upload that I don't think anything relevant could be uploaded. But I will change my ssh keys asap.
> I did notice that my debian-based system got noticeably slower and unresponsive at times the last two weeks, without obvious reasons. Could it be related?
Possible but unlikely.
> I read through the report, but what wasn't directly clear to me was: what does the exploit actually do?
It injects code that runs early during sshd connection establishment. Likely allowing remote code execution if you know the right magic to send to the server.
Thank you for the explanation.
Are you on stable/testing/unstable?
With our current knowledge, stable shouldn’t be affected by this.
Stable, luckily. Thank you for the information.
$ dpkg-query -W liblzma5
liblzma5:amd64 5.4.1-0.2
Tumbleweed has a package: liblzma5-5.6.1.revertto5.4-3.2.x86_64 FYI
revertto probably just means "revert to" but it does sound quite italian lol.
I hope Lasse Collin is doing OK! Here is a older message from him [1]
"I haven't lost interest but my ability to care has been fairly limited mostly due to longterm mental health issues but also due to some other things. Recently I've worked off-list a bit with Jia Tan on XZ Utils and perhaps he will have a bigger role in the future, we'll see.
It's also good to keep in mind that this is an unpaid hobby project. "
Github (Microsoft) are in a unique position to figure out if his account is hacked or not, and find a way to reach him. I hope they reach out and offer him some proper support! Economic support (if that's needed), or just help clearing his name.
This is another tale of how we are building multi trillion dollar industries on the back of unpaid volunteers. It's not github 'job', and many other organisations have benefited even more from Lasses work, but they are in a unique position, and would be literally pocket change for them.
1:https://www.mail-archive.com/xz-devel@tukaani.org/msg00567.h...
In a movie his mental health issues would likely have been caused intentionally by the attacker, setting the stage for the mole to offer to step in just at the right time. Seems a bit far fetched in this case though for what looks like a tangential attack.
In a movie, he was killed by foreign state actors, and his identity assumed by the foreign state hacker. Actually, someone should check on him.
or > Recently I've worked off-list a bit with Jia Tan on XZ Utils and perhaps he will have a bigger role in the future, we'll see.
Is actually Jia Tan has him tied up in a basement and is posing as him. State actors can do that kind of thing.
In that case why bother with the Jia Tan persona at all instead of just pushing the malware as Lasse Collin.
> what looks like a tangential attack
Does it? I expect that finding someone vulnerable was the more likely approach rather than messing with the life of a stable maintainer, but it does seem very much like the attacker was acting with malicious intent from the start of his interaction with the xz project.
Lasse appears to be active and working on undoing the sabotage. https://git.tukaani.org/?p=xz.git;a=blobdiff;f=CMakeLists.tx...
I would start restoring trust by reverting all this guys commits. It's the best way to be sure.
He came on IRC, he seemed ok. He did some cleanup of access and signed off for easter.
I mean, he was right at least. Jia Tan did have a bigger role.
which IRC channel ?
The official channel for the project.
I would like to see more attention given to this. I'm capable of compartmentalization and not over-guilting myself, but holy hell, I really hope he's doing alright. This would kind of destroy me.
I was actually telling my dad about this. I have a project, 500+ users, not quite root access, but enough to cause serious damage. I can think of at least one covert way to backdoor the binary artifacts from it.
About two years ago, someone showed up, started making good commits. In this case, they have some other community rep that goes back a bit further but... man it's an unsettling feeling.
> I'm capable of compartmentalization
teach me how. help me learn how, please. any resources with practical utility you can share? or any class of therapists that are good at teaching this with right frameworks offered? thank you
Relevant xkcd:
A couple of years ago I wrote a Go library that wraps the xz C code and allows you to do xz compression in Go: https://github.com/jamespfennell/xz
About a week ago I received the first PR on that repo, to upgrade to 5.6.1. I thought it was odd to get such a random PR...it's not the same GitHub account as upstream though.
As a bit of an aside, I would never accept a PR like this, and would always update $large_vendored_dependency myself. This is unreviewable, and trivial to insert any backdoor (unless you go through the motions of updating it yourself and diffing, at which point the PR becomes superfluous). I'd be wary even from a well-known author unless I knew them personally on some level (real-life or via internet). Not that I wouldn't trust them, but people's machines or accounts can get compromised, people can have psychotic episodes, things like that. At the very least I'd like to have some out-of-band "is this really you?" signal.
This is how I once inserted a joke in one of our (private) repos that would randomly send cryptic messages to our chat channel. This was pretty harmless and just a joke (there's some context that made it funny), but it took them years to find it – and that was only because I told them after I quit.
That said, looking at the GitHub account I'd be surprised if there's anything nefarious going on here. Probably just someone using your repo, seeing it's outdated, and updating it.
The (most?) popular SQLite driver for Go often gets PRs to update the SQLite C amalgamation, which the owner politely declines (and I appreciate him for that stance, and for taking on the maintenance burden it brings).
e.g., https://github.com/mattn/go-sqlite3/pull/1042#issuecomment-1...
Meanwhile SQLite itself doesn't accept any patches for anything; if you show the author one he will at best rewrite it.
In this case, the project is using Git submodules for its vendored dependencies, so you can trivially cryptographically verify that they have vendored the correct dependency just by checking the commit hash. It looks really crazy on Github but in most git clients it will just display the commit hash change.
Hey all, I’m the author of that PR. Just posted to Github with additional context: https://github.com/jamespfennell/xz/pull/2#issuecomment-2027...
The dopamine hits from updating stuff should come to an end, it should be thought of as adding potentially new bugs or exploits, unless the update fixes a CVE. Also Github needs to remove the green colors and checkmarks in PR's to prevent these dopamine traps from overriding any critical thinking
Counterpoint: if you wait to keep things up to date until there's a CVE, there's a higher likelihood that things will break doing such a massive upgrade, and this may slow down a very time-sensitive CVE response. Allowing people to feel rewarded for keeping things up to date is not inherently a bad thing. As with all things, the balance point will vary from project to project!
Exactly. You don’t want to be bleeding edge (churn, bugs) but in general you usually don’t want to be on the oldest supported version either (let alone unsupported).
Risk/reward depends on the usecase of course. For a startup I’d be on the .1 version of the newest major version (never .0) if there are new features I want. For enterprise, probably the oldest LTS I can get away with.
I strongly disagree. If you don’t update your dependencies then it’s easy to lose the institutional knowledge of how to update them, and who actually owns that obscure area of your code base that depends on them. Then you get a real CVE and have to work out everything in a hurry.
If you have a large code base and organisation then keep doing those upgrades so it won’t be a problem when it really matters. If it’s painful, or touches too many areas of the code you’ll be forced to refactor things so that ceases to be a problem, and you might even manage to contain things so well that you can swap implementations relatively easily when needed.
That sucks to have people write mails to your employer...
To be honest, I probably wouldn't have noticed the comments on the PR if it wasn't for that since my Github notifications are an absolute mess. Thankfully, my employer has been super supportive throughout this :D
I appreciated your detailed update!
I don't want to read too much into it, but the person (supposedly) submitting the PR seems to work at 1Password since December last year, as per his Linkedin. (And his Linkedin page has a link to the Github profile that made the PR).
They're definitely a real person. I know cause that "1Password employee since December" is a person I know IRL and worked with for years at their prior employer. They're not a no-name person or a fake identity just FYI. Please don't be witch hunting; this genuinely looks like an unfortunate case where Jared was merely proactively doing their job by trying to get an externally maintained golang bindings of XZ to the latest version of XZ. Jared's pretty fantastic to work with and is definitely the type of person to be filing PRs on external tools to get them to update dependencies. I think the timing is comically bad, but I can vouch for Jared.
[flagged]
Here's a PR on an employer-owned public Github repository where I made a change and Jared approved it. Please, let's not witch hunt.
If I were trying to compromise supply chains, getting into someplace like 1Password would be high up on the list.
Poor guy, he's probably going to get the third degree now.
As a 1Password user, I just got rather nervous.
Yubikeys starting to look kinda yummy.
Hardware gets backdoored too, remember Crypto AG?
Yeah the GitHub account looks really really legitimate. Maybe it was compromised though?
What looks legit about a gmail address and some stock art for a profile?
[Deleted per below]
Plus the README.md that is just a rickroll
The 2 GMail accounts are 85% / mainly associated with XZ work, since 2021, per searching for them explicitly via Google.
The PR's two commits are signed by a key that was also used to sign previous commits belonging to that author.
Hold up, are you saying that https://github.com/jaredallard and the accounts affiliated with this XZ backdoor share a PGP key? Or something else?
> it's not the same GitHub account as upstream
This is valuable information, and a sign that this may be the tip of an iceberg.
There was also a bug report in Debian which requested updating xz-utils to 5.6.1: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1067708
That's the same Hans Jansen mentioned here: https://boehs.org/node/everything-i-know-about-the-xz-backdo...
IMO your prior on this should be that it's most likely just someone innocently updating a dependency.
The backdoor (test binary blob and autoconf) is not part of the pull request.
Suddenly anything like that becomes super suspicious.
I wonder how this will affect the OS community in general.
More caution might not be a bad thing.
Internet detectives at work in this thread!
I am *not* a security researcher, nor a reverse engineer. There's lots of
stuff I have not analyzed and most of what I observed is purely from
observation rather than exhaustively analyzing the backdoor code.
I love this sort of technical writing from contributors outside the mainstream debugging world who might be averse to sharing. What an excellently summarized report of his findings that should be seen as a template.FWIW, it felt intimidating as hell. And I'm fairly established professionally. Not sure what I'd have done earlier in my career (although I'd probably not have found it in the first place).
> Not sure what I'd have done earlier in my career
To anybody in this sorta situation, you should absolutely share whatever you have. It doesn’t need to be perfect, good, or 100% accurate, but if there’s a risk you could help a lot of people
This story is an incredible testament to how open-source software can self-regulate against threats, and more broadly, it reminds us that we all stand on the shoulders of contributors like you. Thank you!
This is one threat that was discovered, only because the implementer was sloppy.
Think about what various corps and state-level actors have been putting in there.
I hope you've hired a PR person for all the interviews :)
For what it's worth the author is a PostgreSQL committer, he's not a security researcher but he's a pretty damn good engineer!
Honestly, you only get this kind of humility when you're working with absolute wizards on a consistent basis. That's how I read that whole analysis. Absolutely fascinating.
Related ongoing threads:
Xz: Disable ifunc to fix Issue 60259 - https://news.ycombinator.com/item?id=39869718
FAQ on the xz-utils backdoor - https://news.ycombinator.com/item?id=39869068
Everything I Know About the XZ Backdoor - https://news.ycombinator.com/item?id=39868673
Out of curiosity I looked at the list of followers of the account who committed the backdoor.
Randomly picked https://github.com/Neustradamus and looked at all their contributions.
Interestingly enough, they got Microsoft to upgrade ([0],[1]) `vcpkg` to liblzma 5.6.0 3 weeks ago.
OMG: look at the other contributions. He is trying to take over projects and pushing some change to sha256 in a hundred projects.
This guy's interactions seem weird but it might just be because of the non-native english or a strange attitude, or he's very good at covering his track e.g. found a cpython issue where he got reprimanded for serially opening issues: https://github.com/python/cpython/issues/115195#issuecomment...
But clicking around he seems to mostly be interacting with interest around these bits e.g. https://github.com/python/cpython/issues/95341#issuecomment-... or pinging the entire python team to link to the PR... of a core python developer: https://github.com/python/cpython/issues/95341#issuecomment-...
If I saw that on a $dayjob project I'd pit him as an innocuous pain in the ass (overly excited, noisy, dickriding).
Here's a PR from 2020 where he recommends / requests the addition of SCRAM to an SMTP client: https://github.com/marlam/msmtp/issues/36 which is basically the same thing as the PR you found. The linked documents seem genuine, and SCRAM is an actual challenge/response authentication method for a variety of protocols (in this case mostly SMTP, IMAP, and XMPP): https://en.wikipedia.org/wiki/Salted_Challenge_Response_Auth...
Although, and that's a bit creepy, he shows up in the edition history for the SCRAM page, the edit mostly seem innocent though he does plug his "state of play" github repository.
> dickriding
https://www.urbandictionary.com/define.php?term=Dickriding
I guess I'm not in the right demographic to know the term.
"fawning" or "ingratiating" seem to be the standard English words for this.
True, it does seem innocent enough upon more reflection.
What? They're just asking for some features there?
Ya'll need to calm down; this is getting silly. Half the GitHub accounts look "suspicious" if you start scrutinizing everything down the the microscopic detail.
I appreciate the way that duesee handled that whole issue.
reported the account to github, just in case.
Hey, I remember this guy! Buddy of someone who tried to get a bunch of low quality stuff into ifupdown-ng, including copying code with an incompatible license and removing the notice. He's in every PR, complaining the "project is dead". He even pushes for the account to be made "team member".
https://github.com/ifupdown-ng/ifupdown-ng/pulls/easynetdev
He follows 54k accounts though, so it may indeed just be coincidence.
The PR + angry user pushing for the PR author to gain commit access spiel is definitely suspiciously similar to what happened with xz-utils. Possible coincidence but worth investigating further.
I wouldn't be surprised if that is just a bot.
He even follows me, though I have never published any open-source project on my own.
Dear @0xthr0w4, do you attack me because I have requested the XZ update?
Do not mix, I am not linked to the XZ project.
The parent comment doesn't read like an attack to me. Just an observation. Would be curious why you wanted the update though.
Imagine a more competent backdoor attempt on xz(1)—one that wouldn't have been noticed this quickly. xz is everywhere. They could pull off a "reflections on trusting trust": an xz which selectively modifies a tiny subset of the files it sees, like .tar.xz software tarballs underlying certain build processes. Not source code tarballs (someone might notice)—tarballs distributing pre-compiled binaries.
edit to add: Arch Linux' entire package system used to run on .tar.xz binaries (they switched to Zstd a few years ago [0]).
[0] https://news.ycombinator.com/item?id=19478171 ("Arch Linux propose changing compression method from xz to zstd (archlinux.org)")
A backdoored xz could also run payloads hidden inside other xz files, allowing targeted attacks.
The same authors have also contributed to Zstd
details please? I do not see any such contributions to https://github.com/facebook/zstd
They are probably getting confused.
Jia had a zstd fork on github, but when things kicked off, it appears they may have sanitized the fork.
deb packages are xz compressed...
my freaking kernels/initrd are xz or zstd compressed!
... and Debian is very serious about it: https://fulda.social/@Ganneff/112184975950858403
Unfortunately, this is how good bad actors work: with a very long-term point of view. There is no “harmless” project any more.
And, Joey Hess has counted at least 750 commits to xz from that handle.
https://hachyderm.io/@joeyh/112180715824680521
This does not look trust-inspiring. If the code is complex, there could be many more exploits hiding.
clickhouse has pretty good github_events dataset on playground that folks can use to do some research - some info on the dataset https://ghe.clickhouse.tech/
Example of what this user JiaT75 did so far:
https://play.clickhouse.com/play?user=play#U0VMRUNUICogRlJPT...
pull requests mentioning xz, 5.6 without downgrade, cve being mentioned in the last 60 days:
https://play.clickhouse.com/play?user=play#U0VMRUNUIGNyZWF0Z...
Yeah. It would be interesting to see who adopted to the compromised versions and how quickly, compared to how quickly they normally adopt new versions (not bots pulling upgrades, but how quickly maintainers approve and merge them)
If there were a bunch of people who adopted it abnormally fast compared to usual, might point to there being more "bad actors" in this operation (said at the risk of sounding paranoid if this turns out to be a state run thing)
> If the code is complex, there could be many more exploits hiding.
Then the code should not be complex. Low-level hacks and tricks (like pointer juggling) should be not allowed and simplicity and readability should be preferred.
For tools like compression programs, you’d generally prefer performance over everything (except data corruption, of course).
Probably you would prefer no backdoors also? Performance without correctness or trustworthiness is useless.
If this is a conspiracy or a state-sponsored attack, they might have gone specifically for embedded devices and the linux kernel. Here archived from tukaani.org:
https://web.archive.org/web/20110831134700/http://tukaani.or...
> XZ Embedded is a relatively small decompressor for the XZ format. It was developed with the Linux kernel in mind, but is easily usable in other projects too.
> *Features*
> * Compiled code 8-20 KiB
> [...]
> * All the required memory is allocated at initialization time.
This is targeted at embedded and real-time stuff. Could even be part of boot loaders in things like buildroot or RTEMS. And this means potentially millions of devices, from smart toasters or toothbrushes to satellites and missiles which most can't be updated with security fixes.
One scenario for malicious code in embedded devices would be a kind of killswitch which listens to a specific byte sequence and crashes when encountering it. For a state actor, having such an exploit would be gold.
That's an "interesting" thought.
One of my complaints about so many SciFi stories is the use of seemingly conventional weapons. I always thought that with so much advanced technology that weapons would be much more sophisticated. However if the next "great war" is won not by the side with the most destructive weapons but by the side with the best kill switch, subsequent conflicts might be fought with weapons that did not rely on any kind of computer assistance.
This is eerily similar to Einstein's (purported) statement that if World War III was fought with nuclear weapons, World War IV would be fought with sticks and stones. Similar, but for entirely different reasons.
I'm trying to understand why the characters in Dune fought with swords, pikes and knives.
All this circus makes me happy for never moving from sysvinit on embedded.
It is not just systemd which uses xz. For example, Debian's dpkg links xz-utils.
[flagged]
That is just technical disagreements and sour grapes by someone involved in a competing format (Lzip).
There’s no evidence Lasse did anything “wrong” beyond looking for / accepting co-maintainers, something package authors are taken to task for not doing every time they have life catching up or get fed up and can’t / won’t spend as much time on the thing.
This link is opinion piece about the file format and has nothing to do with today's news.
Also, Lasse has not been accused of any wrong-doings.
I have some questions.
1) Are there no legit code reviews from contributors like this? How did this get accepted into main repos while flying under the radar? When I do a code review, I try to understand the actual code I'm reviewing. Call me crazy I guess!
2) Is there no legal recourse to this? We're talking about someone who managed to root any linux server that stays up-to-date.
> 2) Is there no legal recourse to this? We're talking about someone who managed to root any linux server that stays up-to-date.
Any government which uses GNU/Linux in their infrastructure can pitch this as an attempt to backdoor their servers.
The real question is: will we ever even know who was behind this? If it was some mercenary hacker intending to resell the backdoor, maybe. But if it was someone working with an intelligence agency in US/China/Israel/Russia/etc, I doubt they'll ever be exposed.
Reflecting on the idea of introducing a validation structure for software contributions, akin to what RPKI does for BGP routing, I see significant potential to enhance security and accountability in software development.
Such a system could theoretically bring greater transparency and responsibility, particularly in an ecosystem where contributions come from all corners.
Implementing verifiable identity proofs for contributors might be challenging, but it also presents an opportunity to bolster security without compromising privacy and the freedom to contribute under pseudonyms.
The accountability of those accepting pull requests would also become clearer, potentially reducing the risk of malicious code being incorporated.
Of course, establishing a robust validation chain for software would require the commitment of everyone in the development ecosystem, including platforms like GitHub. However, I view this not as a barrier but as an essential step towards evolving our approach to security and collaboration in software development.
The actual inclusion code was never in the repo. The blobs were hidden as lzma test files.
So you review would need to guess from 2 new test files that those are, decompressed, a backdoor and could be injected which was never in the git history.
This was explicitly build to evade such reviews.
> The blobs were hidden as lzma test files.
OK, that is absolutely devious.
I suppose you think the maintainers shouldn’t have scrutinized those files? Please tell me it’s a joke.
"Jia Tan" was not a contributor, but a maintainer of the project. The key point here is that this is a multi-year project of infiltrating the xz project and gaining commit access.
In a large tech company (including ones I have worked at), sometimes you have policy where every change has to be code reviewed by another person. This kind of stuff isn't possible when the whole project only has 1-2 maintainers. Who's going to review your change other than yourself? This is the whole problem of OSS right now that a lot of people are bringing up.
I maintain a widely used open source project myself. I would love it if I could get high quality code review for my commits similar to my last workplace lol, but very very few people are willing to roll up their sleeves like that and work for free. Most people would just go to the Releases page and download your software instead.
>How did this get accepted into main repos while flying under the radar? When I do a code review, I try to understand the actual code I'm reviewing. Call me crazy I guess!
And? You never do any mistakes? Google "underhanded C contest"
750 commits... is xz able to send e-mails yet?
No. But if you have any centrifuges they will probably exhibit inconsistent behavior.
Maybe it’s the centrifuges which will send the mail, making the world’s first uranium-enriching spam botnet.
It's hardly surprising given that parsing is generally considered to be a tricky problem. Plus, it's a 15 years old project that's widely used. 750 commits is nothing to sneer about. No wonder the original maintainer got burned out.
750 is not that many commits if you follow a commit style where each independent change gets its own commit.
Yes, it sends an email containing your private key on installation.
Anyone have any level of confidence that for example EL7/8 would not be at risk even if more potential exploits at play?
RedHat blog says no versions of RHEL are affected.
https://www.redhat.com/en/blog/urgent-security-alert-fedora-...
[flagged]
I don't think EL7 gets minor version updates anymore though
I wouldn't count on it. RedHat packages contain lots of backported patches.
These changes were not backported to RHEL.
Right, that notion was what was making me nervous
I imagine it might be easier to just compromise a weakly protected account than to actual put in a 2 years long effort with real contributions. If we mandated MFA for all contributors who contribute to these really important projects then we can know with greater certainty if it was really a long con vs. a recently compromised account.
For some random server, sure. For a state sponsored attack? Having an embedded exploit you can use when convenient, or better yet an unknown exploit affecting every linux-based system connected to the internet that you can use when war breaks out - that's invaluable.
Yes, but even states have only finite resources, so even for them compromising an account would be cheaper.
(But you are right that a sleeper would be affordable for them.)
Having one or two people on payroll to occasionally add commits to a project isn't exactly that expensive if it pays off. There are ~29,000,000 US government employees (federal, state and local). Other countries like China and India have tens of millions of government employees.
It’s a very cheap investment given the blast radius
I find it funny how MFA is treated as if it would make account takeover suddenly impossible. It's just a bit more work, isn't it? And a big loss in convenience.
I'd much rather see passwords entirely replaced by key-based authentication. That would improve security. Adding 2FA to my password is just patching a fundamentally broken system.
Customer service at one of my banks has an official policy of sending me a verification code via email that I then read to them over the phone, and that's not even close to the most "wrong" 2FA implementation I've ever seen. Somehow that institution knows what a YubiKey is, but several major banks don't.
I'm security consultant in the financial industry. I've literally been involved in the decision making on this at a bank. Banks are very conservative, and behave like insecure teenagers. They won't do anything bold, they all just copy each other.
I pushed YubiKey as a solution and explained in detail why SMS was an awful choice, but they went with SMS anyway.
It mostly came down to cost. SMS was the cheapest option. YubiKey would involve buying and sending the keys to customers, and they having the pain/cost of supporting them. There was also the feeling that YubiKeys were too confusing for customers. The nail in the coffin was "SMS is the standard solution in the industry" plus "If it's good enough for VISA it's good enough for us".
Financial institutions are very slow to adopt new tech. Especially tech that will inevitably cost $$$ in support hours when users start locking themselves out of their accounts. There is little to no advantage to being the first bank to implement YubiKey 2FA. To a risk-averse org, the non-zero chance of a botched rollout or displeased customers outweighs any potential benefit.
Banks are in a tough spot. Remember, banks have you as a customer, they also have a 100 year old person who still wants to come to the branch in person as a customer. Not everyone can grapple with the idea of a Yubikey, or why their bank shouldn't be protecting their money like it did in the past.
Just say BofA.
yeah someone replied to one of my comments about adding MFA that an attacker can get around all that simply by buying the account from the author. I was way too narrowly focused on the technical aspects and was completely blind to other avenues like social engineering, etc.
All very fair points.
[flagged]
>I'd much rather see passwords entirely replaced by key-based authentication
I've never understood how key-based systems are considered better. I understand the encryption angle, nobody is compromising that. But now I have a key I need to personally shepherd? where do I keep it, and my backups, and what is the protection on those places? how many local copies, how many offsite? And I still need a password to access/use it, but with no recourse should I lose or forget. how am I supposed to remember that? It's all just kicking the same cans down the same roads.
Passkeys are being introduced right now in browsers and popular sites like a MFA option, but I think the intention is that they will grow and become the main factor in the future.
From what I've seen they're all controlled by huge tech companies. Hard pass.
This PR from July 8 2023 is suspicious, so it was very likely a long con: https://github.com/google/oss-fuzz/pull/10667
This is a state sponsored event. Pretty poorly executed though as they were tweaking and modifying things in their and other tools after the fact though.
As a state sponsored project. What makes you think this is their only project and that this is a big setback? I am paranoid myself to think yesterdays meeting went like : "team #25 has failed/been found out. Reallocate resources to the other 49 teams."
As I said recently in a talk I gave, 2FA as implemented by pypy or github is meaningless, when in fact all actions are performed via tokens that never expire, that are saved inside a .txt file on the disk.
Passwords have full scope of permission while session tokens can be limited.
In pypi to obtain a token that is limited in scope you must first generate an unlimited token.
True story.
In gh you can generate a limited one, but it's not really clear on what the permissions actually mean, so it's trial and error… which means most people will get tired and grant random stuff to have them working.
Doesn't GH explicitly warn against using non-expiring tokens?
I wonder what the point is, I don't remember GitHub warning me that I've used the se SSH key for years...
And?
they might not have been playing the long con. maybe approached by actors willing to pay them a lot of money to try and slip in a back door. I'm sure a deep dive into code contributions would clear that up for anyone familiar with the code base and some free time.
They did fuck up quite a bit though. They injected their payload before they checked if oss-fuzz or valgrind or ... would notice something wrong. That is sloppy and should have been anticipated and addressed BEFORE activating the code.
Anyway. This team got caught. What are the odds that this state-actor that did this, that this was the only project / team / library that they decided to attack?
Using sockpuppets to pressure the original maintainer into granting commit access in the first place backs the long con theory.
github already mandates MFA for members of important projects
Doesn't it mandate it for everyone? I don't use it anymore and haven't logged in since forever, but I think I got a series of e-mails that it was being made mandatory.
It will soon. I think I have to sort it out before April 4. My passwords are already >20 random characters, so I wasn't going to do it until they told me to.
It mandates it for everyone. I'm locked out of Github because fuck that.
Which helps with some kinds of threats, but not all. It keeps someone from pretending to be the maintainer -- but if an actual maintainer is compromised, coerced, or just bad from the start and biding their time, they can still do whatever they want with full access rights.
You probably should have replied that to the GP, not me. I only clarified that what they were suggesting already is the case.
Not MFA but git commit signing. I don't get why such core low-level projects don't mandate it. MFA doesn0t help if a github access token is stolen and I bet most of use such a token for pushing from an IDE.
Even if an access token to github is stolen, the sudden lack of signed commit should raise red flags. github should allow projects to force commit signing (if not already possible).
Then the access token plus the singing key would need to be stolen.
But of course all that doesn't help in the here more likley scenario of a long con by a state-sponsored hacker or in case of duress (which in certain countries seems pretty likley to happen)
Git signing would not have helped here. In fact, the tags and release archives were signed.
This seems like a great way to invest in supporting open source projects in meantime if these projects are being used by these actors. Just have to maintain an internal fork without the backdoors
Maybe someone can disrupt the open source funding problem by brokering exploit bounties /s
Probably a state actor. You can look far into the future when you’re working for the party.
Which like, also wouldn't be totally weird if I found out that the xz or whatever library maintainer worked for the DoE as a researcher? I kind of expect governments to be funding this stuff.
From what I read on masto, the original maint had personal life breakdown, etc. Their interest in staying as primary maint is gone.
This is a very strong argument for FOSS to pick up the good habit of ditching/un-mainlining projects where they are sitting around for state actors to volunteer injecting commits to, and dep-stripping active projects from this cruft.
Who wants to maintain on a shitty compression format? Someone who is dephunting, it turns out.
Okay so your pirate-torrent person needs liblzma.so Offer it in the scary/oldware section of the package library that you need to hunt down the instructions to turn on. Let the users see that it's marked as obsolete, enterprises will see that it should go on the banlist.
Collin worked on XZ and its predecessor ~15 years. It seems that he did that for free, at least in recent times. Anyone will lose motivation to work for free over this period of time.
At the same time, XZ became a cornerstone of major Linxus distributions, being systemd dependency and loaded, in particular, as part of sshd. What could go wrong?
In hindsight, the commercial idea of Red Hat, utilizing the free work of thousands of developers working "just for fun", turned out to be not so brilliant.
Um, what? This incident is turning into such a big deal because xz is deeply ingrained as a core dependency in the software ecosystem. It's not an obscure tool for "pirates."
[dead]
And that long term perspective could be used constructively instead!
(I detached this subthread from https://news.ycombinator.com/item?id=39866275, for the sake of pruning the top heavy thread.)
More likely that the account of that dev was breawched, dont you think ?
Warning, drunk brain talking. But a LLM driven email based "collaborator" could play a very long gMw adding basic features to a code made whilst earning trust backed by a generated online presence. My money is on a resurgance in the Web of Trust.
The web of trust is a really nice idea, but it works badly against that kind of attacks. Just consider that in the real world, most living people (all eight billions) are linked by only six degrees of separation. It really works, for code and for trusted social relations (like "I lend you 100 bucks and you pay me them back when you get your salary") mostly when you know the code author in person.
This is also not a new insight. In the beginning of the naughties, there was a web site named kuro5hin.org, which experiemented with user ratings and trust networks. It turned out impossible to prevent take-overs.
IIRC, kuro5hin and others all left out a crucial step in the web-of-trust approach: There were absolutely no repercussions when you extended trust to somebody who later turned out to be a bad actor.
It considers trust to be an individual metric instead of leaning more into the graph.
(There are other issues, e.g. the fact that "trust" isn't a universal metric either, but context dependent. There are folks whom you'd absolutely trust to e.g. do great & reliable work in a security context, but you'd still not hand them the keys to your car)
At least kuro5hin modeled a degradation of trust over time, which most models still skip.
It'd be a useful thing, but we have a long way to go before there's a working version.
Once you add punishment for handing out trust to bad actors, even in good faith (which you can't prove/disprove anyway), then you also need to somehow provide siginificant rewardsf for handing out trust to good actors - otherwise everyone is going to play it safe and not vouch for anyone and your system becomes useless.
There were experiments back in the day. Slashdot had one system based on randomly assigned moderation duty which worked pretty great actually, except that for the longest time you couldn't sort by it.
Kuro5hin had a system which didn't work at all, as you mentioned.
But the best was probably Raph Levien's Advogato. That had a web of trust system which actually worked. But had a pretty limited scope (open source devs).
Now everyone just slaps an upvote/downvote button on and calls it a day.
Clearly a human is even better at it.
State actors have equaly long horizons to compromise
State level actor? China?
You're likely being downvoted because the Github profile looking like east Asian isn't evidence of where the attacker/attackers are from.
Nation states will go to long lengths to disguise their identity. Using broken Russian English when they are not Russian, putting comments in the code of another language, and all sorts of other things to create misdirection.
That's certainly true-- at the very least it "seems" like Asian, but it could very well be from any nation. If they were patient enough to work up to this point they would likely not be dumb enough to leak such information.
Otoh with these kind of kneejerk "antiracist" reactions the best way to disguise a chinese operative becomes to prentend you're chinese.
Looks like Lasse Collin has commented on LKML: https://lkml.org/lkml/2024/3/30/188
Also, some info here: https://tukaani.org/xz-backdoor/
Or if you can't stand the lkml.org UI:
https://lore.kernel.org/lkml/20240330144848.102a1e8c@kaneli/
The terrifying part is that this was primarily found because the backdoor was poorly made and causing performance problems.
Makes you wonder what more competent actors can do.
I've analysed the backdoor myself and it's very sophisticated, not poorly made at all. The performance problem is surprising in this context, but I think next time they won't make that mistake.
I guess it seems like the operational parts are a bit poorly done. Valgrind issues, adding a new version with symbols removed, the aforementioned performance issues. Like i would assume the type of person who would do this sort of thing, over a 2 year period no less, would test extensively and be sure all their i's are dotted. Its all kind of surprising given how audacious the attack is.
There are so many variations of Linux/FreeBSD and weird setups and environments that it's almost guaranteed that you'll hit a snag somewhere if you do any major modification like inserting a backdoor.
It's hard enough to get code to work correctly; getting it to be also doing something else is even harder.
The way they went around it, however, was brilliant. Completely reduce the variables to directly target whatever it is you're attacking. Reminds me of stuxnet somewhat.
Note that in this case the backdoor was only inserted in some tarballs and enabled itself only when building deb/rpm packages for x86-64 linux and with gcc and the gnu linker. This should already filter out the most exotic setups and makes it harder to reproduce.
But they almost got away with it. We could have found ourselves 5 years later with this code in all stable distribution versions, IoT devices etc.
Also, we only catch the ones that we ... catch. The ones that do everything perfectly, unless they come out and confess eventually, we don't get to "praise" them for their impeccable work.
Do you have a writeup or any details as to what it does? The logical thing based on this post is that it hooks the SSH key verification mechanism to silently allow some attacker-controlled keys but I wonder if there's more to it?
I was starting one, but the openwall message linked here is far more detailed and gets much further than I did. It's fiendishly difficult to follow the exploit.
sshd starts with root privileges and then proceeds to, in summary:[1]
1. Parse command line arguments
2. Setup logging
3. Load configuration files
4. Load keys/certificates into memory (notably including private keys)
5. Listen on a socket/port for incoming connections
6. Spawn a child process with reduced permissions (on Linux, using seccomp filters [2]) to respond to each incoming connection request
This backdoor executes at order 0 before sshd's main function is invoked, overwriting internal sshd functions with compromised ones. As some ideas of what the backdoor could achieve:
1. Leak server private keys during handshakes with users (including unauthenticated users) allowing the keys to be passively stolen
2. Accept backdoor keys as legitimate credentials
3. Compromise random number generation to disable perfect forward secrecy
4. Execute code on the host (supplied remotely by a malicious user) with the 'root' permissions available to sshd upon launch. On most Linux distributions, systemd-analyze security sshd.service will give a woeful score of 9.6/10 (10 being the worst).[3] There is essentially NO sandboxing used because an assumption is made that you'd want to login as root with sshd (or sudo/su to root) and thus would not want to be restricted in what filesystem paths and system calls your remote shell can then invoke.
The same attacker has also added code to Linux kernel build scripts which causes xz to be executed (xz at this point has a backdoor compiled into it) during the build of the Linux kernel where xz compression is used for the resulting image. Using this approach, the attacker can selectively choose to modify certain (or all) Linux kernel builds to do some very nasty things:
1. Leak Wireguard keys allowing them to be passively intercepted.
2. Compromise random number generation, meaning keys may be generated with minimal entropy (see Debian certificate problem from a few years ago).
3. Write LUKS master keys (keys used by dm-crypt for actually decrypting disks) to disks in retrievable format.
4, Introduce remote root code execution vulnerabilities into basic networking features such as TCP/IP code paths.
[1] 'main' function: https://anongit.mindrot.org/openssh.git/tree/sshd.c
[2] https://anongit.mindrot.org/openssh.git/tree/sandbox-seccomp...
[3] https://github.com/gentoo/gentoo/blob/HEAD/net-misc/openssh/...
So many malicious actors have been caught because they accidentally created a mild annoyance for someone that went on to bird-dog the problem.
Which is why a really good backdoor is a one line logic bug somewhere which is fiendishly difficult to trigger.
http://underhanded-c.org if people want examples of what could (and probably, somewhere, IS) being done.
Like the 2003 Linux kernel attempt https://lwn.net/Articles/57135/
Sure, however the problem that software is really hard also impacts bad actors. So it's probably at least as hard to write that one line logic bug and have it do exactly what you intended as to write equivalent real code that works precisely as intended.
Unrelated: as a dog/pointer lover i really like the term "to bird-dog the problem". Never heard of it (iam from germany though)
I’m from the U.S. and have never heard it either, and don’t understand what it means.
Pointing dogs (bird dogs) are made to point in the direction where they have perceived game. Good dogs are then not distracted by anything and stand there motionless, sometimes so far that they have to be carried away because they cannot turn away themselves.
It's somewhat regional, and it means to hunt down the target at the expense of everything else, as a dedicated hunting dog might.
Case in point: https://news.ycombinator.com/item?id=39843930
You must mean, "Makes you wonder what more competent actors are doing"
s/can do/have done/
Funny how Lasse Collin started to ccing himself and Jia Tan from 2024-03-20 (that was a day of tons of xz kernel patches), he never did that before. :)
https://lore.kernel.org/lkml/20240320183846.19475-2-lasse.co...
This is extremely suspicious.
It looks like someone may have noticed a unmaintained or lightly maintained project related to various things, and moved to take control of it.
Otherwhere in the discussion here someone mentions the domain details changed; if you have control of the domain you have control of all emails associated with it.
Also interesting, to me, how the GMail account for the backdoor contributor ONLY appears in the context of "XZ" discussions. Google their email address. Suggests a kind of focus, to me, and a lack of reality / genuineness.
This also means that Google might know who they are, unless they were careful to hide behind VPN or other such means.
those pipe usages are quite suspicious
https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-n...
https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-n...
pipeing into this shell script which now uses "eval"
https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-n...
i guess this will be revisited and removed soon
> pipeing into this shell script which now uses "eval"
I don’t actually see an issue with that `eval`. Why would one consider running `xz` followed by `eval`-ing its output more insecure than just running `xz`? If `xz` wants to do shenanigans with the privileges it already has, then it wouldn’t need `eval`’s help for that.
just take a closer look at the analysis https://www.openwall.com/lists/oss-security/2024/03/29/4
then try to understand the pattern. they backdoored by modifying the build process of packages. now consider the $XZ is also from a backdoored build and the call recognizes in the same way with parameters --robot --version and the shell environment with the hint "xz_wrap.sh" from the piped process. a lot stuff to recognize for the $XZ process that it run as part of a kernel build.
Maybe they put advanced stuff in a backdoored $XZ binary to modify the kernel in a similar way they modified lzma based packages in the build process.
because in order to put backdoor into xz executable, you need to infect its sources. and in order to infect the sources, you need to use a similar technique to hide the modification
"started to cc himself" seems to be simply "contributing to a new project and not having git-send-email fully set up". By default git-send-email Cc the sender, though in practice it's one of the first options one changes.
My favorite part was the analysis of "I'm not really a security researcher or reverse engineer but here's a complete breakdown of exactly how the behavior changes."
You only get this kind of humility when you're working with absolute wizards on a consistent basis.
That's completely crazy, the backdoor is introduced through a very cryptic addition to the configure script. Just looking at the diff, it doesn't look malicious at all, it looks like build script gibberish.
Thanks to autoconf, we're now used to build scripts looking like gibberish. A perfect place to hide a backdoor.
This is my main take-away from this. We must stop using upstream configure and other "binary" scripts. Delete them all and run "autoreconf -fi" to recreate them. (Debian already does something like this I think.)
> We must stop using upstream configure and other "binary" scripts. Delete them all and run "autoreconf -fi" to recreate them.
I would go further than that: all files which are in a distributed tarball, but not on the corresponding git repository, should be treated as suspect.
Distributing these generated autotools files is a relic of times when it could not be expected that the target machine would have all the necessary development environment pieces. Nowadays, we should be able to assume that whoever wants to compile the code can also run autoconf/automake/etc to generate the build scripts from their sources.
And other than the autotools output, and perhaps a couple of other tarball build artifacts (like cargo simplifying the Cargo.toml file), there should be no difference between what is distributed and what is on the repository. I recall reading about some project to find the corresponding commit for all Rust crates and compare it with the published crate, though I can't find it right now; I don't know whether there's something similar being done for other ecosystems.
One small problem with this is that autoconf is not backwards-compatible. There are projects out there that need older autoconf than distributions ship with.
Why do we distribute tarballs at all? A git hash should be all thats needed...
> I would go further than that: all files which are in a distributed tarball, but not on the corresponding git repository, should be treated as suspect.
This and the automated A/B / diff to check the tarball against the repo, flag if mismatched.
The backdoor is in an .m4 file that gets parsed by autoconf to generate the configure script. Running autoconf yourself won't save you.
That's not entirely true. autoreconf will regenerate m4/build-to-host.m4 but only if you delete it first.
It seems like this was the solution for archlinux, pull directly from the github tag and run autogen: https://gitlab.archlinux.org/archlinux/packaging/packages/xz...
it's shocking how many packages on distros are just one random tarball from the internet with lipstick
Oh come on, please, let's put autotools out to pasture. I've lost so much of my life fighting autotools crap compared to "just use meson".
As long as we also exterminate it's even more evil brother - libtool.
I don't think it would help much. I work on machine learning frameworks. A lot of them(and math libraries) rely on just in time compilation. None of us has the time or expertise to inspect JIT-ed assembly code. Not even mentioning that much of the code deliberately read/write out of bound, which is not an issue if you always add some extra bytes at the end of each buffer, which could make most memory sanitizer tools useless. When you run their unit tests, you run the JIT code, then a lot of things could happen. Maybe we should ask all packaging systems splitting their build into compile and test two stages, to ensure that a testing code would not impact the binaries that are going to be published. I would rather to read and analysis the generated code instead of the code that generates it.
I always run autoreconfig -ifv first.
In this case it wouldn't be sufficient. You had to also delete m4/build-to-host.m4 for autoreconf to recreate it.
Maybe it's time to dramatically simplify autoconf?
How long do we need to (pretend to) keep compatibility with pre-ANSI C compilers, broken shells on exotic retro-unixes, and running scripts that check how many bits are in a byte?
Not just autoconf. Build systems in general are a bad abstraction, which leads to lots and lots of code to try to make them do what you want. It's a sad reality of the mismatch between a prodecural task (compile files X, Y, and Z into binary A) and what we want (compile some random subset of files X, Y, and Z, doing an arbitrary number of other tasks first, into binary B).
For fun, you can read the responses to my musing that maybe build systems aren't needed: https://news.ycombinator.com/item?id=35474996 (People can't imagine programming without a build system - it's sad)
Autoconf is m4 macros and Bourne shell. Most mainstream programming languages have a packaging system that lets you invoke a shell script. This attack is a reminder to keep your shell scripts clean. Don't treat them as an afterthought.
I'm wondering is there i.e. no way to add an automated flagging system that A/B / `diff` checks the tarball contents against the repo's files and warns if there's a mismatch? This would be on i.e. GitHub's end so that there'd be this sort of automated integrity test and subsequent warning? Just a thought, since tainted tarballs like these might be altogether be (and become) a threat vector, regardless of the repo.
Maybe the US Government needs to put its line in the sand and mandate the end of autotools. :D
It looks like they are trying to get rid of C, so maybe in luck!
It looks like an earlier commit with a binary blob "test data" contained the bulk of the backdoor, then the configure script enabled it, and then later commits patched up valgrind errors caused by the backdoor. See the commit links in the "Compromised Repository" section.
Also, seems like the same user who made these changes are still submitting changes to various repositories as of a few days ago. Maybe these projects need to temporarily stop accepting commits until further review is done?
The use of "eval" stands out, or at least it should stand out – but there are two more instances of it in the same script, which presumably are not used maliciously.
A while back there was a discussion[0] of an arbitrary code execution vulnerability in exiftool which was also the result of "eval".
Avoiding casual use of this overpowered footgun might make it easier to spot malicious backdoors. Usually there is a better way to do it in almost all cases where people feel the need to reach for "eval", unless the feature you're implementing really is "take a piece of arbitrary code from the user and execute it".
Unfortunately eval in a shell script has an effect on the semantics but is not necessary to do some kind of parsing of the contents of a variable, unlike Python or Perl or JavaScript. A
$goo
line (without quotes) will already do word splitting, though it won't do another layer of variable expansion and unquoting, for which you'll need eval "$goo"
(This time with quotes).eval in autoconf macros is nothing unusual.
In (pre-backdoor) xz 5.4.5:
$ grep -wl eval m4/*
m4/gettext.m4
m4/lib-link.m4
m4/lib-prefix.m4
m4/libtool.m4
> Usually there is a better way to do it in almost all cases where people feel the need to reach for "eval"
unfortunately thats just standard in configure scripts, for example from python:
``` grep eval Python-3.12.2/configure | wc -l 165 ```
and its 32,958 lines of code, plenty of binary fixtures as well in the tarball to hide stuff.
who knows, but I have feeling us finding the backdoor in this case was more of a happy accident.
Yeah, now imagine they succeeded and it didn't cause any performance issues...
Can we even be sure no such successful attempt has already been made?
You can be certain it has happened, many times. Now think of all the software we mindlessly consume via docker, language package managers, and the like.
Remember, there is no such thing as computer security. Make your decisions accordingly :)
No, we can't.
A big part of the problem is all the tooling around git (like the default github UI) which hides diffs for binary files like these pseudo-"test" files. Makes them an ideal place to hide exploit data since comparatively few people would bother opening a hex editor manually.
How many people read autoconf scripts, though? I think those filters are symptom of the larger problem that many popular C/C++ codebases have these gigantic build files which even experts try to avoid dealing with. I know why we have them but it does seem like something which might be worth reconsidering now that the tool chain is considerably more stable than it was in the 80s and 90s.
How many people read build.rs files of all the transitive dependencies of a moderately large Rust project?
Autoconf is bad in this respect but it's not like the alternatives are better (maybe Bazel).
The alternatives are _better_ but still not great. build.rs is much easier to read and audit, for example, but it’s definitely still the case that people probably skim past it. I know that the Rust community has been working on things like build sandboxing and I’d expect efforts to be a lot easier there than in a mess of m4/sh where everyone is afraid to break 4 decades of prior usage.
Bazel has its problems but the readability is definitely better. And bazel BUILD files are quite constrained in what it can do.
I mean, autoconf is basically a set of template programs for snffing out whether a system has X symbol available to the linker. Any replacement for it would end up morphing into it over time.
Some things are just that complex.
We have much better tools now and much simpler support matrices, though. When this stuff was created, you had more processor architectures, compilers, operating systems, etc. and they were all much worse in terms of features and compatibility. Any C codebase in the 90s was half #ifdef blocks with comments like “DGUX lies about supporting X” or “SCO implemented Y but without option Z so we use Q instead”.
I don't see how showing the binary diffs would help. 99.99999% of people would just scroll right past them anyways.
Even in binary you can see patterns. Not saying it's perfect to show binary diffs (but it is better than showing nothing) but I know even my slow mammalian brain can spot obvious human readable characters in various binary encoding formats. If I see a few in a row which doesn't make sense, why wouldn't I poke it?
This particular file was described as an archive file with corrupted data somewhere in the middle. Assuming you wanted to scroll that far through a hexdump of it, there could be pretty much any data in there without being suspicious.
What should I look for? The evil bit set?
Sure, the same person who's gonna be looking is the same person who'd click "show diff"
00011900: 0000 4883 f804 7416 b85f 5f5f 5f33 010f ..H...t..____3.. │ 00011910: b651 0483 f25a 09c2 0f84 5903 0000 488d .Q...Z....Y...H. │ 00011920: 7c24 40e8 5875 0000 488b 4c24 4848 3b4c |$@.Xu..H.L$HH;L │ 00011930: 2440 7516 4885 c074 114d 85ff 0f84 3202 $@u.H..t.M....2. │ 00011940: 0000 498b 0ee9 2c02 0000 b9fe ffff ff45 ..I...,........E │ 00011950: 31f6 4885 db74 0289 0b48 8bbc 2470 1300 1.H..t...H..$p.. │ 00011960: 0048 85ff 0f85 c200 0000 0f57 c00f 2945 .H.........W..)E │ 00011970: 0048 89ac 2470 1300 0048 8bbc 2410 0300 .H..$p...H..$... │ 00011980: 0048 8d84 2428 0300 0048 39c7 7405 e8ad .H..$(...H9.t... │ 00011990: e6ff ff48 8bbc 24d8 0200 0048 8d84 24f0 ...H..$....H..$. │ 000119a0: 0200 0048 39c7 7405 e893 e6ff ff48 8bbc ...H9.t......H.. │ 000119b0: 2480 0200 0048 8d84 2498 0200 0048 39c7 $....H..$....H9. │ 000119c0: 7405 e879 e6ff ff48 8bbc 2468 0100 004c t..y...H..$h...L │ Please tell me what this code does, Sheldon
You're right - the two exploit files are lzma-compressed and then deliberately corrupted using `tr`, so a hex dump wouldn't show anything immediately suspicious to a reviewer.
Mea culpa!
Is this lzma compressed? Hard to tell because of the lack of formatting, but this looks like amd64 shellcode to me.
But that's not really important to the point - I'm not looking at a diff of every committed favicon.ico or ttf font or a binary test file to make sure it doesn't contain a shellcode.
testdata should not be on the same machine as the build is done. testdata (and tests generally) aren't as well audited, and therefore shouldn't be allowed to leak into the finished product.
Sure - you want to test stuff, but that can be done with a special "test build" in it's own VM.
In the Bazel build system, you would mark the test data blob as testonly=1. Then the build system guarantees that the blob can only be used in tests.
This incident shows that killing the autoconf goop is long overdue.
That could easily double build cost. Most open-source package repositories are not exactly in a position to splurge on their build infra.
in this case the backdoor was hidden in a nesting doll of compressed data manipulated with head/tail and tr, even replacing byte ranges inbetween. Would've been impossible to find if you were just looking at the test fixtures.
> "Given the activity over several weeks, the committer is either directly involved or there was some quite severe compromise of their system. Unfortunately the latter looks like the less likely explanation, given they communicated on various lists about the "fixes" mentioned above."
Crazy indeed.
So when are we going to stop pretending that OSS maintainers/projects are reaping what they sow when they "work for free" and give away their source code away using OSS licensed software, while large companies profit off of them? If they were paid more (or in some cases even actually paid), then they could afford to quit their day jobs, reducing burn out, they could actually hire a team of trusted vetted devs instead of relying on the goodwill of strangers who step up "just to help them out" and they could pay security researchers to vet their code.
Turns out burned out maintainers are a great attack vector and if you are willing to play the long game you can ingratiate yourself with the community with your seemingly innocuous contributions.
Paid people get burnt out as well and they are just as likely to accept free help as an unpaid person.
That's true, but many of these maintainers work a day job on top of doing the open source work precisely because the open source work doesn't pay the bills. If they could get back 40 hours of their time I think many would appreciate it
> So when are we going to stop pretending ...
I'm not sure that we are. Doesn't everybody know that developing/maintaining free software is largely thankless work, with little to no direct recompense?
I don't think moving towards unfree software is a good way to make free software more secure. It shouldn't be a surprise that proprietary software is less likely to be exploited in this way simply because they don't accept any patches from outside of the team. What you want is more people that understand and care about free software and low barriers to getting involved.
> Doesn't everybody know that developing/maintaining free software is largely thankless work, with little to no direct recompense?
No I don't think that is a universally acknowledged feeling. Numerous maintainers have detailed recieving entitled demands from users, as if they were paying customers to the open source software projects. Georges Stavracas' interview on the Tech over Tea podcast^1 describes many such experiences. Similarly, when Aseprite transitioned its license^2 to secure financial stability, it faced backlash from users accusing the developer of betraying and oppressing the community.
On the flipside, if everyone truly does know this is the case, then it's a shame that so many people know, and yet are unwilling to financially support developers to change that. See all of the developers for large open source projects who have day jobs, or take huge pay cuts to work on open source projects. I get that not everyone can support a project financially, but I've personally tried to break that habit of expecting everything I use to be free, and go out of my way to look for donation buttons for project maintainers, and raise awareness during fundraisers. Now if only I could donate directly to Emacs development... I'd encourage other people to do the same.
> What you want is more people that understand and care about free software and low barriers to getting involved.
This is tough. For example, the intention behind initiatives like DigitalOcean's Hacktoberfest, are designed to do just this. It is a good idea in theory, submit 4 pull requests and win a tshirt, but not in practice. The event has been criticized for inadvertently encouraging superficial contributions, such as minor text edits or trivial commits, which burden maintainers^3, causing many maintainers to just archive their repos for the month of October.
So, while there's a recognition of the need for more people who understand and value free software, along with lower barriers to entry, the current state of affairs often falls short. The path forward should involve not just increasing awareness and participation but also providing meaningful support and compensation to maintainers. By doing so, we can foster a more sustainable, secure, and vibrant open source community. Or at least that is how I feel...
1. https://www.youtube.com/watch?v=kO0V7BE1bEo 2. https://github.com/aseprite/aseprite/issues/1242 3. https://twitter.com/shitoberfest?lang=en
To be clear, I'm not at all against compensating developers for their work. I am not trying to argue that people do not need to be supported financially, or that you shouldn't donate, or that no one should be able to make a living working on free software, and so on.
What I'm saying is that paying people (or having a trusted security team) to work on software necessarily makes it less free. Note that "less free" doesn't mean worthless and absolutely free isn't the ideal.
Sorry, I used "everybody" to mean a subset of everybody -- the "we" that you referred to, or people generally involved in open source software development.
> it's a shame that so many people know, and yet are unwilling to financially support developers to change that.
Regardless of which set of everyone, this is undoubtedly the case. However, I'm not sure that paying (some of) the developers a wage is the best way to improve the software, particularly as free software.
> initiatives like DigitalOcean's Hacktoberfest, are designed to do just this.
You've got this backwards. Hacktoberfest is a scheme to pay (more) people to contribute (more) to open source projects. This is an example of why paying people to work on open source doesn't necessarily improve the software. It also doesn't lower any barriers, it just increases the incentive to overcome them.
So while this might increase the number of people contributing[0] to open source projects, it doesn't directly increase the number of people who understand and care about the specific project they're contributing to, let alone the broader free software movement.
In short, you can't pay people to care.
[0] according to their pretty weak metric for what contributing is
OSS maintainers aren't reaping anything. Most OSS licenses say the software is provided without warranty.
The discussion to upload it to Debian is interesting on its own https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1067708
Wow, that's a lot of anonymous accounts adding comments there urging for a fast merge!
And this "Hans Jansen" guy is apparently running around salsa.debian.org pushing for more updates in other projects as well: https://salsa.debian.org/users/hjansen/activity
>that's a lot of anonymous accounts
Just FYI, krygorin4545@proton.me (the latest message before the upload) was created Tue Mar 26 18:30:02 UTC 2024, about an hour earlier than the message was posted.
Proton generates PGP key upon creating the account, with the real datetime of the key (but the key does not include the timezone).
> running around salsa.debian.org pushing for more updates in other projects as well
This is quite common in most (all?) distributions. People are going through lists of outdated packages, updating them, testing them, and pushing them.
That account seems to be a contributor for xz though, you can see him interact a lot with the author of the backdoor on the GitHub repo. Some pull requests seem to be just the two of them discussing and merging stuff (which is normal but looks weird in this context)
And now we see why I don't trust anons, aliases, or anime characters to make contributions.
My GitHub says exactly who I am!
Even if they have a "real" picture or a credible description that is not good enough. Instead of using an anime character a malicious actor could use an image generator [0], they could generate a few images, obtain something credible to most folks, and use that to get a few fake identities going. Sadly, trusting people to be the real thing and not a fake identity on the Internet is difficult now and it will get worse.
You can quite easily generate a realistic photo, bio, even entire personal blogs and GitHub projects, using generative AI, to make it look like it's a real person.
With close to zero OSS participation rate you can just pick a real living person and just keep in sync with their LinkedIn.
It has been on the agenda for years to identify FOSS contributors with an id… Wet dream for authoritarians like you.
What would it solve when identity theft happens on a mass scale on a day to day basis?
It'd just ruin the life of some random person whose identity got stolen to create the account…
That name jumped out at me, Hans Jansen is the name Dominic Monaghan used when posing as a German interviewer with Elijah Woods. Not that it can't be a real person
Hans Gruber would have Been a much more stylish choice…
See comments about "Hans Janson" upthread, he appeared to collaborate on the exploit in other ways as well.
For anyone else feeling some deja vu about ifunc / Valgrind errors, this Red Hat issue [1] was previously linked from HN 12 days ago [2].
I get the feeling that a number of the comments are all the same person / group.
I'd love to be at Microsoft right now and have the power to review this user's connection history to Github, even though VPN exists, many things can be learned from connection habits, links to ISPs, maybe even guess if VPNs were used, roundtrip time on connections can give hints.
I really don't think some random guy wants to weaken ssh just to extract some petty ransomware cash from a couple targets.
> I really don't think some random guy wants to weaken ssh just to extract some petty ransomware cash from a couple targets.
Which is why there's probably nothing remotely interesting in them logs.
Intelligence agencies get caught red handed all the time so I wouldn't be too sure.
If it was an organised group I'm sure they were careful, of course, but it only takes one fuckup.
Nah. I'm sure Microsoft got a call from the alphabet boys and nobody, not even internal employees are allowed to look at the logs right now.
Oh my, another reason not to use github. :D So many reasons poping up just in this comment section alone.
I'm guessing Microsoft just got a call from the Government telling them not to look too deeply into it.
That’d be illegal for an employee to do.
Why?
You can't access personally identify information out of curiosity. That's a fireable offense in nearly any company that cares about privacy.
https://github.com/tukaani-project/tukaani-project.github.io...
> Note: GitHub automatically includes two archives Source code (zip) and Source code (tar.gz) in the releases. These archives cannot be disabled and should be ignored.
The author was thinking ahead! Latest commit hash for this repo: 8a3b5f28d00ebc2c1619c87a8c8975718f12e271
Btw, this is not the only project providing a source tarball different from the git repo, for example libusb also does this (and probably others):
- https://github.com/libusb/libusb/issues/1468#issuecomment-19...
It's very common in autoconf codebases because the idea is that you untar and then run `./configure ...` rather than `autoreconf -fi && ./configure ...`. But to do that either you have to commit `./configure` or you have to make a separate tarball (typically with `make dist`). I know because two projects I co-maintain do this.
It's common but it's plain wrong. A "release" should allow to build the project without installing dependencies that are only there for compilation.
Autotools are not guaranteed to be installed on any system. For example they aren't on the OSX runners of GitHub Action.
It's also an issue with UX. autoreconf fails are pretty common. If you don't make it easy for your users to actually use your project, you lose out on some.
> A "release" should allow to build the project without installing dependencies that are only there for compilation.
Like a compiler or some -devel packages?
> [...] A "release" should allow to build the project without installing dependencies that are only there for compilation.
Built artifacts shouldn't require build-time dependencies to be installed, yes, but we're talking about source distributions. Including `./configure` is just a way of reducing the configuration-/build-time dependencies for the user.
> Autotools are not guaranteed to be installed on any system. [...]
Which is why this is common practice.
> It's common but it's plain wrong.
Strong word. I'm not sure it's "plain wrong". We could just require that users have autoconf installed in order to build from sources, or we could commit `./configure` whenever we make a release, or we could continue this approach. (For some royal we.)
But stopping this practice won't prevent backdoors. I think a lot of people in this thread are focusing on this as if it was the source of all evils, but it's really not.
Whats the problem of running “autoreconf -fi” though?
Very strange argument. It’s like saying our source release only contains a prebuilt binary, otherwise the user has to run “make”.
If that’s such a big hassle for your downstream consumers, maybe one should use something better than autoconf in the first place.
Autotools are not backwards-compatible. Often only a specific version of autotools works. Only the generated configure is supposed to be portable.
It's also not the distribution model for an Autotools project. Project distributions would include a handwritten configure file that users would run: The usual `./configure && make && make install`. Since those configure scripts became more and more complex for supporting diverse combinations of compiler and OS, the idea of autotools was for maintainers to generate it. It was not meant to be executed by the user: https://en.wikipedia.org/wiki/GNU_Autotools#Usage
For running autoreconf you need to have autotools installed and even then it can fail.
I have autotools installed and despite that autoreconf fails for me on the xz git repository.
The idea of having configure as a convoluted shell script is that it runs everywhere without any additional. If it isn't committed to the repository you're burdening your consumers with having compilation dependencies installed that are not needed for running your software.
For a long time, there was one legitimately annoying disadvantage to the git-generated tarballs though - they lost tagging information. However, since git 2.32 (released June 2021; presumably available on GitHub by August 2021 when they blogged about it) you can use `$Format:%(describe)$` ... limited to once per repository for performance reasons.
Except this change was made in 2023, it is just scary how good this threat actor was.
I believe they also do not include sub modules, which is a big disadvantage for some projects
Yes. Also, GitHub recently made some upgrades that forced checksum changes on the autogenerated archives: https://github.blog/changelog/2023-01-30-git-archive-checksu...
Jia Tan "cleaned up" in all their ZSTD branches some hours ago, probably hiding something https://github.com/JiaT75/zstd/branches/all
GitHub/Microsoft likely has a backup. I’d be getting those out about now.
Bad move. Destroying evidence is a felony.
If only you could prosecute people in adversarial countries for a felony, lol.
You can if you can get them extradited (from any country, not just their home country).
If you are this deep into it, it doesn't matter.
Not everywhere, and only if you can prove that were evidences :)
Comment from Andres Freund on how and why he found it [0] and more information on the LWN story about the backdoor. Recommend people read this to see how close we came (and think about what this is going to mean for the future).
That man deserves a Nobel Prize
A mirror of the offending repository created by someone else is available at [1]. GitHub should be keeping the evidence in the open (even if just renamed or archived in a safer format) instead of deleting it/hiding it away.
The offending tarball for v5.6.1 is easier to find, an example being.[2]
m4/.gitignore was updated 2 weeks ago to hide build-to-host.m4 that is only present in the release tarball and is used to inject the backdoor at build time.[3]
[1] https://git.phial.org/d6/xz-analysis-mirror
[2] https://mirrors.xtom.ee/gentoo/distfiles/9f/xz-5.6.1.tar.gz
[3] https://git.phial.org/d6/xz-analysis-mirror/commit/4323bc3e0...
This gist summarizes the current situation very well: https://gist.github.com/thesamesam/223949d5a074ebc3dce9ee78b...
Definitely looking like they were most likely some sort of state actor. This is very well done and all in plain sight. It's reassuring that it was discovered but given a simple audit of the release build artifacts would have raised alarms, how prevalent is this behavior in other projects? Terrifying stuff.
A lot of eyes will be dissecting this specific exploit, and investigating this specific account, but how can we find the same kind of attack in a general way if it’s being used in other projects and using other contributor names?
1. Everything must be visible. A diff between the release tarball and tag should be unacceptable. It was hidden from the eyes to begin with.
2. Build systems should be simple and obvious. Potentially not even code. The inclusion was well hidden.
3. This was caught through runtime inspection. It should be possible to halt any Linux system at runtime, load debug symbols and map _everything_ back to the source code. If something can't map back then regard it as a potentially malicious blackbox.
There has been a strong focus and joint effort to make distributions reproducible. What we haven't managed though is prove that the project compromises only of freshly compiled content. Sorta like a build time / runtime "libre" proof.
This should exist for good debugging anyway.
It wouldn't hinder source code based backdoors or malicious vulnerable code. But it would detect a backdoor like this one.
Just an initial thought though, and probably hard to do, but not impossibly hard, especially for a default server environment.
Build-related fixes are only treating the symptoms, not the disease. The real fix would be better sandboxing and capability-based security[1] built into major OSes which make backdoors a lot less useful. Why does a compression library have the ability to "install an audit hook into the dynamic linker" or anything else that isn't compressing data? No amount of SBOMs, reproducible builds, code signing, or banning binaries will change the fact that one mistake anywhere in the stack has a huge blast radius.
[1]: https://en.wikipedia.org/wiki/Capability-based_security
That's why I always raise concerns about JEP 411 - removal of SecurityManager from Java without any replacement.
Just ban autotools
Note that the malicious binary is fairly long and complex.
This attack can be stopped by disallowing any binary testdata or other non-source code to be on the build machines during a build.
You could imagine a simple process which checks out the code, then runs some kind of entropy checker over the code to check it is all unminified and uncompressed source code, before finally kicking off the build process.
autogenerated files would also not be allowed to be in the source repo - they're too long and could easily hide bad stuff. Instead the build process should generate the file during the build.
This requires a more comprehensive redesign of the build process. Most Linux distributions also run the tests of the project they're building as part of the build process.
The code that runs during testing should not be allowed to affect the package though. If this is possible, this is misdesigned.
Profile guided optimization is, unfortunately, wildly powerful. And it has a hard requirement that a casual link exists from test data (or production data!) to the build process.
We should be able to produce a tar and a proof that tar was produced from a specific source code.
Quote from the article:
That line is not in the upstream source of build-to-host, nor is build-to-host used by xz in git.
Zero Knowledge virtual machines, like cartesi.io, might help with this. Idea is to take the source, run a bunch of computational steps (compilation & archiving) and at the same time produce some kind of signature that certain steps were executed.The verifiers can then easily check that the signature and indeed be convinced that the code was executed as it is claimed and source code wasn't tampered with.
The advantage of Zero-Knowledge technology in this case is that one doesn't need to repeat the computational steps themselves nor rely on a trusted party to do it for them (like automated build - that can also be compromised by the state actors). Just having the proof solves this trust problem mathematically: if you have the proof & the tar, you can quickly check source code that produced the tar wasn't modified.
I don’t think zero knowledge systems are practical at the moment. It will take over around 8 orders of magnitude more compute and memory to produce a ZKP proof of generic computation like compilation. Even 2 orders of magnitude is barely acceptable.
I've been told verifiable builds are possible already, I don't know how practical though:
twitter.com/stskeeps/status/1774019709739872599
The Guix full source bootstrap is looking less paranoid as time goes on
I haven't looked at Guix but in the discussions around this exploit for NixOS they mentioned that regenerating autoshit for xz-utils would not be something they can/want to do because that would add a lot more dependencies to the bootstrap before other packages can be build. Kind of funny how a requirement for bootstrapped builds can add a requirement for trusting not-quite-binaries-but-also-not-really-source blobs.
More reproducible builds, maybe even across distributions? Builds based on specific commits (no tarballs like in this case), possibly signed (just for attribution, not for security per se)? Allow fewer unsafe/runtime modifications The way oss-fuzz ASAN was disabled should've been a warning on its own, if these issues weren't so common.
I'm not aware of any efforts towards it, but libraries should also probably be more confined to only provide intended functionality without being able to hook elsewhere?
NixOS/Pkgs 23.11 unaffected, unstable contains backdoored implementations (5.6.0, 5.6.1) but their OpenSSH sshd does not seem to link against systemd/liblzma, and the backdoor doesn't get configured in (only happens on .deb/.rpm systems).
It may not have really mattered much for NixOS:
> b) argv[0] needs to be /usr/sbin/sshd
For once, the lack of FHS interoperability is a benefit, if only on accident.
Right, but in this case it's not even compiled it, which is arguably better than compiled in but assumed dormant :) (at least until someone actually does a full analysis of the payload).
Note that NixOS has a unique advantage in that `dlopen` is easier to analyze, but you do have to check for it. A lot of people are looking only at `ldd` and missing that they can be vulnerable at runtime.
Not affected by the latest CVE, but the author had unrestricted access to xz for 2 years, so I would say it is affected until the other contributions are proven safe (never gonna happen) or it reverts to pre-adversarial actor version.
That's one of the advantages of NixOS - viruses and mass hacks have lesser chance to function due to how different this OS is. Until it gets more popular, of course.
It's actually not an advantage. The reason why the exploit wasn't included is because the attacker specifically decided to only inject x86_64 Debian and RHEL to reduce the chances of this getting detected.
Then it's an actual advantage.
That's just security by obscurity, not something I'd consider a good security measure.
I looked at the differences between the GitHub repository and released packages. About 60 files are in a release package that are not in the repo (most are generated files for building) but also some of the .po files have changes.
That's devastating.
If you don't build your release packages from feeding "git ls-files" into tar, you are doing it wrong.
I think this is unfortunately very common practice
Why not `git archive`?
Because I didn't know about it.
Although if I look at its documentation, it's already a somewhat complicate invocation with unclear effects (lots of commandline options). Git seems to not be able to do KISS.
git ls-files and tar is a simple thing everybody understands and can do without much issues.
The latest commit from the user who committed those patches is weirdly a simplification of the security reporting process, to not request as much detail:
https://github.com/tukaani-project/xz/commit/af071ef7702debe...
Not sure what to make of this.
I think the reason is pretty obvious. They want you to waste more time after you've submitted the security report and maximize the amount of back and forth. Basically the hope is that they'd be able to pester you with requests for more info/details in order to "resolve the issue" which would give them more time to exploit their targets.
That repository is now disabled. But here's a similar change to the .github repository of tukaani-project from @JiaT75 to the bug report template:
+ or create a private Security Advisory instead.
Under a commit titled "Wrap text on Issue template .yaml files."[1] https://github.com/tukaani-project/.github/commit/44b766adc4...
Potentially the purpose is that if someone goes to the effort to get those details together, they are more likely to send the same report to other trusted individuals. Maybe it was originally there to add legitimacy, then they got a report sent in, and removed it to slow the spread of awareness
> Affected versions of XZ Utils
Most people, to find the affected versions, would either have to bisect or delve deep enough to find the offending commit. Either of which would reveal the attacker.
By not asking for the version, there is a good chance you just report "It's acting oddly, plz investigate".
> "Docs: Simplify SECURITY.md."
https://github.com/tukaani-project/xz/commit/af071ef7702debe...
Removes instructions about details relevant to security reports. Heh, nice one.
It looks like the person who added the backdoor is in fact the current co-maintainer of the project (and the more active of the two): https://tukaani.org/about.html
In various places they say Lasse Collin is not online right now, but he did make commits a week ago https://git.tukaani.org/?p=xz.git;a=summary
Makes me wonder if he's an owner of the github organization, and what happens with it now?
Why has Github disabled the (apparently official) xz repository, but left the implicated account open to the world? It makes getting caught up on the issue pretty difficult, when GitHub has revoked everyone's access to see the affected source code.
https://github.com/tukaani-project/xz vs https://github.com/JiaT75
The account has been suspended for a while, but for whatever reason that's not displayed on the profile itself (can be seen at https://github.com/Larhzu?tab=following). Repo being disabled is newer, and, while annoying and realistically likely pointless, it's not particularly unreasonable to take down a repository including a real backdoor.
Taking down the repo prevents more people inadvertendly pulling and building the backdoor so that makes sense. They should have immediately rehosted and archived the state at a different URL which makes it clear to not use it.
The author (Jia Tan) also changed the xz.tukaani.org (actually the github.io, where the main contributor is, surprise, also them) release description to state all new releases are signed by their OpenPGP key. I'd guess that was one of the first steps to a complete project takeover.
I hope Lasse Collin still has control of his accounts, though the CC on the kernel mailing list looks kind of suspicious to me.
The backdoor is not in the C source directly, but a build script uses data from files in the test dir to only create the backdoor in the release tars. Did I summarize that correctly?
That's how I understand it. A build script that's in the releases tarballs but not the git repo, checks to see if it's being run as part of the debian/build or rpm build processes, and then injects content from one of the "test" files.
I could imagine another similar attack done against an image processing library, include some "test data" of corrupted images that should "clean up" (and have it actually work!) but the corruption data itself is code to be run elsewhere.
"Amazon Linux customers are not affected by this issue, and no action is required. AWS infrastructure and services do not utilize the affected software and are not impacted. Users of Bottlerocket are not affected."
https://aws.amazon.com/security/security-bulletins/AWS-2024-...
The best part is everyone disabling security tests that started failing
I read through the entire report and it gradually got more interesting. Then, I got to the very end, saw Andres Freund's name, and it put a smile on my face. :)
Who else would have run a PostgreSQL performance benchmark and discover a major security issue in the process?
This is another proof that systemd is an anti-pattern for security: with its crawling and ever growing web of dependencies, it extends the surface of vulnerability to orders of magnitude, and once embraced not even large distro communities can defend you from that.
A malware code injection in upstream xz-tools is a vector for remote exploitation of the ssh daemon due to a dependency on systemd for notifications and due to systemd's call to dlopen() liblzma library (CVE-2024-3094). The resulting build interferes with authentication in sshd via systemd.
Please take the systemd trolling to Reddit. They likely targeted xz specifically because it’s so widely used but there are dozens of other libraries which are potential candidates for an attack on sshd, much less everything else which has a direct dependency unrelated to systemd (e.g. dpkg).
Rather than distracting, think about how the open source projects you use would handle an attack like this where someone volunteers to help a beleaguered maintainer and spends time helpfully taking on more responsibilities before trying to weaken something.
Those other libraries dependend on by sshd are hopefully more closely monitored. The upstream sshd developers probably did not even consider that liblzma could end up being loaded in the process.
Make excuses for systemd all you want but loading multiple additional libraries into crytical system deamons just to write a few bytes into a socket is inexcusable and directly enabled this attack vector.
You are distracting from facts with speculations and trolling FUD. I refer to what is known and has happened, you are speculating on what is not known.
Your claim is an appeal to emotion trying to build support for a position the Linux community has largely rejected. Starting with the goal rather than looking unemotionally at the facts means that you’re confusing your goal with the attackers’ – they don’t care about a quixotic attempt to remove systemd, they care about compromising systems.
Given control of a package which is on most Linux systems and a direct dependency of many things which are not systemd - run apt-cache rdepends liblzma5! – they can choose whatever they want to accomplish that goal. That could be things like a malformed archive which many things directly open or using something similar to this same hooking strategy to compromise a different system component. For example, that includes things like kmod and dpkg so they could target sshd through either of those or, if their attack vector wasn’t critically dependent on SSH, any other process running on the target. Attacking systemd for this is like saying Toyotas get stolen a lot without recognizing that you’re just describing a popularity contest.
Actually you have a point. A collection of shell scripts (like the classical init systems) have obviously a smaller attack surface. In this case the attacker used some integration code with systemd to attack the ssh daemon. So sshd without systemd integration is safe against this specific attack.
In general, I’m not convinced that systemd makes things less secure. I have the suspicion that the attacker would just have used a different vector, if there was no systemd integration. After all it looks like the attacker was also trying to integrate exploits in owner libraries, like zstd.
Still I would appreciate it, if systemd developers would find a better protection against supply chain attacks.
It’s also tricky to reason about risk: for example, ShellShock caused a bunch of vulnerabilities in things which used shell scripts and the classic SysV Init system was a factor in a ton of vulnerabilities over the years because not having a standard way to do things like drop privileges or namespace things, manage processes, difficulties around managing chroot, etc. meant that you had a bunch of people implementing code which ran with elevated privileges because it needed to do things like bind to a low network port and they either had vulnerabilities in the privileged part or messed up some detail. I think in general it’s been much better in the systemd era where so much of that is builtin but I have been happy to see them starting to trim some of the things like the compression format bindings and I expect this will spur more.
I really appreciate your tone and dialectic reasoning, thanks for your reply. And yes, as simple as it sounds, I believe that shell scripts help a lot to maintain mission critical tools. One hands-on example is https://dyne.org/software/tomb where I took this approach to replace whole disk encryption which is nowadays also dependent on systemd-cryptsetup.
Is this an example of a simple and clean solution via shell script? I have some stylistic doubts:
1. What "exitcode" is set for:
exitcode=1
exit 1
2. I see a lot of "return $?". Why "$?" is returned if by default the shell returns the return value of the last command? Just ti name a few: lklfuse -o type=ext4 "${loop}" "$mnt"
return $?
...
veracrypt --text --non-interactive -d "$file"
return $?
...
mount "$loop" "$mnt"
return $?
3. Aren't =, != etc. used to compare strings and -eq, -ne, -gt etc. used to compare numbers? I see lot of numbers compared as strings, e.g.: [ $? = 0 ]
[ $? != 0 ]
[ $exitcode = 0 ]
4. There are lot of "cat <<EOF" blocks without indentation. I understand that this is made because the shell expects "EOF" on the line start, but there is a special syntax designed on purpose for this use case, simply put a dash between << and the token, e.g. "cat <<-EOF".
In this case: tomb_init() {
system="`uname -s`"
case "$system" in
FreeBSD)
cat <<-EOF
create=posix_create
format=posix_format
map=posix_map
mount=freebsd_mount
close=freebsd_close
EOF
;;
Linux)
5. Aren't backtick deprecated in favor of $()?thanks for your review! tho you are referring to the tomb-portable unfinished experiment which is about to be dismissed since cross-platform experiments with veracrypt show very bad performance.
you are welcome to share a review of the tomb script, but be warned in that we use a lot of zsh specific features. It is a script that works since 15+ years so it has a discrete amount of patchwork to avoid regressions.
This isn't Twitter you don't have to use hashtags
This isn't Xitter, you don't have to tell people how to write.
> systemd's call to dlopen() liblzma library (CVE-2024-3094)
That's technically wrong, but no surprise. Anti-systemd trolls usually don't understand technical details after all.
It is 10 and more years that I experience such ad-hominem attacks.
You are so quickly labeling an identifiable professional as troll, while hiding behind your throwaway identity, that I am confident readers will be able to discern.
Meanwhile let us be precise and add more facts https://github.com/systemd/systemd/pull/31550
Our community is swamped by people like you, so I will refrain from answering further provocations, believing I have provided enough details to back my assertion.
Can you explain in simple words that even a non programmer can understand, what the linked PR does?
That MR is not part of any released version of systemd. That is simply to verify: there has been no new systemd release.
So much for the "facts".
As for trolling: just look at the usual contributions from your community like https://twitter.com/DevuanOrg/status/1619013961629995008 Excellent work with the ad-hominem attacks there.
It's already accepted and merged into master so it will be released in a future systemd release. What's your point?
LP in da house?
For bad-3-corrupt_lzma2.xz, the claim was that "the original files were generated with random local to my machine. To better reproduce these files in the future, a constant seed was used to recreate these files." with no indication of what the seed was.
I got curious and decided to run 'ent' https://www.fourmilab.ch/random/ to see how likely the data in the bad stream was to be random. I used some python to split the data into 3 streams, since it's supposed to be the middle one that's "bad":
I used this regex to split in python, and wrote to "tmp":
re.split(b'\xfd7zXZ', x)
I manually used dd and truncate to strip out the remaining header and footer according to the specification, which left 48 bytes: $ ent tmp2 # bad file payload
Entropy = 4.157806 bits per byte.
Optimum compression would reduce the size
of this 48 byte file by 48 percent.
Chi square distribution for 48 samples is 1114.67, and randomly
would exceed this value less than 0.01 percent of the times.
Arithmetic mean value of data bytes is 51.4167 (127.5 = random).
Monte Carlo value for Pi is 4.000000000 (error 27.32 percent).
Serial correlation coefficient is 0.258711 (totally uncorrelated = 0.0).
$ ent tmp3 # urandom
Entropy = 5.376629 bits per byte.
Optimum compression would reduce the size
of this 48 byte file by 32 percent.
Chi square distribution for 48 samples is 261.33, and randomly
would exceed this value 37.92 percent of the times.
Arithmetic mean value of data bytes is 127.8125 (127.5 = random).
Monte Carlo value for Pi is 3.500000000 (error 11.41 percent).
Serial correlation coefficient is -0.067038 (totally uncorrelated = 0.0).
The data does not look random. From https://www.fourmilab.ch/random/ for the Chi-square Test, "We interpret the percentage as the degree to which the sequence tested is suspected of being non-random. If the percentage is greater than 99% or less than 1%, the sequence is almost certainly not random. If the percentage is between 99% and 95% or between 1% and 5%, the sequence is suspect. Percentages between 90% and 95% and 5% and 10% indicate the sequence is “almost suspect”."Now to be fair, such an archive could have been created with a “store” level of compression that doesn’t actually perform any compression.
My reading of the commit message is they're claiming the "data" should look random.
All these older (4.x, 5.0.x etc) releases that were suddenly uploaded a few months ago should probably also be considered suspect: https://github.com/tukaani-project/tukaani-project.github.io...
Here's a handy bash script I threw together to audit any docker containers you might be running on your machine. It's hacky, but will quickly let you know what version, if any, of xz, is running in your docker containers.
``` #!/bin/bash
# Get list of all running Docker containers containers=$(docker ps --format "{{.Names}}")
# Loop through each container for container in $containers; do # Get container image image=$(docker inspect --format='{{.Config.Image}}' "$container")
# Execute xz --version inside the container
version=$(docker exec "$container" xz --version)
# Write container name, image, and command output to a text file
echo "Container: $container" >> docker_container_versions.txt
echo "Image: $image" >> docker_container_versions.txt
echo "xz Version:" >> docker_container_versions.txt
echo "$version" >> docker_container_versions.txt
echo "" >> docker_container_versions.txt
doneecho "Output written to docker_container_versions.txt" ```
Sadly this is exactly one of the cases where open source is much more vulnerable to a state actor sponsored attack than proprietary software. (it is also easier to find such backdoors in OS software but that's BTW)
Why? Well, consider this, to "contribute" to a proprietary project you need to get hired by a company, go through their he. Also they have to be hiring in the right team etc. Your operative has to be in a different country, needs a CV that checks out, passports/ids are checked etc.
But to contribute to an OS project? You just need an email address. Your operative sends good contributions until they build trust, then they start introducing backdoors in the part of the code "no one, but them understands".
The cost of such attack is a lot lower for a state actor so we have to assume every single OS project that has a potential to get back doored had many attempts of doing so. (proprietary software too, but as mentioned, this is much more expensive)
So what is the solution? IDK, but enforcing certain "understandability" requirements can be a part of it.
Is that true? Large companies producing software usually have bespoke infra, which barely anyone monitors. See: the Solarwinds hack. Similarly to the xz compromise they added the a Trojan to the binary artifacts by hijacking the build infrastructure. According to Wikipedia "around 18,000 government and private users downloaded compromised versions", it took almost a year for somebody to detect the trojan.
Thanks to the tiered updates of Linux distros, the backdoor was caught in testing releases, and not in stable versions. So only a very low percentage of people were impacted. Also the whole situation happened because distros used the tarball with a "closed source" generated script, instead of generating it themselves from the git repo. Again proving that it's easier to hide stuff in closed source software that nobody inspects.
Same with getting hired. Don't companies hire cheap contractors from Asia? There it would be easy to sneak in some crooked or even fake person to do some dirty work. Personally I was even emailed by a guy from China who asked me if I was willing to "borrow" him my identity so he could work in western companies, and he would share the money with me. Of course I didn't agree, but I'm not sure if everybody whose email he found on Github did.
https://en.wikipedia.org/wiki/2020_United_States_federal_gov...
> Well, consider this, to "contribute" to a proprietary project you need to get hired by a company, go through their he.
Or work for a third-party company that gets access to critical systems without any checks. See for example the incident from 2022 here: https://en.wikipedia.org/wiki/Okta,_Inc.
Or a third-party that rents critical infrastructure to the company (Cloud, SaaS solutions).
Or exactly this kind of backdoor in open source but target proprietary software. I don't know of any survey but I'd be surprised if less than half of proprietary software used open source software one way or another and not surprised if it was quite a bit more than that.
It's wild that this could have laid dormant for far longer if the exploit was better written-- if it didn't spike slow down logins or disturb valgrind.
So many security companies publishing daily generic blog posts about "serious supply chain compromises" in various distros on packages with 0 downloads, and yet it takes a developer debugging performance issues to find an actual compromise.
I worked in the software supply chain field and cannot resist feeling the entire point of that industry is to make companies pay for a security certificate so you can shift the blame onto someone else when things go wrong.
> cannot resist feeling the entire point of that industry is to make companies pay for a security certificate so you can shift the blame onto someone else when things go wrong.
That's the entire point. You did everything you could by getting someone else look at it and saying it's fine.
This needs a Rust joke. You know, the problem with the whole certification charade is it slows down jobs and prevents __actual_problems getting evaluated. But is it safe?
Thats basically the whole point actually... A company pays for insurance for the business. The insurance company says sure we will insure you, but you need to go through audits A B and C, and you need certifications X and Y to be insured by us. Those audits are often industry dependent, mostly for topics like HIPAA, PCI, SOC, etc.
Insurance company hears about supply chain attacks. Declares that insured must have supply chain validation. Company goes and gets a shiny cert.
Now when things go wrong, the company can point to the cert and go "it wasnt us, see we have the cert you told us to get and its up to date". And the company gets to wash their hands of liability (most of the time).
What you describe is a normal process in order to minimise damage from attacks. The damage of hacking is ultimately property damage. The procedures you've described allow you to minimise it.
And that's a good thing.
> And the company gets to wash their hands of liability (most of the time).
Certification theater.
It's completely performative.
If you installed xz on macOS using brew, then you have
xz (XZ Utils) 5.6.1
liblzma 5.6.1
which are within the release target for the vuln. As elsewhere in these comments, people say macOS effect is uncertain. If concerned you can revert to 5.4.6 with brew upgrade xz
Similarly if you're using MacPorts, make sure to sync and upgrade xz if you have it installed.
5.6.1 was available for a few days and just rolled back ~20 minutes ago: https://github.com/macports/macports-ports/commit/a1388aee09...
Thank you for this tip. `brew upgrade xz` worked.
I was going to uninstall but it's used by so many things…
brew uninstall xz
Error: Refusing to uninstall /opt/homebrew/Cellar/xz/5.6.1
because it is required by aom, composer, curl, ffmpeg, gcc, gd, ghostscript, glib, google-cloud-sdk, grc, harfbuzz, httpie, img2pdf, jbig2enc, jpeg-xl, leptonica, libarchive, libavif, libheif, libraw, libtiff, libzip, little-cms2, numpy, ocrmypdf, openblas, openjpeg, openvino, php, pillow, pipx, pngquant, poppler, python@3.11, python@3.12, rsync, tesseract, tesseract-lang, unpaper, webp, wp-cli, yt-dlp and zstd, which are currently installed.
You're welcome!
It's been reverted now: https://github.com/Homebrew/homebrew-core/blob/9a0603b474804...
Yeah it was when I posted the comment too. That's why you could type brew upgrade xz and it went back to 5.4.6 I guess? But it might have been around that time, cutting it fine, not out for everybody. I don't know. Comment race condition haha! :)
> the entire point of that industry is to make companies pay for a security certificate so you can shift the blame onto someone else when things go wrong.
That is actually a major point of a lot of corporate security measures (shifting risk)
And that's a good thing.
That's the entire point of certification, and any certification at all. Certification does not guarantee performance. Actually, I would always cast a suspect glance to anyone who is FOCUSED on getting certification after certification without any side project.
List of pull request requesting the updating to liblzma 5.6.0 [0]
I wonder what amount of scrutiny all the accounts that proposed the upgrade should be put under.
[0] https://github.com/search?q=liblzma+5.6.0&type=pullrequests
When I search for "digital masquerade" on Google, the first result is a book with this title from the author Jia Tan. I assume that is how the attackers got their fake name. Or they think using this author's name is a joke.
A lot of software (including https://gitlab.com/openconnect/openconnect of which I'm a maintainer) uses libxml2, which in turn transitively links to libzma, using it to load and store compressed XML.
I'm not *too* worried about OpenConnect given that we use `libxml2` only to read and parse uncompressed XML…
But I am wondering if there has been any statement from libxml2 devs (they're under the GNOME umbrella) about potential risks to libxml2 and its users.
This doesn't matter, if libxml2 loads .so and the library is malicious, you are already potentially compromised, as it is possible to run code on library load.
> only to read and parse uncompressed XML…
how does libxml2 know to decompress something?
does it require you, as the caller, to explicitly tell it to?
or does it look at the magic bytes or filename or mimetype or something?
> how does libxml2 know to decompress something? > > does it require you, as the caller, to explicitly tell it to?
In the entry point/function that we use, `xmlReadMemory` (https://gnome.pages.gitlab.gnome.org/libxml2/devhelp/libxml2...), it doesn't handle compressed XML at all.
But there are indeed others where it attempts to auto-detect compression, although as I understand it from the docs only ZLib compression is autodetected… though I suspect these may be out-of-date and it may autodetect any/all compiled -in compression algorithms.
Regardless, the fact that it links with liblzma is cause for concern, given the mechanism of operation of the liblzma/xz backdoor.
Potentially malicious commit by same author on libarchive: https://github.com/libarchive/libarchive/pull/1609
Summary: "The upstream xz repository and the xz tarballs have been backdoored."
It is known to be in version 5.6.0 and 5.6.1, and the obfuscated code is found in the test directory.
Since GitHub disabled the repos.. I uploaded all GitHub Events from the two suspected users and from their shared project repo as easy to consume CSV files:
https://github.com/emirkmo/xz-backdoor-github
For those who want to see the GitHub events (commits, comments, pull_requets, diffs, etc.)
Better make a torrent out of them.
Very strange behavior from the upstream developers. Possible government involvement? I have a feeling LANG is checked to target servers from particular countries
One thing to note is that the person that added the commits only started contributing around late 2022 and appears to have a Chinese name. Might be required by law to plant the backdoor.
That would be quite scary considering they have contributed to a wide variety of projects including C++ https://learn.microsoft.com/en-us/cpp/overview/whats-new-cpp...
I don't think you need to worry about the C++ contribution: https://github.com/MicrosoftDocs/cpp-docs/commit/9a96311122a...
This does make me wonder how much they made a deliberate effort to build an open source portfolio so they’d look more legitimate when time came to mount an attack. It seems expensive but it’s probably not really much at the scale of an intelligence agency.
What's the salary for a software engineer in urban China? 60-80k/yr USD? Two years of that salary is cheaper than a good single shoulder fired missile. Seems like a pretty cheap attack vector to me. A Javelin is a quarter million per pop and they can only hit one target.
If I were doing it, I'd have a number of these "burner committers" ready to go when needed.
If I were doing it AND amoral, I'd also be willing to find and compromise committers in various ways.
Until you figure there are very subtle unicode changes in the URL that don’t diff on GitHub. :)
> appears to have a Chinese name
Given the complexity of the attack, I'd assume the name is fake.
The contribution to C++ is just a simple markdown change: https://github.com/MicrosoftDocs/cpp-docs/pull/4716 C++ is fine.
Also it's not a contribution to C++ but only one of Microsoft's projects around C++.
I would think a Chinese state hacker would not assume a Chinese name, just in case he was discovered like now.
But your reaction being a common one would make a Chinese name the best pick for a Chines agent wanting to hide their country affiliation.
[flagged]
No one is being “required by law” to add vulnerabilities, it’s more likely they are foreign agents to begin with.
Depends on the law and where you are. Publicly we have https://www.eff.org/issues/national-security-letters/faq and it's likely that other requests have occurred from time to time, even in the USA.
> No one is being “required by law” to add vulnerabilities
This is absolutely not the case in many parts of the world.
LANG only needs to have some value, the concrete value does not seem to matter.
If you have a recently updated NixOS unstable it has the affected version:
$ xz --version
xz (XZ Utils) 5.6.1
liblzma 5.6.1
EDIT: I've been informed on the NixOS matrix that they are 99% sure NixOS isn't affected, based on conversations in #security:nixos.orgPersonally, I use lzip ever since I read https://www.nongnu.org/lzip/xz_inadequate.html Seems like the complexity of XZ has backfired severely, as expected.
> Seems like the complexity of XZ has backfired severely, as expected.
this is a very bad reading of the current situation.
This kind of shallow dismissal is really unhelpful to those of us trying to follow the argument. You take a tone of authoritative expert, without giving any illuminating information to help outsiders judge the merit of your assertion. Why is it a very bad reading of the current situation? What is a better reading?
I am not sure I agree that every low quality post needs a detailed rebuttal? HN couldn't function under such rules.
as to the specific comment:
> Seems like the complexity of XZ has backfired severely, as expected.
to summarise: someone found a project with a vulnerable maintenance situation, spent years getting involved in a project, then got commit rights, and then commited a backdoor in some binaries and the build system, then got sock puppets to agitate for OSes to adopt the backdoored code.
the comment I replied to made a "shallow" claim of complexity without any details, so let's look at some possible interpretations:
- code complexity - doesn't seem super relevant - the attacker hide a highly obfuscated backdoor in a binary test file and committed it - approximately no one is ever going to catch such things without a process step of requiring binaries be generatable in a reasonable-looking and hopefully-hard-to-backdoor kind of way. cryptographers are good at this: https://en.wikipedia.org/wiki/Nothing-up-my-sleeve_number
- build complexity - sure, but it's auto*, that's very common.
- organisational complexity - the opposite is the case. it had one guy maintaining it, who asked for help.
- data/file format complexity - doesn't seem relevant unless it turns out the obfuscation method used was particularly easy for this format, but even in that case, you'd think others would be vulnerable to something equivalent
perhaps OP had some other thing in mind, but then they could have said that, instead of making a crappy comment.
To summarize the article, the back door is introduced through build scripts and binaries distributed as “test” data. Very little to do with the complexity or simplicity of xz; more that it was a dependency of critical system binaries (systemd) and ripe for hostile takeover of the maintainer role.
Totally agree. This aggressive stance about xz suddenly is not even helpful to anyone. xz has been and will always be my preferred compression algorithm for times to come, despite this pitfall of really insane levels of social engineering. I feel for the author having burn out and such, but in all fairness, xz is one of the best compression formats of today's time and still going.
Introducing a back door is not the same thing as a badly designed file format.
This potentially could be a full automated rootkit type breach right? Great - is any system with 5.6.1 possibly vulnerable?
Also super weird a contributor thought they could slip this in and not have it be noticed at some point. It may point to burning that person (aka, they go to jail) for whatever they achieved with this. (And whoever they are…)
This was only a matter of time. Open source projects are under-staffed, maintainers are overworked and burned out, and everyone relies on the goodwill of all actors.
Obviously a bad actor will make use of these conditions and the assumption of good will.
We need automated tooling to vet for stuff like this. And maybe migrate away from C/C++ while we are at it because they don't make such scanning easy at all.
Wouldn’t be surprised that the ssh auth being made slower was deliberate - that makes it fairly easy to index all open ssh servers on the internet, then to see which ones get slower to fail preauth as they install the backdoor
people are mis-reading the Debian bug report: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1067708
it wasn't the apparently newly-created identity "Hans Jansen" just asking for a new version to be uploaded, it was "Hans Jansen" providing a new version to be uploaded as a non-maintainer-upload - Debian-speak for "the maintainer is AWOL, someone else is uploading their package". if "Hans Jansen" is another attacker then they did this cleverly, providing the new - compromised - upstream tarballs in an innocent-looking way and avoiding anyone examining the upstream diff.
Looking at how many requests to update to the backdoored version have been made, I wonder if the fact that many people (including developers) have been conditioned to essentially accept updates as "always-good" is a huge contributing factor in how easy it is to spread something like this.
The known unknowns can be better than the unknown unknowns.
Totally agree. With things like Dependabot encouraged by GitHub, people now get automated pull requests for dependency updates, increasing the speed of propagation of such vulnerabilities.
Looks like GitHub has suspended access to the repository, which while it protects against people accidentally compiling and using the code, but certainly complicates forensic analysis for anyone who doesn't have a clone or access to history (which is what I think a lot of people will be doing now to understand their exposure).
It looks like git clone https://git.tukaani.org/xz.git still works for now (note: you will obviously be cloning malware if you do this) - that is, however, trusting the project infrastructure that compromised maintainers could have had access to, so I'm not sure if it is unmodified.
HEAD (git rev-parse HEAD) on my result of doing that is currently 0b99783d63f27606936bb79a16c52d0d70c0b56f, and it does have commits people have referenced as being part of the backdoor in it.
Apparently there's a wayback machine for git repos and it "just coincidentally" archived this repo the day before the news broke:
https://archive.softwareheritage.org/browse/origin/visits/?o...
That was me. I'm part of ArchiveTeam and Software Heritage and I'm one of the Debian sysadmins, the latter got some advanced notice. I figured archives of xz related stuff would be important once the news broke, so I saved the xz website and the GitHub repos. I regret that I didn't think to join the upstream IRC channel and archive the rest of the tukaani.org domain, nor archive the git.tukaani.org repos. Been archiving links from these threads ever since the news broke.
As someone who was looking for that git repo, thank you :)
> https://git.tukaani.org/xz.git
it's throwing 403 now.
Works cloning though.
Well that's inconvenient, I was (probably, time permitting) going to propose to some of my friends that we attempt to reverse this for fun tomorrow.
Anyone have a link to the git history? I guess we can use the ubuntu tarball for the evil version.
It seems like based on the (very well written) analysis that this is a way to bypass ssh auth, not something that phones out which would've been even scarier.
My server runs arch w/ a LTS kernel (which sounds dumb on the surface, but was by far the easiest way to do ZFS on Linux that wasn't Ubuntu) and it seems that since I don't have SSH exposed to the outside internet for good reason, and my understanding is Arch never patched shhd to begin with that I and most people who would be in similar situations to me are unaffected.
Still insane that this happened to begin with, and I feel bad for the Archlinux maintainers who are now going to feel more pressure to try to catch things like this.
Being included via libsystemd isn't the only way ssh can load liblzma, it can come as an indirect dependency of Selinux (and its PAM stack) IIUC. Which makes it even a bit more funny (?) since Arch also doesn't officially support any Selinux stuff.
There might be other ways sshd might pull in lzma, but those are the 2 ways I saw commonly mentioned.
On a different note, pacman/makepkg got the ability to checksum source repository checkouts in 6.1.
Interesting commit in January where the actual OpenPGP key was changed: https://github.com/tukaani-project/tukaani-project.github.io...
They just signed each other's keys around that time, and one needs to redistribute the public keys for that; nothing suspicious about it I think. The key fingerprint 22D465F2B4C173803B20C6DE59FCF207FEA7F445 remained the same.
before:
pub rsa4096/0x59FCF207FEA7F445 2022-12-28 [SC] [expires: 2027-12-27]
22D465F2B4C173803B20C6DE59FCF207FEA7F445
uid Jia Tan <jiat0218@gmail.com>
sig 0x59FCF207FEA7F445 2022-12-28 [selfsig]
sub rsa4096/0x63CCE556C94DDA4F 2022-12-28 [E] [expires: 2027-12-27]
sig 0x59FCF207FEA7F445 2022-12-28 [keybind]
after: pub rsa4096/0x59FCF207FEA7F445 2022-12-28 [SC] [expires: 2027-12-27]
22D465F2B4C173803B20C6DE59FCF207FEA7F445
uid Jia Tan <jiat0218@gmail.com>
sig 0x59FCF207FEA7F445 2022-12-28 [selfsig]
sig 0x38EE757D69184620 2024-01-12 Lasse Collin <lasse.collin@tukaani.org>
sub rsa4096/0x63CCE556C94DDA4F 2022-12-28 [E] [expires: 2027-12-27]
sig 0x59FCF207FEA7F445 2022-12-28 [keybind]
Lasse's key for reference: pub rsa4096/0x38EE757D69184620 2010-10-24 [SC] [expires: 2025-02-07]
3690C240CE51B4670D30AD1C38EE757D69184620
uid Lasse Collin <lasse.collin@tukaani.org>
sig 0x38EE757D69184620 2024-01-08 [selfsig]
sig 0x59FCF207FEA7F445 2024-01-12 Jia Tan <jiat0218@gmail.com>
sub rsa4096/0x5923A9D358ADF744 2010-10-24 [E] [expires: 2025-02-07]
sig 0x38EE757D69184620 2024-01-08 [keybind]
GitHub suspended this project
> I am *not* a security researcher, nor a reverse engineer.
Could have fooled me - impressive write-up!
Github making suspect repository private and hiding recent account activity is wrong move and is interfering with citizens investigation efforts.
Going forward this will require more than a citizens investigation. Law enforcement will surely be granted access. Also, tarballs are still available in package managers if you really want to dig into the code.
It's a crime scene. It effectively has the "police" yellow tape around it.
I think the lesson here for packagers is that binary testdata should not be present while doing the build.
It is too easy to hide things in testdata.
Nice idea, but then you just hide the attack in logo.png that gets embedded in the binary. Less useful for libraries, works plenty good for web/desktop/mobile.
This entire thread is above my pay grade, but isn’t minimizing the attack surface always a good thing?
The problem with the parent's suggestion is you end up banning lots of useful techniques while not actually stopping hackers from installing back doors or adding security exploits. The basic problem is once an attacker can submit changes to a project, the attacker can do a lot of damage. The only real solution is to do very careful code reviews. Basically, having a malicious person get code into a project is always going to be a disaster. If they can get control of a project, it is going to be even worse.
> The only real solution is to do very careful code reviews.
Are there any projects that are well resourced enough to do this consistently, including all dependencies?
It's all irrelevant. The attacker social engineered their way to being the lead maintainer for the project.
Mirror of the report, since the Openwall servers appear to be down.
https://web.archive.org/web/20240329182300/https://www.openw...
Debian is considering that their infrastructure may be compromised[1].
Looks like Arch Linux shipped both compromised versions - and 5.6.1-2 is out to hopefully resolve it.
5.6.1-2 is not an attempted fix, it's just some tweaks to Arch's own build script to improve reproducibility. Arch's build script ultimately delegates to the compromised build script unfortunately, but it also appears the payload itself is specifically targeting deb/RPM based distros, so a narrow miss for Arch here.
(EDIT: as others have pointed out, part of the exploit is in the artifact from libxz, which Arch is now avoiding by switching to building from a git checkout)
Are you sure about that? The diff moves away from using the compromised tarballs to the not-compromised (by this) git source. The comment message says it's about reproducibility, but especially combined with the timing it looks to me like that was just to avoid breaking an embargo.
So, you suggest that Frederik Schwan had prior knowledge of the security issues but hid the real purpose of the commit under "improve reproducibility"?
Yes.
I've never had to do it myself but I believe that's common practice with embargos on security vulnerabilities.
xz was masked in the Gentoo repositories earlier today with the stated reason of "Investigating serious bug". No mention of security. It's pretty likely.
This is very likely the case. Arch maintainers do get early information on CVEs just like any other major distro.
But with pacman/makepkg 6.1 (which recently released) git sources can also now be check summed IIRC which is a funny coincidence.
I upgraded Arch Linux on my server a few hours ago. Arch Linux does not fetch one of the compromised tarballs but builds from source and sshd does not link against liblzma on Arch.
[root@archlinux ~]# pacman -Qi xz | head -n2
Name : xz
Version : 5.6.1-2
[root@archlinux ~]# pacman -Qi openssh | head -n2
Name : openssh
Version : 9.7p1-1
[root@archlinux ~]# ldd $(which sshd) | grep liblzma
[root@archlinux ~]#
It seems that Arch Linux is not affected.5.6.1-1 was built from what I understand to be one of the affected tarballs. This was patched in 5.6.1-2: https://gitlab.archlinux.org/archlinux/packaging/packages/xz...
I agree on the sshd linking part.
Interesting, they just switched from tarballs to source 19 hours ago. It seems to me that Frederik Schwan had prior knowledge of the security issue, or it is just a rare coincidence.
Distributions were notified under embargo.
The project has made an official post on the subject
https://archlinux.org/news/the-xz-package-has-been-backdoore...
The writeup indicates that the backdoor only gets applied when building for rpm or deb, so Arch probably would have been okay either way? Same with Nix, Homebrew, etc.
On arch, `ldd $(which sshd)` doesn't list lzma or xz, so I think it's unaffected? Obviously still not great to be shipping malicious code that just happens to not trigger.
Deleted per below
This is what the `detect_sh.bin` attached to the email does. I can only assume that the pesron who reported the vulnerability checked that this succeeds in detecting it.
Note that I'm not looking for the vulnerable symbols, I'm looking for the library that does the patching in the first place.
Deleted, thanks.
My Arch setup is the same, they must not patch openssh.
Incredible. It's like discovering your colleague for 2 years at the secret nuclear weapon facility is a spy for another country, covering his tracks until the very last minute. Feels like a Hollywood movie is coming up.
Should we start doing background checks on all committers to such critical IT infrastructure?
But how? Let's say you're one of 10 maintainers of an open source project. A new user wants to contribute. What do you do? Do you ask them to send you some form of ID? Assuming this is legal and assuming you could ensure the new user is the actual owner of an actual, non counterfeit ID, what do you do? Do you vet people based on their nationality? If so, what nationality should be blackballed? Maybe 3 maintainers are American, 5 are European and 2 are Chinese. Who gets to decide? Or do you decide based on the company they work for?
Open source is, by definition, open. The PR/merge request process is generally meant to accept or refuse commits based on the content (which is why you have a diff), not on the owner.
Building consensus on which commits are actually valid, even in the face of malicious actors, is a notoriously difficult problem. Byzantine fault tolerance can be achieved with a 2/3 + 1 majority, but if anyone can create new identities and have them join the system (Sybil attack) you're going to have to do things differently.
Not even background check but a foreground check would already help. Like literally, who dis? any identity at all?
Too often maintainers who have no time just blanket approve PRs and see if stuff breaks.
@people who write github scanners for updates and security issues (dependabot and the like)
Can we start including a blacklist of emails and names of contributors (with reasons/links to discussions)?
I can't track them and I don't want them in my projects.
Might not be very helpful as it is easy to create new identities, but I see no reason to make it easier for them. Also, I might approach differently someone with lots of contributions to known projects than a new account, so it still helps.
It takes a minute to create a new email address. And you can change or fake an email address on a git commit trivially. You, too, can writing code as anyone you want by just doing "git commit --author='Joe Biden <icecream@whitehouse.gov>'". On the internet nobody knows you're Joe Biden.
You can write a rather simple GitHub action that would do that: look at a PR and reject / close it if you don't like it for some reason. AFAIK open-source projects have a free quota of actions.
OTOH sticking to the same email for more than one exploit might be not as wise for a malicious agent.
github already suspended the account
Github should probably remove the dopamine hits of green checkmarks etc. like in serious stock broker apps
They should also remove the emojis, there is no need to have people feel good about upvotes. I've long felt uncomfortable with emojis on Slack as well. Responding to a coding or infrastructure issue should not be a social activity, I respond because it's my job and if the issue is worth it, not because a human being should feel appreciated (either them or me).
The emojis reduce (but not eliminate) the number of "me too!"s PRs will get, which IMO is a good thing.
That only requires a vote button, not a whole range of cringe cartoon images.
Many people write code for fun and slack is a social communications platform.
If you can't imagine people using these tools for other reasons than pure unemotional business value then you don't understand their market.
Your suggestions would lose those platforms users and revenue.
There's good discussion of the timeline here: https://boehs.org/node/everything-i-know-about-the-xz-backdo...
> openssh does not directly use liblzma. However debian and several other distributions patch openssh to support systemd notification, and libsystemd does depend on lzma.
It looks to be limited to Linux systems that are running certain patches. macOS and BSD seem unaffected?
FreeBSD is not affected as the payloads in question were stripped out, however we are looking into improvements to our workflow to further improve the import process.
Is the solution against such attacks in the future only to scrutinize more, or are there other reasonable options in terms of hardening?
The lesson here seems to not depend on tools written in languages that have complex, obscure build systems and no one is either able or interested to read. Using tools rewritten in Rust, Go or any other languege which resolves dependencies within project seems the only way to do hardening here.
I agree there's safer languages than C, but nobody reads the 50,000 lines changed when you update the vendoring in a random golang project. It would be easy to introduce something there that nobody notices too.
It is generally harder to introduce vulnerabilities in readable language even more when it is memory safe. Sure life is not perfect and bad actors would have found a ways to inject vulnerabilities also in Rust, Go codebase. The benefit of modern languages is that there is one way to build things and the source code is the only thing that needs to be auditted.
this backdoor had nothing at all to do with memory safety.
You don't need a complex obscure build system for most C code. There's a lot of historical baggage here, but many projects (including xz, I suspect) can get away with a fairly straight-forward Makefile. Double so when using some GNU make extensions.
Thanks for that post, I wish people stopped pushing ever so more complicated build systems, opaque, non-backward compatible between their own versions when a 2 pages Makefile would work just fine, and still work in 20 years time.
Rust is the worst in terms of build system transparency. Ever heard of build.rs? You can hide backdoors in any crate, or in any crate's build.rs, or the same recursively.
Most build systems are turing-complete. Rust, at least, drastically reduces the need for custom build scripts (most of my projects have empty build.rs files or lack one entirely), and build.rs being in the same language as the rest of the codebase aids transparency immensely.
That doesn't make build.rs any less of a juicy target for a supply chain attack.
Arbitrary code downloaded from the internet and run at build time? That's a nightmare scenario for auditing, much worse than anything Autotools or CMake can offer.
Wouldn't a supply chain attack like this be much worse with Rust and Cargo because of the fact it's not just a single dynamic library that needs to be reinstalled system-wise, but, instead, every binary would require a new release?
It would mean rebuilding more packages. I don't think that's meaningfully "much worse", package mangers are perfectly capable of rebuilding the world and the end-user fix is the same "pacman -Syu"/"apt-get update && apt-get upgrade"/...
On the flip side the elegant/readable build system means that the place this exploit was hidden wouldn't exist. Though I wouldn't confidently say that 'no hiding places exist' (especially with the parts of the ecosystem that wrap dependencies in other languages).
It's much worse because it requires repackaging every affected system package instead of a single library. Knowing which packages are affected is difficult because that information isn't exposed to the larger system package manager. After all, it's all managed by the build system.
I am not completely sure about this exploit, but seems like a binary needed to be modified for the exploit to work[1] which was later picked up by build system.
https://github.com/tukaani-project/xz/commit/6e636819e8f0703...
The binary was an xz test file that contained a script that patched the c-code.
This seems to be an orthogonal issue. Rust could build the same dynamic library with cargo which could then be distributed. The diference is that there would be a single way to build things.
Most Rust libraries are not dynamically linked; instead, versions are pinned and included statically during the build process. This is touted as a feature.
Only a few projects are built as system-wide libraries that expose a C-compatible abi interface; rsvg comes to mind.
People are going to be upset with this perspective but I completely agree. The whole autoconf set of tools is a complete disaster.
Once somebody actually does this people are gonna complain the same as always: "The sole purpose of your project is to rewrite perfectly fine stuff in Rust for the sake of it" or something along these lines.
Is this really the lesson here? We are talking about a maintainer here, who had access to signing keys and a full access to the repository. Deb packages which were distributed are also different than the source code. Do you honestly believe that the (arguably awful) autotools syntax is the single root cause of this mess, Rust will save us from everything, and this is what we should take away from this situation?
I call bullshit.
The fundamental problem here was a violation of chain of trust. Open source is only about the source being open. But if users are just downloading blobs with prebuilt binaries or even _pre-generated scripts_ that aren't in the original source, there is nothing a less-obscure build system will save you from as you are putting your entire security on the chain of trust being maintained.
Am I crazy thinking libraries shouldn't be able to provide _other libraries'_ symbols without the other libraries' "permission"? What am I missing?
> One portion of the backdoor is solely in the distributed tarballs. For easier reference, here's a link to debian's import of the tarball, but it is also present in the tarballs for 5.6.0 and 5.6.1:
Ubuntu 22.04 version:
dpkg -l |grep liblzma ii liblzma5:amd64 5.2.5-2ubuntu1 amd64 XZ-format compression library
Whew!
Is this a crime? Has anyone been prosecuted for adding a backdoor like this?
Has anyone been prosecuted for adding a backdoor
Google up Randal Schwartz. Caution: clickhole.
As far as I remember, he added no backdoors.
He was a consultant/sysadmin for Intel, and he did 3 things which he thought his employer would support, and was astonished to find that not only did his employer not support, but actively had him prosecuted for doing it. Ouch.
1. He ran a reverse-proxy on two machines so he could check in on them from home.
2. He used the crack program to find weak passwords.
3. He found a weak password, and used it to log into a system, which he copied the /etc/shadow file from to look for additional weak passwords.
https://www.giac.org/paper/gsec/4039/intel-v-randal-l-schwar...
https://web.archive.org/web/20160216204357/http://www.lightl...
He didn't try and hide his activities, and didn't do anything else untoward, it was literally just these things which most people wouldn't bat an eyelid at. These days, it is completely normal for a company to provide VPNs for their employees, and completely normal to continually scan for unexpected user accounts or weak passwords. But... because he didn't explain this to higher-ups and get their buy-in, they prosecuted him instead of thanking him.
To be fair, it is perfectly normal for a surgeon to cut people with a sharp knife with their permission while in the hospital.
It is kinda sus when they do it at home without consent.
I find it useful to compare the reactions of O'Reilly and Intel. Schwartz worked for both (he wrote Learning Perl and co-authored Programming Perl for O'Reilly and made them plenty of money). He cracked the passwords of both companies without first getting permission.
O'Reilly's sysadmin told him off for not getting permission, and told him not to do it again, but used his results to let people with weak passwords know to change them.
Intel's sysadmin started collecting a dossier on Schwartz and ultimately Intel pushed for state criminal charges against him.
O'Reilly's sysadmin testified in Schwartz's defense that he was an overly eager guy with no nefarious intent. So - kinda-sus or not - Intel could have resolved this with a dressing down, or even termination if they were really unhappy. Intel _chose_ to go nuclear, and invoke the Oregon computer crime laws, and demand the state prosecute him.
apparently he did that after leaving the company, which is pretty sus.
Seems a little different. Based on a quick read, he gained unauthorized access to systems.
In this case, backdoor code was offered to and accepted by xz maintainers.
Lots of things are crimes even though they're just offering something to a victim who willingly accepts it, e.g. phishing attacks, fraudulent investment schemes, contaminated food products.
Sure. I'm wondering if there is a specific law that was broken here. It seems to me that it might be beneficial if there were some legal protection against this sort of act.
Seems a little different. Based on a quick read
It is a little different but a thing that you might have missed in the quick read is that one of the things he was accused of was installing and using a backdoor.
One involves making unauthorized access, the other does not.
Kinda relevant, as I saw few comments about how safer languages are the solution.
Here[0] is a very simple example, that shows how easy such supply chain attacks are in Rust; and lets not forget that there was a very large python attack just a few days ago[1].
[0] - https://github.com/c-skills/rust1
[1] - https://checkmarx.com/blog/over-170k-users-affected-by-attac...
I am very concerned about Rust.
Rust’s “decision” to have a very slim standard library has advantages, but it severely amplifies some other issues. In Go, I have to pull in zero dependencies to make an HTTP request. In Rust, pulling reqwest pulls in at least 30 distinct packages (https://lib.rs/crates/reqwest). Date/time, “basic” base64, common hashing or checksums, etc, they all become supply chain vectors.
The Rust ecosystem’s collective refusal to land stable major versions is one of the amplifying issues. “Upgrade fatigue” hits me, at least. “Sure, upgrade ring to 0.17” (which is effectively the 16th major version). And because v0.X versions are usually incompatible, it’s not really possible to opt not to upgrade, because it only takes a short while before some other transitive dependency breaks because you are slow to upgrade. I recently spent a while writing my code to support running multiple versions of the `http` library, for example (which, to be fair, did just land version 1.0). My NATS library (https://lib.rs/crates/async-nats) is at version 34. My transitive base64 dependency is at version 22 (https://lib.rs/crates/base64).
This makes it nearly impossible for me to review these libraries and pin them, because if I pin foo@0.41.7, and bar needs foo@0.42.1, I just get both. bar can’t do =>0.41, because the point of the 0.X series is that it is not backwards compatible. It makes this process so time consuming that I expect people will either just stop (as if they did) reviewing their dependencies, or accept that they might have to reinvent everything from URL parsing to constructing http headers or doing CRC checks.
Combine this with a build- and compile-time system that allows completely arbitrary code execution, which is routinely just a wrapper for stuff like in the zx attack (look at a lot of the low-level libs you inevitably pull in). Sure, the build scripts and the macro system enables stuff like the amazing sqlx library, but said build and macro code is already so hard to read, it really takes proper wizardry to properly understand.
You have perfectly put into words, all my thoughts.
I have been thinking about ways to secure myself, as it is exhausting to think about it every time there is an update or some new dependency.
After this attack, I think the only sure way is to unplug the computer and go buy goats.
The next best thing? Probably ephemeral VMs or some Codespaces/"Cloud Dev Env thingy". (except neither would save me in the xz case)
> In Rust, pulling reqwest pulls in at least 30 distinct packages
This would be less of a problem if each dependency (and in turn, their dependencies) were individually sandboxed, and only allowed to access specific inputs/files at runtime in the capability security (https://en.wikipedia.org/wiki/Capability-based_security) fashion.
This way the attack surface would be hollowed out as much as possible, and exploits limited to the (sub)program output or specific accessible (writable) files.
Or you vendor everything.
You don't automatically download anything at build or install time, you just update your local source copies when you want to. Which to be clear I know means rarely.
It's 1970 all over again!
Vendoring is nice, and I usually prefer it, but you don't always have the time or people for it.
Vendoring + custom build system (Bazel?) for everything is basically googles approach, if what I have read is correct. Definitely better than everything we have, but the resources for it are not something most can afford.
P.S also what mrcus said, if we trust the upstream build process, we may as well trust their binaries.
That was what the 1970 crack was about.
Yes, but this doesn’t prevent issues like the xz issue, where the code looks fine, but the build scripts alter it.
Keeps one wonder how many similar backdoors are there in the wild. What is the best way to execute such a move? This is sophisticated enough, but not good enough to stay unnoticed for a long while. If I were a state actor I'd think about at least 6-12 months.
Both https://github.com/tukaani-project members accounts have been suspended. (to see that, you can list the followers of each account).
Jai Tan's commit history on his github profile suggests he took off for Christmas, new years, and spring break. I smell an American.
Sometimes you smell an American because someone wanted you to smell an American.
Operating on a target region schedule doesn't seem particularly sophisticated, at least compared to the all the efforts put into this exploit.
Interesting. Is there also a pattern in the times of day? (I don't so much mean the times in commits done by the developer because they can be fake. I'd be more interested in authentic times recorded by GitHub, if any such times are publicly accessible.)
Another thing would be to examine everything ever written by the user for linguistic clues. This might point towards particular native languages or a particular variant of English or towards there being several different authors.
Someone said commits lined up with Beijing time, but I've not verified that.
But that wouldn't count for much, someone employed by anyone could work any hours.
Also git actually stores the timezone information. You could see it is consistently China time (GMT+8).
P.S. could be Taiwanese as China and Taiwan share the same timezone.
Below are links to the git mailbox files where you could see the timezone.
From 2022:
- https://github.com/tukaani-project/xz/commit/c6977e740008817...
- https://github.com/tukaani-project/xz/commit/7c16e312cb2f40b...
From 2024:
- https://github.com/tukaani-project/xz/commit/af071ef7702debe...
- https://github.com/tukaani-project/xz/commit/a4f2e20d8466369...
That timezone information is provided by whoever created the commits so cannot be trusted to be correct. Considering the chosen alias it's not unexpected that the timezone information was also made to look Chinese.
So could be the holiday inactivity. I'd expect multiple layers of country obfuscation as well as conflicting information to confuse you. Or none. Impossible to know for sure.
Maybe it's a Western Australian, or Indonesian, or Thai...
GMT+8 covers a lot of places
Quite ironic: The most recent commit in the git repo is "Simplify SECURITY.md", committed by the same Github account which added the backdoor.
https://github.com/tukaani-project/xz/commit/af071ef7702debe...
It's not ironic, this change is really sinister IMO. They want you to waste more time after you've submitted the security report and maximize the amount of back and forth. Basically the hope is that they'd be able to pester you with requests for more info/details in order to "resolve the issue" which would give them more time to exploit their targets.
This is exactly why I fight the windmills so hard when it comes automatic updates in Linux software.
So much damage is caused just by adding a single maintainer to a project - imagine how much power you would have to wield the remote execution systems put in place by naive developers for "automatic updates".
All it takes is a single malicious maintainer given access to the new version update of some popular user software, and they have a new botnet of thousands of devices at their disposal. Better yet, after the backdoor installation, they can just release the real update and cover their tracks forever.
Automatic updates are like running web applications, but without any sandboxing or protection usually implemented by the browser.
I hope mainstream news cover this so the general population can understand the issue with our software ecoysystems reliance on unpaid open-source maintainers
I worry the mainstream news take would just be "open source bad, microsoft closed source and google cloud good"
> Red Hat assigned this issue CVE-2024-3094.
Does that mean this affects RHEL and Fedora?
RHEL no, Fedora 41 and Rawhide yes.
https://www.redhat.com/en/blog/urgent-security-alert-fedora-...
https://lists.debian.org/debian-security-announce/2024/msg00...
Note that Fedora 40 isn't even released yet, it's in beta, Fedora 41 / rawhide is basically a development branch used only by a small number of people.
A small number of people with likely professional involvement in the Fedora project and possibly RHEL.
A supply chain attack serve as the basis for another supply chain attack.
RHEL won't get this bug for 2 years =)
i knew there was an advantage to being 8-10 years out of date at all times...
and when they do port finally backport this bug in 2026, they will probably implement the systemd integration with openssl (pbthththt...) via 600 patch files in some nonstandard divergent manner that thwarts the payload anyhow. see? i knew they were super duper secure.
Red Hat helps to do the job of making sure OSS has CVEs so there's common vernacular for the problem.
Given the recent ( not so recent ) attacks/"bugs" I feel there is a need to do more than the already hard task of investigating and detecting attacks but also to bring IRL consequences to these people.
My understanding is that right now it's pretty much a name and shame of people who most of the time aren't even real "people" but hostile agents either working for governments or criminal groups ( or both )
Getting punched in the face is actually a necessary human condition for a healthy civilization.
In the article it says CISA was notified - that sounds like it's going to be a federal investigation if nothing else. If I was this person, I wouldn't be in the USA (or any US friendly nation) ASAP.
One of Jia Tan's recent contributions is "Speed up CRC32 calculation on LoongArch" I would guess the odds are that this is not someone in the US.
It's also very possible that the account was compromised and taken over. A two years long con with real useful work is a lot of patience and effort vs. just stealing a weakly protected account. I wonder if MFA shouldn't be a requirement for accounts that contribute to important OSS projects.
>A two years long con with real useful work is a lot of patience and effort vs. just stealing a weakly protected account.
The long-con theory seems a bit more plausible at the moment
2 years of one engineer's time is very cheap, compared to e.g. the NSA's CryptoAG scam. I'd say most likely a Chinese intelligence plant, kindly offering to relieve the burden of the original author of xz.
This is most likely not his first backdoor, but the first which was detected.
So most likely he didn't wait two years to benefit.
> It's also very possible that the account was compromised and taken over
Or they WERE legit and simply went rogue, perhaps due to external factors.
I am thinking more in so-called rubberhose cryptoanalysis.
[dead]
That was a review of someone else's work? https://github.com/tukaani-project/xz/pull/86
Since that repo is disabled: here is a mirror of the discussion [1]
Yeah I saw that - I wouldn't bet on them being in the US but who knows. Maybe they just really love CRC32 ;) And introducing backdoors (if it that was them not an account takeover).
Those tarballs are PGP signed, too..
The full name "Jia Cheong Tan" doesn't sound like Mainland China. The name and actions could be intentionally misleading though.
We're way too global now for this to be more than a tiny extra signal. People move around, families preserve names.
Also nobody checked that person's id, so "Jia" is only slightly more meaningful than "ghrssbitrvii".
Names can be faked, and even real names are not a great indicator.
Unless you have some very specific cultural knowledge you could not make even vaguely useful deductions about my location, nationality, culture, ethnicity etc. from my name. I get a lot of wrong guesses though!
Since his only appearance outside of github and git repos is on some Taiwanese blogs, can we please change all occurances of China to Taiwan please?
And some others hints at Eastern Europe, comparing the timezones. Taiwan ist still the strongest hint though.
[flagged]
From their Git commits, they're in China's time zone.
Remember that agencies like NSA, GCHQ etc will always use false flags in their code, even when it doesn’t have as high risk of exposure as a backdoor in public has.
Looking at the times of commits shouldn’t be given much value at all. A pretty pointless endeavour.
State actors are actually known for not doing that; after all, there's no need to hide when what you're doing is legal. They also tend to work 9-5 in their own timezones.
But the actual interactions with Github are done between 12.00 UTC and 18.00 UTC
https://news.ycombinator.com/item?id=39870925
https://play.clickhouse.com/play?user=play#U0VMRUNUIHRvSG91c...
My git commits are sometimes in UTC, depending on which computer I make them from. Sometimes my laptop just switches timezones depending on whether I'm using wifi or LTE. I wouldn't put much weight on the timezone.
The time stamp of a git commit depends on the system clock of the computer the commit was checked in. This cannot be checked by github & co (except that they could reject commits which have time stamps in the future).
I assume you mean UTC+8... that covers about 20% of the earth's population, besides China it includes parts of Russia, a bunch of SEA and Western Australia.
China is 20& of the world's population...
We shouldn't rule out the probability that this account is from a U.S. agency as well.
We shouldn't rule it out, but it seems unlikely to me.
This is more reckless than any backdoor I can think of by a US agency . NSA backdoored Dual EC DRBG, which was extremely reckless, but this makes that look careful and that was the Zenith of NSA recklessness. The attackers here straight up just cowboy'd the joint. I can't think of any instance in which US intelligence used sock puppets on public forums and mailinglists to encourage deployment of the backdoored software and I maintain a list of NSA backdoors: https://www.ethanheilman.com/x/12/index.html
It just doesn't seem like their style.
The CIA had plans to commit terrorist acts against American civilians to start a war against Cuba in the 60s. This is quite literally their style. For example, perhaps they were planning to blame the hack of a power plant or critical infrastructure on this exploit, then use the "evidence" that was leaked to prove it was China, and from there carry out an offensive operation against Chinese infrastructure. There are lots of subversive reasons they would want to do this.
Just so I understand, you're alleging that a U.S. agency was, among other things, submitting patches for a mainland Chinese home-grown CPU architecture (Loongson)?
No, they're not. They are saying that due to the extraordinary circumstances with this case US agencies cannot be excluded from suspicion. At this time no actor seems to be a more likely perpetrator than the next. (Keep in mind that false-flag operations are a very common occurrence in cyber warfare and this cannot be ruled out yet.)
Aren't you confusing JiaT75 and xry111?
And if someone wanted to attack a target running on Loongson, they would certainly have to make sure the code can actually run there in the first place.
It doesn't seem out of the question that the U.S. or allied nations might want to be involved in the development effort around these CPUs. Even if initially it's just to build some credibility for this account so future adversarial patches are accepted with less suspicion? If you think that's implausible, I'm interested why?
CISA Advisory: https://www.cisa.gov/news-events/alerts/2024/03/29/reported-...
Note that it say "Fedora 41" in the CISA page link to Red Hat, but Red Hat changed the blog title to "Fedora 40" and left the HTML page title as "Fedora 41".
A federal investigation into what, itself? The primary actors doing this type of thing are the US Government.
And I bet if it ended up on a NATO system, things escalate quickly for the person / nation states being scrutinized (https://www.nato.int/cps/en/natohq/topics_78170.htm)
What law do you think is being broken here?
Maybe https://www.law.cornell.edu/uscode/text/18/1030#a_5 ?
> knowingly causes the transmission of a program, information, code, or command, and as a result of such conduct, intentionally causes damage without authorization, to a protected computer;
How does posting an exploit POC differ here?
> Getting punched in the face is actually a necessary human condition for a healthy civilization.
Aside from signed commits, we need to bring back GPG key parties and web of trust. When using a project you would know how many punches away from the committers you are.
PGP is more famous for "web of trust" topologies, not chains of trust.
For all of their nerd cred, key parties didn't accomplish very much (as evidenced by the fact that nothing on the Internet really broke when the WoT imploded a few years ago[1]). The "real" solution here is mostly cultural: treating third-party software like the risky thing it actually is, rather than a free source of pre-screened labor.
Yes, but there was also little pressure to really build the WOT. People, like myself, did it because it was fun, but no one really relied on it. This could change, but it is still far from certain if it'd work given enough pressure.
Chain/web was typo, corrected, thanks.
I know of the key party issues. But there is some value to knowing how far removed from me and people I trust the project authors are.
> But there is some value to knowing how far removed from me and people I trust the project authors are
That's true!
Nowadays i achieve this with linkedin[1] connections. Less nerd cred, but achieves roughly the same purpose (most of the people I care about in my niche are at most a 3rd degree connection - a friend of a friend of a friend).
[1] formerly also twitter, at least partially.
The web of punches?
> Getting punched in the face is actually a necessary human condition for a healthy civilization.
This is factually false - in fact, it's literally the direct opposite of the truth. "Getting punched in the face" is base violence that is incompatible with a healthy civilization. A good government with a robust justice system is what is actually needed for a healthy civilization.
> openssh does not directly use liblzma. However debian and several other distributions patch openssh to support systemd notification, and libsystemd does depend on lzma.
The systemd notification protocol could have been as simple as just writing a newline to a pipe, but instead you have to link to the libsystemd C library, so now security-critical daemons like openssh have additional dependencies like liblzma loaded into their address space (even if you don't use systemd as PID 1), increasing the risks of supply chain attacks. Thanks, systemd.
That is all the protocol is. From https://www.freedesktop.org/software/systemd/man/latest/sd_n...:
> These functions send a single datagram with the state string as payload to the socket referenced in the $NOTIFY_SOCKET environment variable.
The simplest implementation (pseudocode, no error handling, not guaranteed to compile), is something like:
const char *addrstr = getenv("NOTIFY_SOCKET");
if (addrstr) {
int fd = socket(AF_UNIX, SOCK_DGRAM, 0);
struct sockaddr_un addr = { .sun_family = AF_UNIX };
strncpy(addr.sun_path, sizeof(addr.sun_path), addrstr);
connect(fd, (struct sockaddr*) &addr);
write(fd, "READY=1");
close(fd);
}
This is what I did for a daemon I'm maintaining. Type=notify support was requested but I'm really allergic to adding new libs to a project until they really do some heavy lifting and add enough value. I was pleasantly surprised the protocol was that simple and implemented it myself. I think systemd should just provide a simple standalone reference implementation and encourage people to copy it into their project directly. (But maybe they already do, I did that almost a decade ago IIRC when the feature was relatively new.)
goddamnit leftpad got us too :)
Whoops, you forgot `vsock:`, `@`, `SO_PASSCRED` (I think)... oh and where is that example provided? But yep that's all the protocol is for sure (and forever)!
> The systemd notification protocol could have been as simple as just writing a newline to a pipe
It basically is. libsystemd links to liblzma for other features not related to notifications.
(The protocol is that systemd passes the path to a unix socket in the `NOTIFY_SOCKET` env variable, and the daemon writes "READY=1" into it.)
Is that protocol documented/stable? For whatever reason, daemons are choosing to link to libsystemd instead of implementing it themselves.
It doesn't matter that libsystemd links to liblzma for other reasons. It's still in the address space of any daemon that is using libsystemd for the notification protocol.
I know Golang has their own implementation of sd_notify().
For Slurm, I looked at what a PITA pulling libsystemd into our autoconf tooling would be, stumbled on the Golang implementation, and realized it's trivial to implement directly.
indeed; it should be trivial in any language. Here's python: https://github.com/9001/copyparty/blob/a080759a03ef5c0a6b06c...
Caveat is that golang is not a good enough actor to be a reliable indicator of whether this interface is supported, though. They’ll go to the metal because they can, not because it’s stable.
Can me point me to the Golang implementation? Is it a standard package?
> libsystemd links to liblzma for other features not related to notifications
Which is pretty emblematic of systemd's primary architectural fault!
systemd getting its tentacles everywhere they can squeeze is a feature, not a bug
The funny thing is that libsystemd _used_ to be split into several different libraries. I certainly remember libsystemd-journal (which is presumably the part of libsystemd that pulls in liblzma) being separate to libsystemd-daemon (which is the part that implements sd_notify, as used by OpenSSH [after patching by distros]).
If that split had never happened, then liblzma wouldn't have ended up being linked into sshd...
Strange protocol. Why not pass a path to a file that should be `touch`d and/or written to, I wonder? Would avoid the complexity of sockets.
Services may be in a different mount namespace from systemd for sandboxing or other reasons (also means you have to worry about filesystem permissions I suppose). Passing an fd from the parent (systemd) is a nice direct channel between the processes
But systemd precisely doesn't pass an FD. If it did, you would just need to write() and close().
FWIW, I did a quick check on a Devuan system. The sshd in Devuan does link to a libsystemd stub - this is to cut down on their maintenance of upstream packages. However that stub does not link to lzma.
On an MX Linux (non-systemd Debian-derived distro) box I ran ldd on /sbin/ssh and also ran:
[EDIT: this string gives cleaner results:]
lsof -w -P -T -p $(pgrep sshd)|grep mem
and saw liblzma in the results of both, so there is some sort of similar trickery going on.Huh. That's rather surprising. Do you know how MX Linux handles systemd? Devuan does that shimming of upstream. Do they perhaps just try to leave out certain packages?
Anyway. I did not see lzma in the results on Devuan running a process check (just in case). I did see it on a Debian.
It turns out MX uses a package called systemd-shim that seems to be the Debian one:
$aptitude show systemd-shim
Package: systemd-shim
Version: 10-6
State: installed
Automatically installed: no
Priority: extra
Section: admin
Maintainer: Debian QA Group <packages@qa.debian.org>
Architecture: amd64
Uncompressed Size: 82.9 k
Depends: libc6 (>= 2.34), libglib2.0-0 (>= 2.39.4), cgmanager (>= 0.32)
Suggests: pm-utils
Conflicts: systemd-shim:i386
Breaks: systemd (< 209), systemd:i386 (< 209)
Description: shim for systemd
This package emulates the systemd function that are required to run the systemd helpers without using the init service
> so now security-critical daemons like openssh have additional dependencies like liblzma
Systemd itself seems security-critical to me. Would removing other dependencies on libsystemd really make a secure system where systemd was compromised through its library?
1. systemd (at least the PID 1 part) does not talk to the network, so a remotely-accessible backdoor would need to be more complex (and thus more likely to be detected) than a backdoor that can be loaded into a listening daemon like openssh.
2. You can run Debian systems without systemd as PID 1, but you're still stuck with libsystemd because so many daemons now link with it.
> systemd... does not talk to the network...
Socket activation and the NFS automounter appear to.
If I run "netstat -ap" I see pid 1 listening on enabled units.
Edit: tinysshd is specifically launched this way.
Edit2: there is also substantial criticism of xz on technical grounds.
.. well, you can use a shim package as devuan did.
One of the objections that many people do not understand, is that systemd adds complexity. Unnecessary complexity. Boats full, loads full, mountains full of complexity.
Yes, there are things delivered with that complexity. However, as an example, sysvinit is maybe, oh, 20k lines of code including binaries, heck including all core init scripts.
What's systemd? 2M lines? It was >1M lines 4+ years ago.
For an init system, a thing that is to be the core of stability, security, and most importantly glacial, stable change -- that is absurdly complex. It's exceedingly over engineered.
And so you get cases like this. And cases like that, and that over there, and that case over there too. All which could not exist, if systemd didn't try to overengineer, over complicate everything.
Ah well. I'm still waiting for someone to basically fork systemd, remove all the fluff (udev, ntp, dns, timers, restart code, specialized logging, on and on and on), and just end up with systemd compatible service files.
But not yet. So... well, oh well.
This is a bit like complaining that the Linux kernel has 30 million lines of code, while ignoring that 3/4 of that is in hardware support (drivers) or filesystems that nobody is actually required to use at any given time.
systemd is a collection of tools, one of which is an init system. Nobody accused GNU yes of being bloated just because it's in a repository alongside 50 other tools.
> that nobody is actually required to use at any given time
But that's the very problem with systemd! As time goes on you're required, whether by systemd itself or by the ecosystem around it, to use more and more of it, until it's doing not only service management but also timezones, RTC, DNS resolution, providing getpwent/getgrent, inetd, VMs and containers, bootloader, udev (without adding literally any benefit over the existing implementations), ... oh and you also have to add significant complexity in other things (like the kernel!) to use it, like namespaces (which have been a frequent source of vulnerabilities)...
> timezones, RTC, DNS resolution, providing getpwent/getgrent, inetd, VMs and containers, bootloader
How many of those are you actually required to use systemd for? At least for DNS, inetd, containers and bootloader I'm pretty sure I run a few different alternatives across my systems. I think major distros (running systemd) still ship with different dns and inetd, for containers its a lot more common to use a docker-like (probably docker or podman) than it is to use systemd-nspawn.
> oh and you also have to add significant complexity in other things (like the kernel!) to use it, like namespaces (which have been a frequent source of vulnerabilities)
Namespaces were implemented before systemd, have been used before systemd in widely used systems (for example LXC and many others). Namespaces and similar kernel features are not tied to systemd.
>namespaces (which have been a frequent source of vulnerabilities)...
Unprivileged user namespaces sure, but I don't think that applies to namespaces in general (which without unprivileged user namespaces can only be created by root, and LPE is the concern with unprivileged userns due to increased attack surface). systemd doesn't need unprivileged userns to run.
Gnu yes is actually pretty bloated. It's 130 lines of code for something so trivial [1]! ;)
[1] https://github.com/coreutils/coreutils/blob/master/src/yes.c
yes(1) is the standard unix way of generating repeated data. It's good to do this as quickly as possible. I really don't understand why so many get annoyed with this code. 130 lines isn't that complicated in the scheme of things.
> Ah well. I'm still waiting for someone to basically fork systemd, remove all the fluff (udev, ntp, dns, timers, restart code, specialized logging, on and on and on)
Most of the things you named there are modular and can be easily disabled.
Furthermore, udev precedes systemd and systemd has in fact its own replacement for it (though the name escapes me).
Kind of a classic, people loving harping on systemd without properly understanding it.
systemd subsumed udev. Eudev is what folks who don't have systemd use.
> are modular and can be easily disabled.
That's a common defense for any bloatware. If they're modular and easily disabled then why are they all enabled by default?
Systemd is actually pretty damn good and it's GPL licensed free software.
I understand that people don't like the way it seems to work itself into the rest of Linux user space as a dependency but that's actually our own fault for not investing the man power that Red Hat invests. We have better things to do than make our own Linux user space and so they have occupied that niche. It's free software though, we always have the freedom to do whatever we want.
By the way, all the stuff you mentioned is not really part of the actual init system, namely PID 1. There's an actual service manager for example and it's entirely separate from init. It manages services really well too, it's measurably better than all that "portable" nonsense just by virtue of using cgroups to manage processes which means it can actually supervise poorly written double forking daemons.
> By the way, all the stuff you mentioned is not really part of the actual init system, namely PID 1
Except it literally is. I once had a systemd system suddenly refuse to boot (kernel panic because PID1 crashed or so) after a Debian upgrade, which I was able to resolve by... wait for it... making /etc/localtime not be a symlink.
Why does a failure doing something with the timezone make you unable to boot your system? What is it even doing with the timezone? What is failing about it? Who knows, good luck strace'ing PID1!
Turns out you're right and my knowledge was outdated. I seriously believed the systemd service manager was separate from its PID 1 but at some point they even changed the manuals to say that's not supported.
I was also corrected further down in the thread, with citations from the maintainers even:
https://news.ycombinator.com/item?id=39871735
As it stands I really have no idea why the service manager has not been split off from PID 1. Maintainer said that PID 1 was "different" but didn't really elaborate. Can't find much reliable information about said differences either. Do you know?
People are complaining that it's too big, labyrinthine, and arcane to audit, not that it doesn't work. They would prefer other things that work, but don't share those characteristics.
Also, the more extensive the remit (of this init), the more complexly interconnected the interactions between the components; the fewer people understand the architecture, the fewer people understand the code, the fewer people read the code. This creates a situation where the codebase is getting larger and larger at a rate faster than the growth of the number of man-hours being put into reading it.
This has to make it easier for people who are systemd specialists to put in (intentionally or unintentionally) backdoors and exploitable bugs that will last for years.
People keep defending systemd by talking about its UI and its features, but that completely misses the point. If systemd were replaced by something comprehensible and less internally codependent, even if the systemd UI and features were preserved, most systemd complainers would be over the moon with happiness. Red Hat invests too much into completely replacing linux subsystems, they should take a break. Maybe fix the bugs in MATE.
>the more complexly interconnected the interactions between the components
This is a bit of a rich criticism of systemd, given the init scripts it replaced.
> Red Hat invests too much into completely replacing linux subsystems, they should take a break. Maybe fix the bugs in MATE.
MATE isn't a Red Hat project. And nobody complains about Pipewire.
A shell script with a few defined arguments is not a complexly interconnected set of components. It's literally the simplest, most core, least-strongly-dependent interconnection that exists in a nix system.
Tell us you never bothered to understand how init worked before drawing a conclusion on it without telling us.
Have you ever seen the init scripts of a reasonably-complex service that required other services to be online?
Yep.
depend(){
need net localmount
after bootmisc
}
> Red Hat invests too much into completely replacing linux subsystems, they should take a break.
They should do whatever they feel is best for them, as should we. They're releasing free as in freedom GPL Linux software, high quality software at that. Thus I have no moral objections to their activities.
You have to realize that this is really a symptom of others not putting in the required time and effort to produce a better alternative. I know because I reinvent things regularly just because I enjoy it. People underestimate by many orders of magnitude the effort required to make something like this.
So I'm really thankful that I got systemd, despite many valid criticisms. It's a pretty good system, and it's not proprietary nonsense. I've learned to appreciate it.
Let’s not get started on how large the kernel is. Large code bases increase attack surface, period. The only sensible solution is to micro service out the pieces and only install the bare essentials. Why does the an x86 server come with Bluetooth drivers baked in?
The kernel devs are wasting time writing one offs for every vendor known to man, and it ships to desktops too.
How is the service manager different from PID1/init?
They are completely different things.
Init just a more or less normal program that Linux starts by default and by convention. You can make it boot straight into bash if you want. I created a little programming language with the ultimate goal of booting Linux directly into it and bringing up the entire system from inside it.
It's just a normal process really. Two special cases that I can think of: no default signal handling, and it can't ever exit. Init will not get interrupted by signals unless it explicitly configures the signal dispositions, even SIGKILL will not kill it. Linux will panic if PID 1 ever exits so it can't do that.
Traditionally, it's also the orphaned child process reaper. Process descriptors and their IDs hang around in memory until something calls wait on them. Parent processes are supposed to do that but if they don't it's up to init to do it. Well, that's the way it works traditionally on Unix. On Linux though that's customizable with prctl and PR_SET_CHILD_SUBREAPER so you actually can factor that out to a separate process. As far as I know, systemd does just that, making it more modular and straight up better than traditional Unix, simply because this separate process won't make Linux panic if it ever crashes.
As for the service manager, this page explains process and service management extremely well:
https://mywiki.wooledge.org/ProcessManagement
Systemd does it right. It does everything that's described in there, does it correctly, uses powerful Linux features like cgroups for even better process management and also solves the double forking problem described in there. It's essentially a solved problem with systemd. Even the people who hate it love the unit files it uses and for good reason.
I know the differences between them conceptionally.
The thing that people usually complain about is systemd forcibly setting its process manager at pid=1. I.e. the thing "discussed" in https://github.com/systemd/systemd/issues/12843
There is a secondary feature to run per-user managers, though I'm unsure whether it does run doesn't run without systemd PID1. Though it might only rely on logind.
Wow, I remember reading that PID != 1 line years ago. Had no idea they changed it. I stand corrected then. Given the existence of user service managers as well as flags like --system and --user, I inferred that they were all entirely separate processes.
Makes no sense to me why the service manager part would require running as PID 1. The maintainer just says this:
> PID 1 is very different from other processes, and we rely on that.
He doesn't really elaborate on the matter though.
Every time this topic comes up I end up searching for those so called PID 1 differences. I come up short every time aside from the two things I mentioned above. Is this information buried deep somewhere?
Just asked ChatGPT about PID 1 differences. It gave me the aforementioned two differences, completely dismissed Linux's prctl child subreaper feature "because PID 1 often assumes this role in practice" as well as some total bullshit about process group leaders and regular processes not being special enough to interact with the kernel which is just absolute nonsense.
So I really have no idea what it is about PID 1 that systemd is supposedly relying on that makes it impossible to split off the service manager from it. Everything I have read up until now suggests that it is not required, especially on Linux where you have even more control and it's not like systemd is shy about using Linux exclusive features.
> One of the objections that many people do not understand, is that systemd adds complexity. Unnecessary complexity. Boats full, loads full, mountains full of complexity.
Complexity that would otherwise be distributed to a sea of ad-hoc shell scripts? systemd is a win
The init-scripts that predated systemd were actually pretty damn simple. So was init itself.
We removed tens of thousands of lines of code for fixes for those "simple" init scripts when migrating to systemd.
They are never "simple", there is always some fucking edge case, like for example we had Java apps writing its own PID few seconds after start. So any app that did start and immediately after status (like Pacemaker) threw errors.
Or how once in blue moon MySQL didn't start after reboot because it so happened that:
* the PID file from previous boot wasn't cleared * some other app ran with same PID as it was in file * Script did not care, script saw pid file existing and didn't start MySQL.
Both examples from pre-systemd CentOS
> One of the objections that many people do not understand, is that systemd adds complexity. Unnecessary complexity. Boats full, loads full, mountains full of complexity.
this is and always has been such a dumb take.
if you'd like to implement an init (and friends) system that doesn't have "unnecessary complexity" and still provides all the functionality that people currently want, then go and do so and show us? otherwise it's just whinging about things not being like the terrible old days of init being a mass of buggy and racey shell scripts.
There were plenty of those that existed even before systemd. Systemd's adoption was not a result of providing the functionality that people want but rather was a result of providing functionality that a few important people wanted and promptly took hard dependencies on.
> about things not being like the terrible old days of init being a mass of buggy and racey shell scripts.
Zero of the major distros used System V init by default. Probably only distros like Slackware or Linux From Scratch even suggested it.
It's unfortunate that so many folks uncritically swallowed the Systemd Cabal's claims about how they were the first to do this, that, or the other.
(It's also darkly amusing to note that every service that has nontrivial pre-start or post-start configuration and/or verification requirements ends up using systemd to run at least one shell script... which is what would have often been inlined into their init script in other init systems.)
> Zero of the major distros used System V init by default. Probably only distros like Slackware or Linux From Scratch even suggested it.
I have absolutely no idea what you're trying to claim.
Are you suggesting that Debian's "sysvinit" package wasn't a System V init system? That the years I spent editing shell scripts in /etc/init.d/ wasn't System V init?
or are you making some pointless distinction about it not actually being pre-lawsuit AT&T files so it doesn't count or something?
or did you not use Linux before 2010?
if you have some important point to make, please make it more clearly.
> It's unfortunate that so many folks uncritically swallowed the Systemd Cabal's claims about how they were the first to do this, that, or the other.
I feel like you have very strong emotions about init systems that have nothing to do with the comment you're replying to.
> or did you not use Linux before 2010?
I've been using Linux regularly since 2002. I've never regularly used a Linux that used sysvinit.
In other words, over the past ~22 years (goddamn, where did the time go?) every Linux I've regularly used has had an init system that allows you to specify service dependencies to determine their start order.
> ...Debian...
Ah. That explains it. Debian's fine to build on top of but a bad distro to actually use. (Unless you really like using five-to-ten (and in some cases 25->35) year old software that's been superseded by much-improved versions.)
You should also consider that packages named "sysvinit" sometimes aren't actually what people think of when they hear "sysvinit": <https://wiki.gentoo.org/wiki/Sysvinit>
As long as Gnome requires bug-compatibility with systemd, nobody will rewrite it.
I have a design in the works to do just this.
The problem? It's on the backburner because I don't think I could find a business model to make money from it.
I don't think offering support for a price would work, for example.
What's the point of your implementation? systemd is totally modular, you can use just the init system without networkd, timesyncd, resolved, nspawn, whatever else I forgot about.
If you want you can just use systemd as PID1 for service management and enjoy a sane way to define and manage services – and do everything in archaic ways like 20 years ago.
There are two points to the implementation:
* Choice. If I have a separate implementation, my users do not have to be subject to systemd's choices. And I do not either.
* The same implementation will have the same bugs, so in the same way that redundant software has multiple independent implementations, having an independent implementation will avoid the same bugs. It may have different bugs, sure, but my goal would be to test like SQLite and achieve DO-178C certification. Or as close as I could, anyway.
I'd assume chances of monetizing this are incredibly low. There already is an init system that understands systemd unit files, the name escapes my mind unfortunately. DO-178C might be a selling point literally, but whether there's enough potential customers for ROI is questionable.
I unfortunately agree with you. Hence why it's on the backburner.
No, you can't. Systemd might be somewhat modular; the things distros ship which depend on it are not.
Well some distros might force more components upon you but thas hardly systemd's fault. Same if some software decides to make use of another component of systemd - then that's their choice, but also there are alternatives. The only thing that comes to mind right now would be something like GNOME which requires logind, but all other "typical" software only wants systemd-the-init-system if anything. You can run Debian just fine with just systemd as an init system and nothing else.
What about sponsors? Actually, now I have the idea of a platform similar to Kickstarter but for software development, and with just sponsors. It wouldn't work, sure... Except in some cases. Like when things like this happen...
Sponsors are fickle, unfortunately, and they tend to remove "donations" when money gets tight.
If I am considered a full vendor, though, and a vendor for a critical piece of software, they might keep me around.
Also thanks to Debian for modifying openssh.
You're not wrong. Had Debian not patched it in this way, OP might have never found it, leaving all other distros who do the same vulnerable.
Note that OP found this in Debian sid as well, which means it's highly unlikely this issue will find its way into any Debian stable systems.
Right, the systemd notification framework is very simple and I've used it in my projects. I didn't even know that libsystemd provided an implementation.
My Arch system was not vulnerable because openssh was not linked to xz.
IMO every single commit from JiaT75 should be reviewed and maybe even rolled back, as they have obliterated their trust.
edit:
https://github.com/google/oss-fuzz/pull/10667
Even this might be nefarious.
> the systemd notification framework is very simple and I've used it in my projects
Have you come across an outline or graph of systemd that you really like, or maybe a good example of a minimal setup?
If they hadn't been modifying SSH their users would never have been hit by this backdoor. Of course if it is actually intended to target SSH on Debian systems, the attacker would likely have picked a different dependency. But adding dependencies like Debian did here means that those dependencies aren't getting reviewed by the original authors. For security-critical software like OpenSSH such unaudited dependencies are prime targets for attacks like this.
My point was, this is not "Debian did a thing". Lots of other distros do the same thing. In this particular case, it was in fact fortunate for users of all these other distros that Debian did it, lest this vulnerability might have never been found!
Also, only users on sid (unstable) and maybe testing seem to have been affected. I doubt there are many Debian servers out there running sid.
Debian stable (bookworm) has xz-utils version 5.4.1: https://packages.debian.org/bookworm/xz-utils
I would phrase it as "It's good we have a heterogenous open-source community".
Monocrops are more vulnerable to disease because the same (biological) exploit works on the entire population. In our Linux biosphere where there are dozens of major, varied configurations sharing parts but not all of their code (and hundreds or thousands of minor variations), a given exploit is likely to fail somewhere, and that failure is likely to create a bug that someone can notice.
It's not foolproof, but it helps keep the ecosystem healthy.
> Debian stable (bookworm) has xz-utils version 5.4.1: https://packages.debian.org/bookworm/xz-utils
Guess who released 5.4.1? JiaT75!
5.4.1 doesn't even have the `m4/build-to-host.m4` script that pulls the backdoor's tarball.
Neither does https://salsa.debian.org/debian/xz-utils/-/tree/v5.6.0/m4
The script was not present in the git tree, only in the released archives.
I'm also suggesting that there could be more than one exploit present. All of their commits should be rolled back, none of it can be trusted.
> The script was not present in the git tree, only in the released archives.
I confess I couldn't quite figure out the branching and tagging strategy on that repo. Very weird stuff. That script seems to have been added by Sebastian Andrzej Siewior just ahead of the 5.6.0 release. It's definitely present in the Debian git tree, and probably in many other distros since others seem to be affected.
The commit where the script was added to Debian is tagged `upstream/v5.6.0` despite the script itself not being present on that tag upstream: https://github.com/tukaani-project/xz/tree/v5.6.0/m4
> I'm also suggesting that there could be more than one exploit present. All of their commits should be rolled back, none of it can be trusted.
I agree.
> I confess I couldn't quite figure out the branching and tagging strategy on that repo.
It's just a regular Debian packaging repository, which includes imports of upstream tarballs - nothing out of ordinary there. Debian packaging is based on tarballs, not on git repos (although in absence of upstream tarballs, Debian maintainer may create a tarball out of VCS repo themselves).
The linked repo just happens to include some tags from upstream repo, but those tags are irrelevant to the packaging. Only "debian/*" and "upstream/*" tags are relevant. Upstream VCS history is only imported for the convenience of the packager, it doesn't have to be there.
Debian's git repositories don't have any forced layout (they don't even have to exist or be up-to-date, the Debian Archive is the only source of truth - note how this repo doesn't contain the latest version of the package), but in practice most of them follow the conventions of DEP-14 implemented by gbp (in this particular case, it looks like `gbp import-orig --upstream-vcs-tag`: https://wiki.debian.org/PackagingWithGit#Upstream_import_met...).
Thanks for the explanation, very helpful!
Not just commits, but all tarballs released with his key.
It takes a village.
Uh. systemd documents the protocol at various places and the protocol is trivial: a single text datagram sent to am AF_UNIX socket whose path you get via the NOTIFY_SOCKET. That's trivial to implement for any one with some basic unix programming knowledge. And i tell pretty much anyone who wants to listen that they should just implement the proto on their own if thats rhe only reason for a libsystemd dep otherwise. In particular non-C environments really should do their own native impl and not botjer wrapping libsystemd just for this.
But let me stress two other things:
Libselinux pulls in liblzma too and gets linked into tons more programs than libsystemd. And will end up in sshd too (at the very least via libpam/pam_selinux). And most of the really big distros tend do support selinux at least to some level. Hence systemd or not, sshd remains vulnerable by this specific attack.
With that in mind libsystemd git dropped the dep on liblzma actually, all compressors are now dlopen deps and thus only pulled in when needed.
> And i tell pretty much anyone who wants to listen that they should just implement the proto on their own if thats rhe only reason for a libsystemd dep otherwise
Could you point out where the man page (https://www.freedesktop.org/software/systemd/man/latest/sd_n...) says this?
The notes section has a brief description of the protocol and the different kinds of sockets involved.
If you are talking about the stability of that interface: https://systemd.io/PORTABILITY_AND_STABILITY/
That site states "using libsystemd is a good choice"
:-D
Deferring the load of the library often just makes things harder to analyze, not necessarily more secure. I imagine many of the comments quoting `ldd` are wrongly forgetting about `dlopen`.
(I really wish there were a way to link such that the library isn't actually loaded but it still shows in the metadata, so you can get the performance benefits of doing less work but can still analyze the dependency DAG easily)
It would make things more secure in this specific backdooring case, since sshd only calls a single function of libsystemd (sd_notify) and that one would not trigger the dlopen of liblzma, hence the specific path chosen by the backdoor would not work (unless libselinux fucks it up fter all, see other comments)
Dlopen has drawbacks but also major benefits. We decided the benefits relatively clearly outweigh the drawbacks, but of course people may disagree.
I have proposed a mechanism before, that would expose the list of libs we potentially load via dlopen into an ELF section or ELF note. This could be consumed by things such as packagae managers (for auto-dep generation) and ldd. However there was no interest in getting this landed from anyone else, so I dropped it.
Note that there are various cases where people use dlopen not on hardcoded lib names, but dynamically configured ones, where this would not help. I.e. things like glibc nss or pam or anything else plugin based. But in particular pam kinda matters since that tends to be loaded into almost any kind of security relavant software, including sshd.
The plugin-based case can covered by the notion of multiple "entry points": every library that is intended to be `dlopen`ed is tagged with the name of the interface it provides, and every library that does such `dlopen`ing mentions the names of such interfaces rather than the names of libraries directly. Of course your `ldd` tool has to scan all the libraries on the system to know what might be loaded, but `ldconfig` already does that for libraries not in a private directory.
This might sound like a lot of work for a package-manager-less-language ecosystem at first, but if you consider "tag" as "exports symbol with name", it is in fact already how most C plugin systems work (a few use an incompatible per-library computed name though, or rely entirely on global constructors). So really only the loading programs need to be modified, just like the fixed-name `dlopen`.
I note that Solaris had runtime-optional but compile-time linked shared libraries, I always wondered why Linux/glibc never adopted them.
> And i tell pretty much anyone who wants to listen that they should just implement the proto on their own if thats rhe only reason for a libsystemd dep otherwise.
That's what I think too. Do the relevant docs point this out too? Ages ago they didn't. I think we should try to avoid that people just google "implement systemd notify daemon" and end up on a page that says "link to libsystemd and call sd_notify()".
The correct thing to do would be to put different unrelated APIs into their own library, instead of everything into libsystemd0. This has always been one of my biggest issues with it. It makes it hard to replace just one API from that library, because on a binary distribution, only one package can provide it. And as a nice side effect, surprises like this one could then be avoided.
systemd developers have already rejected that approach, so I guess we will end up with lots of reimplementations, both in individual projects and third-party libsystemd-notify style libraries.
I see that different clients implemented in different languages will need different client-libraries and maintaining all that is not something, a core project is going to do but if using the raw protocol instead of the convenience of libsystemd is a (commonly ignored) recommendation which makes a lot of sense in terms of segmentation, providing at least one reference implementation would point all systemd users into the right direction. Recommending that each client should just implement the (trivial) protocol access itself does not make so much sense to me.
IIRC sshd loads libpam only if specifically configured for it. So while it's not wrong, it's also a more edge case for the backdoor to work.
And will end up in sshd too (at the very least via libpam/pam_selinux).
Inaccurate.
It's not pulled in on any sysvinit Debian system I run. It is on stable, oldstable, and oldoldstable systems via systemd.
Not systemd:
# ldd $(which sshd) linux-vdso.so.1 (0x00007ffcb57f5000)
libcrypt.so.1 => /lib/x86_64-linux-gnu/libcrypt.so.1 (0x00007fbad13c9000)
libwrap.so.0 => /lib/x86_64-linux-gnu/libwrap.so.0 (0x00007fbad13bd000)
libaudit.so.1 => /lib/x86_64-linux-gnu/libaudit.so.1 (0x00007fbad138c000)
libpam.so.0 => /lib/x86_64-linux-gnu/libpam.so.0 (0x00007fbad137a000)
libsystemd.so.0 => /lib/x86_64-linux-gnu/libsystemd.so.0 (0x00007fbad12d5000)
libselinux.so.1 => /lib/x86_64-linux-gnu/libselinux.so.1 (0x00007fbad12a5000)
libgssapi_krb5.so.2 => /lib/x86_64-linux-gnu/libgssapi_krb5.so.2 (0x00007fbad1253000)
libkrb5.so.3 => /lib/x86_64-linux-gnu/libkrb5.so.3 (0x00007fbad1179000)
libcom_err.so.2 => /lib/x86_64-linux-gnu/libcom_err.so.2 (0x00007fbad1173000)
libcrypto.so.3 => /lib/x86_64-linux-gnu/libcrypto.so.3 (0x00007fbad0c00000)
libz.so.1 => /lib/x86_64-linux-gnu/libz.so.1 (0x00007fbad1154000)
libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007fbad0a1f000)
libnsl.so.2 => /lib/x86_64-linux-gnu/libnsl.so.2 (0x00007fbad1137000)
libcap-ng.so.0 => /lib/x86_64-linux-gnu/libcap-ng.so.0 (0x00007fbad112f000)
libcap.so.2 => /lib/x86_64-linux-gnu/libcap.so.2 (0x00007fbad1123000)
/lib64/ld-linux-x86-64.so.2 (0x00007fbad156a000)
libpcre2-8.so.0 => /lib/x86_64-linux-gnu/libpcre2-8.so.0 (0x00007fbad1089000)
libk5crypto.so.3 => /lib/x86_64-linux-gnu/libk5crypto.so.3 (0x00007fbad09f2000)
libkrb5support.so.0 => /lib/x86_64-linux-gnu/libkrb5support.so.0 (0x00007fbad09e4000)
libkeyutils.so.1 => /lib/x86_64-linux-gnu/libkeyutils.so.1 (0x00007fbad09dd000)
libresolv.so.2 => /lib/x86_64-linux-gnu/libresolv.so.2 (0x00007fbad09cc000)
libtirpc.so.3 => /lib/x86_64-linux-gnu/libtirpc.so.3 (0x00007fbad099e000)
systemd:# ldd $(which sshd) linux-vdso.so.1 (0x00007ffc4d3eb000)
libcrypt.so.1 => /lib/x86_64-linux-gnu/libcrypt.so.1 (0x00007feb8aa35000)
libwrap.so.0 => /lib/x86_64-linux-gnu/libwrap.so.0 (0x00007feb8aa29000)
libaudit.so.1 => /lib/x86_64-linux-gnu/libaudit.so.1 (0x00007feb8a9f8000)
libpam.so.0 => /lib/x86_64-linux-gnu/libpam.so.0 (0x00007feb8a9e6000)
libsystemd.so.0 => /lib/x86_64-linux-gnu/libsystemd.so.0 (0x00007feb8a916000)
libselinux.so.1 => /lib/x86_64-linux-gnu/libselinux.so.1 (0x00007feb8a8e6000)
libgssapi_krb5.so.2 => /lib/x86_64-linux-gnu/libgssapi_krb5.so.2 (0x00007feb8a894000)
libkrb5.so.3 => /lib/x86_64-linux-gnu/libkrb5.so.3 (0x00007feb8a7ba000)
libcom_err.so.2 => /lib/x86_64-linux-gnu/libcom_err.so.2 (0x00007feb8a7b4000)
libcrypto.so.3 => /lib/x86_64-linux-gnu/libcrypto.so.3 (0x00007feb8a200000)
libz.so.1 => /lib/x86_64-linux-gnu/libz.so.1 (0x00007feb8a795000)
libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007feb8a01f000)
libnsl.so.2 => /lib/x86_64-linux-gnu/libnsl.so.2 (0x00007feb8a778000)
libcap-ng.so.0 => /lib/x86_64-linux-gnu/libcap-ng.so.0 (0x00007feb8a770000)
libcap.so.2 => /lib/x86_64-linux-gnu/libcap.so.2 (0x00007feb8a764000)
libgcrypt.so.20 => /lib/x86_64-linux-gnu/libgcrypt.so.20 (0x00007feb89ed8000)
liblzma.so.5 => /lib/x86_64-linux-gnu/liblzma.so.5 (0x00007feb8a735000)
libzstd.so.1 => /lib/x86_64-linux-gnu/libzstd.so.1 (0x00007feb89e1c000)
liblz4.so.1 => /lib/x86_64-linux-gnu/liblz4.so.1 (0x00007feb8a70d000)
/lib64/ld-linux-x86-64.so.2 (0x00007feb8abb5000)
libpcre2-8.so.0 => /lib/x86_64-linux-gnu/libpcre2-8.so.0 (0x00007feb89d82000)
libk5crypto.so.3 => /lib/x86_64-linux-gnu/libk5crypto.so.3 (0x00007feb8a6e0000)
libkrb5support.so.0 => /lib/x86_64-linux-gnu/libkrb5support.so.0 (0x00007feb8a6d2000)
libkeyutils.so.1 => /lib/x86_64-linux-gnu/libkeyutils.so.1 (0x00007feb8a6c9000)
libresolv.so.2 => /lib/x86_64-linux-gnu/libresolv.so.2 (0x00007feb8a6b8000)
libtirpc.so.3 => /lib/x86_64-linux-gnu/libtirpc.so.3 (0x00007feb8a68a000)
libgpg-error.so.0 => /lib/x86_64-linux-gnu/libgpg-error.so.0 (0x00007feb89d5a000)
EG# ldd $(which sshd) | grep liblz
liblzma.so.5 => /lib/x86_64-linux-gnu/liblzma.so.5 (0x00007fd1e647a000)
liblz4.so.1 => /lib/x86_64-linux-gnu/liblz4.so.1 (0x00007fd1e6398000)
You also need to have sshd enabled to use PAM and your sshd pam stack should include pam_selinux. Then it will be dynamically loaded only when sshd starts a PAM session.
The notify protocol isn't much more complicated than that. From memory you send a string to a unix socket. I have written both systemd notify and listenfd in a few languages for little experiments and it is hard to imagine how the protocols could be simpler.
Looking at most popular projects these days they are a mass of dependencies and I think very few of them can be properly audited and verified by the projects that use them. Rust and Go might be more memory safe than C but look at the number of cargo or go modules in most projects. I have mostly stopped using node/npm on my systems.
Not a programmer, but couldn't the distribution's sshd patches for systemd (and all other distro patches for privileged daemons) use static includes? Wouldn't that have only pulled in the simple client-side communication API? Would that have defeated this vector? Would it be doable?
It's unfortunate that the anti-systemd party lost the war... years ago. But I don't blame systemd, Lennart Pottering or the fanboys (though it would have been so much better if the guy never worked in open source or wasn't such a prolific programmer). I blame Debian and its community for succumbing to this assault on Unix philosophy (again, years ago).
Sometimes things evolve in ways that make us feel a little obsolete.
I've been learning NixOS for a few years now, and it would have been impossible without systemd. It's one heck of a learning curve, but when you get to the other side, you know something of great power and value. Certain kinds of complexity adds 'land' (eg. systemd) that can become 'real estate' (eg. NixOS), which in turn hopes to become 'land' for the next innovation, and so forth.
Whether this happens or not (whether it's the right kind of complexity) is really hard to assess up-front, and probably impossible without knowing the complex new technology in question very well. (And by then you have the bias of depending, in part, yourself on the success of the new tech, as you've committed significant resources to mastering it, so good luck on convincing skeptical newcomers!)
It's almost like a sort of event horizon -- once you know a complex new technology well enough to see whether or not it's useful, the conflict-of-interest makes your opinion unreliable to outsiders!
Nevertheless, the assessment process itself, while difficult to get right, is worth getting better at.
It's easy for impatience and the sensation of what I've taken to calling 'daunt' -- that intrinsic recoil that the mind has from absorbing a large amounts of information whose use case is not immediately relevant -- to dissuade one from exploring. But then, one never discovers new 'land', and one never builds new real estate!
[ Aside: This is why I'm a little skeptical of the current rebellion against frontend frameworks. Certainly some of them, like tailwind, are clearly adding fetters to an otherwise powerful browser stack. But others, like Svelte, and to some extent, even React, bring significant benefits.
The rebellion has this vibe like, well, users _should_ prefer more simply-built interfaces, and if they don't, well, they just have bad taste. What would be more humble would be to let the marketplace (e.g. consumers) decide what is preferable, and then build that. ]
What? I don't get it? Isn't it on Debian if they modified the package to do something like this? Why would you blame systemd for maintainers doing something that upstream has never required or recommended?
xz is so pervasive, I just discovered on my Mac that the (affected?) version 5.6.1 made it into homebrew. The post in the linked article says that only Linux x86-64 systems are affected, but now I'm left scratching my head whether my Mac is also in trouble, just that we don't know it yet.
The two active maintainers seem to be: Lasse Collin <lasse.collin@tukaani.org> and Jia Tan <jiat0218@gmail.com>
Searching DDG for "jiat0218" I came across a blog post which I found weird. Seems to be dated: 2006-05-03
Blog post: "Kuso拍賣.有靈氣的筷子 - 闕小豪" <https://char.tw/blog/post/24397301>
Internet Archive link: <https://web.archive.org/web/20240329182713/https://char.tw/b...>
The contents of the page when translated seems to be about jiat0218 auctioning a pair of spiritual chopsticks as a prank.
The blog entry is basically a QA between jiat0218 and various other people about these chopsticks.
If Jia Tan does turn out to be a compromised maintainer working for a state actor then some of the content on the blog page can be viewed in a more sinister way (i.e. spycraft / hacks for sale etc.).
Example question 38:
Question 38
accounta066 (3): Are these chopsticks really that good? I kind of want to buy
them! But I recently sent money for online shopping but didn’t receive anything.
It’s very risky; currently jiat0218 you don’t have any reviews, you can
interview me. Do you want to hand it over?! … A sincere buyer will keep it.
Reply to
jiat0218 (4): First of all, I would like to express my condolences to you for
your unfortunate experience! What can I say about this kind of thing...My little
sister has always been trustworthy. What’s more, this is a pair of spiritual
chopsticks, so I hope to have a good one. It’s the beginning! As you can see,
my little sister is very careful and takes her time when answering your
questions. Except for the two messages that were accidentally deleted by her,
she always answers your questions. If this still doesn’t reassure you, then I
can only say that I still have room to work hard. You are still welcome
to bid... ^_^
Note however, it could all just be what it purports to be which is a prank auction of spiritual chopsticks.This is likely just a coincidence. 0218 looks like a birthday and jiat is probably the name + initial. 18 years is also too long of a time horizon for this.
Chopsticks could also be a codeword for something. Maybe some sort a backdoor into a system somewhere.
Crazy to think that the time horizon for these kinds of attacks span decades. This absolutely does not read like a coincidence. Chopsticks, little sister, "room to work hard", all sound like codewords.
Do you say that about every word commonly used in Asia?
Sounds to me like google translate gibberish
Something about this I found surprising is that Linux distros are pulling and packaging pre-built binaries from upstream projects. I'd have expected them to build from source.
They were pulling a tarball from upstream and building it - the tarball was compromised.
The answer is not complete. There were 2 ways to pull sources:
bad - https://github.com/tukaani-project/xz/releases/download/...
or:
good - https://github.com/tukaani-project/xz/archive/refs/tags/...
Specifically in Gentoo, there is a note in https://github.com/gentoo/gentoo/blob/master/app-arch/xz-uti...
# Remember: we cannot leverage autotools in this ebuild in order
# to avoid circular deps with autotools
Namely, to unpack autoconf-2.72e.tar.xz from gnu.org you need xz-tools. And this is just the shortest circle. It is not very common, but xz-utils was one of few rare cases where regeneration of autohell files was considered as unnecessary complication (it backfired).Unfortunately, those GitHub links are no longer valid, so we randos can't use them to learn what went wrong here. Hopefully GH will reverse this decision once the dust settles.
GitHub should not just reverse and make repo public and archived "as is", because there are many rolling distributions (from Gentoo to LFS), submodule pullers, CI systems, unaware users, which may pull and install the latest backdoored commit of archived project.
However if you want to access exact copies of backdoored tarballs, they are still available on every mirror, e. g. in http://gentoo.mirror.root.lu/distfiles/9f/ . For project of this level artifacts are checksummed and mirrored across the world by many people, and nothing wrong with that.
The gist of it is: The "good" one is the auto generated "Source code" releases made by github. The "bad" one is a manually generated and uploaded source code release, which can have whatever you want.
Sorry, by "they" I mean "Debian and Fedora", which (when including derivatives) include most Linux systems which use a Linux distro in the standard sense.
Not in this case as the other commenter pointed out but for example Vivaldi on Arch Linux is just a repackaged upstream build.
https://gitlab.archlinux.org/archlinux/packaging/packages/vi...
Homebrew is currently shipping 5.6.1 (and was shipping 5.6.0 as well). Hopefully not affected on mac?
Homebrew reverted to 5.4.6 once the maintainers became aware. The current understanding is that macOS is not affected, but that's not certain.
The issue is caused by patches to add integration with systemd, so no, this won't affect SSH on a Mac.
Just because macs don't use systemd, doesn't mean the backdoor won't work. The oss-sec post talks about liblzma having backdoors in crc32_resolve() and crc64_resolve() and that it has not been fully reversed. This could perhaps affect more than just sshd on x86-64 linux?
> Just because macs don't use systemd, doesn't mean the backdoor won't work.
Practically speaking it can't - For one the script injected into the build process tests that you're running on x86-64 linux, for another, the injected code is elf code, which wouldn't link on a mac. It also needs to manipulate dynamic linker datastructures, which would also not work the same on a mac.
> This could perhaps affect more than just sshd on x86-64 linux?
This however is true - /usr/sbin/sshd was the only argv[0] value that I found to "work", but it's possible there are others. "/usr/sbin/sshd" isn't a string directly visible in the injected code, so it's hard to tell.
The article explains numerous concurrent conditions that have to be met for the backdoor to even be activated (at build time, not runtime), which combined make it extremely unlikely this will affect SSH on macOS:
- linux
- x86-64
- building with gcc & the GNU linker
- part of a .deb or .rpm build
Add to that, as the article explains: openssh does not directly use liblzma, the only reason SSH is affected at all, is because some Linux Distros patch openssh to link it against systemd, which does depend on liblzma.
Could it affect things other than SSH on a Mac? Unlikely. The compromise was introduced in 5.6.0, but macOS Sonoma has 5.4.4 (from August last year).
Well isn't this an interesting commit. He finished his inject macro to compose the payload at build, so now he can start clearing up the repo so none of that shit gets seen when cruising through it.
https://git.tukaani.org/?p=xz.git;a=commitdiff;h=4323bc3e0c1...
That's not what gitignore does. I can't think of a way it would let you hide this exploit.
Accidentally committing it.
Makes it a bit harder for sure. What actually happens if you git add something that's ignored? I assumed it would still let you do it, but never tried.
Everybody here In jumping into the pure malice bandwagon, I have a better hypothesis.
Abandonment and inaction, the actual developers of these tools are elsewhere, oblivious to this drama, trying to make living because most of the time you are not compensated nor any corporation cares about making things sustainable at all. This is the default status of everything your fancy cloud depends on underneath.
An attacker took over of the project slowly and stayed dormant until recently.
Except that doesn't match reality.
Someone has worked on xz for several years. Are you saying that this somewhat active contributor was likely actively contributing, then all of a sudden stopped, also stopped paying attention, and also allowed their account to be compromised or otherwise handed it over to a nefarious party?
That fails the sniff test.
See, people drop dead from OSS projects pretty frecuently, usually because they take on other life responsabilities and there is no cushion or guard against a bus factor. Then it is very easy to get credentials compromised or have your project took over by someone else.
Well, yeah. The attacker, operating largely under the name Jia Tan, has successfully manipulated the original author (Lasse Collin) to become a maintainer.
The attacker indeed laid dormant for two years, pretending to just be maintaining xz.
I really don't see any way how this wasn't malice on Jia's part. But I do think your hypothesis applies to Lasse, who was just happy someone could help him maintain xz.
funding model of OSS work is obviously a problem, but these problems are deeper than that. even a very well compensated OSS developer can get a knock on the door from a government agency (or anyone with a "$5 wrench")[1] and they might feel "compelled" to give up their maintainer creds.
I'm really curious about if the act of injecting a backdoor into OSS software is legal/illegal ?
Are they somehow in the clear unless we can show they actively exploited it?
Probably depends on criminal code a country. Mine does (EU country):
> Section 231 Obtaining and Possession of Access Device and Computer System Passwords and other such Data
> (1) Whoever with the intent to commit a criminal offence of Breach of secrecy of correspondence [...] or a criminal offence of Unauthorised access to computer systems and information media [...] produces, puts into circulation, imports, exports, transits, offers, provides, sells, or otherwise makes available, obtains for him/herself or for another, or handles
> a) a device or its component, process, instrument or any other means, including a computer programme designed or adapted for unauthorised access to electronic communications networks, computer system or a part thereof, or
> b) a computer password, access code, data, process or any other similar means by which it is possible to gain access to a computer system or a part thereof,
shall be sentenced .. (1 year as an individual, 3 years as a member of a organized group)
The way I see it: People are being charged for their speech all the time. Especially outside the US, but even in the US. And code is speech.
And that is even before all the hacking/cracking/espionage laws get involved.
There's a reason all the (sane) people doing grey/black hat work take their security and anonymity extremely seriously.
Oof, this is on my Sid laptop:
{0}[calvinow@mozart ~] dpkg-query -W liblzma5
liblzma5:amd64 5.6.0-0.2
{0}[calvinow@mozart ~] hexdump -ve '1/1 "%.2x"' /lib/x86_64-linux-gnu/liblzma.so.5 | grep -c f30f1efa554889f54c89ce5389fb81e7000000804883ec28488954241848894c2410
1
Glad I stopped running sshd on my laptop a long time ago... still probably going to reinstall :/No obvious need to reinstall if you didn't use ssh and expose it publicly and are not a politically important person. All signs suggest that it was a nation state attack, and you are likely not a target.
We'll see... given that sshd is just one of many possible argv[0] it may have choosen to act on, I'm going to be a little paranoid until it's been fully analyzed. It just takes half an hour to reinstall, I have some shows to catch up on anyway :)
I was thinking about reinstalling, because I'm on Manjaro Linux, which has the version in question.
But it's unclear if earlier versions are also vulnerable.
And if it did nasty things to your machine, how do you make sure that the backups you have do not include ways for the backdoor to reinstate itself?
Sure, the backdoor could have e.g. injected an libav exploit into a video file to re-backdoor my system when I watch it... that's too paranoid for me.
I don't backup the whole system, just a specific list of things in /home.
Anyone have any idea what the code in the malicious liblzma_la-crc64-fast.o is actually doing? It's difficult to follow statically.
The `pack`[0] compression utility that reached the HN front page the other day[1] is setting off my alarm bells right now. (It was at the time too, but now doubly so)
It's written in Pascal, and the only (semi-)documented way to build it yourself is to use a graphical IDE, and pull in pre-compiled library binaries (stored in the git repo of a dependency which afaict Pack is the only dependent of - appears to be maintained by the same pseudonymous author but from a different account).
I've opened an issue[2] outlining my concerns. I'm certainly not accusing them of having backdoored binaries, but if I was setting up a project to be deliberately backdoorable, it'd look a lot like this.
[0] https://pack.ac/
We need to get these complex & bloated build-systems under control.
What we need is to move away from 1970s build tools.
I'm not trying to troll, but I'm wondering if a distro like Gentoo is less susceptible to such attacks, since the source code feels more transparent with their approach. But then again, it seems that upstream was infected in this case, so I'm not sure if a culture of compiling from source locally would help.
It is not going to make a difference. If you run malicious code, you will get hacked. Compiling the code yourself does not prevent the code from being malicious.
The one it might help is it might make it easier to find the back door once you know there is one.
I am not embarrassed to say... is there anything in there that someone who runs a server with ssh needs to know?
I literally can't make heads or tails of the risk here. All I see is the very alarming and scary words "backdoor" and "ssh server" in the same sentence.
If I am keeping stuff up to date, is there anything at all to worry about?
You should probably not be running your own publicly-accessible ssh servers if this email is not sufficient to at least start figuring out what your next actions are.
The email itself comes with an evaluation script to figure out if anything is currently vulnerable to specifically this discovery. For affected distributions, openssh servers may have been backdoored for at least the past month.
Yet here I am, getting up every morning and getting dressed and tying my shoes all by myself, and then maintaining a small number of servers that have openssh on them!
Thanks, though, for pointing out the little script at the very end of that technical gauntlet of an email intended for specialists. I had gotten through the first 3 or 4 paragraphs and had given up.
What I should have done is just googled CVE-2024-3094, whatever, still glad I asked.
> You should probably not be running your own publicly-accessible ssh servers if this email is not sufficient to at least start figuring out what your next actions are.
That seems like a fairly unreasonable stance.
Not at all. For instance, I don't know what the next steps are, but I run SSH servers behind Wireguard, exactly to prevent them being accessible in the case of such events. Wireguard is simple to setup, even if I lack the expertise to understand exactly how to go forward.
> I literally can't make heads or tails of the risk here. All I see is the very alarming and scary words "backdoor" and "ssh server" in the same sentence.
From what I've read, there is still lots of unknowns about the scope of the problem. What has been uncovered so far indicates it involves bypassing authentication in SSH.
In https://gist.github.com/thesamesam/223949d5a074ebc3dce9ee78b..., Sam James points out
> If this payload is loaded in openssh sshd, the RSA_public_decrypt function will be redirected into a malicious implementation. We have observed that this malicious implementation can be used to bypass authentication. Further research is being done to explain why.
Thus, an attacker maybe could use this to connect to vulnerable servers without needing to authenticate at all.
Thanks, that gist is a really lucid explanation for normal folks.
Is it time to deprecate the ability for code to implement linker symbols in other libraries? Shouldn't there be a strict namespace separation between binaries/libraries? liblzma being to implement openssh symbols seems like a symptom of a much larger problem.
Safety through obscurity and weirdness! If you disable ifunc, like any sensible person, this backdoor disables itself.
Interesting, I used https://ossinsight.io/analyze/JiaT75 to identify contributions from the account used by author of the backdoor. It looks like the account made other potentially problematic contributions to other projects.
The disabling of ifunc in this PR against Google's oss-fuzz project maybe one way they tried to prevent this particular backdoor being flagged by that tool? https://github.com/google/oss-fuzz/pull/10667
There is a related issue for LLVM/clang by this person:
I am curious, why don't this clever hacker create multiple accounts instead of only this "JiaT75"
I'm curious now. What is ifunc? (Had difficulty finding it through a search)
ifunc is a GNU method of interposing function calls with platform-optimized versions of the function. It is used to detect CPU features at runtime and insert, for example, AVX2-optimized versions of memcmp. It is seen in crypto a lot, because CPUs have many crypto-specific instructions.
However, I don't like it much and I think software should be compiled for the target machine in the first place. My 1 hardened system that is reachable from the public network is based on musl, built mostly with llvm, and with ifunc disabled.
> However, I don't like it much and I think software should be compiled for the target machine in the first place.
That means you either have to compile software locally on each machine, or you have a combinatorial explosion of possible features.
Compiling locally has several drawbacks. It needs the full compilation environment installed on every machine, which uses a lot of disk space, and some security people dislike it (because then attackers can also compile software locally on that machine); compiling needs a lot of memory and disk space, and uses a lot of processor time and electric power. It also means that signature schemes which only allow signed code cannot be used (or you need to have the signing key available on the target machine, making it somewhat pointless).
The combinatorial explosion of features has been somewhat tamed lately, by bundling sets of feature into feature levels (x86_64-v1, etc), but that still quadruples the amount of compiled code to be distributed, and newer features still have to be selected at runtime.
Compiled _on_ and compiled _for_ are not the same. There must be a way to go to the target machine, get some complete dump of CPU features, copy that to the compile-box, do the build, and copy the resulting binaries back.
> That means you either have to compile software locally on each machine, or you have a combinatorial explosion of possible features.
Or you just have to buy a lot of the exact same hardware. Secure installations tend to do that.
I don't think you can really say it is "combinatorial" because there's not a mainstream machine with AES-NI but not, say, SSSE3. In any case if there were such a machine you don't need to support it. The one guy with that box can do scratch builds.
I have no issues compiling everything on my Gentoo box.
Obviously compiling for the target architecture is best, but for most software (things like crypto libraries excluded) 95% of the benefit of AVX2 is going to come from things like vectorized memcpy/memcmp. Building glibc using ifuncs to provide optimized implementations of these routines gives most users most of the benefit of AVX2 (or whatever other ISA extension) while still distributing binaries that work on older CPU microarchitectures.
ifunc memcpy also makes short copies suck ass on those platforms, since the dispatch cost dominates regardless of the vectorization. It's an open question whether ifunc helps or harms the performance of general use cases.
By "open question" I meant that there is compelling research indicating that GNU memcpy/memcmp is counterproductive, but the general Linux-using public did not get the memo.
https://storage.googleapis.com/gweb-research2023-media/pubto...
"AsmDB: Understanding and Mitigating Front-End Stalls in Warehouse-Scale Computers" Section 4.4 "Memcmp and the perils of micro-optimization"
On the other hand, it also means that your distro can supply a microarchitecture-specific libc and every program automatically gets the memcpy improvements. (Well, except for the golang/rust people.)
Wasn't this the point of Gentoo, back in the day? It was more about instruction scheduling and register allocation differences, but your system would be built with everything optimized for your uarch.
Is there a way to easily/reliably disable ifunc globally on a system (e.g. ubuntu/debian) without breaking a bunch of things?
FYI this looks for pkgs with liblzma:
> dpkg -l |grep liblzma
Versions >= 5.6 are compromised
Why doesn’t GitHub force “releases” to be a simple repo tarball for sources and with binaries from GitHub actions or such…
I find it incredibly ironic that a “version control” site gives no assurance of reproducible builds (nor reproducible source!!)
The real villain is not the perpetrator, it is Microsoft, and it is all of us.
Too inflexible ideological. There are infinite things that most properly belong in a release file and not in the source, that can't be generated from that source by github actions, and seperately no one should be compelled to use github actions.
Because then for autoconf codebases you have to commit `./configure` or you have to require that users have autoconf installed and run `autoreconf -fi` first.
Maybe autoconf-using projects should really just require that users have autoconf installed.
Not that that would prevent backdoors, mind you.
If committing configure is objectionable, perhaps there could be "service" repositories that are not directly writable and are guaranteed to be nothing more than the base repo + autoconf cruft used to generate the releases.
Well, for example in jq we do commit bison/flex outputs because for users ensuring that they have the right version of those can be tricky. We could do the same w.r.t. autoconf and its outputs, though again, that won't preclude backdoors.
Yeah, it's less about detecting backdoors specifically and more about having a way to compare releases to build jobs.
Committing built artifacts presents similar problems: how do you know that the committed artifacts are in fact derived from their sources? Or from non-backdoored versions of build tools for that matter? Hello Ken Thompson attacks.
I don't believe there's a nice easy answer to these questions.
What we do in jq is rely on GitHub Actions to run the build and `make dist`. In fact, we could now stop committing the bison/flex outputs, too, since we can make sure that the tarball includes them.
We do also publish the git repo snapshots that GitHub auto-generates for releases, though we do that because GitHub doesn't give one a choice.
Thinking about this more: maybe there would be some benefit to GitHub taking control of "release" repositories that may only be written to be GA. They'd write everything -- maybe as a docker image -- so anyone could pull down the image and compare shas, or whatever. And maybe this could also be done by their competitors. The ultimate goal would be to have multiple trusted parties performing the build on the same code producing the same output, and allowing any randos to do the same.
If the source is included in those images, we could conceivably prove that the target was based on the source.
It's not nice and easy, true.
Really disappointed in the number of posters here who are playing down rushing to judgement and suggesting perhaps a legitimate developer was compromised, when it's very clear this is sophisticated and not the work of a single person.
I'm recalling bad memories of the Juniper backdoor years ago.
Whoever did this, was playing the long game. As the top post pointed out, there was an effort to get this into Fedora.... which eventually makes its way into RHEL (read: high value targets). This was not for short term payoffs by some rogue developer trying to mine crypto or other such nonsense. What you are seeing here is the planting of seeds for something months or a year down the road.
It doesn't really relate to this issue other than that both issues share a common source, but I wish we'd never fallen for xz.
I agree with the lzip guy
So, it's been almost 24 hours since I read this yesterday. Is it confirmed that Jia Tan is the perpetrator? do we know who he/she really is? Or are we going to live for the rest of our lives only knowing the pseudo name? just like Satoshi Nakamoto did to us. ;)
https://github.com/tukaani-project/tukaani-project.github.io... Does this mean anything that it changed to a parameter??
no. unlikely.
So much for a quiet Easter holiday. Fuck
There's a bug in the detection script. The line:
if [ "$path" == "" ]
should be
if [ "$path" = "" ]
Bash accepts both variants of the equality operator. So it is not a bug.
Could anyone please tell me if current stable version of Debian has that backdoor or not?
Debian stable has 5.4.1, the backdoored versions are 5.6.0-5.6.1
It does not contain the backdoor: <https://security-tracker.debian.org/tracker/CVE-2024-3094<
Debian Stable seems to be in the clear.
https://lists.debian.org/debian-security-announce/2024/msg00...
Python for Windows bundles liblzma from this project, but it appears to be version 5.2.5 [0] vendored into the Python project's repo on 2022-04-18 [1], so that should be fine, right?
[0] https://github.com/python/cpython/blob/main/PCbuild/get_exte...
a user offered 5.6.0 and 5.4.5 in an issue to microsoft/vcpkg
5.4.5 can be compromised
Which nation state (if any) is most likely behind this? China based on name, or is this a red herring?
The perpetrator did most GitHub actions between 10 and 18 UTC, which sort of rules out US based, unless the messages were scheduled. Consistent with Europe to Asia.
See clickhouse for data: https://play.clickhouse.com/play?user=play#U0VMRUNUICogRlJPT...
Could easily be someone in the EST time zone. There aren't that many events that would be earlier than 8am EST.
What a disappointment.
It's something always in the back of our minds as developers using public libraries, but when something like this happens, non-developers that hear about it start to associate it with the rest of the open-source community.
It's essentially a terrorist attack on developer experience. Thankfully, management doesn't follow the same approach as the TSA.
Doesn't this call for criminal charges?
Good luck finding him/her.
GitHub probably already gave feds all logs and IPs, but I would bet 100:1 that it's all going to be a VPN or something like that.
Hello,
Github just disabled the repo : https://github.com/tukaani-project/xz
Do someone have an up to date fork to see the project history ?
Is there any news concerning the payload analysis? Just curious to see if it can be correlated with something I have in my sshd logs (e.g. login attempt with specific RSA keys).
I think we have to assume that all community software is a target. The payoff for bad actors is too great.
For every one of these we spot, assume there are two we have not.
Now consider that your average Linux distribution pulls in tens of thousands of packages, each of which can be similarly compromised. Pretty scary to think about.
The terrible desktop software security model of weak/essentially non-existent security boundaries at run and compile time makes this all the more spicy.
Computer security for billions runs on the simultaneous goodwill of many thousand contributors. Optimistically said it's actually a giant compliment to the programming community.
And this is not even talking about hardware backdoors that are a million times worse and basically undetectable when done well. The myriad ways to betray user trust at any level of computation make me dizzy...
I have exactly 719 packages on my Gentoo box, just rebuilt everything as part of the profile 23 upgrade.
Also the attacker included in the 5.6.0 release the support for the long-awaited multi-threading decompression (and - broken - sandbox) making it very attractive to upgrade to...
It was probably a tactic to give a reason to upgrade. It's not always a fault for those who did or tried to do.
Is there a proper reverse engineering of the payload yet?
Anyone keeping current with OpenSUSE Tumbleweed got a update...downgrade. Prior to `zypper dup --no-allow-vendor-change` I had 5.6.0, now I'm at 5.4.6.
I see `5.6.1.revertto5.4-3.2`
It was caught out of luck due to performance degradation. So nobody reads the code - not even once- prior to merging into upstream supply chain?
https://x.com/bl4sty/status/1773780531143925959?s=20
So nobody reads releases notes either.
But I’m sure this was a one off and were safe now
This is why the less the better... even if it means less comfortable... to a certain point obviously. And that includes SDKs...
I don't understand why you were downvoted. Having fewer moving parts does make it easier to catch issues.
Everything which is not engaging in licking Big Tech balls (open source or not) on HN is served with severe downvoting, that most of the time (probably real trash human beings or AI trolls with headless blink/geeko|webkit).
On Ubuntu there is a bug report asking to sync the 5.6 version from Debian experimental https://bugs.launchpad.net/ubuntu/+source/xz-utils/+bug/2055...
Saw this on nix, which was using a compromised version in the unstable channel, I hope not too many systems are affected.
State actor or not, let's not ignore that the backdoor has been discovered thanks to the open nature of the projects involved that allowed digging into the code. Just another example like the infamous Borland InterBase backdoor in the early 2K that remained dormant for years and was discovered months after the source code has been released. If the xz malware authors worked for any corp that produced closed source drivers or blobs that can't be properly audited, we would be fucked; I just hope this is not already happening, because the attack surface in all those devices and appliances out there running closed code is huge.
Why are projects like xz and sshd still active? Just freeze it, it works fine. Only changes should be fixes for vulnerabilities. None of this complicated new functionality. If you want something like that make a new project. If it is truly better people will use it.
chmod u+x running detect_sh script just runs with no output on my arch linux box?
Yes, Arch Linux’s OpenSSH binary doesn’t even link to liblzma, which means your installation is not affected by this particular backdoor.
The authors of the `detect_sh` script didn’t have that scenario in mind, so the `ldd` invocation never finds a link and the script bails early without a message.
Thanks!
remove the -e option on the script and run it.
Anyway, arch is not affected because they don't modify openssh to link against any of this nonesense.
Interestingly on of the accounts that the GitHub account who introduced the backdoor follows was suspended very recently [1] who is also part of the org who runs XZ
That JiaT75 account is also suspended, if you check https://github.com/Larhzu?tab=following you'll see that they're suspended as well. It's pretty weird that it's that hard to find out whether a user is suspended.
It seems that to counter this type of supply chain attack, the best practices for managing software dependencies are to pin the version numbers of dependencies instead of using `latest`, and to use static linking instead of dynamic linking.
In the future: automated `diff` or any other A/B check to see whether or not the tarball matches the source repo (if not, auto-flag with a mismatch warning attribute), is that feasible to implement?
For someone who does not understand the packages used, could you please summarize in layman non technical terms. Thanks I did read the main post.
This link helped a little. https://gist.github.com/thesamesam/223949d5a074ebc3dce9ee78b...
This is perfect summary https://research.swtch.com/xz-timeline
i added there something about a possible planned kernel attack which not mentioned so much yet.
https://gist.github.com/thesamesam/223949d5a074ebc3dce9ee78b...
that's... creative. and patient. 11/10 concerning - now I'm wondering how many other projects could have shit like this in them or added right as I'm writing this shudder
Brain fart: would it be possible to attach passwords to a crypto based micro transaction such that every time you attempted a password entry your crypto account was charged a small fee for the login attempt?
This would thwart brute force attacks, but not be a significant cost for users. If you could attach your login to the crypto account it would mean the account would have to be funded to allow the attempt. The token wouldn't store passwords it would just be a gatekeeper to the login attempt.
The fees would be paid to the service providers as mining fees.
E.g. foo@bar.com needs a password and a token provided from a designated crypto address to gain access to the service.
Damn. I'm on macOS and use homebrew. To my surprise I had "xz" version 6.5.1 installed on my computer!
I ran "brew upgrade" and that downgraded to version 5.4.6.
xz is just a horribly designed format, and always has been. If you use it, please switch to Lzip. Same compression level, but designed by someone competent.
Be an asshole elsewhere. This makes me LESS want to use lzip because of such aggressive non-useful slander and just plain nonsense.
Thanks for that link, lzip sounds useful
Someone competent? More like a drama queen butthurt that his pet project did not win the popularity contest. Not the kind of person I want to rely on for important tools.
Looks like Jonathan Blow was right about open source.
Why is the Long Range Zip lrzip compression format not used? It gives better compression than xz when using the correct switches.
Why isn't he identified personally? Very likely he is 'contributing' to other projects under different accounts.
Maybe @JiaT75 got forced to do it. Maybe someone has more personal contact with him and can check how he is doing.
Is there already a list of distributions that included the affected versions in non-prereelase channels?
None that I could find have included it. Not even NixOS 23.11.
Surely the real target of this was Tor (which links liblzma) not random SSH servers.
Has this affected OpenBSD at all?
Seems the backdoor relied on Debian and others patching their copies of openssh to support systemd notifications, and this would obviously not be the case on OpenBSD.
To be sure the current ports version of xz is 5.4.5: https://cvsweb.openbsd.org/cgi-bin/cvsweb/~checkout~/ports/a...
Although the maintainer was working on updating to 5.6.1, but this news broke before the diff was landed: https://marc.info/?l=openbsd-ports&m=171174441521894&w=2
I wonder which browsers link liblzma and can this lead to https eavesdropping?
Fairly deep bugs for a Bazaar.
we should take this diagram and change "random person in nebraska" to "possibly a state-level attacker"
nice
Candidly how would someone protect against a vulnerability like this?
Build from source AND run an Ai agent that reviews every single line of code you compile (while hoping that the any potential exploit doesn’t also fool / exploit your AI agent)
Compile all your packages from source would be a start.
You’re not wrong. However, building from source wouldn’t have protected you against this specific backdoor. The upstream source tarball itself was compromised in a cleverly sneaky way.
You might read https://www.openwall.com/lists/oss-security/2024/03/29/4
"However, building from source wouldn’t have protected you against this specific backdoor." Depends on how exactly you build from source. A generic build was not the target. Andres Freund showed that the attack was targeted against a specific type of build system.
Building from git, or the github automatic tarball would have. The larger issue here is authenticating tarballs against the source.
There is no reason to believe the exploit would have been spotted earlier had the attacker included the final part in git.
Which OS are affected by this compromise?? Is Ubuntu affected?
How that backdoor is triggered and what exactly it does?
Maybe it's finally time to start sunsetting LZMA and xz all together in favor of newer algorithms like Zstandard that also offer better performance but compression rates on par with LZMA.
Yes but don’t start thinking they’re immune to compromise
Nobody is. But it's a great opportunity window.
Was Debian 12/stable unaffected? Only sid?
My understanding is that that's correct. I'm on fully upgraded stable (Debian 12) and my xz is 5.4.2 and liblzma as well.
- * _ring ring_ * - "Hello?" - "It's Lasse Collin." - "Why are you collin me? Why not just use the backdoor?"
Please note: the changes have been made after GitHub has enforced 2FA (certainly not for "better security", but for promotion of FIDO2 and Windows Hello biometric impl of FIDO2, see https://codeberg.org/KOLANICH/Fuck-GuanTEEnomo for more info. Until recent times (for now access via git protocol is blocked for my acc, I guess based on lack of 2FA set up) it was even possible to push into all repos one has access by just using single-factor SSH key even without enabling 2FA in the account). As I have warned, nothing will protect when a backdoor is introduced by a malicious maintainer, or a "smart entrepreneur" who sold his project to a ad-company, or a loyal "patriot" living and earning money within reach of some state, or just a powerless man who got an offer he can't refuse. In general supply chain attacks by "legitimate" maintainers cannot be prevented. "Jia Tan" is just a sockpuppet to mitigate consequences to maintainers to make it look like they are not involved into it. They surely are. At least according to the current info it were they who have given the malicious account the permission to publish releases on behalf of the project and access to the repo.
IMHO all maintainers of the backdooored projects anyhow related to accepting the malicious changes should be considered as accomplices and boycotted. We don't need evidence of their liability, it is they who need to maintain their reputation. We are just free to take our decisions based on their reputation. Even if they were hacked themselves, it is not our problem, it is their problem. Our problem is to keep ourselves safe. It may feel "unjust" to ruin reputation of a person based on the fact he may be cheated or hacked… But if a person can be cheated or hacked, why should he/she have such a good reputation as everyone else?! So, it makes a lot of sense to just exclude and replace everyone, for whome there exists evidence of comprometation, no matter due to unconcern or malice. But FOSS is a doocracy serving products at dumpling prices ($0, free of charge), and for majority backdoored software is completely acceptable given that they get them free of charge. And powerful actors who can afford to pay for software will just hire devs to develop their private versions, while allowing the public to pay $0 for their free versions and use the backdoors placed into them themselves. In other words a complete market failure.
I think that 1. xz project must be shut down completely. I mean projects should stop using it as a dependency, exclude from distros, boycott it. LZMA algo was developed by Igor Pavlov in 7z project, but somehow it has happenned that liblzma was developed and maintained by unrelated folks. liblzma should be developed as a part of 7z project taking no code other than the trivial one for API compatibility adapter from xz. 2. Projects created by compromised authkrs should be boycotted. 3. Other projects touched by the compromised devs/maintainers should be audited. 4. All the projects using autotools should be audited and must replace autotools with cmake/meson. Autotools is a piece of shit, completely uncomprehensible. There is no surprise it was used to hude a backdoor - according to my experience in FOSS noone likes to touch its scripts anyhow. 5. No project should be built from releases. Project should be built from git directly. Implementing full support of SHA256 in git and git forges (GitHub, GitLab, Codeberg, sr.ht) should be accelerated to mitigate attacks using collisions to replace approved commits (I guess the randomness can be concealed from reviewer's eye in binary resource files, like pictures).
TLDR: Some people have been throwing around “China,” but it seems also quite possible that Jia is from somewhere in Eastern Europe pretending to be from China. In addition, Lasse Collin and Hans Jansen are from the same EET time zone.
These are my notes on time stamps/zones. There are a few interesting bits that I haven't fully fleshed out.
The following analysis was conducted on JiaT75’s (https://github.com/JiaT75?tab=overview&from=2021-12-01&to=20...) commits to the XZ repository, and their time stamps.
Observation 1: Time zone basic analysis
Here is the data on Jia’s time zone and the number of times he was recorded in that time zone:
3: + 0200 (in winter: February and November)
6: +0300 (in summer: in Jun, Jul, early October)
440: +0800
1. The +800 is likely CST. China (or Indonesia or Philippines), given that Australia does daylight savings time and almost no one lives in Siberia and the Gobi dessert.
2. The +0200/+0300, if we are assuming that this is one location, is likely on EET (Finland, Estonia, Latvia, Lithuania, Ukraine, Moldavia, Romania, Bulgaria, Greece, Turkey). This is because we see a switch from +300 in the winter (past the last weekend of October) and +200 in the summer (past the last Sunday in March).
Incidentally, this seems to be the same time zone as Lasse Collin and Hans Jansen…
Observation 2: Time zone inconsistencies
Let’s analyze the few times where Jia was recorded in a non +800 time zone. Here, we notice that there are some situations where Jia switches between +800 and +300/+200 in a seemingly implausible time. Indicating that perhaps he is not actually in +800 CST time, as his profile would like us to believe.
Jia Tan Tue, 27 Jun 2023 23:38:32 +0800 —> 23:38 + 8 = 7:30 (+ 1) Jia Tan Tue, 27 Jun 2023 17:27:09 +0300 —> 17:27 + 3 = 20:30 —> about a 9 hour difference, but flight from China to anywhere in Eastern Europe is at a min 10 hours
Jia Tan Thu, 5 May 2022 20:53:42 +0800
Jia Tan Sat, 19 Nov 2022 23:18:04 +0800
Jia Tan Mon, 7 Nov 2022 16:24:14 +0200
Jia Tan Sun, 23 Oct 2022 21:01:08 +0800
Jia Tan Thu, 6 Oct 2022 21:53:09 +0300 —> 21:53 + 3 = 1:00 (+1)
Jia Tan Thu, 6 Oct 2022 17:00:38 +0800 —> 17:00 + 8 = 1:00 (+1)
Jia Tan Wed, 5 Oct 2022 23:54:12 +0800
Jia Tan Wed, 5 Oct 2022 20:57:16 +0800
—> again, given the flight time, this is even more impossible
Jia Tan Fri, 2 Sep 2022 20:18:55 +0800
Jia Tan Thu, 8 Sep 2022 15:07:00 +0300
Jia Tan Mon, 25 Jul 2022 18:30:05 +0300
Jia Tan Mon, 25 Jul 2022 18:20:01 +0300
Jia Tan Fri, 1 Jul 2022 21:19:26 +0800
Jia Tan Thu, 16 Jun 2022 17:32:19 +0300
Jia Tan Mon, 13 Jun 2022 20:27:03 +0800
—> the ordering of these time stamps, and the switching back and forth looks strange.
Jia Tan Thu, 15 Feb 2024 22:26:43 +0800
Jia Tan Thu, 15 Feb 2024 01:53:40 +0800
Jia Tan Mon, 12 Feb 2024 17:09:10 +0200
Jia Tan Mon, 12 Feb 2024 17:09:10 +0200
Jia Tan Tue, 13 Feb 2024 22:38:58 +0800
—> this travel time is possible, but the duration of stay is unlikely
Observation 3: Strange record of time stamps It seems that from the commits, often the time stamps are out of order. I am not sure what would cause this other than some tampering.
Observation 4: Bank holiday inconsistencies
We notice that Jia’s work schedule and holidays seem to align much better with an Eastern European than a Chinese person.
Disclaimer: I am not an expert in Chinese holidays, so this very well could be inaccurate. I am referencing this list of bak holidays:(https://www.bankofchina.co.id/en-id/service/information/late...)
Chinese bank holidays (just looking at 2023):
- Working on 2023, 29 September: Mid Autumn Festival
- Working on 2023, 05 April: Tomb Sweeping Day
- Working on 2023, 26, 22, 23, 24, 26, 27 Jan: Lunar New Year
Eastern European holidays:
- Never working on Dec 25: Christmas (for many EET countries)
- Never working Dec 31 or Jan 1: New Years
Observation 5: No weekend work —> salary job?
The most common working days for Jia was Tue (86), Wed (85), Thu (89), and Fri (79). If we adjust his time zone to be EET, then that means he is usually working 9 am to 6 pm. This makes much more sense than someone working at midnight and 1 am on a Tuesday night.
These times also line up well with Hans Jansen and Lasse Collin.
I think it is more likely that Jia does this as part of his work… somewhere in Eastern Europe. Likely working with, or in fact being one and the same as, Hans Jansen and Lasse Collin.
You say yourself that the time data could be tampered. It's trivial to change commit dates in git. So this analysis means nothing by itself, unfortunately.
I wouldn't say that. This guy seems to have tried hard to appear Chinese (and possibly tampered the time stamps this way) – but based on that analysis, it seems plausible they did a bad job and were actually based out of Eastern Europe.
I asked ChatGPT 4 based on Jia's Github avatar image:
The timezones that ChatGPT thinks the avatar comes from aligns with +2 and +3, see what how it ranked and at the end the description of Jia's avatar:
---
Rank, Score, Country, City, Timezone, Criteria
1, 10, Saudi Arabia, Mecca, AST (UTC+3), Heartland of Islam, deeply rooted calligraphic traditions.
2, 9.5, Iran, Tehran, IRST (UTC+3:30), Integral Persian calligraphy with a distinct style and history.
3, 9, Turkey, Istanbul, TRT (UTC+3), Historical significance of Ottoman calligraphy, actively preserved.
4, 8.5, Egypt, Cairo, EET (UTC+2), Home to Al-Azhar University, with calligraphy in the curriculum.
5, 8, Morocco, Marrakech, WET (UTC+0), Calligraphy integrated into architecture and crafts.
6, 7.5, United Arab Emirates, Abu Dhabi, GST (UTC+4), Promotes Islamic arts through festivals and museums.
7, 7, Syria, Damascus, EET (UTC+2), Historical center of Arabic calligraphy, despite recent conflicts.
8, 6.5, Pakistan, Islamabad, PKT (UTC+5), Rich tradition, hosts several institutions and events dedicated to calligraphy.
9, 6, Indonesia, Jakarta, WIB (UTC+7), Largest Muslim-majority country with calligraphy in art and monuments.
10, 5.5, Spain, Cordoba, CET (UTC+1), Legacy of Islamic culture and appreciation for calligraphy, particularly in Andalusia.
--
GPT4: This image appears to be a stylized representation of the letter 'J' within an intricate border, possibly inspired by the art style of Islamic calligraphy. The ornate background is typical of arabesque patterns, which are characteristic of Islamic art and consist of repeating geometric forms that often echo the shapes of plants, flowers, and sometimes calligraphic writing. The letter 'J' stands out in a vibrant yellow, contrasting with the dark green of the surrounding design.
Interesting :). However, I think that EET is the only time zone that works. (This is mostly because is seems that the area follows DST, which most non western countries in the worlf do not).
This 2011 addition to the XZ Utils Wikipedia page is interesting because a) why is this relevant, b) who is Mike Kezner since he's not mentioned on the Tukaani project page (https://tukaani.org/about.html) under "Historical acknowledgments".
https://en.wikipedia.org/w/index.php?title=XZ_Utils&diff=pre...
Arch Linux played an important role in making this compression software trusted and depended upon. Perhaps not a coincidence, but at the very least, such a big project should more carefully consider the software they distribute and rely on, whether it's worth the risk.
> Arch Linux played an important role in making this compression software trusted and depended upon.
because of the way arch distributes packages? then what you think about freebsd?
If you check the history of that IP address, it added Mike Kezner to other pages. No clue why.
Made a more detailed write-up on this: https://rheaeve.substack.com/p/xz-backdoor-times-damned-time...
Thank the gods I didn't plan on having a life this weekend
Is this sev0?
Hello everybody.
I am taking the initiative to gather more information regarding the possible precursors and perpetrators of the backdoor.
The purpose of this commentary is focused on open source information (OSINT).
I am not a judge of anyone or any action that may occur, the objective of this comment is to help through accurate and quick information to help the core developers of the affected packages and consequently the Linux kernel (which may have been indirectly or directly affected) take action necessary in relation to the fact that occurred.
NOTE: This comment will always have "edit" so always review it for information.
Information I have so far.
Summary: 1. GitHub Account Suspension: - The accounts of @JiaT75 and @Larhzu were suspended by GitHub. - All Tukaani repositories, including downloads, were disabled. - Investigate the cause of the account suspensions and whether there is any correlation with suspicious activities.
2. Possible Backdoor in xz/liblzma: - There are concerns about the presence of a backdoor in xz/liblzma. - Investigate whether there is evidence of compromise in the source code and recent updates. - Examine potential impacts, especially if the software is used in critical systems.
3. Updates and Patches in Packages: - Note recent updates in packages such as MinGW w64, pacman-static, Alpine, and OpenSUSE. - Review changelogs to understand if these updates are related to security fixes.
4. Jia's Activities on Platforms and Projects: - Investigate Jia's contributions to different projects and platforms, such as Arch Linux, Alpine Linux, and OpenSUSE. - Check for correlations between Jia's activities and reported security issues.
5. Libera Registration Information: - Analyze Jia's registration details on Libera to determine the timeline of their online activities. - Consider correlating this information with other online activities of Jia.
6. VPN Usage: - Confirm Jia's use of VPN and assess its impact on security investigations. - Explore possible reasons for using a VPN and how it may affect the identification and tracking of online activities.
Links related to user JiaT75 [xz] Remove JiaT75 as a contact, determine correct contacts #11760 - Google/oss-fuzz https://github.com/google/oss-fuzz/issues/11760
Tuktest index hash #7 - tukaani-project/xz/pull/7 https://web.archive.org/web/20240329230522/https://github.co...
Time for another OS wipe. Glad I keep bleeding edge versions VMd
is this sev0?
Jesus! Does anyone know if Debian stable is affected?
The stable releases don't have this particular backdoor, but they're still using older versions of the library that were released by the same bad actor.
It's not. Neither Ubuntu.
Do you have a source my friend? I thought Ubuntu was built off of Debian testing or unstable
The latest version in 23.10 is 5.4.1-0.2
https://packages.ubuntu.com/mantic/liblzma5
And in unreleased 24.04 is 5.4.5-0.3
https://packages.ubuntu.com/noble/liblzma5
There are no changelog entries indicating that the package was reverted.
now I wonder which browsers link liblzma?
Notes on time stamps and time zones.
A few interesting bits that I haven't fully fleshed out. TLDR: Some people have been throwing around that Jia is from “China,” but it seems also quite possible that Jia is from somewhere in Eastern Europe pretending to be from China. In addition, Lasse Collin and Hans Jansen are from the same EET time zone.
The following analysis was conducted on JiaT75’s (https://github.com/JiaT75?tab=overview&from=2021-12-01&to=20...) commits to the XZ repository, and their time stamps.
Observation 1: Time zone basic analysis
Here is the data on Jia’s time zone and the number of times he was recorded in that time zone: 3: + 0200 (in winter: February and November) 6: +0300 (in summer: in Jun, Jul, early October) 440: +0800
1. The +800 is likely CST. China (or Indonesia or Philippines), given that Australia does daylight savings time and almost no one lives in Siberia and the Gobi dessert. 2. The +0200/+0300, if we are assuming that this is one location, is likely on EET (Finland, Estonia, Latvia, Lithuania, Ukraine, Moldavia, Romania, Bulgaria, Greece, Turkey). This is because we see a switch from +300 in the winter (past the last weekend of October) and +200 in the summer (past the last Sunday in March). 1. Incidentally, this seems to be the same time zone as Lasse Collin and Hans Jansen…
Observation 2: Time zone inconsistencies
Let’s analyze the few times where Jia was recorded in a non +800 time zone. Here, we notice that there are some situations where Jia switches between +800 and +300/+200 in a seemingly implausible time. Indicating that perhaps he is not actually in +800 CST time, as his profile would like us to believe.
Jia Tan Tue, 27 Jun 2023 23:38:32 +0800 —> 23:38 + 8 = 7:30 (+ 1) Jia Tan Tue, 27 Jun 2023 17:27:09 +0300 —> 17:27 + 3 = 20:30 —> about a 9 hour difference, but a flight from China to anywhere in Eastern Europe is at a min 10 hours
Jia Tan Thu, 5 May 2022 20:53:42 +0800 Jia Tan Sat, 19 Nov 2022 23:18:04 +0800 Jia Tan Mon, 7 Nov 2022 16:24:14 +0200 Jia Tan Sun, 23 Oct 2022 21:01:08 +0800 Jia Tan Thu, 6 Oct 2022 21:53:09 +0300 —> 21:53 + 3 = 1:00 (+1) Jia Tan Thu, 6 Oct 2022 17:00:38 +0800 —> 17:00 + 8 = 1:00 (+1) Jia Tan Wed, 5 Oct 2022 23:54:12 +0800 Jia Tan Wed, 5 Oct 2022 20:57:16 +0800 —> again, given the flight time, this is even more impossible
Jia Tan Fri, 2 Sep 2022 20:18:55 +0800 Jia Tan Thu, 8 Sep 2022 15:07:00 +0300 Jia Tan Mon, 25 Jul 2022 18:30:05 +0300 Jia Tan Mon, 25 Jul 2022 18:20:01 +0300 Jia Tan Fri, 1 Jul 2022 21:19:26 +0800 Jia Tan Thu, 16 Jun 2022 17:32:19 +0300 Jia Tan Mon, 13 Jun 2022 20:27:03 +0800 —> the ordering of these time stamps and the switching back and forth between time zones looks strange.
Jia Tan Thu, 15 Feb 2024 22:26:43 +0800 Jia Tan Thu, 15 Feb 2024 01:53:40 +0800 Jia Tan Mon, 12 Feb 2024 17:09:10 +0200 Jia Tan Mon, 12 Feb 2024 17:09:10 +0200 Jia Tan Tue, 13 Feb 2024 22:38:58 +0800 —> this travel time is possible, but the duration of stay is unlikely
Observation 3: Strange record of time stamps
It seems that from the commits, often the time stamps are out of order. I am not sure what would cause this other than some tampering.
Observation 4: Bank holiday inconsistencies
We notice that Jia’s work schedule and holidays seems to align much better with an Eastern European than a Chinese person.
Disclaimer: I am not an expert in Chinese holidays, so this very well could be inaccurate. I am referencing this list of bank holidays:(https://www.bankofchina.co.id/en-id/service/information/late...)
Chinese bank holidays (just looking at 2023): - Working on 2023, 29 September: Mid Autumn Festival - Working on 2023, 05 April: Tomb Sweeping Day - Working on 2023, 26, 22, 23, 24, 26, 27 Jan: Lunar New Year
Eastern European holidays: - Never working on Dec 25: Christmas (for many EET countries) - Never working Dec 31 or Jan 1: New Years
Observation 5: Little weekend work —> salary job?
The most common working days for Jia were Tue (86), Wed (85), Thu (89), and Fri (79). If we adjust his time zone to EET, then that means he is usually working 9 am to 6 pm. This makes much more sense than someone working at midnight and 1 am on a Tuesday night.
These times also line up well with Hans Jansen and Lasse Collin.
I think it is more likely that Jia does this as part of his work… somewhere in Eastern Europe. Likely working with, or in fact being one and the same as, Hans Jansen and Lasse Collin.
Another interesting data point: about 2 years ago there was a clear pressure campaign to name a new maintainer: https://www.mail-archive.com/xz-devel@tukaani.org/msg00566.h...
At the time I thought it was just rude, but maybe this is when it all started.
"Jigar Kumar" seems to have disappeared
true, that is suspicious as well. A person that hasn't even created any bugs or issues suddenly has a big problem with the speed of development? Especially the way this was phrased: "You ignore the many patches bit rotting away on this mailing list. Right now you choke your repo. Why wait until 5.4.0 to change maintainer? Why delay what your repo needs?"
"Why delay what your repo needs?" This sounds like scammer lingo
Wow, people suck. I almost hope it's fake profiles urging the maintainer to take on a new member as a long con. Because I sincerely hope Jigar Kumar is not a real person behaving like that towards volunteers working for free.
How many people are involved in this ?
Could be just a single person with a bunch of identities.
I would put money on government hackers. They're the sort of people that have the time to pull something like this off. Frankly I'm really surprised it isn't more common, though maybe it is and these guys were just super blatant. I would have expected more plausible deniability.
Good cop bad cop play maybe.
Wait, I'm on mobile. Did this partially slip by because of the ABSURD PRACTICE of publishing release.tarballs that do not 1:1 correspond with source?
Let me guess, autotools? I want to rage shit post but I guess I'll wait for confirmation first.
EDIT: YUP, AT LEAST PARTIALLY. Fucking god damn autotools.
Been saying this the whole day now, GitHub really needs an automated diff / A/B check-up on tarballs against the actual repo, flag everything with at least a warning (+[insert additional scrutiny steps here]) when the tarball isn't matching the repo.
I think its much more likely this was not a bad actor, given their long history of commits.
It's a known fact that China will "recruit" people to operate them. A quote:
> They talk to them, say my friend, I see you like our special menu. Are you from China? Are you here on a VISA? Do you have family back there? Would you like your family to stay alive? Is your loyalty to this temporary employer or is your loyalty to your motherland? You know, a whole bunch of stuff like that. That’s how Chinese intelligence operations acts...
This just gives feelings of less "compromised account" and more "Your account is now our account"
Isn't that still a "bad actor" even if they are coerced into it?
For the purposes of security discussions, I would say yes. You often don't know their real identity let alone their motivations and tribulations.
However if we were critiquing characters in a book-- especially ones where narrative voice tells us exactly their true motivations--then maybe not, and they get framed as a "dupe" or "manipulated" etc.
"bad actor" doesn't mean "bad faith", it's not a value judgement
I believe your parent is trying to make a distinction that the handle's history may not be suspect, only recent activity, positing a rubber-hose type compromise.
Yes.
It's also a known fact that China will coerce people by threatening family and friends.
Seen this happen to friends here in Australia who were attending pro-Taiwan protests.
We detached this subthread from https://news.ycombinator.com/item?id=39867106.
I think we should seriously consider something like a ts clearance as mandatory for work on core technologies. Many other projects, both open and closed, are probably compromised by foreign agents.
> I think we should seriously consider something like a ts clearance as mandatory for work on core technologies.
Was xz/lzma a core technology when it was created? Is my tiny "constant time equality" Rust crate a core technology? Even though it's used by the BLAKE3 crate? By the way, is the BLAKE3 crate a core technology? Will it ever become a core technology?
With free software in general, things do not start a "core technology"; they become a "core technology" over time due to usage. At which point would a maintainer have to get a TS clearance? Would the equivalent of a TS clearance from my Latin America country be acceptable? And how would I obtain it? Is it even available to people outside the military and government (legit question, I never looked)?
We probably shouldn't use your code at all, is the real answer. You can get TS, it just costs a lot of money.
In United States, you cannot apply for a clearance. You must get a job that requires a clearance, then start application process and wait.
Who is "we"? Are you from the US by any chance? Do you mean that the US government should rewrite every piece of core architecture (including Linux, Ssh, Nginx...) from scratch? Because they are all "contaminated" and actually were created by non-americans.
If that's the case, you do you. Do you also think that all other countries should do the same, and rewrite everything from scratch for their government use (without foreign, for example American, influence)? And what about companies? Should they be forced to switch to their government's "safe" software, or can they keep using Linux and ssh? What about multi-national companies? And what even counts as a "core" software?
So yeah, I don't think it's a good idea.
We can keep it between NATO plus friends.
Wow, I can't decide which is the bigger act of sabotage to open source, your ideas or the actual backdoor.
The Linux kernel is complaining about a lack of funding for CI-one of the highest visibility projects out there. Where will the money come from for this?
Corps? Aside from Intel most of them barely pay to upstream their drivers.
The govt? The US federal government cut so much of it's support since the 70s and 80s.
You're right, but accepting code from random Gmail accounts can't be the solution. Honestly the Linux kernel is a bloated mess, and will probably never be secured.
Accepting code from any source without properly reviewing it is surely the actual problem, no? This person only infiltrated this project because there was no proper oversight.
Maintainers need to be more stringent and vigilant of the code they ship, and core projects that many other projects depend upon should receive better support, financial and otherwise, from users, open source funds and companies alike. This is a fragile ecosystem that this person managed to exploit, and they likely weren't the only one.
Maintainers can't fully review all code that comes in. They don't have the resources. Even if they could give it a good review, a good programmer could probably still sneak stuff in. That's assuming a maintainer wasn't compromised, like in this case. We need a certain level of trust that the contributors are not malicious.
Definitely this.
I’ve been a package maintainer for a decade. I make it a habit to spot check the source code of every update of every upstream package, hoping that if many others do the same, it might make a difference.
But this backdoor? I wouldn’t have been able to spot it to save my life.
This wasn't caused by not reviewing the code of a dependency. This was a core maintainer of xz, who gradually gained trust and control of the project, and was then able to merge changes with little oversight. The failure was in the maintenance of xz, which would of course be much more difficult to catch in dependent projects. Which is why it's so impressive that it was spotted by an OpenSSH user. Not even OpenSSH maintainers noticed this, which points to a failure in their processes as well, to a lesser degree.
I do agree that it's unreasonable to review the code of the entire dependency tree, but reviewing own code thoroughly and direct dependencies casually should be the bare minimum we should expect maintainers to do.
> Not even OpenSSH maintainers noticed this, which points to a failure in their processes as well, to a lesser degree.
The OpenSSH project has nothing to do with xz. The transitive dependency on liblzma was introduced by a patch written by a third party. [1] You can't hold OpenSSH project members accountable for something like this.
Alright, that's fair. But I mentioned them as an example. Surely liblzma is a dependency in many projects, and _none_ of them noticed anything strange, until an end user did?
This is a tragedy of the commons, and we can't place blame on a single project besides xz itself, yet we can all share part of the blame to collectively do better in the future.
One of the primary responsibilities of a maintainer is to ensure the security of the software. If they can't keep up with the pace of development in order to ensure this for their users, then this should be made clear to the community, and a decision should be made about how to proceed. Open source maintenance is an often stressful and thankless role, but this is part of the problem that allowed this to happen. Sure, a sophisticated attacker would be able to fool the eyes of a single tired maintainer, but the chances of that happening are much smaller if there's a stringent high bar of minimum quality, and at least one maintainer understands the code that is being merged in. Change proposals should never be blindly approved, regardless of who they come from.
At the end of the day we have to be able to answer why this happened, and how we can prevent it from happening again. It's not about pointing fingers, but about improving the process.
BTW, there have been several attempts at introducing backdoors in the Linux kernel. Some manage to go through, and perhaps we don't know about others, but many were thwarted due to the extreme vigilance of maintainers. Thankfully so, as everyone is well aware of how critical the project is. I'm not saying that all projects have the resources and visibility of Linux, but clearly vigilance is a requirement for lowering the chances of this happening.
> That's assuming a maintainer wasn't compromised, like in this case.
What makes you say that? Everything I've read about this (e.g. [1]) suggests that this was done by someone who also made valid contributions and gained gradual control of the project, where they were allowed to bypass any checks, if they existed at all. The misplaced trust in external contributions, and the lack of a proper peer review process are precisely what allowed this to happen.
[1]: https://boehs.org/node/everything-i-know-about-the-xz-backdo...
My understanding is that the attacker was the only maintainer of xz, that was trusted by upstream maintainers. They couldn't realistically check his work. The defence against this can't be "do better, volunteer maintainers". Maybe we could have better automated testing and analysis, but OSS is allergic to those.
Sure, I'm not saying this is the only solution, or that it's foolproof. But this should be a wake up call for everyone in the OSS community to do better.
Projects that end up with a single maintainer should raise some flags, and depending on their importance, help and resources should be made available. We've all seen that xkcd, and found it more amusing than scary.
One idea to raise awareness: a service that scans projects on GitHub and elsewhere, and assigns maintenance scores, depending on various factors. The bus factor should be a primary one. Make a scoreboard, badges, integrate it into package managers and IDEs, etc. GitHub itself would be the ideal company to implement this, if they cared about OSS as much as they claim to do.
Okay, so instead of one random Gmail account taking over a critical project, we need two or three?
That's a very US centric view and would practically split the open source community along the atlantic at best and fracture it globally at worst. Be careful what you wish for.
I trust NATO members.
Oh, how generous.
Even Turkey?
That's hard to do when the development of these libraries is so international. Not to mention that it's already so hard to find maintainers for some of these projects. Given that getting a TS clearance is such a long and difficult process, it would almost guarantee more difficulty in finding people to do this thankless job.
It doesn't need to be TS for open source(but for closed, I'm leaning yes). But all code for these core technologies need to be tied to a real person that can be charged in western nations. Yes, it will make it harder to get people, but with how important these technologies are, we really should not be using some random guys code in the kernel.
Don't forget that the NSA bribed RSA (the company) to insert a backdoor into their RNG. Being in western jurisdiction doesn't mean you won't insert backdoors into code. It just changes whom you will target with these backdoors. But they all equally make our technology less trustworthy so they are all equally despicable.
It will significantly cut down on Russian and Chinese back doors, which is still an improvement, Mr. Just Made an Account.
That just means the bad actors will all have clearance while putting in a bunch of hurdles for amateur contributors. The only answer is the hard one, constant improvement in methods to detect and mitigate bugs.
"Constant improvement" sounds like "constantly playing catch-up". Besides that, someone with TS can be arrested and charged, and I don't want amateur contributors.
>and I don't want amateur contributors.
And you're free to not accept amateur contributions to the OS projects you maintain. Hell, you can require security clearance for your contributors right now, if you want.
Software like that already exists. I'm saying open source should do better.
This only ensures the backdoors are coming from governments that issued the clearances, nothing more. I prefer more competition, at least there is incentive to detect those issues.
It will ensure that my OS doesn't have code from random Gmail accounts. If someone with U.S clearance submits a backdoor, they should either be charged in the U.S, or extradited to somewhere that will charge them. We have no idea who this person is, and even if we did we probably could not hold them accountable.
how many people in PRISM had such clearance? and how many of them would i trust? precisely zero.
Killing your pipeline for innovation and talent development doesn't make you secure, it makes you fall behind. The Soviet Union found this out the hard way when they made a policy decision to steal chip technology instead of investing in their own people. They were outpaced and the world came to use chips, networks, and software designed by Americans.
That's the exact opposite of what I'm saying we do. We need to invest in engineers we can trust, and cut off those we can't.
Who's we? Americans? Sure that's fine for you, but Americans aren't exactly trustworthy outside of the US either and I say that as someone who's usually pro US. This sort of mentality just shows a lack of understanding of how most of the world sees the US. Even in places like say, france, the us is seen as an ally but a very untrustworthy one. Especially since out of all the confirmed backdoors up until now, most of them were actually US made.
If this backdoor turns out to be linked to the US, what would your proposal even solve?
"We" doesn't have to be the U.S. This is a false dichotomy that I see people in this thread keep pushing. I suspect in bad faith, by the people that want to insert backdoors. As a baseline, we could keep the contributors to NATO and friends. If a programmer is caught backdooring, they can be charged and extradited to and from whatever country.
If it's just an extradition issue, the US has extradition treaties with 116 countries. You'd still have to 1) ensure that user is who they say they are (an ID?) and 2) they are reliable and 3) no one has compromised their accounts.
1) and 3) (and, to an extent, 2) )are routinely done, to some degree, by your average security-conscious employer. Your employer knows who you are and probably put some thought on how to avoid your accounts getting hacked.
But what is reliability? Could be anything from "this dude has no outstanding warrants" to "this dude has been extensively investigated by a law enforcement agency with enough resources to dig into their life, finances, friends and family, habits, and so on".
I might be willing to go through these hoops for an actual, "real world" job, but submitting myself to months of investigation just to be able to commit into a Github repository seems excessive.
Also, people change, and you should be able to keep track of everyone all the time, in case someone gets blackmailed or otherwise persuaded to do bad things. And what happens if you find out someone is a double agent? Rolling back years of commits can be incredibly hard.
Getting a TS equivalent is exactly what helps minimize them chances that someone is compromised. Ideally, such an investigation would be transferable between jobs/projects, like normal TS clearance is. If someone is caught, yes rolling back years isn't practical, but we probably ought to look very closely at what they've done, like is probably being done with xz.
I guess it depends on the ultimate goal.
If the ultimate goal is to avoid backdoors in critical infrastructures (think government systems, financial sector, transportation,...) you could force those organizations to use forks managed by an entity like CISA, NIST or whatever.
If the ultimate goal is to avoid backdoors in random systems (i.e. for "opportunistic attacks"), you have to keep in mind random people and non-critical companies can and will install unknown OSS projects as well as unknown proprietary stuff, known but unmaintained proprietary stuff (think Windows XP), self-maintained code, and so on. Enforcing TS clearances on OSS projects would not significantly mitigate that risk, IMHO.
Not to mention that, as we now know, allies spy and backdoor allies (or at least they try)... so an international alliance doesn't mean intelligence agencies won't try to backdoor systems owned by other countries, even if they are "allies".
The core systems of Linux should be secured, regardless of who is using it. We don't need every single open source project to be secured. It's not okay to me that SSH is potentially vulnerable, just because it's my personal machine. As for allies spying on each other, that certainly happens, but is a lot harder to do without significant consequences. It will be even harder if we make sure that every commit is tied to a real person that can face real consequences.
The "core systems of Linux" include the Linux kernel, openssh, xz and similar libraries, coreutils, openssl, systemd, dns and ntp clients, possibly curl and wget (what if a GET on a remote system leaks data?),... which are usually separate projects.
The most practical way to establish some uniform governance over how people use those tools would involve a new OS distribution, kinda like Debian, Fedora, Slackware,... but managed by NIST or equivalent, which takes whatever they want from upstream and enrich it with other features.
But it doesn't stop here. What about browsers (think about how browsers protect us from XSS)? What about glibc, major interpreters and compilers? How do you deal with random Chrome or VS Code extensions? Not to mention "smart devices"...
Cybersecurity is not just about backdoors, it is also about patching software, avoiding data leaks or misconfigurations, proper password management, network security and much more.
Relying on trusted, TS cleared personnel for OS development doesn't prevent companies from using 5-years old distros or choosing predictable passwords or exposing critical servers to the Internet.
As the saying goes, security is not a product, it's a mindset.
We wouldn't have to change the structure of the project to ensure that everyone is trustworthy.
As for applications beyond the core system, that would fall on the individual organizations to weigh the risks. Most places already have a fairly limited stack and do not let you install whatever you want. But given that the core system isn't optional in most cases, it needs extra care. That's putting aside the fact that most projects are worked on by big corps that do go after rogue employees. Still, I would prefer if some of the bigger projects were more secure as well.
Your "mindset" is basically allowing bad code into the Kernel and hoping that it gets caught.
>Your "mindset" is basically allowing bad code into the Kernel and hoping that it gets caught.
Not at all. I'm talking about running more and more rigorous security tests because you have to catch vulnerabilities, 99% of which are probably introduced accidentally by an otherwise good, reliable developer.
This can be done in multiple ways. A downstream distribution which adds its own layers of security tests and doesn't blindly accept upstream commits. An informal standard on open source projects, kinda like all those Github projects with coverage tests shown on the main repo page. A more formal standard, forcing some critical companies to only adopt projects with a standardized set of security tests and with a sufficiently high score. All these approaches focus on the content, not on the authors, since you can have a totally good-willing developer introducing critical vulnerabilities (not the case here, apparently, but it happens all the time).
On top of that, however, you should also invest in training, awareness, and other "soft" issues that are actually crucial in order to actualy improve cybersecurity. Using the most battle-tested operating systems and kernels is not enough if someone actually puts sensitive data on an open S3 bucket, or if someone only patches their systems once a decade, or if someone uses admin/admin on an Internet-facing website.
This seems infeasible for projects like LLVM that depend on international collaboration.
A quote from... your arse?
That is what I thought too but it wasn't hard to find:
Yikes! Do you have any info on the individual's background or possible motivations?
I would presume it's a state actor. Generally in the blackhat world, attackers have very precise targets. They want to attack this company or this group of individuals. But someone who backdoors such a core piece of open source infrastructure wants to cast a wide net to attack as many as possible. So that fits the profile of a government intelligence agency who is interested in surveilling, well, everything.
Or it could in theory be malware authors (ransomware, etc). However these guys tend to aim at the low hanging fruits. They want to make a buck quickly. I don't think they have the patience and persistence to infiltrate an open source project for 2 long years to finally gain enough trust and access to backdoor it. On the other hand, a state actor is in for the long term, so they would spend that much time (and more) to accomplish that.
So that's my guess: Jia Tan is an employee of some intelligence agency. He chose to present an asian persona, but that's not necessarily who he truly represents. Could be anyone, really: Russia, China, Israel, or even the US, etc.
Edit: given that Lasse Collin was the only maintainer of xz utils in 2022 before Jia Tan, I wouldn't be surprised if the state actor interfered with Lasse somehow. They could have done anything to distract him from the project: introduce a mistress in his life, give him a high-paying job, make his spouse sick so he has to care for her, etc. With Lasse not having as many hours to spend on the project, he would have been more likely to give access to a developer who shows up around the same time and who is highly motivated to contribute code. I would be interested to talk to Lasse to understand his circumstances around 2022.
> I haven't lost interest but my ability to care has been fairly limited mostly due to longterm mental health issues but also due to some other things. Recently I've worked off-list a bit with Jia Tan on XZ Utils and perhaps he will have a bigger role in the future, we'll see.
https://www.mail-archive.com/xz-devel@tukaani.org/msg00567.h...
That "Jigar Kumar" is like fake and one-time throw-off account, probably from the same state actor to orchestrate the painstakingly prepared supply chain attack (under the sun).
At first glance I thought it was a far-fetched conclusion but then I read in a subsequent reply he wrote:
> With your current rate, I very doubt to see 5.4.0 release this year. The only progress since april has been small changes to test code. You ignore the many patches bit rotting away on this mailing list. Right now you choke your repo. Why wait until 5.4.0 to change maintainer? Why delay what your repo needs?
https://www.mail-archive.com/xz-devel@tukaani.org/msg00568.h...
The last two sentences really make it look as if he were trying to pressure the original author.
Oh wow, all his posts are trying to pressure Lasse, or guilt him into getting Jia on board. They're definitely conspiring.
"Your efforts are good but based on the slow release schedule it will unfortunatly be years until the community actually gets this quality of life feature."
"Patches spend years on this mailing list. 5.2.0 release was 7 years ago. There is no reason to think anything is coming soon."
"With your current rate, I very doubt to see 5.4.0 release this year. The only progress since april has been small changes to test code. You ignore the many patches bit rotting away on this mailing list. Right now you choke your repo. Why wait until 5.4.0 to change maintainer? Why delay what your repo needs?"
"Progress will not happen until there is new maintainer. XZ for C has sparse commit log too. Dennis you are better off waiting until new maintainer happens or fork yourself. Submitting patches here has no purpose these days. The current maintainer lost interest or doesn't care to maintain anymore. It is sad to see for a repo like this."
"Is there any progress on this? Jia I see you have recent commits. Why can't you commit this yourself?"
"Over 1 month and no closer to being merged. Not a suprise."
[flagged]
Dated June 2022. Good find!
Given the details from another comment [1], it sounds like both maintainers are suspicious. Lasse's behavior has changed recently, and he's been pushing to get Jia Tan's changes into the Linux kernel. It's possible both accounts aren't even run by the original Lasse Collin and Jia Tan anymore.
Edit: Also, Github has suspended both accounts. Perhaps they know something we don't.
Where does that comment mention the other maintainer (Lasse Collin)?
Whoops, I linked the wrong comment. I meant to link this one [1]. Anyway, seems like there's potentially a whole trail of compromised and fake accounts [2]. Someone in a government agency somewhere is pretty disappointed right now.
According to Webarchive, https://tukaani.org/contact.html changed very recently (between 11/02/2024 and 29/02/2024) to add Lasse Collin's PGP key fingerprint. That timing is weird, considering his git activity at that time is almost non existent. Although, i checked, this key existed back in 2012.
> considering his git activity at that time is almost non existent
Are you looking at the same repositories I am? He's made 88 commits to xz in that time period, two-thirds of the total.
> I wouldn't be surprised if the state actor interfered with Lasse somehow
People could also just get tired after years of active maintainership or become busier with life. Being the sole maintainer of an active open source project on top of work and perhaps family takes either a lot of enthusiasm or a lot of commitment. It's not really a given that people want to (or can) keep doing that forever at the same pace.
Someone then spots the opportunity.
I have no idea what the story is here but it might be something rather mundane.
Or they have just one or a small number of targets, but don’t want the target(s) to know that they were the only target(s), so they backdoor a large number of victims to “hide in the crowd”.
I agree that this is likely a state actor, or at least a very large & wealthy private actor who can play the long game…
If anyone here happens to know Lasse, it might be good to check up on him and see how he's doing.
> Generally in the blackhat world, attackers have very precise targets
Lol, what
> wants to cast a wide net to attack as many as possible. So that fits the profile of a government intelligence agency
That's quite backwards. Governments are far more likely to deploy a complex attack against a single target (see also: Stuxnet); other attackers (motivated primarily by money) are far more likely to cast a wide net.
> That's quite backwards. Governments are far more likely to deploy a complex attack against a single target (see also: Stuxnet); other attackers (motivated primarily by money) are far more likely to cast a wide net.
Governments are well known to keep vulnerabilities hidden (see EternalBlue). Intentionally introducing a vulnerability doesn’t seem that backwards tbh
Oh for sure. I'm not suggesting that this wasn't a government actor, although I'd only give you 50/50 odds on it myself. It coulda just been someone with a bunch of time, like phreakers of old.
According to top comment he committed multiple binary files to xz for the last two years.
Most likely this is not the first backdoor, just the first one to be discovered, so it wasn't two years of work until there were results.
But I still agree that he's probably a state actor.
Don't forget that you could have state actors who are otherwise interested in open source code, and working to actually improve it.
In fact, that'd be the best form of deep cover. It'll be interested to watch as people more knowledgable than I pour over every single commit and change.
(not to be overly pedantic, but you probably meant pore, not pour: https://www.merriam-webster.com/grammar/pore-over-vs-pour-ov... )
If you have a backdoor in a specific piece of software already, what is the purpose of trying to introduce another backdoor (and risk it getting caught)?
There are two general attack targets I'd use if I had access to a library/binary like xz:
(1) A backdoor like this one, which isn't really about its core functions, but about the fact that it's a library linked into critical code, so that you can use it to backdoor _other things_. Those are complex and tricky because you have to manipulate the linking/GOT specifically for a target.
(2) Insert an exploitable flaw such as a buffer overflow so that you can craft malicious .xz files that result in a target executing code if they process your file. This is a slightly more generic attack vector but that requires a click/download/action.
Not every machine or person you want to compromise has an exposed service like ssh, and not every target will download/decompress a file you send to them. These are decently orthogonal attack vectors even though they both involve a library.
(Note that there's as yet no evidence for #2 - I'm just noting how I'd try to leverage this to maximum effect if I wanted to.)
This backdoor targeted only sshd.
There could be other backdoors for other targets.
xz is a data compression tool, so it's natural to have compressed files for (de)compression tests.
these files are also useful to check that the library we just built works correctly. but they aren't necessary for installation.
we may have more sophisticated procedures that will allow us to use some parts of distribution only for tests. This may significantly reduce an attack vector - many projects have huge, sophisticated testing infrastructure where you can hide the entire Wikipedia.
> They want to attack this company or this group of individuals. But someone who backdoors such a core piece of open source infrastructure wants to cast a wide net to attack as many as possible.
The stuxnet malware, which compromised Siemens industrial controls to attack specific centrifuges in uranium enrichment plants in Iran, is a counterexample to that.
Stuxnet wasn't similar to this xz backdoor. The Stuxnet creators researched (or acquired) four Windows zero-days, a relatively short-term endeavor. Whereas the xz backdoor was a long-term 2.5 years operation to slowly gain trust from Lasse Collin.
But, anyway, I'm sure we can find other counter-examples.
If a government wants to cast a wide nest and catch what they can, they'll just throw a tap in some IXP.
If a government went to this much effort to plant this vulnerability, they absolutely have targets in mind - just like they did when they went to the effort of researching (or acquiring) four separate Windows zero-days, combining them, and delivering them...
> a long-term 2.5 years operation to slowly gain trust from Lasse Collin
Couldn't the account that committed the backdoor have been compromised recently?
Bit much speculating about mistresses and poisoned spouses with well anything to go on...
Adding some unreadable binary to the source code is a really dangerous thing to do. We also need tools to quickly detect the addition of indentation symbols that can be easily overlooked.
BYW,I had a classmate who used to play DOTA1(on war3) under this name at the University of Science and Technology of China a long time ago, and this was his first girlfriend name (maybe) . His father was a high-ranking official. Then he joined the parent department of the Internal Security Detachment, a secret service that has gained a lot of power in the last few years. I hope I'm not awake . lol.
Yes, I believe it's an state actor, and the intention of choosing a typical Chinese name Jia Tan is intentially and malicious.
Literally this https://xkcd.com/2347/
[flagged]
Brand new anon HN account created 17 minutes ago to defend China? Hmm, suspicious :-)
Plus China does not care about obfuscation. They smash and grab and then deny, deny, deny + counter accuse.
I love that i get downvoted on this immediately like I haven't worked IR cases from CN threat actors that did just this.
To be fair, if I worked for the responsible state and it wasn't China, then this is what I would do to deflect…
> Though I'm not a malicious actor
Yeah, the actor part seems unnecessary now.
Yeah, could be Venezuela. Though I'm not trying to make random statements to create uncertainty and doubt, so take this with a gigantic grain of salt.
It's ridiculous to think it's the US as it would be an attack on Red Hat a US company and an attack on Americans. It's a good way to be dragged in front of Congress.
Hardly ridiculous.
You say that as if members of US government agencies didn't plot terror attacks on Americans (Operation Northwood), steal the medical records of American whistleblowers (Ellsberg), had to be prevented from assassinating American journalists (Gordon Liddy, on Jack Anderson), collude to assassinate American political activists (Fred Hampton), spy on presidential candidates (Watergate), sell weapons to countries who'd allegedly supported groups who'd launched suicide bombing attacks on American soldiers (Iran-Contra), allow drug smugglers to flood the USA with cocaine so that they could supply illegal guns to terrorists abroad on their return trip (Iran-Contra again) and get caught conducting illegal mass-surveillance on American people as a whole (Snowden). Among others.
It's super-naive to suggest that government agencies wouldn't act against the interest of American citizens and companies because there might be consequences if they were caught. Most of the instances above actually were instances where the perpetrators did get caught, which is why we know about them.
Caught and, more importantly, nothing bad typically happened to anyone involved. Also worth noting that there is probably a survivorship bias in play.
You don’t even have to be this conspiratorially minded to believe the NSA is a legitimate suspect here. (For the record, I think literally every intelligence agency on Earth is plausible here.)
You kind of lost the thread when you say, “act against the interests of American citizens and companies”. Bro, literally anyone could be using xz, and anyone could be using Red Hat. You’re only “acting against Americans” if you use it against Americans. I don’t know who was behind this, but a perfectly plausible scenario would be the NSA putting the backdoor in with an ostensibly Chinese login and then activating on machines hosted and controlled by people outside of the US.
Focusing on a specific distro is myopic. Red Hat is popular.
> but a perfectly plausible scenario would be the NSA putting the backdoor in with an ostensibly Chinese login and then activating on machines hosted and controlled by people outside of the US.
There's a term for that: NOBUS (https://en.wikipedia.org/wiki/NOBUS). It won't surprise me at all if this backdoor can only be exploited if the attacker has the private key corresponding to a public key contained in the injected code. It also won't surprise me if this private key ends up being stolen by someone else, and used against its original owner.
>It also won't surprise me if this private key ends up being stolen by someone else, and used against its original owner.
And that is exactly why backdoored encryption is bad.
100%.
The HN crowd has come a long way from practically hero-worshipping Snowden to automatically assuming that 'state actor' must mean the countries marked evil by the US.
[flagged]
I love being called naive.
Seems like an appropriately used descriptor here.
Whisper it to me lover.
The US has backdoored RSA's RNG and thus endangered the security of American companies. It is naive to think that US intelligence agencies will act in the best interest of US citizens or companies.
Notably that was a "no-one-but-us" backdoor, that requires a specific secret key to exploit. We'll see when someone analyzes the payload further, but presumably this backdoor also triggers on a specific private key. If not there are ways to do it that would look far more like an innocent mistake, like a logic bug or failed bounds check.
I can see some arguments that might persuade the NSA to run an attack like this
- gathers real world data on detection of supply attacks
- serves as a wake-up call for a software community that has grown complacent on the security impact of dependencies
- in the worst case, if no one finds it then hey, free backdoor
What about the time it was shown they did the reverse (hardened security using math only they knew at the time) for DSA
What about it?
There's an implicit "always" in their second sentence, if you're confused by the wording. They aren't positing the equivalent of the guard that only lies.
It's an interesting story for those who haven't heard about that an think the NSA could only be up to evil. You may not have read it as the guard only ever lies, but that doesn't stop people from thinking that anyway.
It's an interesting story, but I still don't know what you wanted as an answer to "What about".
They were responding to:
> It is naive to think that US intelligence agencies will act in the best interest of US citizens or companies.
With an example of them doing exactly that.
This is addressed very directly by the second paragraph of my first comment. Please adjust your response to take that into account.
why are you so fight-y? do you have to be right, or have the last word? what is it?
I'm perfectly willing to have an actual discussion, but someone coming along to ignore what I said is kind of annoying.
Is there something more productive that I could have replied with? (I know I could have been less snippy, but I think being snippy is fair there.)
No I think that's it. "What about it?" kinda set me off, and then "if you're confused by the wording" was unnecessarily condescending.
You coulda just pointed out that just because they did right in the case of DSA, doesn't mean we should actually ever trust them, which I would agree is the correct stance.
Mostly I think that story is neat and wanted people to know about it, so I asked a question as a performative writing technique.
"What about it?" is a very real question that I still want to know the answer to. What did you want as a response when you asked that?
"If you're confused by the wording" was definitely condescending, but I think interpreting guinea-unicorn's post that way doesn't make sense. Even in your reply you didn't say you think it's the right interpretation, just that someone might believe the NSA could "only be up to evil". That followup gives the impression you were giving an FYI for readers. Which is nice to do, but then the "what about" doesn't fit.
So all of that is to say the words "what about" felt like you were deciding to read their post in an unfair way.
I'm happy to listen to an alternate explanation! But you ignored my request for why you said that, and I'm honestly kind of confused as to why that's what set you off.
So overall I think I think my first post can come across as fighty but I don't think the followups should suggest I'm making things fighty. I think my response to 2OEH8eoCRo0 was fine given the way they were ignoring half of the four sentences I had typed.
That is speculation and has never been confirmed.
You are understating the level of evidence that points to the NSA being fully aware of what it was doing.
To be clear, the method of attack was something that had been described in a paper years earlier, the NSA literally had a program (BULLRUN) around compromising and attacking encryption, and there were security researchers at NIST and other places that raised concerns even before it was implemented as a standard. Oh, and the NSA paid the RSA $10 million to implement it.
Heck, even the chairman of the RSA implies they got used by the NSA:
In an impassioned speech, Coveillo said RSA, like many in industry, has worked with the NSA on projects. But in the case of the NSA-developed algorithm which he didn’t directly name, Coviello told conference attendees that RSA feels NSA exploited its position of trust. In its job, NSA plays two roles, he pointed out. In the information assurance directorate (IAD) arm of NSA, it decides on security technologies that might find use in the government, especially the military. The other side of the NSA is tasked with vacuuming up data for cyber-espionage purposes and now is prepared to take an offensive role in cyber-attacks and cyberwar.
“We can’t be sure which part of the NSA we’re working with,” said Coviello with a tone of anguish. He implied that if the NSA induced RSA to include a secret backdoor in any RSA product, it happened without RSA’s consent or awareness.
https://www.networkworld.com/article/687628/security-rsa-chi...
What type of confirmation do you want? The documents aren't going to be declassified in the next couple of decades, if ever.
I've never heard anyone claim that Dual_EC_DRBG is most likely not intentionally backdoored, but there's literally no way to confirm because of how its written. If we can't analyze intention from the code, we can look at the broader context for clues. The NSA spent an unusual amount of effort trying to push forward an algorithm that kept getting shot down because it was slower than similar algorithms with no additional benefits (the $10 million deal specified it as a requirement [1]). If you give the NSA the benefit of the doubt, they spent a lot of time and money to... intentionally slow down random number generation?!
As an American, I'd prefer a competent NSA than an incompetent NSA that spends my tax dollars to make technology worse for literally no benefit...
[1] https://www.reuters.com/article/us-usa-security-rsa-idUSBRE9...
Have you forgotten about the Snowden leaks exposing the surveillance on Americans by the American govt?
Every country spies on its own citizens.
By comparison America is actually quite timid compared to other countries e.g. UK and the widespread CCTV network.
I'd say that CCTV is quite different to wiretapping. You (generally) wouldn't have the expectation of privacy in a public place, most people would expect that phone calls, messages, etc do remain private.
Now, GCHQ is no better than the NSA for that either, but I don't think CCTV is a good comparison.
While his leaks expose surveillance, he was useful idiot https://en.wikipedia.org/wiki/Useful_idiot in hands of Assange club. And it might be event of his saving was trigger for Putin to start war. So no, I'd better see whole camaraderie before court and sentenced. Regardless of 'heroism'.
And yes, most of modern supporters of Wikileaks / Assange / Snowden / etc, chanting 'release Assange' and 'pardon Snowden' are useful idiots in hands of tyrannies like BRICS club.
Yeah as we know, intelligence agencies are very often held accountable in the US. As witnessed by all the individuals that got charged or punished for uh... nevermind.
I'm not very inclined to think this is the US govt, however, you should better acquaint yourself with the morals of some members of Congress.
I think the best reason to doubt USG involvement is the ease with which somebody discovered this issue, which is only a month or two old. I feel like NSA etc. knows not to get caught doing this so easily.
Seems to be a perfect project to hijack. Not too much happening, widely used, long history, single maintainer who no longer has time to manage the project and wants to pass it over.
I handed over all the emails I received to the security team, who I guess will send them "higher". I'll let them analyse it.
Yikes indeed. This fix is being rolled out very fast, but what about the entire rest of the codebase? And scripts? I mean, years of access? I'd trust no aspect of this code until a full audit is done, at least of every patch this author contributed.
(note: not referring to fedora here, a current fix is required. But just generally. As in, everyone is rolling out this fix, but... I mean, this codebase is poison in my eyes without a solid audit)
This seems to be the account, correct me if wrong (linked from the security email commit link):
I hope authors of all these projects have been alerted.
STest - Unit testing framework for C/C++. Easy to use by simply dropping stest.c and stest.h into your project!
libarchive/libarchive - Multi-format archive and compression library
Seatest - Simple C based Unit Testing
Everything this account has done should be investigated.
Woha, is this legit or some sort of scam on Google in some way?:
https://github.com/google/oss-fuzz/pull/11587
edit: I have to be missing something, or I'm confused. The above author seems to be primary contact for xz? Have they just taken over?? Or did the bad commit come from another source, and a legit person applied it?
A bit confused here.
The concern about other projects is fine, but let's be careful with attacks directed at the person.
Maybe their account is compromised, maybe the username borrows the identity of an innocent person with the same name.
Focus on the code, not people. No point forming a mob.
(e: post above was edited and is no longer directed at the person. thanks for the edit.)
It's important to focus on people, not just code, when suspecting an adversary. Now, I have no idea if this is the right account, and if it has recently been compromised/sold/lost, or if it has always been under the ownership of the person who committed the backdoor. But IF this is indeed the right account, then it's important to block any further commit from it to any project, no matter how innocuous it seems, and to review thoroughly any past commit. For the most security-conscious projects, it would be a good idea to even consider reverting and re-implementing any work coming from this account if it's not fully understood.
An account that has introduced a backdoor is not the same thing as an account who committed a bug.
I agree we should look at the account and its contributions, I make a distinction between the account and the person.
Sometimes the distinction is not meaningful, but better safe than sorry.
Oh, agreed then.
They appear to have moved carefully to set this up over the course of weeks by setting up the framework to perform this attack.
I would now presume this person to be a hostile actor and their contributions anywhere and everywhere must be audited. I would not wait for them to cry 'but my bother did it', because an actual malicious actor would say the same thing. The 'mob' should be pouring over everything they've touched.
Audit now and audit aggressively.
My above post shows the primary domain for xz moving from tukaani.org to xz.tukaani.org. While it's hosted on github:
$ host xz.tukaani.org
host xz.tukaani.org is an alias for tukaani-project.github.io.
And originally it was not:
$ host tukaani.org
tukaani.org has address 5.44.245.25 (seemingly in Finland)
It was moved there in Jan of this year, as per the commit listed in my prior post. By this same person/account. This means that instead of Lasse Collin's more restrictive webpage, an account directly under the control of the untrusted account, is now able to edit the webpage without anyone else's involvement.
For example, to make subtle changes in where to report security issues to, and so on.
So far I don't see anything nefarious, but at the same time, isn't this the domain/page hosting bad tarballs too?
This account changed the instructions for reporting security issues in the xz github as their very last commit:
commit af071ef7702debef4f1d324616a0137a5001c14c (HEAD -> master, origin/master, origin/HEAD)
Author: Jia Tan <jiat0218@gmail.com>
Date: Tue Mar 26 01:50:02 2024 +0800
Docs: Simplify SECURITY.md.
diff --git a/.github/SECURITY.md b/.github/SECURITY.md
index e9b3458a..9ddfe8e9 100644
--- a/.github/SECURITY.md
+++ b/.github/SECURITY.md
@@ -16,13 +16,7 @@ the chance that the exploit will be used before a patch is released.
You may submit a report by emailing us at
[xz@tukaani.org](mailto:xz@tukaani.org), or through
[Security Advisories](https://github.com/tukaani-project/xz/security/advisories/new).
-While both options are available, we prefer email. In any case, please
-provide a clear description of the vulnerability including:
-
-- Affected versions of XZ Utils
-- Estimated severity (low, moderate, high, critical)
-- Steps to recreate the vulnerability
-- All relevant files (core dumps, build logs, input files, etc.)
+While both options are available, we prefer email.
This project is maintained by a team of volunteers on a reasonable-effort
basis. As such, please give us 90 days to work on a fix before
Seems innocuous, but maybe they were planning further changes.> Seems innocuous, but maybe they were planning further changes.
Seems like an attempt to get 90 days of "use" of this vulnerability after discovery. If they only had checked performance before!
No, they just removed the bullet points about what to include in a report. The 90 days part was in both versions.
Yes. An incomplete report allows for dragging out "fixing" the issue longer.
True, but the "talk only to me" part was new, I think.
They didn't add any content, it was a pure removal commit
The website change reminds me a bit of lbzip2.org https://github.com/kjn/lbzip2/issues/26#issuecomment-1582645... Although, at the moment, it only seems to be spam. The last commit was 6 years ago, so I guess that's better than a maintainer change...
> tukaani.org has address 5.44.245.25 (seemingly in Finland)
Hetzner?
For what it's worth, tukaani is how you spell toucan (the bird) in Finnish, and Lasse is a common Finnish name; the site being previously hosted in Finland is very plausible.
Yeah according to their website[0] it looks like majority of the past contributors were Finnish so nothing odd about the hosting provider. On the same page it says that Jia Tan became co-maintainer of xz in 2022.
No:
route: 5.44.240.0/21
descr: Zoner Oy
origin: AS201692
mnt-by: MNT-ZONER
created: 2014-09-03T08:09:00Z
last-modified: 2014-09-03T08:09:00Z
source: RIPE
It's Finnish, Oy is short for "Osake Yhtiö" (share-association, basically a LLC), seems to be registered/hosted at https://www.zoner.fi/
So probably Suojelupoliisi, Finnish Security and Intelligence Service is behind all this
Zoner is a Finnish web hosting company, which has a history of providing hosting for Finnish open source projects, and the original maintainer (and most of the original crew) is Finnish as well. Nothing weird here.
Interesting, seems to be a tiny finnish hosting company: https://www.zoner.fi/english/
If the owner of the account is innocent and their account was compromised, it's on them to come out and say that. All signs currently point to the person being a malicious actor, so I'll proceed on that assumption.
Does the person exist at all? Maybe they're a persona of a team working at some three letter agency...
Probably not. I did some pattern of life analysis on their email/other identifiers. It looks exactly like when I set up a burner online identity- just enough to get past platform registration, but they didn't care enough to make it look real.
For example, their email is only registered to GitHub and Twitter. They haven't even logged into their Google account for almost a year. There's also no history of it being in any data breaches (because they never use it).
Burn the witch.
It would be interesting to hear the whole arc of social engineering behind getting access to the repo. Although, as a maintainer of a large-ish OSS project myself, I know that under a lot of burden any help will be welcomed with open arms, and I've never really talked about private stuff with any of them.
did you find the Twitter account associated to Jia's email?
Found it by myself! https://twitter.com/JiaT03868010
Or for some three letter party.
> The above author seems to be primary contact for xz?
They made themselves the primary contact for xz for Google oss-fuzz about one year ago: https://github.com/google/oss-fuzz/commit/6403e93344476972e9...
A SourceGraph search like this shows https://sourcegraph.com/search?q=context:global+JiaT75&patte...
- Jia Tan <jiat75@gmail.com>
- jiat75 <jiat0218@gmail.com>
``` amap = generate_author_map("xz")
test_author = amap.get_author_by_name("Jia Cheong Tan")
self.assertEqual(
test_author.names, {"Jia Cheong Tan", "Jia Tan", "jiat75"}
)
self.assertEqual(
test_author.mail_addresses,
{"jiat0218@gmail.com", "jiat75@gmail.com"}
)
```I tried to understand the significance of this (parent maybe implied that they reused a completely fictitious identity generated by some test code), and I think this is benign.
That project just includes some metadata about a bunch of sample projects, and it links directly to a mirror of the xz project itself:
https://github.com/se-sic/VaRA-Tool-Suite/blob/982bf9b9cbf64...
I assume it downloads the project, examines the git history, and the test then ensures that the correct author name and email addresses are recognized.
(that said, I haven't checked the rest of the project, so I don't know if the code from xz is then subsequently built, and or if this other project could use that in an unsafe manner)
additionally, even though the commit messages they've made are mostly plain, there may be features of their commit messages that could provide leads, such as his using what looks like a very obscure racist joke of referring to a gitignore file as a 'gitnigore'. There's barely a handful of people on the whole planet making this 'joke'.
Can you point to where you saw that racist joke?
I don't see anything at https://sourcegraph.com/search?q=context:global+author:jiat0...
first commit made in one of JiaT75's other repos https://github.com/JiaT75/STest/commits/master/
Thank you. If you wouldn't have explained the background, I totally would've thought that this is just an innocent typo.
(I still think it's like... 60% a typo? don't know)
Anyhow, other people called the CCing of JiaT75 by Lasse suspicious:
https://news.ycombinator.com/item?id=39867593
https://lore.kernel.org/lkml/20240320183846.19475-2-lasse.co...
Someone pointed out the "mental health issues" and "some other things"
https://news.ycombinator.com/item?id=39868881
https://www.mail-archive.com/xz-devel@tukaani.org/msg00567.h...
Lasse is of course a Nordic name, and the whole project has a finnish name and hosting
https://news.ycombinator.com/item?id=39866902
If I wanted to go rogue and insert a backdoor in a project of mine, I'd probably create a new sockpuppet account and hand over management of the project to them. The above is worringly compatible with this hypothesis.
OTOH, JiaT75 did not reuse the existing hosting provider, but rather switched the site to github.io and uploaded there old tarballs:
https://github.com/tukaani-project/tukaani-project.github.io...
If JiaT75 is an old-timer in the project, wouldn't they have kept using the same hosting infra?
There are also some other grim possibilities: someone forced Lasse to hand over the project (violence or blackmailing? as farfetched as that sounds)... or maybe stole Lasse devices (and identity?) and now Lasse is incapacitated?
Or maybe it's just some other fellow scandinavian who pretended to be chinese and got Lasse's trust. In which case I wish Lasse all the best, and hope they'll be able to clear their name.
Is the same person sockpuppeting Hans Jansen? It's amusing (but unsurprising) that they are using both german-sounding and chinese-sounding identities.
That said, I don't think it's unreasonable to think that Lasse genuinely trusted JiaT75, genuinely believed that the ifunc stuff was reasonable (it probably isn't: https://news.ycombinator.com/item?id=39869538 ) and handed over the project to them.
And at the end of the day, the only thing linking JiaT75 to a nordic identity is a nordic racist joke which could well be a typo. People already checked the timezone of the commits, but I wonder if anyone has already checked the time-of-day of those commits... does it actually match the working hours that a person genuinely living (and sleeping) in China would follow? (of course, that's also easy to manipulate, but maybe they could've slip up)
Anyhow, I guess that security folks at Microsoft and Google (because of JiaT75 email account) are probably going to cooperate with authorities on trying to pin down the identity of JiaT75 (which might not be very useful, depending on where they live).
> does it actually match the working hours that a person genuinely living (and sleeping) in China would follow?
No, it doesn't:
https://play.clickhouse.com/play?user=play#U0VMRUNUIHRvSG91c...
The vast majority of their Github interactions are between 12.00 UTC and 18.00 UTC
It's worth mentioning Lasse is still online in the Libera chat room, idling. Nothing's been said.
From elsewhere in the comments:
https://news.ycombinator.com/item?id=39874621
> He came on IRC, he seemed ok. He did some cleanup of access and signed off for easter.
i think it's American trauma. outside of the Western hemisphere, sexist and racist jokes are just jokes
Pretty sure this is just a typo...
Interesting thing about this jiat75@gmail.com email is that it seems to not exist?
The google account: "Couldn't find your Google Account"
The email: "50 5.1.1 The email account that you tried to reach does not exist"
But then when you try to register it says it's taken.
Was it disabled?
I'd say at this point all major tech companies, ISPs and authorities should have more enough information and disabling and freezing their accounts would be the first step.
This can happen if you delete your old gmail account. Source: I deleted a gmail account I shouldn't have years ago. It will say taken if it previously existed, and was deleted.
Oh no, not libarchive! GitHub search shows 6 pull requests were merged back in 2021.
https://github.com/search?q=repo%3Alibarchive%2Flibarchive+j...
It does look innocent enough though. Let's hope there's no unicode trickery involved...
Maybe not. They removed safe_fprintf() here and replaced it with the (unsafe) fprintf().
https://github.com/libarchive/libarchive/commit/e37efc16c866...
That seems to be fine. safe_fprintf() takes care of non-printable characters. It's used for archive_entry_pathname, which can contain them, while "unsafe" fprintf is used to print out archive_error_string, which is a library-provided error string, and strerror(errno) from libc.
We know there's long-cons in action here, though. This PR needn't be the exploit. It needn't be anywhere _temporally_ close to the exploit. It could just be laying groundwork for later pull requests by potentially different accounts.
Exactly. If we assume the backdoor via liblzma as a template, this could be a ploy to hook/detour both fprintf and strerror in a similar way. Get it to diffuse into systems that rely on libarchive in their package managers.
When the trap is in place deploy a crafted package file that appears invalid on the surface level triggers this trap. In that moment fetch the payload from the (already opened) archive file descriptor, execute it, but also patch the internal state of libarchive so that it will process the rest of the archive file as if nothing happened, and the desired outcome also appearing in the system.
Assuming there isn't another commit somewhere modifying a library-provided error string or anything returned by libc. There is all kinds of mischief to be had there, which may or may not have already happened, e.g. now you do some i18n and introduce Unicode shenanigans.
If libarchive is also backdoored, would that allow specifically crafted http gzip encoded responses to do bad things?
No. There's no good reason HTTP response decoding would ever be implemented in terms of libarchive; using libz directly is simpler and supports some use cases (like streaming reads) which libarchive doesn't.
What software is using libarchive to decode HTTP responses?
Well for one, the Powershell script I just wrote to build all the 3rd-party library dependencies for a video game.
tar.exe was added to Windows this January, sourced from libarchive: https://learn.microsoft.com/en-us/virtualization/community/t...
Unlike the GNU tar I'm used to, it's actually a "full fat" command line archiving tool, compressing & decompressing zip, xz, bz2 on the command-line - really handy :-O
FreeBSD's archive tools are built on top of libarchive IIRC. Not sure about the other BSDs.
I don't know, way outside my domain. Possibly none I guess?
EDIT: Ahh, I was wrong and missed the addition of "strerror"
The PR is pretty devious.
JiaT75 claims is "Added the error text when printing out warning and errors in bsdtar when untaring. Previously, there were cryptic error messages" and cites this as fixing a previous issue.
https://github.com/libarchive/libarchive/pull/1609
However it doesn't actually do that!
The PR literally removes a new line between 2 arguments on the first `safe_fprintf()` call, and converts the `safe_fprintf()` to unsafe direct calls to `fprintf()`. In all cases, the arguments to these functions are exactly the same! So it doesn't actually make the error messages any different, it doesn't actually solve the issue it references. And the maintainer accepted it with no comments!
reread it...
It does remove the safe prefixes... But it also adds one print statement to "strerror()", which could plausibly give better explanations for the error code...
The only suspicious thing here is the lack for safe_ prefix (and the potential for the strerror() function to already be backdoored elsewhere in another commit)
But I see the "strerror" call is added
JiaT75 also has commits in wasmtime according to https://hachyderm.io/@joeyh/112180082372196735
Just a documentation change, fortunately:
https://github.com/bytecodealliance/wasmtime/commits?author=...
They've submitted little documentation tweaks to other projects, too; for example:
https://learn.microsoft.com/en-us/cpp/overview/whats-new-cpp...
I don't know whether this is a formerly-legitimate open source contributor who went rogue, or a deep-cover persona spreading innocuous-looking documentation changes around to other projects as a smokescreen.
Minor documentation change PRs is a well known tactic used to make your GitHub profile look better (especially to potential employers).
He could be doing the same thing for other reasons; nobody really digs into anything very deep so I could see someone handing over co-maintenance to a project based on a decent looking Github graph and some reasonability.
Consider the possibility those type of submissions were part of the adversary's strategy in order to make their account appear more legitimate rather than appearing out of nowhere wanting to become the maintainer of some project.
per https://hachyderm.io/@bjorn3/112180226784517099, "The only contribution by them to Wasmtime is a doc change. No actual code or binary blobs have been changed by them."
>Woha, is this legit or some sort of scam on Google in some way?:
I work on OSS-Fuzz.
As far as I can tell, the author's PRs do not compromise OSS-Fuzz in any way.
OSS-Fuzz doesn't trust user code for this very reason.
It looks more like they disabled a feature of oss-fuzz that would've caught the exploit, no?
That's what people are saying though I haven't had the chance to look into this myself.
Fuzzing isn't really the best tool for catching bugs the maintainer intentionally inserted though.
It's more likely that fuzzing would blow up on new code and they wanted an excuse to remove it.
After all, if it hadn't had a performance regression (someone could submit a PR fixing whatever slowed it down, heh) it still wouldn't be known.
There is also a variety of new, parallelized implementations of compression algorithms which would be good to have a close look at. Bugs causing undefined behaviour in parallel code are notoriously hard to see, and the parallel versions (which are actually much faster) could be take the place of well-established programs which have earned a lot of trust.
That looks like a repo that would sound alarms if you look at it from a security standpoint.
Well that account also did most of the releases since 5.4.0.
+1 Can see from project homepage http://web.archive.org/web/20240329165859/https://xz.tukaani... they have some release responsibility from 5.2.12.
> Versions 5.2.12, 5.4.3 and later have been signed with Jia Tan's OpenPGP key . The older releases have been signed with Lasse Collin's OpenPGP key .
It must be assume that before acquiring that privilege, they also contributed code to project. Probably most was to establish respectable record. Still could be malicious code going back someways.
Looks like the Jia Tan OpenPGP key was replaced a few months ago as well: https://github.com/tukaani-project/tukaani-project.github.io...
I get why people are focusing on this bad actor. But the question that interests me more: how many other apparent individuals fit the profile that this person presented before caught?
Apparently, many.
It looks like gettext may be containing a part of their attack infrastructure.
https://github.com/microsoft/vcpkg/pull/37199#pullrequestrev...
https://github.com/microsoft/vcpkg/pull/37356/files#diff-e16...
Are you referencing the '-unsafe' suffix in the second link? That is not something to worry about.
This is from Gnulib, which is used by Gettext and other GNU projects. Using 'setlocale (0, NULL)' is not thread-safe on all platforms. Gnulib has modules to work around this, but not all projects want the extra locking. Hence the name '-unsafe'. :)
See: https://lists.gnu.org/archive/html/bug-gnulib/2024-02/msg001...
They may be right: https://git.alpinelinux.org/aports/log/main/gettext
Timeline matches and there is a sudden switch of maintainer. And they add dependency to xz!
psykose was a prolific contributor to Alpine's aports, with thousands of commits over the past few years[0]. So, I doubt They're involved.
[0]: https://git.alpinelinux.org/aports/stats/?period=y&ofs=10
JiaT75 was also a prolific contributor to xz over the past few years, so your assumptions are generally invalid at this point.
There is zero web presence for this person and associated email address.
Looks more likely a fake identity than compromised account.
Actually the "jiat0218" user part in his email address jiat0218@gmail.com has a bunch of matches on Taiwanese sites:
https://char.tw/blog/post/24397301
I think it's just a coincidence.
- All the posts are from 2004/2006. - "jiat" can be abbreviation for many common Chinese names.
I agree, probably a coincidence. Just wanted to point out we can, actually, find the username online.
It might just be a coincidence, but the same username from that gmail account also appears to have a Proton Mail address
I think it's not a coincidence: Hans Jansen (hansjansen162@outlook.com) has a matching account on Proton mail too (hansjansen162@proton.me). Furthermore, the Outlook account is configured as recovery e-mail for the Proton account.
This is all I can find on them.
carrd.co jiat0218@gmail.com business https://jiat0218@gmail.com.carrd.co
eBay JiaT75 shopping https://www.ebay.com/usr/JiaT75
giters jiat0218 coding https://giters.com/jiat0218
giters JiaT75 coding https://giters.com/JiaT75
GitHub jiat0218 coding https://github.com/jiat0218
GitHub JiaT75 coding https://github.com/JiaT75
Mastodon-meow.so.. jiat0218@gmail.com social https://meow.social/@jiat0218@gmail.com
Beyond that, nothing surefire. (This is all publicly queryable information, if anyone is curious).JiaT75 also used "jiatan" on Libera.Chat using a Singapore IP address (possibly a proxy/VPN).
Where did you gather this information from?
I've never had a web presencse for my associated emails due to wanting to avoid spammers. I don't have a false identity.
Keep in mind that having a "false identity" does not make you a malicious actor. I have a serious project I work on under another pseudonym, but it has to do more with the fact that I do not want my real name to be associated with that project AND having a serious case of impostor syndrome. :/
That, and I used to contribute to various games (forks of ioquake3) when I was a teen and I wanted to keep my real name private.
Someone named "John is good" claims they aren't a malicious actor... You're trying real hard to convince us, huh.
Oh yeah, I am using a pseudonym here as well, because I have controversial views in some topics. :P
[dead]
> I don't have a false identity.
That's just what someone with a false identity would say.. get him boys!
The biggest /S
I am more interest about his git commits https://github.com/JiaT75?tab=overview&from=2021-12-01&to=20... If JiaT75 is a Chinese, then his working log should follow Chinese Holiday, especially Spring Festival and National Holiday. Chinese usually not work on first 3 days of Spring Festival and National Holiday - 2021 2/11 - 2/13 (few commits), 2021 10/1 - 10/3 (nothing) 2022 1/31 - 2/2 (huge commits on 1/31, suspect), 2022 10/1 - 10/3 (nothing) 2023 and 2024, not very much commits. So 2022 1/31 huge commits is a proof that he is not follow Chinese holiday.
But wait, 2021 is his active year, but he missed almost all Aug. Is he on holiday? Who can have such a long holiday? What i can think is a solider who has a long vacation (探亲假). So let's guess he is a solider then it's sense that he worked on Spring Holiday because they need on duty. Let's double check again, if he is a solider, then they will have a holiday on every Aug. 1 because it's liberation army day. I check and no commits on all 4 years Aug. 1.
Did you check Chinese social media?
I found this link on Zhihu: https://www.zhihu.com/question/650826484
Why would you think the person would have social media (or would even be on Chinese social media specifically), given the sophistication and planning?
I mention Chinese social media specifically because I know it's not indexed so well by western search engines. You can't conclude someone has no social footprint until you've actually checked.
Regardless of how likely you think it is, finding a social media footprint would be useful information. Seek information first, reach conclusions second.
i wonder if that avatar, familiarity with C/C++ and Git, and "offering help with open source projects" is just coincidence
https://twitter.com/JiaTan1337/status/1774931375994319244
kind of interesting also to see this account was set up ~2 months ago. if it's a troll, it's a somewhat poor joke.
[dead]
[dead]
I found a user who seems suspicious to me.
https://github.com/snappyJack/CVE-request-XZ-5.2.5-has-denia...
He understood the software architecture quite early on while working on the following repository. He connected the dots from his other projects and went rogue. (probably to benefit from crypto?). Take a look at his other repositories and code style and recent likes on github. Is he our Jia Tan?
An Indian with the name, Jigar (meaning heart) would never address himself as Jigar, as seen in the citation. This would be culturally a bit weird. Unless he is being sarcastic or writing this on some comic note.
Secondly, the use of English is not consistent in what should be from typical Indian. He should be from a foreign background or a very reputed English medium.
The language though seemingly simple for a native English speaker but it seems in this case; a person whose first language: likely is not English.
It is possible that Grammarly or auto correct could have been used to write these. But can't be certain of anything stated above.
I do think that this is a sabotage account with 60% chances unless Mr. Kumar comes out clean, publicly. He is likely a state sponsored actor.
Not a developer but reading the changelogs and commit history from this person seem interesting, as they appear to be some effort consolidate control and push things in the direction of supporting wider dissemination of their backdoor code:
Discussing commits that the other author has since reverted, IFUNC change with Project Zero tests, a focus on embedded, etc.:
https://www.mail-archive.com/xz-devel@tukaani.org/msg00642.h...
Trimming security reporting details:
https://git.tukaani.org/?p=xz.git;a=commitdiff;h=af071ef7702...
We detached this subthread from https://news.ycombinator.com/item?id=39866275. (It's fine; I'm just trying to prune the top-heavy subthread.)
"crazytan" is the LinkedIn profile of a security software engineer named Jia Tan in Sunnyvale working at Snowflake, who attended Shanghai Jiao Tong University from 2011 to 2015 and Georgia Institute of Technology from 2015 to 2017. However, this Jia Tan on LinkedIn might not be the same Jia Tan who worked on XZ Utils. Also, the person who inserted the malicious code might be someone else who hijacked the account of the Jia Tan who worked on XZ Utils.
Has Jia in any way posted a response to the incident?
My assumption would be that he knows the jig is up, and is probably going to do everything he can to jettison the JiaTan account, lest any IPs he uses be turned over to authorities.
May or may not be related: https://www.linkedin.com › crazytan Jia Tan - Snowflake | LinkedIn
Tukaani website states "jiatan" as the nickname of the malicious code committer on Libera Chat.
WHOWAS jiatan provided me the following information:
jiatan ~jiatan 185.128.24.163 * :Jia Tan jiatan 185.128.24.163 :actually using host jiatan jiatan :was logged in as jiatan tungsten.libera.chat :Fri Mar 14:47:40 2024
WHOIS yields nothing, the user is not present on the network at the moment.
Given that 185.128.24.163 is covered with a range-block on the English Wikipedia, it appears this is a proxy.
> it appears this is a proxy.
Yes, that IP address appears associated with witopia[.]net, specifically vpn.singapore.witopia[.]net points to that IP address.
[dead]
[dead]
[dead]
[flagged]
[flagged]
[flagged]
[flagged]
[flagged]
[flagged]
can someone ELI5 ?
House of cards experiences strong wind.
pRoBaBlY a StaTe AcToR
zero definition of what that means...
egos of people who just like to say cool words they don't understand
lol
this comment will probably get deleted, but let the action of this comment being deleted stand that in 2024 we're all allowed to use big words with no definition of what they mean -> bad
state actor? who? what motive? what country? all comments involving "state actor" are very broad and strange... i would like people to stop using words that have no meaning, as it really takes away from the overall conversation of what is going on.
i mean you're seriously going to say "state actor playing the long game" to what end? the issue was resolved in 2 hours... this is stupid
For starters, the backdoor was technically really sophisticated.
For example, the malicious code circumvents a hardening technique (RELRO) in a clever way, which would otherwise have blocked it from manipulating the sshd code in the same process space at runtime. This is not something that script kiddies usually cook up in an afternoon to make a quick buck. You need experts and a lot of time to pull off feats like that.
This points to an organization with excellent funding. I’m not surprised at all that people are attributing this to some unknown nation-level group.
It's always Debian, like last time when they removed RNG randomness from ssh because of a warning.
This is why we never upgrade software versions. I’ve been asked by our customers why we use such an old AMI version. This is why.
This feels like the exact opposite of the takeaway you should have. Old software isn't inherently more secure; you're missing thousands of security and bug fixes. Yes, this was bad, but look how quickly the community came together to catch it and fix it.
It only took 6 days for it to be found and fixed.
Waiting for the new YouTube videos on this. "Woah! Linux has a back door dudes!". My distribution, Ubuntu (now Kubuntu) 2022 isn't affected.
Still better than TwoMinuteToiletPapers and other AI-bamboozled channels hyping over proprietary OpenAI crap (text/photo/video), what a time to be alive!
not sure why you're being downvoted. this is exactly what is going to happen.
I guess that rewriting liblzma in Rust would not have prevented this backdoor. But would have likely increased the confidence in its safety.
Using the build system (and potentially the compiler) to insert malicious backdoors is far from a new idea, and I don't see why this example would the only case.
It would have made it worse, because there would be 300 crates with 250 different maintainers, all pulled in by several trivial/baseline dependencies. More dependencies = higher the probability that a malicious maintainer has gotten maintainer's rights for one of them, especially because many original authors/maintainers of rust style microdepencency crates move on with their lives and eventually seek to exit their maintainer role. At least for classic C/C++ software, by the virtue of it being very inconvenient to casually pull 300 dependencies for something trivial, there are fewer dependencies, i.e. separate projects/repos, and these tend to be more self-contained. There are also "unserious" distributions like Fedora and something like stable/testing/unstable pipeline in Debian, which help with catching the most egregious attempts. Crates.io and npm are unserious by their very design, which is focused on maximizing growth by eliminating as many "hindrances" as possible.
Why is rust beginning to sound like JavaScript?
Modern coders have been conditioned to import random libs to save 30mins work.
Rust specifically chose a minimal standard library to not get stuck with the Python "dead batteries" problem. There's a strong culture as well of minimizing a project's dependencies in Rust.
> Rust specifically chose a minimal standard library to not get stuck with the Python "dead batteries" problem.
So has C++ in the past although there seems to be a push for a more batteries included approach recently.
> There's a strong culture as well of minimizing a project's dependencies in Rust.
This doesn't match what anyone can observe by looking at dependencies of Rust projects.
Don’t know all the details and rust isn’t immune to a build attack, but stuff like that tends to stand out a lot more I think in a build.rs than it would in some m4 automake soup.
There was a drama back then where serde tried to ship its derive macro as a precompiled binary: https://news.ycombinator.com/item?id=37189462
The backdoor hinged on hiding things in large shell scripts, obscure C "optimizations", and sanitizer disabling. I'd expect all of those would be a much bigger red flag in the Rust world.
This hack exploited a fairly unique quirk in the linux C ecosystem / culture. That packages are built from "tarballs" that are not exact copies of the git HEAD as they also contain generated scripts with arbitrary code.
It would not have happened in any modern language. It probably wouldn't have even happened in a Vistual Studio C-project for windows either.
> It would not have happened in any modern language.
It would. pip for example installs from tarballs uploaded to PyPi, not from a git repository.
Pip and similar are their own can of worms yeah. They trade convinience for an almost complete lack of oversight.
But in this case we are talking about people (distro packagers) manually downloading the source and building it which is not quite the same thing.
`pip install` does do exactly the same thing: it downloads and executes code from a tarball uploaded to PyPi by its maintainer. There's no verification process that ensures that tarball matches what's in the git repository.
Yes I know, and that's what I meant when I said "their own can of worms".
Distro-provided python packages don't use pip however, at least afaik.
The distro-provided Python packages are usually still build from the source on PyPi as uploaded by the maintainer, not what's in git.
Funny you should say that, given they definitely have exploit code in `vcpkg`
If it were using Cargo as its build system, it might make such manipulations more obvious / understandable?
Pretty much proof that OSS != automatically more secure. And proof that OSS projects can get backdoored. See this for more ideas on this issue: https://seirdy.one/posts/2022/02/02/floss-security/
The malware was hidden inside an opaque binary. If anything, this shows that we need more open source and more reproducibility.
"Lasse Collin," as other posters here have found, does not seem to exist as an experienced coder. Oddly, there is a Swedish jazz musician named Lasse Collin, which would otherwise be one of those names, especially the last name, that would stick out. Instead it is buried under a lot of mentions of a musician.
Lasse Collin has been working on xz for decades: https://sourceforge.net/p/sevenzip/discussion/45797/thread/0...
Now, whether his GitHub account is currently being controlled by him is another question.
Also, for some more context: In 2022, Lasse said he was struggling to work on xz and was looking for maintainers, and mentioned Jia Tan: https://www.mail-archive.com/xz-devel@tukaani.org/msg00567.h...
Searching for my real name on Google doesn't return anything either, I don't think this means anything.
Lasse Collin the contributor is findable, especially if you add "tukaani" to the search. But not in any other context, unless that's what old jazz musicians do in their retirement.
I don't think that's what they meant. The idea is to find information about their personal life, not OSS contributions. Something that proves they're a real person.