Tangential at best but I work in a regulated industry and we're starting to get some heavy scrutiny from external auditors and regulators on the way we plan to address extended outages and forced exits from third party vendors. This goes beyond basic DR/BCP and plans are reviewed with at least a high level scenario exercise.
https://www.federalregister.gov/documents/2023/06/09/2023-12...
On the surface a product like managed git repos would seem to be relatively straightforward to deal with, but these same regulated firms are also under tremendous scrutiny for access management, change management, SDLC, etc etc. They also have a huge software footprint which just increases the impact.
Self-hosting is obviously the traditional answer, but that's not exactly a simple one either.
Just an interesting problem.
The nice thing about git, from my perspective, is that if your entire hosted service vanishes, you can still reconstruct what you need from your users’ working directories. All of the important branches should be there. Somewhere. And any important integration branches that aren’t cached can be reconstructed.
Of all the many dependencies on cloud services, git is by far the last I’d worry overly much about.
The source code, perhaps, but a good many of the orgs I've worked at also use Github for PRs, Actions (CI), and for triggering deployments. Those things take time to setup, and across a whole org, I wouldn't want to have to have an unplanned change to another vendor.
In particular, I've moved a CI for a large repository between different CI systems. It was anything but trivial: you want to believe "it's just a YAML that runs commands, right? Translate the format, right?" but it's really not; differences between how CI systems map commands to machines, external integrations (e.g., in this case, into Action's artifacts system, or output system) etc. all make it more complicated.
Totally agree (and also that’s the reason I think all CI ecosystems are a nightmare nowadays).
But GitHub Actions are somewhat portable: there’s the standalone act [0] runner, and the Forgejo/Gitea Actions (e.g. on Codeberg [1]) that use act under the hood and are pretty much drop-in replacement – they even use GitHub-hosted actions transparently. It might not be a 100% compatible standard, but it’s pretty nice. It would be nice for others to follow lead!
This is a good reason for keeping your build steps in scripts or a Makefile rather than jumping head first into the ecosystem.
I think unless you’ve been burned by having to move CI provider before it’s easy to lean in. I had to change from Travis many years ago because of pricing changes.
Ugh, agreed - I remember doing a GitHub+TeamCity to GitLab migration and I had specifically designed all of my jobs to just directly call out to a bash script. Artifacts and GitLab's infinite hooks made that much harder than it seemed like it would be.
I think my point is we are losing the option to worry about it or not. There needs to be an answer with a runbook to restore services within a given window of time should we lose that vendor.
Problem is that often you also end up relying on GitHub for CI/CD so not as easy of a change. Imagine GH being down and you need to deploy a hotfix. How do you handle that? Especially, if you followed best practices and set up a system where all PRs need to go through code review.
This is why I personally like to use none of the CI features, and mostly use it like a shell script executor. Images? Stick to OS images only so that you can easily spin them up with `docker run` locally. Artifacts? Read and write them into S3 buckets and avoid the native artifact features.
This is obviously more difficult in the Github actions ecosystem, but I have mostly used Gitlab CI so far. My CI pipelines mostly look like this:
image: ubuntu:24.04
before_script:
- apt-get install ...
script:
- ./ci/build-project.sh
after_script:
- ./ci/upload-build-artifacts.sh
Systems like these should have an escape hatch of some sort. The key part is that it needs to be auditable.
Anything you do in CI should be possible outside of CI, at least by some subset of users.
I've seen numerous "escape hatches" over the years that actually just turned out to be painted on the wall. If you actually tried to use them. No one ever does though.
I don't think its malice. I just think its pretty uncommon for anyone to intentionally back out of a structural tech decision so it gets forgotten about and remains un-battle tested.. That or the timeline is longer than SaaS has been around.
Yea - definitely. Just not ideal and something that needs to be built out, tested, etc.
Yes, it is easier said than done. At my company we use buildkite and many people wrote scripts that simply fail outside of buildkite.
GitHub actions is even worse, it seems like it was designed from the ground up to create lock in.
Nix helps a bit on the bootstrapping and dependency management problem, but won't save you from writing a script that is too tightly coupled to its runtime environment.
I've run into a scenario where one of our rarely used environments needed a hotfix and the GitHub action we used to deploy there was broken. Was easy enough to translate GitHub action steps to shell scripting for a quick, manual deployment.
Git is only a small part of Github these days.
Let's say you self-host Github. Now you are responsible for maintaining uptime, and you have less expertise with the service and less resources to dedicate to keeping it up, so it's going to be hard to match, much less exceed the uptime of Github cloud.
And it doesn't protect you from a "forced exit" either. Github could terminate your contract, and change the terms of the license in a way that you found unacceptable, or even go out of business, and being self hosted would leave you in no better position than if you had used cloud with external backups. You can somewhat mitigate this risk by self hosting an open source solution, so that in the worst case scenario, you can fork the project and maintain it yourself, but there is still risk that the project could be abandoned, or possibly have the license changed in future versions.
To be clear, I'm not saying that you shouldn't self host and SaaS is always better. But it isn't a magic bullet that solves these problems.
We self-host GitHub using GitHub Enterprise Server. It is a mature product that requires next-to-no maintenance and is remarkably stable. (We did have a period of downtime caused by running it on an underprovisioned VM for our needs, but since resolving that it hasn't had problems.)
Of course we have a small and mostly unchanging number of users, don't have to deal with DDoS attacks, and can schedule the fairly-infrequent updates during maintenance windows that are convenient for us (since we don't need 100% availability outside of US working hours).
I don't have the metrics in front of me, but I would say we've easily exceeded github.com's uptime in the last 12 months.
Things start to go sideways when you have tens of thousands of users.
> Things start to go sideways when you have tens of thousands of users.
If that’s really the case, run another GitHub instance then. Not all tens of thousands of users need access to the same codebases. In the kind of environment described someone would want identity boundaries established around each project anyway…
It’s fairly stable, but with a large codebase I’ve seen it take a day + to rebuild the search index, not to mention GHES relies on GitHub.com for the allowed actions list functionality which is a huge PITA. It should not rely on the cloud hosted version for any functionality. That having been said, I don’t think there’s much of an alternative and I quite like it.
you don't have to manage access to Actions that way.
on GHES you can use https://github.com/actions/actions-sync/ to pull the actions you want down to your local GHES instance, turn off the ability to automatically use actions from github.com via GitHub Connect, and use the list of actions you sync locally as your whitelist.
My employer did this for years. It worked very well. Once a day, pull each action that we had whitelisted into GHES and the runners would use those instead of the actions on github.com.
I would have thought if you had tens of thousands of developers all needing access to the same git repos, then you'd probably have a follow-the-sun team of maybe 50 or 100 engineers working on your git infra.
Most self hosted instances would not have tens of thousands of users.
Agreed, that’s why products of that nature start to break when you do.
> Things start to go sideways when you have tens of thousands of users.
Hm not really. I manage the GHES instance at my employer and we have 15k active users. We haven't needed to scale horizontally, yet.
GHES is amazingly reliable. Every outage we have ever had has been self-inflicted; either we were too cheap to give it the resources it needed to handle the amount of users who were using it, or we tried to outsmart the recommended and supported procedures by doing things in a non-supported way.
Along the way we have learned to never deviate from the supported ways to do things, and to keep user API quota as small as possible (the team which managed this service prior to my team would increase quota per user anytime anyone asked, which was a capital-M Mistake.)
I was the administrator of a GitHub Enterprise Server instance back in 2015-2016 (I think 2014 too).
Rock-solid stability, for a company with 300+ microservices, 10+ big environments, 50+ microenvironments, who knows how many Jenkins pipelines (more than 900, I’ll tell you that). We deployed several times a day, each service on average had 3 weekly deployments.
As a company, I think GitHub (public) should do better, much better, given this is happening more frequently as of late, but if big companies (even medium ones) don’t have their own package caches, they are all in for a ride.
At a previous Startup we had GitHub + GitHub Actions, and we were on AWS. We setup some OCI images cache. Sure, if GitHub went down we could not deploy new stuff, but at least it wouldn’t take us down. If we really needed the pipelines, I suppose we could have setup some backup CLI or AWS CodePipeline (eww) workflows.
Could have a self hosted git or ci/cd pipeline for deployments and code access during outages. They don't need to be up constantly, just need to be able to keep a backup of the code somewhere and have some way to run critical deployment scripts if github actions are unavailable
Sounds like a backup one never tested, nobody verified and absolutely will not work if needed.
Either you go full in, or you'll better don't do it.
Well these things should be tested and verified periodically, just like ANY backup. You shouldn't just tick the "enable backups" box in AWS and then never actually test them
What self hosted CI do you recommend?
We've been using gitea since 2019 with no problems. Drone for CI, although these days you can just stick with the built-in "actions". It's always funny reading the news about yet another GitHub outage, when our gitea instance hasn't had a minute of unplanned downtime in 6 years. (And very little of planned downtime, always in non-working hours.)
Whatever the current gitea is named this year. See codeberg.org. I think currently it is called Forgejo
We use gitlab at my company. the k8s yaml files are a bit complicated but the simpler shell based ones (publishing npm packages, etc.) are pretty straight forward
I don't see how self-hosting solves the problem of 3rd party vendors unless you're standing up a self-hosting solution as a hot/cold backup to your 3rd party vendor "in the event of an extended outage".
Kinda eliminates all those pennies saved (in theory) for outsourcing to "the cloud" if you have to duplicate your infra.
Hybrid has always seemed the most optimal approach, but there's always someone in charge who thinks spending money on safety nets is just wasted money.
Since git is distributed, I wonder if it's enough to demonstrate the capability to spin up an alternative, but not necessary keep it up as a live backup 24/7.
If all you care about is insurance, I think you can sometimes merely attest "yeah, it will work" and check a box. No demo necessary (sadly).
If you actually care about uptime, then a real demo with usage is likely the better approach: switch over to your "backup" on a regular basis and make sure it works 100% as expected.
My hypothetical universe is "I believe GitHub is too big to fail and want to spend as little resources to please the auditor as is reasonably possible without resorting to fraud".
So really what I'm asking is "how strict are these audits really?"
Github is owned by Microsoft so I'm assuming, maybe incorrectly, that they're well funded.
Internal audits are always subject to gaps, but if the stated issue is correct "a load balancer config change gone pear shaped" an audit wouldn't have caught that necessarily.
Unless the audit wants to test their change control, deployment methods, and redundancy.
Are they changing all of their load balancers all at once? Seems non optimal. Maybe change only one at a time, or a small batch.
Are they propagating load balancer changes from a canary to production without vetting it's good?
Or did they vet it and they were wrong - some difference in their canary or analysis had a short coming?
And even if all of that was A-OK why did a mistake (and we all make mistakes) not get reverted quickly?
Were there insufficient internal controls to revert small mistakes and keep them from becoming site wide outages? And so on.
I suspect these kinds of discussions are happening. Or, maybe not. Who knows?
It's a 3rd party, and even if your whole organization's life depends on it you only know what they tell you.
Welcome to "the cloud".
Doesn't it exactly solve "extended outages and forced exits from third party vendors."?
Self hosting as an alternative (excluding the 3rd party) or self-hosting as a method of redundancy (in addition to the third party)?
If you self host as an ALTERNATIVE to the 3rd party you have all of the same problems - more because you know about them, and the 3rd party can make all these claims you can't verify until they fall over with a "load balancer misconfig" story you also can't verify.
If you self-host redundantly to a 3rd party you have no special benefit (it does the same thing) AND the additional cost of a redundant infrastructure.
Why not just have redundant 3rd parties (so-called "multi-cloud") if you can't or won't trust your 3rd party.
I believe the point was protecting against risk of 3rd party falling over. If you self host that reduces the risk. At least you have visibility into the falling over process.
It reduces the risk only if your self-hosted solution also doesn't fall over.
It's like saying "I can reduce the risk of my rental car failure by owning my own car", assuming your own car you keep undriven in your garage, doesn't have a dead battery, no gas, flat tires, and proves to be unusable for hauling.
The "cloud" was touted as the fix for all that nuisance in self-hosting. Magically Jeff's bit barn would work to five 9s of uptime, and you could sit back and write your code, unshackled to infra. Until Jeff's bit barn went tits up.
I say the "cloud" is just another guys data center behind an API.
You wanna cloud experience? Put an API in front of your own servers and burn a $100 bill.
but if you self host, how do you anticipate power outages and natural disasters?
you shift it from a problem of software reliability to a problem of physical infrastructure. at some point in the chain somebody has to do that, but i'd prefer the person doing that was somebody with DEEP experience in it that could give you some nice confident assurances.
Most of the companies exposed to this are already going to be running a number of geographically diverse datacenters and have been exposed to regulations around basic DR/BCP for decades.
>at some point in the chain somebody has to do that, but i'd prefer the person doing that was somebody with DEEP experience in it that could give you some nice confident assurances.
Yes. This is part of the reason why services like git are being moved outside the datacenter. Most of the product offerings on the market don't scale well, have terrible reliability and are still very expensive to run.
Basically you rent space and machines in datacenters that are geographically distributed and designed to resist natural disasters
That's exactly what Github does already. I wouldn't bet on my own org being better at hosting github than github.
Perhaps not better at hosting Github, but some sort of code repository. We've been running on-prem Bitbucket for years and our uptime is easily better than Githubs. The feature set is small, and the CI/CD pipelines are a separate issue, but I still think we come out on top.
Depending on your size your requirements may be much lower and easier to manage. Github has to be all things to all people and that comes with complexity and that can make things more fragile.
Agreed. I have worked for several companies who tried to self-host their VCS etc, and their uptime was way worse, the overall cost was higher, and there were less features. Having a backup is always a good idea though, and an emergency plan for how to onboard to a competitor.
You're just using your own data center instead of Jeff's bit barn.
I mean seriously people, a "cloud" is just someone elses' data center.
Am I missing something?
Yes, but if it’s somebody else’s data center it’s on them to figure out how to earthquake proof or whatever. The cloud isnt just renting a computer, it’s renting a computer with a guarantee of uptime that somebody else can be financially liable for
If Jeff is using DataCenter XYZ, and then letting me use his API to manage my VMs on his "cloud", and I decide to rent a cage at the exact same DataCenter XYZ on my own servers, I'm now "self-hosting".
Same Datacenter. Same reliability infrastructure wise - power, earthquakes, tsunami, typhoons, black plague, monkey pox, etc.
You seem to equate the some straw man version of the cloud to simple rack hosting in a single location. They are nothing alike. A cloud service is almost always geographically diverse and highly available in a way that is beyond most people to build out.
Am I? Cloud providers usually quote price estimates for their least redundant single region services to give the impression of cost competitiveness.
But their virtual offerings are much less reliable than a standalone system by a lot (they guarantee to refund you the 25cents for your instance if it goes down, not the value of the service interruption or its cost. lol! Read that TOS)
What's the solution to their inherent unreliability? Redundancy at more cost.
Well, hey, you can rent two colocation facilities if you really need redundancy across geographic regions. And maybe you can just use your colo as a source for a CDN that is geographically diverse (for latency, not hurricanes).
Geographic diversity and HA is beyond most people? If you're the kind of business that needs that, you can hire the exact same people that Amazon Pip'd and fired because they didn't hit some arbitrary ticket metrics, to scale your business.
e.g. https://www.forbes.com/sites/lucianapaulise/2022/10/27/amazo...
I work on one part of a huge suite of interconneted services that have extremely strict SLAs for up-time. It really is an interesting problem, and the quantity of engineering resources devoted to ensuring availability and avoiding downtime -- basically fighting to make inherently unstable, complex systems stable and reliable -- is dumbfounding.
When I'm trying to explain to people what it's like to work on this kind of software, I like to use an analogy: it's as though I have my own personal brick, or group of bricks, in the great pyramids of Egypt, just a a tiny piece of a stupefyingly, inconceivably larger whole -- and when I twist my chisel and strike my block just so, at exactly the right (or rather the wrong) angle, I can shake the very foundations of Egypt.
That's a lot of overhead. Do you know how they are calculating the possible risk of these events? It feels like there are a million rabbit holes like this you could go down when modern infrastructure is so cloud connected.
Is the risk of your git repos higher than a chip shortage causing you to lose access to the infrastructure you need? So many factors to consider. A chip shortage doesn't seem that unlikely with geopolitics.
The list of scenarios you mitigate for seem like they could very easily be an arbitrary list of the of scenarios a single person came up with.
It's an astonishing amount of overhead that slows everything to a veritable crawl. It also creates downstream issues because you need to build in contingencies for your service consumers until you are able to fully abstract the dependency to a point where rehoming doesn't impact them.
The risk calculations are very primitive at the moment, I'm guessing they will be refined over time as industry feedback starts to resonate.
First pay a consultancy a fortune to tell you which SaaS you are able to use based on compliance requirements. Then run your own IdP, use service providers that have a data takeout mechanism, take regular backups, and use standards-based technology. Line up a self-hosted fallback for each non-self-hosted option for if you're forced to exit them. Basically you line up the exit strategy for the auditors but hope you never have to use it.
My company is replacing the entire system we built with Github actions. Our previous DR plan involved running our automation scripts to reprovision the infrastructure that manages the rest of the infrastructure (runs jenkins jobs, etc).
They are replacing everything with Github actions. I wonder what they are going to do when Github is down.
GitHub not being available for 45 minutes is not an "extended outage".
Agreed, that's why i said 'tangential at best'.
And the ultimate problem that you don't need a solution for, until you _need_ it. Which often leads to "yeah we have backups" but without regular testing, can you trust them?
It's almost as if only Git is distributed, but people sold out to Github for convenience. Too bad Git lacks a distributed bug tracker and wiki system like Fossil. Guess Github has to fail a lot more for things to change.
people could not be more clear that their preference is for reliable and easy-to-use centralized services maintained by professionals, and not decentralized systems that require a great deal of user expertise
I actually don't care whether it's centralised or decentralised, or who's managing it.
But you are right that I want reliable and easy-to-use services. And centralisation is often one way to go there.
As an interesting counterpoint: Git itself is decentralised and replaced centralised services like Subversion. And that made git easier to use, especially easier to get started with: no need for a server, no need to be online, just do `git init` in any old directory.
A GitHub-clone could be more decentralised, but they'd need to use that decentralisation to drive those other features that people actually care about day to day.
> no need for a server, no need to be online, just do `git init` in any old directory
svn doesn’t require a server and there is no need to be online. It works perfectly fine over the file:// protocol.
Interesting.
Was that always the case? I remember it being quite a hassle to set up (following tutorials online), but that was about 15 to 20 years ago or so.
And a `git remote add name url` and you are setup to use another remote server.
Only if your repo doesn't have any other critical integrations like CI/CD, jira, etc
Why? We use github and its CI/CD system, but locally I still only need to add the git remote to work with it.
Ironically github is sort of the exception that proves the rule: decentralized in the one way that really matters (decoupled development on individual systems), but centralized for easy interaction in the way the market demands.
But it demands both. The ability to develop in a parallel, decentralized way, and the ability to integrate things at a central point, an authoritative source and a blessed official destination.
It's similar to how databases allow to begin parallel, concurrent, even contradictory transactions, and also guarantee serialized, consistent database state, and rejection of invalid updates, at commit time. Both aspects are utterly important.
Similar thing that happens with crypto exchanges, also swift transfers and federal states for that matter.
Yes, subsidiarity is a great principle.. in-principle, but in practice it often gets the curb. See https://en.wikipedia.org/wiki/Subsidiarity
So far, having individual small countries seems to keep the centralisation at bay for longer than just having states in a federation.
(Look at Germany, Austria, Australia, the USA for examples of the latter. Interestingly, the UK is legally not made of federal states, but in practice they have granted more autonomy to eg Scotland over the years. And everyone knows that Scotland would secede and get away with it, if there was a power grab by London. In that sense, they are more federal than the US, where secession is very much verboten.)
There is not actually a need to choose between a single centralised monopoly and going full-on techno-prepper and running all your own services in your garage as an individual . We could have intermediate points, such as having services run by professionals but based on portable open standards.
git is a great portable standard. But if I were GitHub, I'd make damn sure I didn't have issues, actions, and the rest of it be based on something users could just yank out in a portable format and take elsewhere though.
Portable VCS is simple. Portable anything with the integration everyone expects (issues connects to source which connects to builds which connects to releases) is hard. Git being so open and portable means it isn't a moat.
And I'm sure they'll continue to feel that way right up until the first time they experience "the" internet from a minority partition for more than a few days. I just hope that the distributed stuff is easy enough to limp along with if/when that happens.
No, they'll continue to feel like that even after that. GitHub being down once in a blue moon is more acceptable to the vast majority of users than having to cobble together your own nerdy distributed version of everything.
I was imagining something a bit more disastrous than that. A big enough solar flare could take parts of the planet offline for months. Years if they can't source enough replacement transformers. There are also political reasons that countries go offline.
Then it'll be up to the nerds who manage to cobble together their own distributed version of everything--even if it's a significantly reduced definition of everything.
If large parts of the planet lose their digital infrastructure for months, I really don’t think that “finding a good platform to host my code” is going to be one of my biggest problems.
I think that how big those problems get is going to depend on how much critical infrastructure we can hack back into a working state despite the fact that it can't phone home, which is going to be a problem if nobody knows how to work offline anymore.
Or, if it's a political scenario, it may depend on how well we can coordinate en masse without the cut connection. If we can exceed a certain threshold then we'll have removed the incentive to cut it in the first place.
Especially if the problem was due to a lack of electric grid as they're suggesting.
Even if that's a concern in a doomsday scenario, self-hosting gitlab is super easy and a good (some would argue better) alternative.
Self-hosting gitlab is not and has never been easy if you do it right, it's very heavy on resources and take lots of time and effort to upgrade. It's also extremely complex and has many moving parts. Stick to gitea or forgejo, they upgrade just by restarting the daemon. MySQL for the database if you want comparable ease of maintenance (same thing: upgrading between major versions requires replacing the binaries and restarting the daemon).
At that point the enormously powerful central players like big tech and militaries and tax collectors will be more than incentivezed to use every remaining resource they have to bring everything back online and re-centralize power. And if they can't - society and power wasn't exactly more distributed in the distant past lol. Your local warlord/military dictator/whatever will probably not be supportive of nerds acting independently.
You've described a scenario disastrous enough that my primary concern would be drinkable water.
ok fine, ill thank the nerds in that case.
I doubt it, I don't think the distributed stuff is anywhere near ready. Instead it'll be time to kiss the ring of whoever manages to grab control during the gap.
this should be on a wall somewhere
this, especially if it's a business
Thats mostly because of herdmentality not because of a considered approach. Decentralization assumes the majority are anti social.
How so?
Decentralization can be hidden from the user, it's an implementation detail.
There's literally a popular decentralized social network.
It's less about the tech, and more about the execution.
Historically we can look at LimeWire or PopcornTime as an example.
Both decentralized, both popular due to the ease-of-use.
> There's literally a popular decentralized social network.
No there isn't. Not a single one.
There are a few federated social networks, which is a fancy way of saying that they are centralized networks that have (or can have, in principle) more than one "center".
In practice, the overwhelming majority of users of such networks gravitate towards one or a handful of large providers. And many of those providers actually refuse to federate with other providers unless they follow an ever-growing list of politically-charged rules. This is just centralization with extra steps.
bluesky has over 27 million users
Bluesky is federated, not decentralized.
No, it's because complexity comes with a cognitive cost, and delegating responsibility to other entities minimizes this
If you don't account for the benefit, it looks irrational, but this is true of absolutely anything
what a convoluted way to say you do not agree with this. when you are dealing with people of all skill levels, every step that you put on the development process costs real money. over that, not everything is and has to be open source. we all have source code that is proprietary and has a value for us as it is. making that decentralized and private is not something easy to achieve (and I don't really see a benefit). the problem from my point of view is that we have a single player (github) that managed to attract a huge percent of the market and the competition, while it exists, it's minor.
...ironic
Yes, conveniences like:
- A canonical name and place on the web;
- Access policy enforcement (who can commit and when);
- The whole pull request thing, with tags, issues, review, discussion, etc linked to it;
- Code review, linked to said policy enforcement;
- Issue tracking, even as basic as what GitHub offers;
- A trusted store for signing keys, so commits are verified;
- CI/CD triggers and runners;
- A page with releases, including binary releases, and a CDN allowing to use the download links without fear.
This is way more than distrusted version tracking. Actually the above is not even married to Git; it could be as valuable with Mercurial, or even Perforce.
This is a large product, actually a combination of many potentially self-contained products. It should not be compared to Git, but rather to Gitea or BitBucket. Not all of this can be reasonably decentralized, though quite a bit can.
The above is why my company left mercurial years ago for git. Mercurial of 10 years ago was better version control than git (git still can't track a branch and how it changes) - but the rest of github is much better.
The git reflog tracks branches and how they changed. Is that roughly what mercurial has? (It’s been more than a decade since I switched from hg to git as well, so my remaining memory is minimal). GitHub also has an API for querying historical branch info, which is more permanent than reflog, though quite annoying to parse for that info if I recall right.
Hg names the branches and keeps the name. The other day I was looking at a sequence of commits trying to figure out where they came from and knowing the branch would have helped.
mg always kept history though. Git has always encougaged squashes and rebase to keep a linear history so that information was lost.
We all want history information to be lost. (Unless you are running a version control system that timestamps every keystroke.) Reasonable people may disagree on what information should be kept.
While true I still find it strange that git still doesn't address this which hg has always used to say why it is better. There is something to it that git fans who have never had good brancing don't know what they are missing.
> - Access policy enforcement (who can commit and when);
Interestingly, what GitHub mostly enforces is where your branches point to. Not who can make commits. That's mostly because of how git works, not because of any grand design on GitHub's part.
It controls who can push commits to the main branch hosted by GitHub (and other branches if you want to configure that). You can have OWNERS files to control who can push commits touching particular parts of the tree, or who must approve such a push / merge (see "pull request").
Out of the box, git does not offer that, and this does require a single point of enforcement.
My point is that in git branches are just mutable pointers to commits. Tags are internally nearly the same, but socially they are meant to be immutable.
Anyone can make any commit they want in git. That includes merge commits, too. GitHub mostly lets anyone push any commits they feel like, too. (What restrictions are there on pushing commits is mostly to deal with denial of service and people being a nuisance.)
Where the policing comes in is in giving rules for how these pointers (aka branches) can be mutated. OWNERS files, PR reviews, CI automation etc is all about controlling that mutation.
See also the new-ish merge queues[0], which really bring out that difference: the merge queue machinery makes the merge commit of your approved PR branch with 'main', runs the CI against that, and iff that passes, moves the pointer that is 'main' to point to the newly created commit.
It's exactly the same commit (with exactly the same hash), whether it passes the CI or not. The only difference is in whether it gets the official blessing of being pointed to by the official 'main'.
It really speaks to the design of git, that conceptually the only thing they need to lock down is who can mutate this very small amount of data, these handfuls of pointers. Everything else is (conceptually) immutable, and thus you don't need to care about who can eg make commits.
[0] Really a re-implementation of bors-ng.
- CI/CD triggers and runners;
I've used up 17h of CI time these two (slow) January weeks, for free, testing stuff across ~20 different OS/CPU combinations.
That's on just one "personal" project; a bigger dependency of that, of which I'm a maintainer, spends an order of magnitude more.
Can you (GP post, people complaining, not parent) blame us? Should we instead self host everything and beg for donations just to cover the costs?
I don’t know anyone beyond hobbyist hackers who want to set up and maintain this stuff for themselves.
As a professional software developer, I want tools that just work that I can rely on. GitHub 99.99% uptime is something I can rely on.
four 9's uptime https://uptime.is/99.99
Daily: 8.6s
Weekly: 1m 0.48s
Monthly: 4m 21s
Quarterly: 13m 2.4s
Yearly: 52m 9.8s
If you assume that "uptime" means all tools are available
https://statusgator.com/services/github
this appears to be 45 minutes in just one day
Incident with Git Operations 30m Jan 14, 2025 9:01 AM Down
Incident with Git Operations 10m Jan 14, 2025 8:51 AM Down
Incident with Git Operations 5m Jan 14, 2025 8:46 AM Down
Not much margin to hit four 9's left for the rest of the year.
> GitHub 99.99% uptime is something I can rely on.
Their enterprise level SLA is only 99.9% (measured quarterly) and the remedy is a 10% credit, increasing to a 25% credit if they drop below 99%.
A bloke called Linus Thorvalds created git for Linux development when the commercial service used for that ceased to be useful, for one non technical reason or another.
github basically shoves a webby frontend and workflows on top of someone else's work. That's all fine and good but github is not git.
As a professional IT consultant, I want tools, I use lots of other's and I also create my own and I also generally insist on hosting my own. I'm also a fair carpenter and several other trades. I have tools for all my trades and I look after all of them. That way I can guarantee quality - or at least reproducible and documented quality.
I'm sure you are fine with abrogating responsibility for stuff that you are not able to deal with - that's all good: Your choice.
EDIT: Sorry, forgot to say: "Yay cloud"
Linux development is also centralized. Instead of Github they use an email list that has patches sent to it. If that goes down, your change isn't going into Linux today.
You do know what's a mailing list right? You seem to be confusing it with GitHub.
A mailing list can go down and nothing would happen. The main point is to post patches to the maintainer. The mailing list is for a public record of things.
The only centralised thing is repo hosting on kernel.org. And that isn't the only official place, you can get the repo published on googlesource or GitHub, so it isn't exactly central enough.
I suppose that's true. In any case, though, getting a backup mailing list going is much much easier than something like GitHub, and you can always mail patches directly if maintainers allow it.
I contend that if you submit a patch under any circumstances your change isn’t going into linux today.
I believe he found the open source tools being used (CVS?) weren't good enough, and he started using a commercial closed source tool called "bitkeeper", which rankled the ire of the FOSS community who wanted to eat their own dogfood.
So Torvald's opted to "clone" the features of bitkeeper into an open source version he named 'git'.
That's the story I heard, no idea if it's true.
They were using BitKeeper for years but git was created when BitKeeper pulled the rug licensing-wise.
Source: A Git Story from https://blog.brachiosoft.com/en/posts/git/
Bitkeeper pulled the rug because they agreed not to reverse engineer it and then someone did.
It was down for ~45 mins according to the linked page page, which would put it at 99.897% uptime for the month, assuming no other downtime.
They had some more downtime a few days ago, too. And that's just the one I happened to notice.
And if your central hub for your distributed vcs needs more than 2-3 9s of uptime for your service to be reliable, honestly you’ve done something really wrong in the design phase like using version control as a database.
As a professional software developer, you rely on software written by those hobbyist hackers.
Whenever you do a clone or an npm install or apt get or pip install, etc...
You choose github because your dependencies chose git
Sure, but as you say those hobbyist developers aren't responsible for keeping a specific server up. They are 'just' writing some software.
(And even among professionals, there's a big difference between Site Reliability Engineering and Software Engineering.)
> Too bad Git lacks a distributed bug tracker
Not your point, really, but fortunately, git is easily extensible. This in-repo issue tracker is surprisingly feature complete: https://github.com/git-bug/git-bug. Anyone else given it a whirl?
Counterpoint: it's a 45 minute outage once in a blue moon. Very small price to pay for the convenience of a centralized VCS with many features that aren't easy to reliably set up in standalone installations.
Once in a blue moon? It’s like, monthly at best.
Well, despite our intents, we may have established the moon might be blue then.
When was the last time you noticed? Not saw a HN post, but it actually stopped you from pushing commits or commenting on an issue or whatever?
Today ;)
Once a full moon maybe? There's one tonight.
I've been keeping an eye on radicle[1] but the documentation for setting up a peer and web frontend is a bit complex. It seems to offer what you're describing: a "decentralized" Git frontend with issue tracking. Seems to be missing wiki functionality, however.
Git is distributed. Distributed system does not guarantee 100% uptime or real time consistency. You can take the whole history with you and push to a different remote.
Is there a world in which GitHub used an open protocol for the social network part of their product like BlueSky's AT protocol[0]?
not p2p, but federated: https://forgefed.org (ActivityPub extension)
I believe Gitea has support for it, not sure to what extent.
Forgejo (Gitea fork) has been working for multiple years to add support for this. It will still take a lot of effort to finish, I doubt we will see anything usable this year.
Originally the plan was to PR the federation support to Gitea as well. I'm not sure if this is still the case, considering the rising tensions between the two projects and the fact that Forgejo is now a hard fork.
Forgejo, a Gitea fork that I use, has support for it according to the page you linked. But the FAQ for Forgejo mentions it's on the roadmap so not sure how complete ActivityPub support is in Forgejo either.
https://forgejo.org/faq/#is-there-a-roadmap-for-forgejo
I only use my Forgejo instance for myself currently so I haven't looked at the ActivityPub features of it before.
...this is an interesting thought exercise, thank you.
Contrary to popular belief, sarcasm makes you harder to understand and is no longer cool
People said that when GitHub got bought out, and only more people ended up moving there. It's really not fun to manage your own Git servers, and when things go down, they get fixed much slower than in-house hosted version of it.
It's this type of negativity ruining the internet. Nothing thoughtful to say, nothing to add, and hoping for failure. I hope everything is okay over there...
They are advocating for decentralized tools. That’s actionable and hoping to prevent failure.
There is no reason to host 'decentralized' tools besides regulation. It's considerably cheaper to use GitHub (or other alternatives like GitLab) than hosting your own and hiring people to maintain and support the solution. Their issue tracking system is very convenient for small teams too.
> It's considerably cheaper to us
This is not true. The cheapest option is to not have services that require servers to maintain. Git continues to work if GitHub is down. So do shell scripts when CI is down. So why can’t we have an issue system where the underlying data is text files in a git branch?
I understand at scale you can pay people to optimize a process for the larger team, but there is a ton of unnecessary fragility before getting to that scale.
If you're collaborating with a small group of people (or you're not running a huge amount of CI/CD) then you can make almost anything work. Once you get big it's another story entirely.
Exactly, hire a team of 3 and pay 500K in compensation, or spend 100K on a system that works and you get a support person to call in the event of an issue. The math is so simple.
Except you’re not considering the cost of when you can’t deliver something on time for a customer because infrastructure you don’t control is down.
You don’t outsource things that prevent you from doing your core competency.
The costs would be trivial for the vast majority of Software Engineering companies. Talking about corner cases is useless as they often need a custom specialized solution anyways and wouldn't be using GitHub in the first place.
And for most companies, building and managing an SCM is absolutely not their core competency. Your point is valid, but not in the way you're trying to convey it.
> building and managing an SCM is absolutely not their core competency.
Building their software is - Github being down is currently preventing that for many companies.
Nope, sorry. Github offers cloud and on premise offerings. If you choose cloud and your company can't handle a 45 minute service outage, that's just a bad purchasing decision. You do realize they make most of their revenue from on premise enterprise customers and that none of those customers were impacted? The solution was there the entire time but they can't force people to use it.
They want people motivated to design systems that can handle github going down. That doesn't strike me as negativity, and especially not negativity ruining the internet. It's not the most thoughtful thing in the world but it's a reasonable opinion, and most comments are also not the most thoughtful things in the world.
You're quite a Mary Poppins for someone with the heavy handle of @bastardoperator.
AFAICT the internet was built on negativity.
Here's the 2nd post from a random USENET group I found:
https://www.usenetarchives.com/view.php?id=comp&mid=PDQ5ajZp...
If you have a local development setup, few hours of not being able to push should be a not problem most devs.
Except if you have a release planned but most don't at that time, statistically.
Problem is that people get comfortable with pushing to branch -> deploying in dev and testing from there.
The GitHub wikis are actually git repos.
I forgot about that, I wonder if they were broken too.
> Too bad Git lacks a distributed bug tracker
Well there is https://github.com/git-bug/git-bug
> Too bad Git lacks a distributed bug tracker
Email and mailing lists?
That's the same as not having one.
Otherwise you can claim Facebook is distributed because you can email people links to Facebook pages.
> That's the same as not having one.
That's the way the Linux kernel (the first Git repository) and Git [2] itself manage their codes. There's even a git send-email command, that prepare the commits as patches and send them following the using the correct template.
[1] Linux kernel, IIO subsystem: https://lore.kernel.org/linux-iio/
[2] Git mailing list: https://lore.kernel.org/git/
I agree that Git has more of a claim to this than Facebook, but that's kinda like saying a turtle is more of a car than a banana is a car.
Like, yes, it's true. Unlike a banana, turtles have 4 movement-enabling things, they use them to move mostly forward and backwards and not sideways, and other things can ride on them. It's probably more of a car. But it's not a car.
Git has no issue tracker. It's really not a controversial statement. The git community has common practices using something else to work around that, but if that's all you need to say "therefore git has X" then you can claim git has a CI framework because everyone and their dog uses GitHub. Which also has email integrations.
A CI infra just needs to communicate. It can communicate over email, like it does for the kernel.
A bug tracker is just assorted communication. One can easily build it over email.
You're just indulging in hyperbole for the sake of it. Nobody said git has an issue tracker in it.
> You're just indulging in hyperbole for the sake of it. Nobody said git has an issue tracker in it.
Yes they did. That's what this comment thread is about. https://news.ycombinator.com/item?id=42691624
(Unless you're splitting some really fine hairs about what "in it" means?)
They just mentioned the most common communication tool used with Git? You're the one splitting hairs here.
They mentioned them as a counterargument to "Too bad Git lacks a distributed bug tracker".
Which makes it a claim that those tools are git's distributed bug tracker.
A bug tracker and an issue tracker are basically the same term. So that's a claim that git has an issue tracker.
So when you come along and say "Nobody said git has an issue tracker in it." you are either wrong, or you're saying the words "in it" completely change the meaning of the sentence.
If it's the latter, that is a very unhelpful way to communicate, and is definitely splitting hairs. And honestly it's a strawman too because the comment you replied to wasn't using the words "in it". They were saying that you shouldn't say "git has" email. Which is a direct reference to the ancestor comment's claim. It was not hyperbole.
I'm not splitting hairs anywhere. I'm saying that the ancestor comment has the same meaning as "git has an issue tracker". That's not splitting. It's the opposite of splitting.
But git has built in email features
And it was developed itself by email.
Makes me wish for the good old days of git-ssb
Convenience is nice.
I mean I could very easily still push to another remote until it comes back? I do not feel locked in at all.
git was never really decentralized though. The whole system is aggressively a funnel towards central repositories, it's just that because it deals in whole repos, every git repo has the potential to be promoted to that role.
Nothing is built into git to let it actually run decentralized: there's no server or protocol where someone can register say, an identifying public key and then just have a network of peers all communicate updates to each other. It's even pretty damn unsafe to just run a repo through basic file-sync (I do this for myself with syncthing with bare repos in a special folder, which seems to work fine but I'm hardly loading it up to chase down why it doesn't).
It was down for ~2 hours. The status website claims "degraded performance", but in reality we get
git@github.com: Permission denied (publickey).
Either GitHub didn't know how to communicate, or they were not sure about the real impact.This is bad.
Status pages are rarely honest. The company will lie to salvage their SLA. "Degraded performance" or "some customers are experiencing an elevated error rate" should be interpreted as "service unavailable / outage"
Someone else lamenting the delayed status page updates almost 2 years ago: https://news.ycombinator.com/item?id=35887213
I'd be curious by how much they downplay downtime. Wouldn't be too hard to put together an honest status page that pulls & pushes something new to main every 5 minutes, creates an issue, comments on it etc. Very basic high level checks.
PMs would never let an automated system make the company look bad, nor would they let engineers have time to build such a system.
I was thinking of a guerilla uptime monitor here, not one maintained by GitHub but independently.
I'm not convinced it's deliberate dishonesty. Just a communication disconnect. Firstly, it can take time from the first yellow flags to the full realization that there really is an incident underway, secondly it needs someone to decide how to communicate that incident, and thirdly the engineers who are actually working on the incident need to be able to get on with it instead of being pestered for an update every 10 minutes.
They used to be. Github's is a prime example of how "useful info" has turned into "PR mouthpiece" — it used to display graphs of a few choice Github system metrics, and those spiking could often usefully indicate "yeah, they're having a problem" well before a human could update the page.
But yeah, also status pages seem to be under the domain of non-engineers who are too concerned with how things look, vs. conveying useful information in a timely manner, and ultimately, fail at both.
Yeah, I've been trying to check out the SDL3 suite for the last hour or so, and it's still failing on SDL_ttf...
fatal: clone of 'https://github.com/libsdl-org/freetype.git' into submodule path '~/src/SDL/SDL_ttf/external/freetype' failed
...
fetch-pack: unexpected disconnect while reading sideband packet fatal: early EOF fatal: fetch-pack: invalid index-pack output
...
fatal: clone of 'https://github.com/libsdl-org/harfbuzz.git' into submodule path '~/src/SDL/SDL_ttf/external/harfbuzz' failed Failed to clone 'external/harfbuzz'. Retry scheduled
...
Failed to clone 'external/freetype' a second time, aborting
For a problem that's supposedly "fixed" that's a whole lot of errors...
I was going insane doubting my SSH knowledge, stopped short of creating new keys thankfully!
well on my side I'm the proud owner of a new ed25519 key.. The status page didn't update quick enough
as am I!
I for a second thought was I getting fired
As someone who found out that way twice in the last two years, I also thought I had been let go. Anxiety is a hell of a drug.
Sorry for you, it should be illegal for companies to fire with no notice!
I thought my account was hacked and ssh keys removed. Panicked a little bit and then double checked that my keys were still there.
Then went to github status and calmed down.
I briefly entertained the notion that I was fired and my ssh keys revoked. But I still had access to slack and email etc, so I banished that thought.
Was trying to install homebrew on a device and kept wtfing because the clone step kept failing… thought I was crazy
I had just upgraded to Windows 11 last week, and for some godforsaken reason earlier today, I could SSH via WSL but not from the host OS even though they were both using keys served from the Windows OpenSSH agent! I'm just going to blame this service outage and hope for the best tomorrow.
Same - but created new keys :) ... which also didn't work, and then I went to check... oh well.
Experiencing this as well. I was a bit concerned something was wrong with my key but alas it is not my key but an outage.
Developer "snow day".
GIT is down today, Code rests in snowy silence, Developers play.
Spent like 30 mins tryig to figure out why ssh auth was not working. Compared sha256 signatures, doubted my reading abilities and pulled my hair.
this coincided with an os upgrade for me.. made for a very confusing 15min until i checked gh status
same..
same here I was scratching my head trying to figure out if something had corrupted my system or what
damn, why couldnt it happen during east coast business hours?
Not sure if I am just noticing GitHub's issues more often as I am using their tools pretty much every day but their availability is kinda not great. Be it Actions failing or something "not core business" (read: git operations) but I can't remember a month in the past where I was not annoyed by any outage on GitHub
Agreed, I wonder what their downtime percentage is. If guess it's down on the order of one hour per month, so 1/1000.
Update: They promise >99.9% on a quarterly basis for enterprise customers - https://github.com/github/docs/blob/main/content%2Fsite-poli...
99.9% uptime corresponds to about 2 hours downtime per quarter, if my maths is correct. If that is indeed the guarantee, based on the experience at my company, GitHub has failed its promise recently (or is getting damn close). I recall 2 decent outages in the past few weeks alone. It's making me begin to doubt if GitHub's reliability is appropriate for an enterprise service.
I'm not sure whether the uptime guarantee is 99.9% independently or jointly for each service, i.e. if service A is down for 0.06% and service B another 0.06% but not at the same time, will this count as overall uptime >99.9% or <99.9%.
Rereading the SLA, it looks like Github can have each service feature like issue, pull requests, git operations be down 0.1% and still not reimburse. In your head you might not account separately for each feature, but Github does.
The whole product feels like it’s getting g progressively jankier.
The front-end is glacial nowadays and frequently has issues actually loading the page, actions frequently has some kind of panic attack and breaks or just grinds along a glacial speeds. The UX has gotten worse (why no merge-queue button?).
Agree with the second part. Github actually has merge queues, but only for the enterprise tier…
I wouldn't say glacial. If you want glacial try Jira. Mind-bogglingly slow.
Oh I know, I have to use both at work. God forbid you mis-click in Jira.
Navigating around takes so much time, it should probably have its own timesheet code.
I'm not sure if it's just me but I swear GitHub UI is one of the slowest web app I've used.
It's just a hub at the moment.
GitHub Actions are also failing with 500 (gateway timeout) errors when trying to fetch the repository.
Update: looks like the problem has been identified.
"We've identified a cause of degraded git operations, which may affect other GitHub services that rely upon git. We're working to remediate."
I guess it's a good time as any to setup a backup or check out alternatives that could mirror or standalone. Gitlab, Gitea, any others worth checking out?
JFYI, the next version of gitea (which should come out in April) will have full GitHub mirroring functionality (which will let you set up a mirror and have it pull code/issues/wiki/PRs/etc. every few minutes). The current version can either migrate the full repo once, or mirror the code and nothing else.
Consider Forgejo: https://forgejo.org/.
What's the point of having a mirror? It'll only lead to conflicts that'll be difficult to resolve once the main service you use is back up again.
For example, I doubt you would be able to easily merge any pull-requests or use the same CI/CD code for the same services without hacky solutions.
Gitea works for us.
I use a fork of Gitea called Forgejo. It works nicely as well.
What drove you to the fork?
First I was using Gogs. Someone forked Gogs and made Gitea, because Gogs was under control of a single person and some other people found that frustrating.
https://blog.gitea.com/welcome-to-gitea/
I was using Gitea for a long time, and then someone forked Gitea to create Forgejo. At this time, my installation of Gitea was already out of date a bit because I had previously been manually building and installing Gitea from source. Soon after Forgejo was created, it landed in FreeBSD ports and then it became available in the FreeBSD package manager.
So at this point, and having read a bit about Forgejo and seeing that Forgejo was maintained by people with connection to Codeberg, I thought “hey I need to migrate my current Gitea setup anyway. Either to Gitea installed from FreeBSD packages, or to something else. I might as well try Forgejo.”
And that’s how I ended up installing Forgejo and I’ve stuck with it since.
Aha, makes sense. Thanks for explaining. I see that Forgejo was created at least partially due to concerns that Gitea was trending towards freemium as well. Good to know.
I hope you didn't wait until now to push your day's work!
I did :(
Unclear if related, but the Terraform Registry is also having issues serving data: https://status.hashicorp.com/incidents/jypswvyh0h3z.
In a wonderful twist, we are relying on a couple modules served from GitHub!
Can't pull anything from my repos which is not great.
Is it possible to download a zip of a repo through the GUI?
Yeah, this is not good. I've experienced other outages on their other products (e.g. GitHub Actions), but not even being able to do a basic git pull is an entirely different level.
I'm effectively dead in the water here. I guess I'll go touch some grass. Thank you to the SRE's who are looking at this.
Looking forward to a blogpost or writeup from github on this.
All checkmarks just turned green and my push went through. Snow day over.
It's odd to me that there's nothing indicating an issue is ongoing on the github.com homepage
Thankfully I use Forgejo! No reason to stick with GitHub when Git itself is distributed by nature
True. Also there's no reason not to stick with GitHub when Git itself is distributed by nature. That's the beauty of distribution.
They were also down a few days ago, but HN flagged my submission as a dupe. :)
Yup, SSH rejecting my key, thought it was something at my end but no. Can't run any Yocto builds... I guess a good opportunity to take a walk or do some life admin.
Back up now for me.
Just the info I was expecting to see on HN. Unfortunately (or Fortunately?), a better source for these kinds of things than anywhere else.
I thought it was my internet. Oh well, time to finish that game of Civ 6. Edit: and it literally came back up the moment I posted this.
Panic for a minute SSH rejected a key!
Yep, 100% of git operations are failing. CI/CD and dev operations are at a halt.
Ha so that's why my git pull were so slow. I should have checked HN faster.
it'd be great if they set up their CLI tools such that
if there's an error/timeout, they'll do a check of their status page so you don't get the standard 'error' but rather a 'dont worry, youre not doing anything wrong. its borked, our bad' message.
That's just an indirect manual dependency on the human that gets to decide to update their status page.
*great not break :)
Make an issue in gh CLI repo, I like the idea
Damn you azure! Or is Microsoft not even running their precious GitHub in azure
Who said it was an infrastructure failure?
There's innumerable causes of this kind of failure that aren't rooted cloud service provider shenanigans
(Admittedly the duration of the outage does "feel" like an infra outage)
GitHub git operations are down. Any git operation returns a 500 error response.
SSH access appears to be rejecting my key too.
Why does GitHub go down almost once a month?
Acquisition.
ducks
I've just ended my Pro plan; enough is enough. I've been debating moving my repositories to a self-hosted forge for a while; it'll be an interesting opportunity to move more CI operations out of Actions and into Nix.
lol I changed my ssh key 2 times until I figured out pubkey permission denied was an issue on their side
windows window operations are down
> all Git operations were unavailable due to a configuration change
eats popcorn waiting for explanation of why they didn't catch it in non-prod
Have you never fixed a prod-specific bug before? They’re common enough, and it sounds like they patched this one within a reasonable amount of time.
I expect more from billion-dollar companies. If you have that much money, you can pay somebody to ensure the changes are properly tested. (I've worked in billion dollar companies, and usually they are too cheap to do this, despite their massive wealth. They just leave it up to ICs and hope they are testing right, rather than making sure)
Yet another reason to self host your VCS. Only thing hosted to “the cloud” these days are my backups - split between S3 and GCS.
~$25/mo
That’s expensive, I get it for free with GitHub. And a loooot more functionality.
I just reset my ssh key lol.
They are going to get DDoS's by people trying to do this haha.
the incident has been marked as resolved; I am now able to push
[dead]
lmao i deleted and readded a new key thinking my key was fucked
[dead]