• _russross 3 hours ago

    I am in the minority who thinks Raft is overrated.

    I tried teaching Raft one year instead of Paxos but ended up switching back. While it was much easier to understand how to implement Raft, I think my students gained deeper insight when focusing on single-decision Paxos. There is a lightbulb moment when they first understand that consensus is a property of the system that happens first (and they can point at the moment it happens) and then the nodes discover that it has been achieved later. Exploring various failure modes and coming to understand how Paxos is robust against them seems to work better in this setting as well.

    I think this paper by Heidi Howard and Richard Mortier is a great way to move on to Multipaxos:

    https://arxiv.org/abs/2004.05074

    They present Multipaxos in a similar style to how Raft is laid out and show that Multipaxos as it is commonly implemented and Raft are almost the same protocol.

    Raft was a great contribution to the engineering community to make implementing consensus more approachable, but in the end I don't think the protocol itself is actually more understandable. It was presented better for implementers, but the implementation focus obscures some of the deep insights that plain Paxos exposes.

    • alexchamberlain an hour ago

      I've read both the Paxos and Raft papers a few times, and hacked on some implementations, but never quite got one over the line to working...

      Raft strikes me as a particular set of decisions made within a Paxos framework, such as having 1 entity for Proposers, Acceptor and Followers. It's frustrating that there isn't a clearly written defacto paper on Paxos - the story style confused the monkeys out of me.

      • senderista 44 minutes ago

        Agree, Raft is less modular and therefore harder to understand than MultiPaxos:

        https://maheshba.bitbucket.io/blog/2021/12/14/Modularity.htm...

        • withinboredom 3 hours ago

          Know that I join you in Raft being overrated. I’m working on a multipaxos implementation right now. There are some really neat capabilities/properties that paxos has that Raft can never achieve (see wpaxos, for example, that lets keys migrate to nodes near the client).

          • weinzierl 2 hours ago

            This is an interesting insight into the educational side, but now I am curious about the implementation side. Raft is easier to implement but that's just one factor. Looking at real world usages there seems to be a draw. I could easily count as many Paxos implementations as Raft. Is this just historical or are there good reasons for a new project to still implement Oaxos?

            • bfdes an hour ago

              Paxos -- and even Multi-Paxos -- have been around much longer than Raft. The paper that introduced Raft was published in 2014.

          • benbjohnson 4 hours ago

            Author here. I'm happy to answer any questions although this project was from 10+ years ago so I could be a little rusty.

            Over the years I've been trying to find better ways to do this kind of visualization but for other CS topics. Moving to video is the most realistic option but using something like After Effects takes A LOT of time and energy for long-form visualizations. It also doesn't produce a readable output file format that could be shared, diff'd, & tweaked.

            I spent some time on a project recently to build out an SVG-based video generation tool that can use a sidecar file for defining animations. It's still a work in progress but hopefully I can get it to a place where making this style of visualizations isn't so time intensive.

            • huntaub 2 hours ago

              I just want you to know how much this visualization was appreciated. In my time working at AWS, I recommended this website to every one of our new hires to learn how distributed consensus works. Know that this has taught probably 50+ people. Thank you for what you’ve built.

              • mgenglder 3 hours ago

                This is wonderful. Can I ask how you created it? Stack used and sour e code? I'd love to create something like this to help visualize things I'm working with currently.

                • kfrzcode 3 hours ago

                  What are your thoughts on Dr. Leemon Baird's Hedera Hashgraph?

                  https://www.swirlds.com/downloads/SWIRLDS-TR-2016-01.pdf

                • prydt 2 hours ago

                  I've run a reading group for distributed systems for the last 2 years now and I do think that Raft is a better introduction to Consensus than any Paxos paper I have seen (I mean the Paxos Made Simple paper literally has bugs in it). But when I learned consensus in school, we used Paxos and Multi-Paxos and I do believe that there was a lot to be gained by learning both approaches.

                  Heidi Howard has several amazing papers about how the differences between Raft and Multi-Paxos are very surface level and that Raft's key contribution is its presentation as well as being a more "complete" presentation since there are so many fragmented different presentations of Multi-Paxos.

                  As a bonus, one of my favorite papers I have read recently is Compartmentalized Paxos: https://vldb.org/pvldb/vol14/p2203-whittaker.pdf which is just a brilliant piece on how to scale Multi-Paxos

                • MarkMarine 5 hours ago

                  This is one of my favorite pieces of software engineering because it took something difficult and tried to design something easy to understand as a main criteria for success. The PHD Thesis has a lot more info about this if anyone is curious, it is approachable and easy to read:

                  https://web.stanford.edu/~ouster/cgi-bin/papers/OngaroPhD.pd...

                  I think this was core to Raft’s success, and I strive to create systems like this with understandability as a first goal.

                  • throwawaymaths 4 hours ago

                    Weirdly it's also kinda worse is better: raft is non-deterministic and has an unboundedly long election cycle time. IIRC:

                    - it assumes no hysteresis in network latencies and if there is a hysteresis it's possible that elections can be deterministically infinite.

                    - this fact and the use of raft in production has caused real, large scale network outages.

                    Paxos is of course a beast and hard to understand. There is an alternative, VSR (which was developed ~time of paxos) which is easy to understand and does not have the issues caused by election nondeterminism in raft.

                    Of course everyone uses raft so raft dominates.

                    • eatonphil 3 hours ago

                      > - this fact and the use of raft in production has caused real, large scale network outages.

                      While this has surely happened, I am not so confident about what the reasons were for this. If you've got links on details I'd love to read.

                      > which is easy to understand

                      I've implemented core bits of Raft twice now and have looked at VSR a couple of times and VSR wasn't easier for me to understand. I'm sure I could implement VSR and would like to some day, but just comparing the papers alone I personally felt like Raft was better presented (i.e. easier to understand).

                      Also keep in mind that nobody ships consensus implementations exactly in line with the original paper. There are dozens or hundreds of papers on variations and extensions of Raft/Paxos and every actual implementation is going to implement some selection of these extensions/variations. You have to look at each implementation carefully to know how it diverges from the original paper.

                      • throwawaymaths 3 hours ago

                        > A known limitation of the base Raft protocol is that partial/asymmetric network partitions can cause a loss of liveness [27, 32]. For instance, if a leader can no longer make progress because it cannot receive messages from the other nodes, it continues to send AE heartbeats to followers, preventing them from timing out and from electing a new leader who can make progress.

                        (Howard, Abram et al)

                        Me: note this can also occur if there isn't a complete outage, if the latency back to the shit leader is different from the latency out of the shit leader.

                        > nobody ships consensus implementations exactly in line with the original paper. There are dozens or hundreds of papers on variations

                        As the paper above explains once you add extensions you might have broken the correctness proofs in raft. More to the original point, you're now in a state where it's no longer "simple"... I would go so far as to say if you have to consider the extensions, which are distributed over several papers and sometimes not even papers at all, you're in "deceptively simple" land.

                        As a pedagogical tool, raft is valuable because it can be a launching ground for conversations like these... But maybe we shouldn't use it in prod when there are better, straightforward options? I get the feeling that being hard sold as simple nerdsniped devs into writing it and someone r/very smart put it into prod and with social proof more people did and now here we are

                        • lifeinthevoid 2 hours ago

                          > A known limitation of the base Raft protocol is that partial/asymmetric network partitions can cause a loss of liveness [27, 32]. For instance, if a leader can no longer make progress because it cannot receive messages from the other nodes, it continues to send AE heartbeats to followers, preventing them from timing out and from electing a new leader who can make progress.

                          Real-world raft implementations make the leader step down if it hasn’t heard from a quorum for a while. Not part of vanilla raft though.

                          • eatonphil 2 hours ago

                            The thesis does describe doing this fwiw while the paper does not.

                      • convolvatron 2 hours ago

                        don't forget about calm. its a shame that that isn't the default we reach for, and only struggle when we really need stronger latency.

                        https://arxiv.org/abs/1901.01930

                        • prydt 2 hours ago

                          I think the CALM theorem and this whole line of research is so interesting and it is still carried on by the CRDT people. But I would love to see more of this.

                          I feel like it doesn't get as much attention as it deserves.

                    • mbivert 2 hours ago

                      In case this is of interest, MIT's 6.5840[0], distributed systems, has a series of labs, implementing Raft in Go. Haven't made it through the whole thing yet, but it's quite entertaining so far.

                      The teachers provide you with some code templates, a bunch of tests, and a progressive way to implement it all.

                      [0]: https://pdos.csail.mit.edu/6.824/index.html

                      • shiredude95 37 minutes ago

                        "Paxos Made Moderately Complex" by Robert van Renesse and Deniz Altinbuken: http://www.cs.cornell.edu/courses/cs7412/2011sp/paxos.pdf is a great starting point for implementing multi-paxos. The authors also provide a working python implementation.

                        • dang an hour ago

                          Related:

                          Raft Consensus Animated (2014) - https://news.ycombinator.com/item?id=32484584 - Aug 2022 (67 comments)

                          Raft Visualization - https://news.ycombinator.com/item?id=25326645 - Dec 2020 (35 comments)

                          Raft: Understandable Distributed Consensus - https://news.ycombinator.com/item?id=8271957 - Sept 2014 (79 comments)

                          • skilning 4 hours ago

                            I was asked to click "continue" after each of the first two sentences, and the fade-in of the text took longer than reading the text.

                            This may be a great article, but I'll never know because it's frustrating to try and read.

                            • eatonphil 4 hours ago

                              Ben's visualization here is great.

                              The other biggest help to me aside from the paper and the thesis was Ongaro's TLA+ spec: https://github.com/ongardie/raft.tla/blob/master/raft.tla. It's the only super concise "implementation" I found that is free of production-grade tricks, optimizations, and abstractions.

                              And for building an intuition, TigerBeetle's sim.tigerbeetle.com is great. What happens to consensus when there's high latency to disk or network? Or as processes crash more frequently? It demonstrates.

                              • ko_pivot 5 hours ago

                                The writing and the visualizations are great. The ‘continue’ button is way too frequent.

                                • rapsey 4 hours ago

                                  While understandable, implementing it is however far from easy.

                                  • ukd1 4 hours ago

                                    Having implemented it twice (for fun/learning, though) I think you're right - yet it's also the easiest, imho. The paper, and resources around implementing it (posts, other implementations) are great. Also, the authors still reply on github when issues are opened, which is great.

                                    • MarkMarine 4 hours ago

                                      Right, but Paxos is double hard in comparison. I’ve read both papers multiple times, tried to implement and failed, and I still don’t think I understand Paxos.

                                      • mrkeen 4 hours ago

                                        I'm on the other side.

                                        I think Leslie Lamport asserted that Paxos is minimal, and that "all other consensus algorithms are just Paxos with more steps". I'm inclined to believe him.

                                        I've implemented Paxos but I can't get through "Raft for dummies" style blog posts.

                                        Regarding Raft [1]:

                                          > The consensus problem is divided into three sub-problems: Leader election, Replication and Safety.
                                        
                                        What is leader election? It's a distributed system coming to consensus on a fact (i.e. who the leader is.) Then once you have the leader, you do additional steps. The entirety of Paxos is a distributed system coming to consensus on a fact.

                                        When I read these posts, i see things like "timeout", "heartbeat", and I think: timeout according to whom? I read "once the leader has been elected", um, hangon, according to whom? Has node 1 finally agreed on the leader, just while node 3 has given up and started another election? I don't doubt that Raft is correct, but the writing about it seems simple by glossing over details.

                                        Paxos, on the other hand, seems timeless. (And the writing about it doesn't trigger my "distributed system fallacies" reaction)

                                        [1] https://www.brianstorti.com/raft/

                                        • spmurrayzzz 3 hours ago

                                          I tend to agree that many explanations of raft dont get into the useful details and handwave some of the hard problems. But the original paper does do a good job of this and is pretty accessible to read IMO.

                                          > I read "once the leader has been elected", um, hangon, according to whom? Has node 1 finally agreed on the leader, just while node 3 has given up and started another election?

                                          The simple response I think to "according to whom" is "the majority of voting nodes". When the leader assumes its role, it sends heartbeats which are then accepted by the other nodes in the cluster. Even if (in your example) node 3 starts a new election, it will only succeed if it can get a majority of votes. If node 2 has already acknowledged a leader, it won't vote for node 3 in the same term.

                                          There's some implicit concessions inherent there around eventual consistency, but I don't think thats novel to Raft compared to other distributed consensus protocols.

                                          • ivankelly 3 hours ago

                                            100% agree. I haven't read the raft paper in years, but I remember thinking there's just too much stuff in there. That stuff in important, but if you want people to understand what's happening they internalize the fundamental idea of being able to block other writers by bumping a number. Which is all covered in the single decree paxos section in part time parliment.

                                            • kfrzcode 3 hours ago

                                              Paxos is nice, sure, but Hedera does DLT with aBFT and much more efficient, as well as being faster and ensuring fairness. It's leaderless, and achieves incredible TPS (10k+ in practice, 100k+ in theory).

                                              I am curious on your thoughts here.

                                            • Vervious 3 hours ago

                                              Raft has a problem where, in the protocol description, sometimes I have no idea why some line is there, but if you take it out the protocol comes to a grinding halt... it's really a mumble jumble. It has good intuition, but the details are really messy, it's edge cases all the way down.

                                              Whereas I think each line of pseudocode in Paxos is much more motivated.

                                              In other words, if a philosopher had to design a crash-fault protocol from scratch, without having seen any before, I think 80% of the time it would look exactly like Paxos.

                                              • candiddevmike 4 hours ago

                                                IMO, Paxos has a lot less edge cases than Raft, mostly because of the complexity of the implementation covers them/forces you to think about how to handle them.

                                                • hinkley 3 hours ago

                                                  How many people actually understand Paxos?

                                                  The assertion at the time was that only a few people understood it well enough to make a correct implementation, the others were full of bugs.

                                                  The problem we have with Lamport is that he’s very good at talking to computers but not so good at talking to humans. I think the world would be a better place today if someone had forced him to learn to speak human.

                                                  He did a presentation at MS just after he won the Turing Award. He prefaces it with how important writing is to thinking, and how you don’t really know what you think until you write it down. Those are the words and thinking of an introvert. Writing is still the shallow end of understanding. The deep end is teaching. If you understand something and you teach it to others, then you have proven that you understand it, and caused that understanding not to be lost to posterity. If you only write about it, it might work as instructional material, or it may require some very clever people who can teach themselves using your words. But they may also get it wrong, and not have you for feedback.

                                                  The latter is where seem to be with Leslie’s works. The consensus is that few people actually understand what he’s talking about well enough to implement it correctly.

                                                  I had a coworker once who was shocked to learn I read the ACM SIGPLAN proceedings. “You can read those??” I knew what he meant and yeah, a lot of those were very unapproachable and I understood two thirds of them and only half of each of the rest. Before I committed to using Raft I gave Lamport’s paper a try. It was a slog and he doesn’t sell the why of each part. He’s just giving you a very, very long recipe without the mental models necessary to reproduce it robustly.

                                                • jeffbee 3 hours ago

                                                  Students of raft say they understand it but when they have to implement it they make as many mistakes as students of paxos. The simplicity is misleading.

                                              • cedws 3 hours ago

                                                Can't proof-of-work be used as a leader election algorithm? If the proof is hard enough to generate then one node should be able to generate one and broadcast it before the other nodes can, then that node becomes the leader.

                                                • withinboredom 2 hours ago

                                                  There’s a paper about that, using paxos as the base. Can’t find it right now, it is called “chained paxos” or “block paxos” or something like that.

                                                  • ryanthemadone 2 hours ago

                                                    Proof of work is a leader election algorithm!

                                                  • kfrzcode 3 hours ago

                                                    DLT technology discussions are entirely incomplete without consideration of Hedera Hashgraph [0], an aBFT, leaderless, fair and fast DLT using a gossip-about-gossip consensus mechanism. It's absolutely a more robust and scalable technology than Paxos or any other DLT for that matter. I'd love to know what the HN crowd thinks about Hedera as the trust layer of the internet but.... nobody around here seems to have any. It's like ignoring Linux while comparing Mac and Windows based computing.

                                                    [0]: https://www.swirlds.com/downloads/SWIRLDS-TR-2016-01.pdf

                                                    • ryanthemadone 2 hours ago

                                                      Paxos isn't a DLT, it's a consensus algorithm — granted, DLTs tend to require a consensus algorithm, but they're not the same things.

                                                      As for Hedera Hashgraph being the trust layer of the internet, we tend to build the internet through the IETF and standards setting. Unfortunately HH is an endeavour from a private company so isn't especially likely to be taken on in that context.

                                                      I'd also wonder what you mean by the trust layer of the internet, what are the use cases that you'd like to see solved with such a trust layer?