• foota 4 days ago

    Ok, I'm a bit surprised to see no comments here, I guess everyone is waiting to read the paper till after the work day? :)

    I thought it was interesting, iiuc the proposal is that you can design a protocol like Paxos that allows you to guarantee that some replica will win in the future, I'm not sure how useful this is in practice though, since you won't know that your (you being the client) writes are winning until after the fact, and no replica could give you the true state until convergence. I find the claim that this represents partial progress slightly misleading, but interesting nonetheless.

    • aphyr 2 days ago

      The fundamental definition of consensus already requires that some proposal will eventually (given sufficient communicating, non-faulty, non-malicious nodes) win, so that doesn't seem particularly novel: https://lamport.azurewebsites.net/pubs/lower-bound.pdf

      The core problem here that it's impossible for any replica to tell whether its speculatively-executed operations were legal or not; it might have to admit "sorry, I lied to you", and back up at any point. That seems to be what they mean by "partial progress". You can guess that things might happen, but you will often be wrong.

      This idea's been around for a while--Fekete et al outlined speculative execution at local replicas with eventual convergence on a Serializable history in their 1996 PODC paper "Eventually Serializable Data Services": https://groups.csail.mit.edu/tds/papers/Lynch/podc96-esds.pd.... These systems have essentially two tiers of operations: weak operations, which the system might re-order (which could cause totally different results, including turning successes to failures and vice-versa!), and strong operations, which are Serializable. I assume Cassandra does the same. Assuming it's correct, the strong operations work like any other consensus system: they must block (or abort) when there isn't sufficient communication with other replicas. The weak ones might give arbitrarily weird results. In CAP's terms, the strong operations are CP, the weak ones are AP.

      You see similar dynamics at play in probabilistic consensus systems, like Bitcoin, by the way. In Bitcoin technically all operations are weak ones, but the probability of non-Serializable outcomes should decrease quickly over time.

      Having a consensus system that merges conflicting proposals is a nice idea, but I don't think Cassandra is novel here either. I don't have a citation handy, but I recall a conversation with Heidi Howard at HPTS (maybe 2017?) where she explained that one of the advantages of leaderless Paxos is that when you're building a replicated state machine, you can treat what would normally be conflicting proposals from multiple replicas as sets of transitions. Instead of rejecting all but one proposal, you can union them, then apply them to the state machine in some (deterministic) order--say, by lexicographically sorting the proposals.

      • undefined 4 days ago
        [deleted]
      • sargun 4 days ago

        Wasn't there a paper from Heidi Howard that stated you it's not actually f failures in 2f+1 nodes, but you can actually make the number of nodes that can fail a lot higher, if you have some nodes that have to take part in the quorum (or the prior proposal)? This feels roughly similar?

        • foota 4 days ago

          I think that's a different thing... I believe you're talking about flexible paxos (https://arxiv.org/abs/1608.06696).

          Flexible paxos doesn't change the underlying guarantees at all, whereas this does.