• britannio 9 months ago

    Joseph explains the algorithm on YouTube too: https://www.youtube.com/watch?v=rjbEG7COj7o

    It's great work, combining the best of OT and CRDTs.

    • auggierose 9 months ago

      I find the formulation in the abstract slightly confusing. As far as I understand EG-Walker is a CRDT, an operation-based one.

      • josephg 9 months ago

        Author here. It’s kinda both a crdt and an operational transform system.

        It’s a crdt in that all peers share & replicate the set of all editing events. (A grow-only set crdt if we’re being precise). Peers can use those editing events to generate the document state at any point in time, merge changes and so on.

        But the editing events themselves are stored and expressed in their “original” form (unlike existing CRDTs, which need a prepare function). That means lower memory usage during use.

        The replying / merging process itself is kind of a batch operational transform algorithm. It works by building a normal crdt state object in memory in order to transform the events so they can be replayed. In that sense, it’s an OT system. (But one which transforms by using a crdt, like Yjs, internally within each peer).

        I don’t know if that clarifies things. Feel free to ask more questions!

        • auggierose 9 months ago

          Ok, I started reading the paper now, and this seems to be a really cool method. I didn't understand all the details of apply/retreat/advance yet, though.

          I am wondering, the EG graph is a very general construct, and the events themselves (Insert(i, c) and Delete(i)) are very natural as well. You say in the paper this should also work for other applications than plain text, but I guess then another CRDT has to be constructed to implement apply/retreat/advance. Would it be possible to formulate all of this independently of the application and particular CRDT, together with corresponding correctness theorems? That would help with constructing versions of this for other applications, and maybe make understanding this particular application for plain text easier.

          • josephg 9 months ago

            Maybe. Here's another way to think of the algorithm:

            All the complexity comes about because we're trying to convert the insert / delete position from edits (expressed at their original version) to some later current version.

            There's lots of ways of solving this problem. For example, we could build a data structure which contains metadata for every inserted item in a text document. For every inserted character, we store when the item was inserted and when (if ever) the item was deleted.

            Then you could implement the algorithm in a simpler way. Lets say I'm trying to insert at position 1000, at some version V.

            - We scan the list of characters from the start of the document, looking for the 1000th item which was actually in the document at version V.

            - For each character in the list, we can tell if that item was inserted at version V by comparing V to the stored inserted / deleted at times.

            This algorithm would be correct, and it avoids retreat / advance. The only problem with this approach is that it would be slow - because you're constantly scanning the document to convert insert positions. Inserting N items into a document take O(N^2) time.

            The retreat / advance approach described in the paper is an optimization on top of this algorithm which performs the same work in O(N log N) time.

            I wish we made this more clear in the paper. In an earlier draft we spent about 5 pages simply talking about version theory. The algorithm was then described using that theory with a stronger theoretical grounding. But I think that description may have been even more confusing.

            > You say in the paper this should also work for other applications than plain text, but I guess then another CRDT has to be constructed to implement apply/retreat/advance. Would it be possible to formulate all of this independently of the application and particular CRDT, together with corresponding correctness theorems?

            "Independently of the application and particular CRDT"? I don't know, we might have to think through how that would work for every CRDT. Do you have any personal favorites that would be worth thinking through?

            For registers (eg in a variable, dictionary, hash map or array where indexes never change), you could implement a similar algorithm incredibly easily by just doing the version comparison operation on the graph. (The current value is the value set in the graph's frontier.) The retreat / advance optimisation isn't needed at all for registers.

            For a list - for example, a list of layers in photoshop - we might need something more complex, since layers can be inserted / deleted like text and as a result the index of subsequent items changes. But layers can also be reordered - and that requires some thought. For rich text, there's an approach that I think would work but I haven't implemented it yet.

            • auggierose 9 months ago

              > I wish we made this more clear in the paper. In an earlier draft we spent about 5 pages simply talking about version theory.

              I think it still comes across pretty clearly. I like the idea to think of a version in terms of the frontier, and it certainly feels like the right setting for all of this. Then it is just about how to implement replay efficiently, and such that it also works incrementally.

              > "Independently of the application and particular CRDT"?

              Yes, but I don't know if this even makes sense. Or maybe your more elaborate version theory already covers this. And I should really understand the plain text case first, before asking for a general method :-) It just seems that your method really is a general framework, based on:

              * a set of operations

              * an event graph where each event correspond to an operation

              * replay

              * the apply/retreat/advance method for efficient replay

              And it seems to me there is a conceptual gap here between the set of operations and the event graph, and what replay actually does purely in terms of semantics. In order to define replay, you need to say what it means to execute operations that are concurrent, and this is the job of the CRDT, by making operations commutative, and that defines concurrent execution. But to implement apply/retreat/advance, you need a more complex thing than just the CRDT, let's call it an XCRDT (your "internal structure" in the paper). What are the laws of the XCRDT so that apply/retreat/advance work, and it does the same as the CRDT-semantics for replay? Knowing such laws might help when constructing the XCRDT from the CRDT.

              Edit: Oh, and the XRCDT also somehow combines the original operations with the operations of the CRDT.

          • judofyr 9 months ago

            Let me see if I understand this correctly:

            CRDTs take an editor event such as "insert at position X" and turns it into something a concrete operation like "insert to the right of node Y created by client C" which is then sent. This makes it super easy to apply concurrent operations since they have a direct reference to where they're located. However, it also means that you know have to keep track of these nodes. All of the nodes that has ever existed is present at all times, and deletion is handled as a state flag which marks it as hidden.

            OTs take an editor event such as "insert at position X" and keeps it like that. Whenever a concurrent event is received it then tries to "rebase" it (i.e. patch that event) so that it makes sense on top of the current events. However, this (1) can be quite finicky to get right and (2) it is based on there being One True Order of Events (i.e. a server).

            This approach takes an editor event such as "insert at position X" and keeps it like that. When applied, it can be inserted into an "ever-growing list of atoms with state flag". However, in this algorithm the data structure is actually capable of representing two different versions at the same time: A current version and a final version. This is handled by there being two state flags instead of one: Every node has a "current state = exists/deleted" and "final state = exists/deleted".

            This gives us the power of doing a "soft undo" (which is called "retreat" in the paper): We can take our own latest event which we've applied, revert the effect on the current version, while still keeping the final version the same. We're handling this very similar to CRDTs: We keep all the nodes at all time, we're just using state flags to keep track of whether it exists or not.

            This is useful when we observe a concurrent event. This event have references to positions which are valid in the context of its parent. If we "retreat" of all of our events until we reach the parent, we then have a data structure which represents the text at that point. We can now apply the "insert at position Y"-events which we received by interpreting "Y" in terms of the current version. After we've applied all of those events we can then look at the final version and this now actually contains the combined result of both changes!

            And here comes the nice part: Since the events themselves are always on the form "insert at position X" it means that we can choose another representation of applying them. For instance, if we know that there are no concurrent events that are happening, we might as well apply them directly on a string without bothering with the whole "current/final dual data structure".

            • josephg 9 months ago

              First paragraph: yes, exactly.

              > OTs take an event..

              This is how the early Jupiter OT works, yes. And most OT systems work like this. But there are also some papers on more recent OT systems which can work with more than 2 peers. Unfortunately, many of these systems have turned out to have convergence bugs and/or they are O(n^2). For our paper one of our example datasets takes tens of milliseconds to replay with CRDTs and egwalker but an hour of time with OT!

              > the data structure is capable of representing two different versions…

              With egwalker it’s important to distinguish between two different data structures. There’s a grow only set of original editing events. This is persisted to disk and replicated over the network. Then while actually replaying events or merging, we generate a second, temporary, in memory data structure which resembles a normal CRDT. (Except with an extra state field on each item like you said). This crdt state object isn’t persisted. It’s usually discarded as soon as the merge (transform) operation is complete. One big advantage of this approach is that this data structure does not need to represent all items ever inserted. Just the concurrent items, back to the most recent common branch. So it’s usually tiny. And that allows history to be pruned - which CRDTs typically don’t allow.

              But yes, everything else is right!

              • ProblemFactory 9 months ago

                This is probably a question about classic CRDTs as much as eg-walker:

                Do all possible topological sorts of the event graph result in the same final consensus document? If yes how do we know that, and if no, how do they resolve the order in which each branch is applied?

                • josephg 9 months ago

                  > Do all possible topological sorts of the event graph result in the same final consensus document?

                  Yes. Thats usually referred to as the "convergence property".

                  > If yes how do we know that

                  Usually, careful design, mathematical proofs and randomized (fuzz) testing. Fuzz testing is absolutely essential - In over a decade of working on systems like this, I don't know if I've ever implemented something correctly first try. Fuzz testing is essential. You shouldn't trust the correctness of any system which haven't been sufficiently fuzzed. (Luckily, fuzzers are easy to write, and the convergence property is very easy to test for.)

                  For Eg-walker, I think we've pumped around 100M randomly generated events (in horribly complex graphs) through our implementation to flush out any bugs.

                  • auggierose 9 months ago

                    This seems to be a field perfect for theorem proving, I think I've seen some work by Kleppmann using Isabelle.

                    I once tried to understand the Yjs paper, but I came to the conclusion that their proof is just wrong! They do some impressively looking logical reasoning in the paper, but they define some order in terms of itself, so they don't really show anything, if I remember correctly. If you tried that in Isabelle, it would stop you already at the very start of all that nonsense.

                    • josephg 9 months ago

                      I talked to Kevin Jahns (the author of the YATA paper & Yjs) about his paper a few years ago. He said he found errors in the algorithm described in the paper, after it was published. The algorithm he uses in Yjs is subtly different from YATA in order to fix the mistakes.

                      He was quite surprised the mistakes went unnoticed through the peer review process.

                      There have also been some (quite infamous) OT algorithm papers which contain proofs of correctness, but which later turned out to actually be incorrect. (Ie, the algorithms don't actually converge in some instances).

                      I'm embarassed to say I don't know Isabelle well enough to know how you would use it to prove convergence properties. But I have gotten very good at fuzz testing over the years. Its wild how many bugs in seemingly-working software I've found using the technique.

                      I think ideally you'd use both approaches.

                      • auggierose 9 months ago

                        Ah, that makes sense! I thought that Yjs must be doing something differently than described, because it seems to work well in practice, but I couldn't see how Yata would. Anyway, I learnt a lot by thinking through that paper :-)

                        Fuzz testing and proof are complementary, I think, both catch things the other one might not have caught. The advantage of Fuzz testing is that it tests the real thing, not a mathematical replica of it.

      • matlin 9 months ago

        Seph (author) also has a reference implementation in Typescript: https://github.com/josephg/eg-walker-reference

        I've stated before that I think the main thing holding back collaborative text / sequence CRDTs is integration with a production database.

        Eg-walker looks interesting because it might lend itself to be integrated into a database because the operations are immutable and only appended. However, to demonstrate the effectiveness of these algorithms library authors (see Yjs, DiamondTypes, etc) build stand-alone data structures (usually specialized search trees) that most databases already provide.

        Personally, I've been trying to adapt a Piece Table[1] to be collaborative and stored in Triplit[2] which runs on both client and server and already implements logical clocks but I might see how well I can adapt this algorithm instead!

        1. https://en.wikipedia.org/wiki/Piece_table 2. https://github.com/aspen-cloud/triplit

        • btown 9 months ago

          This seems to be a holy grail, to be honest! Super-simple database representations with barely any processing required on the "write path," instant startup, minimal memory requirements on both server and client without a need for CRDT data structures to be in memory, none of the O(n^2) complexity of OT. In fact, if I'm interpreting it correctly, it should be straightforward to get this working in a serverless environment without any notion of session fixation, nor active documents needing to be kept in memory.

          I can see this completely reshaping the landscape of what's possible with collaborative documents!

          • josephg 9 months ago

            Author here. Thanks! Yeah this is my hope too.

            Egwalker has one other advantage here: the data format will be stable and consistent. With CRDTs, every different crdt algorithm (Yjs, automerge/rga, fugue, etc) actually stores different fields on disk. So if someone figure out a new way to make text editing work better, we need to rip up our file formats and network protocols.

            Egwalker just stores the editing events in their original form. (Eg insert “a” at position 100). It uses a crdt implementation in memory to merge concurrent changes (and everyone needs to use the same crdt algorithm for convergence). But the network protocol and file format is stable no matter what algorithm you use.

            • vlovich123 9 months ago

              I’ve loved learning all your detailed info on CRDT work. Thank you for progressing the field!

              Since it stores all the editing events, does this mean that the complexity of opening a document is at least O(N) in terms of number of edits? Or are there interim snapshots / merging / and/or intelligent range computations to reduce the number of edits that need to be processed?

              • josephg 9 months ago

                You can just store a snapshot on disk (ie, the raw text) and load that directly. You only ever need to look at historical edits when merging concurrent changes into the local document state. (And thats pretty rare in practice).

                Even when that happens, the algorithm only needs to look at operations as far back as the most recent "fork point" between the two branches in order to merge. (And we can compute that fork point in O(n) time - where n is the number of events that have happened since then). Its usually very very fast.

                In an application like google docs or a wiki, the browser will usually never need to look at any historical changes at all in order to edit a document.

                • vlovich123 9 months ago

                  Very clever idea. Thanks for explaining

          • benjismith 9 months ago

            Awesome, I'm been following Seph's work for many years! Always thoughtful and well-executed. Probably the most prolific and insightful engineer in the "collaborative text editing" universe.

            I use ShareDB every day, which originated from Seph's excellent work on OT algorithms. Good stuff!

            • josephg 9 months ago

              Good to hear it’s still in use! That’s very kind.

            • riedel 9 months ago

              There was a recent thread about the 2001 post that afaik eventually lead to this paper (diamond types is the rust implementation): https://news.ycombinator.com/item?id=41372833

            • no1youknowz 9 months ago

              I've seen a comment from the YT page:

              > While the downside of OT is p2p, the one up side is that you get GIT like history that is super valuable for us especially if we want to build a CDC system.

              How trivial would it be, to implement a CDC system from a CRDT. Does anyone know any github repos or any documentation I could refer to? Thanks

              • abdullahkhalids 9 months ago

                Do collaborative whiteboard like software use the same algorithms, or are there more suitable algorithms for picture collaborations?

                • sno6 9 months ago

                  They usually use a central server and last-writer-wins semantics.

                  Figma for example https://www.figma.com/blog/how-figmas-multiplayer-technology...

                  I've seen CF Durable Objects used quite a lot.

                  There are other emerging patterns too: https://www.instantdb.com/

                  • josephg 9 months ago

                    There’s usually more suitable algorithms for picture collaborations.

                    Text is hard because it’s a list of characters, and when items are inserted and deleted the operations change the index of all subsequent elements.

                    Usually, editing a digital whiteboard is much simpler.

                  • Palmik 9 months ago

                    Saw the YouTube video when it was first posted, and it could be a great match for a new project I have in mind.

                    Is there a practical implementation yet that supports not just strings, but also lists and maps?

                    Would be great to see it integrated into yjs / y-crdt.

                    • eclectic29 9 months ago

                      If Martin Kleppmann is the author I know this stuff will be worth watching out for.

                      • canadiantim 9 months ago

                        Looks like amazing work, congrats!! Excited to see implementations in the wild, definitely would be keen to play around with.

                        • 1attice 9 months ago

                          s/e.g./EG/

                          • deathanatos 9 months ago

                            s/e.g./Eg/, which is how the paper stylizes it?