• Animats 6 hours ago

    There's no such thing as finite-time arbitration for multiprocessor requests to memory, but it works anyway. This is similar. You can't guarantee time-bounded exactly once delivery, but after enough interchanges the probability of being wrong approaches zero.

    This shows up explicitly in blockchain systems, where confidence improves as the number of confirmation cycles increases.

    • YZF 3 hours ago

      This is the first time I'm hearing about the topic of arbitrating multiprocessor access to memory. Isn't there a trivial counterexample where the bandwidth is divided by the number of processors and each processor gets a timeslot?

      Processors, despite being a distributed system (well really every ASIC is), don't typically suffer from your classical distributed systems style issues because they have no failures, or rather they only have certain catastrophic failure modes that are low probability and well controlled for. Your typical distributed system to contrast has constant failures.

      • Animats 3 hours ago

        > This is the first time I'm hearing about the topic of arbitrating multiprocessor access to memory.

        See [1]. This was a very real problem in early multiprocessor computer design.

        [1] https://en.wikipedia.org/wiki/Arbiter_%28electronics%29

        • fanf2 2 hours ago

          Are there many (or even any) multi-drop buses in modern computers? I get the impression that these days almost everything is point-to-point links. But I know less about what’s happening on silicon. I have seen talk of crossbar interconnects on microcontrollers, which (aiui) are more active, unlike passive buses?

        • rcxdude 3 hours ago

          It's more to do with the underlying electronics: digital is an abstraction, everything is analog underneath, and logic gates and registers are actually just very non-linear analog circuits. What this means is that it's always in principle possible for an asynchronous edge (i.e. one that is not aligned to the other edges in the system) to be arbitrarily non-determinate: never actually being decided as a 1 or a 0 as it propagates through the digital circuit. This isn't an entirely theoretical issue, either: it can and will cause very strange behaviour in a digital system because a marginal or metastable value can be interpreted differently by different parts of the circuit that should be seeing the same value. It's possible in practice to push the chance of this happening extremely low, but it comes at the cost of latency: you need to spend more time to allow the circuit to actually settle into one state or the other.

          Within a fully synchronous system, it's possible to avoid this, though. But most multi-processor systems are not synchronous.

          • londons_explore 3 hours ago

            > But most multi-processor systems are not synchronous.

            Is this actually true? A modern AMD or Intel CPU doesn't have one quartz crystal per core, so therefore all per-core clocks have to be derived from one clock source, meaning the relationship between all the clock frequencies and phases is deterministic, even if not specced in the datasheet.

            • YZF 2 hours ago

              There would be a single clock typically but OP is right in the sense that as the signal propagates through the system you can get nondeterministic effects: https://en.wikipedia.org/wiki/Metastability_(electronics)

              In practice the probability of that is just driven low enough that it's not an issue.

        • dheera 2 hours ago

          Easy exactly-once delivery: Send a transaction ID with the transaction using an at-least-once system. The receiver ignores duplicate ID transactions. The sender can guarantee never sending the same ID twice.

          • Maxatar an hour ago

            That's what the article calls exactly once processing.

            An analogy would be pizza delivery. You wouldn't call someone who delivered 100 identical pizzas to you even though you only ate one of them "exactly once delivery".

            • kelnos an hour ago

              Agreed, if I had to be the one who dealt with the extra pizzas. But if there was someone standing outside in my yard in order to discard the 99 duplicate pizzas before they got to my door (a travesty in and of itself, but that's another discussion), I would think of the system as a whole as "exactly once delivery".

              And, analogously, that does seem to be what Sequin's system provides.

            • magicalhippo an hour ago

              But then it's not the transport system that provides exactly-once delivery, but rather the users of the system.

              • dheera an hour ago

                Wrap that up and abstract it as a transport system. The users don't need to know the inner workings, they just need a transport API that guarantees exactly-once.

                • magicalhippo 26 minutes ago

                  The point is you can't wrap that up, because the faults can happen at the edges, ie where the systems interact.

                  The users still need to interact with this wrapped system, and it's those interactions that can fail, thus preventing the overall exactly-once delivery.

                  To flip it around, if this wasn't an issue then what you're suggesting already exists in the form of ACID databases.

                  • orf 16 minutes ago

                    > they just need a transport API that guarantees exactly-once

                    Which is the same problem, but at a different layer.

            • dang 5 hours ago

              Recent and related:

              Yes, you can have exactly-once delivery - https://news.ycombinator.com/item?id=41598006 - Sept 2024 (133 comments)

              Also. Others?

              You cannot have exactly-once delivery (2015) - https://news.ycombinator.com/item?id=34986691 - March 2023 (217 comments)

              Show HN: Exactly-once delivery counter using Watermill messaging library - https://news.ycombinator.com/item?id=26888303 - April 2021 (9 comments)

              High-performance, exactly-once, failure-oblivious distributed programming (2018) - https://news.ycombinator.com/item?id=20562733 - July 2019 (26 comments)

              Exactly-once Semantics: How Kafka Does it - https://news.ycombinator.com/item?id=14670801 - June 2017 (39 comments)

              Delivering Billions of Messages Exactly Once - https://news.ycombinator.com/item?id=14664405 - June 2017 (133 comments)

              You Cannot Have Exactly-Once Delivery - https://news.ycombinator.com/item?id=9266725 - March 2015 (55 comments)

              Exactly-Once Messaging in Kafka - https://news.ycombinator.com/item?id=8880612 - Jan 2015 (8 comments)

              Exactly-Once Delivery May Not Be What You Want - https://news.ycombinator.com/item?id=8614264 - Nov 2014 (10 comments)

              • griffgeorge 5 hours ago

                > That's why we don't say Sequin offers exactly-once delivery anywhere. We say it offers at-least-once delivery and exactly-once processing.

                The author should note that the homepage at https://sequinstream.com/ does in fact claim that Sequin supports "Exactly-once delivery" under the "Key features" heading.

                • notfed 5 hours ago

                  Well that doesn't seem consistent.

                  • bhaney 5 hours ago

                    It will be eventually

                    • kibwen 2 hours ago

                      Unless it's available and partition-tolerant, that is.

                    • hinkley 3 hours ago

                      Engineers often don't get to touch the website.

                    • _acco 5 hours ago

                      Aye, good catch! Honest mistake. Fixed.

                      • notfed 5 hours ago

                        Ack.

                        • hinkley 3 hours ago

                          This is my new favorite video on TCP/IP:

                          https://www.youtube.com/shorts/R6WN4_bBB1Q

                          • kortilla an hour ago

                            Out of order does not cause a reset. Some networks with unstable topologies like wireless meshes frequently cause reordering and it doesn’t destroy TCP streams. It usually just results in inefficient retransmits when the receiver registers it as a drop.

                            It was a good video until the RST.

                          • rzzzt 3 hours ago

                            I didn't get this one.

                      • tombert 6 hours ago

                        Reminds me of this: https://en.wikipedia.org/wiki/Fallacies_of_distributed_compu...

                        Part of the reason I find distributed and concurrent computing so fascinating is that we lose so many assumptions about how computers are supposed to work; ordering guarantees, data actually arriving to destinations, fairness, etc.

                        While it is of course possible for data to get lost or mangled in a single-threaded, non-distributed application, it's rare enough to where you often don't need to plan much around it. Once you add concurrency, and especially if you make it distributed, you suddenly have to plan around the eventuality of the messages not arriving, or arriving in the wrong order, or arriving multiple times.

                        Personally, I've gotten into the habit of PlusCal and TLA+ for everything, and modeling in the assumption that stuff might never arrive or arrive multiple times, which can work as a very good sanity check (though it's of course not a silver bullet).

                        • exabrial 9 minutes ago

                          I mean… activemq delivers messages once and reliably… we push 12k msgs/s with a gig of ram and 2 cores about 25% cpu. Msg size is about 500 bytes - 500kb.

                          It’d be faster but we don’t care. It also participates in an XA transaction…

                          In 10 years of running it practically non stop up to volumes of 100kmsg/s with hundreds of network and disk issues, I’ve never ever seen even once a dupe message. Sure as the article is it could happen but in practice it doesn’t.

                          • orf 3 minutes ago

                            You wouldn’t expect to see issues in a simple system in a steady-state. The problems arise when the system responds to other transient or unexpected issues.

                            Try `kill -9`ing your consumers or unplugging a network cable and see what happens.

                            Also, AMQP is most certainly not exactly once by any measure at all. Do you have any consistency checks or anything to validate duplicates?

                          • kelnos an hour ago

                            > we don't say Sequin offers exactly-once delivery anywhere. We say it offers at-least-once delivery and exactly-once processing.

                            Ugh, this is just a matter of semantics. Boring. If the receiver only acts on messages once, and will always have the opportunity to act on every single message, then for all practical purposes, it's exactly-once delivery.

                            • orf 13 minutes ago

                              And how do you guarantee it only acts on messages once? It’s the same problem. Even so, it’s very explicitly not exactly once delivery.

                            • magicalhippo 3 hours ago

                              This feels a bit like those relativity puzzles, for example the barn-pole paradox[1].

                              That is, delivered exactly-once seems to be a relative statement where one party can state they did deliver exactly-once while another party can disagree, and they're both correct relative to their local frames of reference.

                              So, like how GR has no global conservation of energy, just local, could one say that distributed systems have no global conservation of messages, only local?

                              Or am I just late night rambling again...

                              [1]: https://en.wikipedia.org/wiki/Ladder_paradox

                              • socketcluster 3 hours ago

                                The "exactly once delivery is impossible" narrative got so strong that people feel that they have to redefine the word 'delivery' to make it fit the premise.

                                If a message can be guaranteed to be received one or more times but results in a single entry being inserted into a database (e.g. in an idempotent way using UUIDs to enforce uniqueness), then clearly, a consumer reading the records from that database would only receive the message exactly once. If you consider the intended recipient to be the process, not the host, then the definition of 'delivery' allows you to claim that exactly-once delivery is possible.

                                Of course this is a pragmatic, not mathematical argument. In theory you can't mathematically guarantee even at least once delivery because you can't guarantee that there's not going to be a Carrington event which will bring down the internet and drive humanity to extinction; thus preventing the consumer server from being rebooted.

                                • BeeOnRope 2 hours ago

                                  In a sense it's just kicking the can down the road: how you ensure the consumer reads the message only once? It may fail at some point after it has read the message but updated some state to reflect that.

                                  So it's not just a pointless argument about the semantics of the term "delivery": the fact that no communication channel can have exactly once delivery means that these systems are much more difficult to implement. For example, you can't just chain together 2 "exactly once" delivery systems into some longer one with a stateless middle node: instead you need some concept of idempotency that spans both systems or the middle node itself needs to de-duplicate (statefully) the first link and forward messages on to the second link.

                                  • YZF 3 hours ago

                                    What about other messages though? The question isn't usually limited to one message in isolation, it's related to the state of the system as a whole. So sure, trivially, setting one bit where that bit is never cleared can only be done once, so any process setting that bit is also trivially (and not very usefully?) "exactly once delivery".

                                    • mgsouth 3 hours ago

                                      On the contrary, that is an extremely useful process, and is the basis for vast swathes of multiprocessing algorithms (such as "has this thing finished initializing", or "has it been freed"). However, it is "at most once", not "exactly once". The bit may never be set.

                                      • YZF 3 hours ago

                                        Fair enough. A single bit that can only get set can indeed be useful.

                                  • benlivengood 3 hours ago

                                    Messages don't have to be delivered more than once to achieve exactly-once processing; the receiver can offer a query interface for delivery status. If every attempt at delivery checks for a unique message ID and only sends the message if it has not been received then you get exactly-once delivery of the message, if not the metadata.

                                    Optimistically assuming the message has not been delivered yet will sometimes reduce average latency (most databases), but for very expensive messages it is better to check first (rsync)

                                    • losvedir 2 hours ago

                                      Isn't there a race condition in the check, though?

                                      * X in transit to B

                                      * A asks B: Have you received X? B says No.

                                      * X gets to B.

                                      * A sends X.

                                      • 3np an hour ago

                                        Right, and as you solve for this on a different layer, you end up having to choose between at-most-once and at-least-once at some point anyway. Proof is in the pudding if GP won't take my word for it ;)

                                        I think it's a useful excercise to attempt to disprove notions such as this. We shouldn't take all these "truths" for granted. But it does take some diving into it properly to appreciate the nuances and eventually get the underlying (as opposed to reasoning about it on a higher level, as we all are here).

                                    • thadt 5 hours ago

                                      > With that definition, SQS, Kafka, and Sequin are all systems that guarantee exactly-once processing. The term processing captures both the delivery of the message and the successful acknowledgment of the message.

                                      This terminology confuses my face. Maybe we need a better name for "exactly-once processing"? When I say "processing" I'm thinking of the actions that my program is going to take on said message - the very processing that they're talking about can fail. Can we not just say 'the message is held until acknowledgement'?

                                      • theptip 5 hours ago

                                        The simple way to visualize this is that all the processing can happen within a serializable transaction. If anything fails, the transaction rolls back. Marking the message as delivered is part of the same transaction.

                                        • thadt 4 hours ago

                                          Which means that the 'processing' in my application has to have the semantics of a serializable transaction. And it might not - as in the very example given in the post, where a processing application sends an email.

                                          Hence my complaint that making statements like "our system offers exactly once processing" can be completely correct - from the point of view of the messaging system. But it can be misleading to the casual reader in understanding how it fits into the broader solution, because the messaging system cannot guarantee that my application processes it exactly once. I'm saying the terminology doesn't feel right.

                                      • dondraper36 6 hours ago

                                        It's admirable when vendors don't try to abuse theory to make their product look better.

                                        There was some blog post last week that supposedly proved the impossibility of exactly once delivery wrong, but that was quite expectedly another attempt to rename the standard terms.

                                        • sidewndr46 6 hours ago

                                          This didn't stop AWS from at one point declaring "none of the above" as their delivery guarantees for S3 events:

                                          https://www.hydrogen18.com/blog/aws-s3-event-notifications-p...

                                          From what I can tell my blog post eventually motivated them to change this behavior

                                          • convolvatron 6 hours ago

                                            if you employ persistence to keep sequence numbers, and/or retransmissions and redundancy, you can drive the likelihood of distributed consensus over time arbitrarily close to probabilities where it doesn't matter anyways (like the earth suddenly being coincident with the sun).

                                            so what are we arguing about?

                                            • sethammons 6 hours ago

                                              there is a group of people that say that there is no such thing as exactly-once-delivery and they are correct. There is also a group of people who cannot hear nor read nor process the word "delivery" in that term and instead insist that exactly-once-processing is easily achievable via [deduplication, idempotent actions, counters, ...]. They are correct as long as they acknowledge "processing" vs "delivery" -- and there are several who refuse to acknowledge that and don't believe the distinction matters and they are wrong.

                                              • andrewflnr 5 hours ago

                                                Not that I disagree with getting terms exactly right, but if the "processing" that happens exactly once in your messaging system is firing a "delivery" event to a higher level of the stack (over a local channel), that's observationally indistinguishable from exactly-once delivery to that higher level, isn't it? What am I missing?

                                                • wmf 5 hours ago

                                                  Nope, because that event could also be lost/duplicated.

                                                  • Izkata 4 hours ago

                                                    In the systems I'm used to, that "passing up" is a function call within the same codebase. There isn't a second delivery system for a message to be lost/duplicated, though sometimes they obscure it by making it look more like such a system.

                                                • ViewTrick1002 5 hours ago

                                                  The problems always stem from the side effects. You can achieve exactly once like properties in a contained system.

                                                  But if the processing contains for example sending an email then you need to apply said actions for all downstream actions, which is where exactly once as seen by the outside viewer gets hard.

                                                  • rwiggins 5 hours ago

                                                    Out of curiosity, how would you describe TCP in these terms? Does the TCP stack's handling of sequence numbers constitute processing (on the client and server both I assume)? Which part(s) of a TCP connection could be described as delivery?

                                                    • wmf 5 hours ago

                                                      From TCP's perspective it has delivered the data when read() returns successfully. This is also the point at which TCP frees the packet buffers. From the app's perspective, TCP is an at-most-once system because data can be lost in corner cases caused by failures.

                                                      (Of course plenty of people on the Internet use different definitions to arrive at the opposite conclusion but they are all wrong and I am right.)

                                                      • YZF 3 hours ago

                                                        I agree this is what you'd consider delivery. Also agree everyone else is wrong an I am right ;)

                                                        The similar question in TCP is what happens when the sender writes to their socket and then loses the connection. At this point the sender doesn't know whether the receiver has the data or not. Both are possible. For the sender to recover it needs to re-connect and re-send the data. Thus the data is potentially delivered more than once and the receiver can use various strategies to deal with that.

                                                        But sure, within the boundary of a single TCP connection data (by definition) is never delivered twice.

                                                    • wellpast 5 hours ago

                                                      is there a real difference between delivery and processing?

                                                      In a streaming app framework (like Flink or Beam, or Kafka Streams), I can get an experience of exactly once "processing" by reverting to checkpoints on failure and re-processing.

                                                      But what's the difference between doing this within the internal stores of a streaming app or coordinating a similar checkpointing/recovery/"exactly-once" mechanism with an external "delivery destination"?

                                                      • ithkuil 5 hours ago

                                                        Yes there is a difference and it's quite crucial to understanding how two groups of people can claim seemingly opposite things and both be correct.

                                                        It's important to note that by processing here we don't just mean any "computing" but the act of "committing the side effects of the computation"

                                                        • wellpast 4 hours ago

                                                          A typical delivery target is a data store or cache. Writing to such a delivery can be construed of as a “side effect”. But if you accept eventual consistency, achieving exactly once semantics is possible.

                                                          Eg a streaming framework will let you tally an exactly once counter. You can flush the value of that tally to a data store. External observers will see an eventually consistent exactly-once-delivery result.

                                                    • mgsouth 3 hours ago

                                                      Vendor: "Exactly once is possible. We do it!"

                                                      No, as a vendor you're either deliberately lying, or completely incompentent. Let's insert the words they don't want to.

                                                      "We do exactly once good enough for your application!"

                                                      How could you possibly know that?

                                                      "We do exactly once good enough for all practical purposes!"

                                                      So are my purposes practical? Am I a true scotsman?

                                                      "We do exactly once good enough for applications we're good enough for!"

                                                      Finally, a truthful claim. I'm not holding my breath.

                                                      Edit: More directly address parent:

                                                      It's not possible to even guarentee at-least-once in a finite amount of time, let alone exactly-once. For example, a high-frequency trading system wants to deliver exactly one copy of that 10-million-share order. And do it in under a microsecond. If you've got a 1.5 microsecond round trip, it's not possible to even get confirmation an order was received in that time limit, much less attempt another delivery. Your options are a) send once and hope. b) send lots of copies, with some kind of unique identifier so the receiver can do process-at-most-once, and hope you don't wind up sending to two different receiving servers.

                                                      • SpicyLemonZest 5 hours ago

                                                        What you can't do, and what naive consumers of distributed systems often expect, is drive the probability arbitrarily low solely from the producer side. The consumer has to be involved, even if they consider themselves to be a single-node process and don't want to bother with transactionality or idempotency keys.

                                                    • notfed 5 hours ago

                                                      Tangent, but: this reminds me of many philosophical debates, e.g. "do we have free will?".

                                                      What I've found is that usually what's happening is that the "debaters" assume different definitions of the thing they're debating about, which means they're really debating about two entirely different things, with the same name, and that's the real reason why they've come to different conclusions.

                                                      • dang 34 minutes ago

                                                        (We detached this comment from https://news.ycombinator.com/item?id=41701678. There's nothing wrong with a good tangent now and then, but it's probably better as a top-level subthread, plus lower down on the page.)

                                                        • rzzzt 4 hours ago

                                                          One recommendation I've read is that in such cases you should treat the single label you and your opponent have attached to two different ideas as forbidden. Describe it in more detail to reveal the difference (don't just swap words in the starting combination with their synonyms).

                                                          • wmf 4 hours ago

                                                            That's fine for discussions between experts but when a hundred noobs enter the discussion it's just unpaid tutoring. These threads are bringing out a lot of "I've discovered something all the distributed systems experts missed" energy.

                                                          • aidenn0 4 hours ago

                                                            For me "do we have free will" debate is useless because if the answer is "no" then that implies the result of the debate is predetermined and all involved lack any agency to change the outcome.

                                                            Therefore any adult arguing the "no free will" side is either

                                                            1. Disingenuous

                                                            2. Purposefully playing Candyland as an adult.

                                                            Either of which is so damaging to the Ethos that I lack any desire to continue the debate.

                                                            • kelnos an hour ago

                                                              I feel similarly, but for a different reason: even if I don't have free will, I'm not going to change anything about how I live my life. My life is still important to me, either way. Free will vs. determinism might be a fascinating academic question to some, but I just don't find it all that interesting.

                                                              Same when it comes to arguments about whether or not our reality is a simulation, because, again, I'm not going to live any differently either way. My life is still important to me, even if it's just the result of an algorithm run on some computer built by an advanced civilization.

                                                              • aidenn0 14 minutes ago

                                                                I'm making a similar argument: if I don't have free will, I cannot change anything about by life because I don't have agency.

                                                              • jchw 3 hours ago

                                                                > that implies the result of the debate is predetermined and all involved lack any agency to change the outcome.

                                                                In a superdeterministic world, the presence of an illusion of free will is baked into everything humans do anyways. If the world is superdeterministic, but you can't actually predict what's about to happen with any additional certainty, then it doesn't really change anything anyway. So, there's no reason to change the way you argue based on the answer to the question in my opinion.

                                                                Of course, now I'm arguing about whether arguing about free will makes any sense, which is perhaps even more silly, but alas.

                                                                • yarg 3 hours ago

                                                                  That doesn't follow.

                                                                  There's nothing disingenuous or moronic about believing that physical processes might be fundamentally deterministic.

                                                                  • aidenn0 3 hours ago

                                                                    > There's nothing disingenuous or moronic about believing that physical processes might be fundamentally deterministic.

                                                                    My argument is that a game of Candyland is fundamentally deterministic in the same way.

                                                                    • yarg 2 hours ago

                                                                      Fair enough, but the parameter space is incomparable (and in the case of space-time irreplicable and chaotic, making events unrepeatable).

                                                                      And, unlike the game, it is impossible to hold the oracle's view (so even if freewill is illusory, it's an unbreakable illusion).

                                                                • klabb3 2 hours ago

                                                                  Not sure why you’re downvoted. I think this is completely true. To add more examples, I’ve seen a lot of debates about async IO and “non-blocking” where people talk over each other because they clearly mean different things. I tend to avoid ambiguous terms, but it’s an uphill battle.

                                                                • dataflow 3 hours ago

                                                                  If there is no such thing as exactly-once delivery because the recipient might not feel like deduplicating, then there can be no such thing as at-least-once delivery either, because the recipient might not feel like being endlessly available for delivery either. Or the mailman might not feel like (re-)attempting deliveries endlessly either. Or the sender might not feel like re-sending things endlessly either.

                                                                  Is this insightful? Surprising? What's the point of the debate?

                                                                  • 0xbadcafebee 5 hours ago

                                                                    I mean, you can do exactly-once delivery, it's just a more complex state machine with a specific design requirement. Here it is using mail:

                                                                      # Receive new mail message, incoming id '1234'.
                                                                      # No other writer knows about this message.
                                                                      $ echo "hello world" > incoming/1.1234.txt
                                                                      
                                                                      # If files stay in incoming/ too long, it means they probably failed writing,
                                                                      # and are thus stale and can be removed.
                                                                      
                                                                      # When the message is done being written, the writer moves it to a new state.
                                                                      $ mv incoming/1.1234.txt new/1.1234.txt
                                                                      
                                                                      # The writer can then report to the sender that the message was accepted.
                                                                      # In mail, this is seen as optional, as the whole point is to deliver the message,
                                                                      # not to make sure you inform the sender that you will deliver the message.
                                                                      
                                                                      # Here's the exactly-once bit:
                                                                      #   If you need assurance of exactly-once delivery, split out the delivery portion 
                                                                      #   from the confirmation portion.
                                                                      #   
                                                                      #   Have one actor talk to the sender, and another actor queue the message in another 
                                                                      #   temporary queue, like 'accepted/'. Once the message is queued there, you tell the 
                                                                      #   client it is accepted, wait for successful termination of the connection, and then
                                                                      #   move the message to the 'new/' queue.
                                                                      # 
                                                                      #   This way, if there is an error between accepting the message and confirming with
                                                                      #   the sender, you can detect that and leave the message in 'accepted/' where you can
                                                                      #   later decide what to do with it. (But it won't get picked up for processing)
                                                                      
                                                                      # The message is now ready to be picked up for processing.
                                                                      # To process the message, a processor picks up the new message and moves it to a new state.
                                                                      # 
                                                                      # The processor adds an identifier '5678', so the processor can identify which file it's working on.
                                                                      # It can have extra logic to re-attempt processing on this ID if it fails.
                                                                      $ mv new/1.1234.txt process/1.1234.5678.txt
                                                                      
                                                                      # If the processor dies or takes too long, you can identify that (stale file, processor tracking its work, etc)
                                                                      # and this can be moved back to 'new/' for another attempt. Moving it back to 'new/' is still atomic so
                                                                      # there is no danger of another processor continuing to work on it.
                                                                      
                                                                      # Once the processor is done processing, it will move to the complete state.
                                                                      $ mv process/1.1234.5678.txt done/1.txt
                                                                      
                                                                      # If the processing file no longer exists, it means it was either already done,
                                                                      # or previously died and was moved back to 'new/'.
                                                                    
                                                                    This process all depends on a database [filesystem] using synchronous atomic operations [mv]. If your distributed system can't handle that, yeah, you're gonna have a hard time.
                                                                    • bcrl 4 hours ago

                                                                      You have made a false assumption: directory operations across directories are not atomic. The typical way around that is to hardlink the file into the target directory and then sync the target directory then delete the old name, but now you no longer know if the file is new and has been processed or old when the systems is interrupted with the intermediate state on disk with the file in both directories. Which is exactly the problem messaging systems have. There is always a point in the state machines across multiple systems where a crash at the right point in time can lead to replay of a message. I worked on persistent messaging for a few years, and it is impossible to resolve this in real world systems. The best you can do is to narrow the window by reducing latency of the network and for commits to persistent storage.

                                                                      • 0xbadcafebee 2 hours ago

                                                                        mv implementation that uses rename() syscall is atomic, even across directories, according to POSIX standard. https://pubs.opengroup.org/onlinepubs/9699919799/functions/r... What you describe as the correct way is what rename() does.

                                                                        It is of course "from the perspective of the host issuing the call", but that can be resolved by a network server by blocking operations on a given file when such an operation has begun. And of course the filesystem has to actually do the right thing. I'm no filesystem expert, but I would assume one way to do it is to write the block(s) that have the updated inode maps in one operation. A journal should help prevent corruption from making a mess of this. And of course disk / filesystem / OS tuning to further ensure data durability.

                                                                        • lmm an hour ago

                                                                          The idea of a filesystem, much less a distributed one, that complies with the POSIX standard, is vaporware.

                                                                      • mgsouth 4 hours ago

                                                                        So that everyone's on the same page, perhaps you could clarify what "delivery" means to you? For example:

                                                                        * The human to whom the message is addressed reads it on their screen.

                                                                        * The email is inserted into that person's inbox. It's possible something will happen to the message after than point, before the addressee actually reads it, but that isn't what you're talking about.

                                                                        * The email has been accepted by a server operating as an endpoint for the addressee (their company's Exchange server), _and_ acknowledgement has been received by the sending server.

                                                                        * The above, but no guarentees about the sending server getting the acknowledgement.

                                                                        etc.

                                                                        [Edit: also, what "guarentee" means to you. 100%, 99.99999% is good enough, and so on.]