• limit499karma 7 minutes ago

    I do not believe 'pipelining' and parallelism are interchangeable models and conflating them is a mistake. For example, consider a parallel processing system that in fact works strictly using a 'pipeline' of length 0, that is there is a hand-of from input to processing stage and processing of that input. And you can have n such parallel processing stages and voila 'parallelism'.

    Pipelines are strictly processing stages where the 'production of the input' and processing on the inputs are not synchronized. For example, one sends n requests to via a pipeline protocol to a remote server without waiting for acks for each input from the server. There may only be one such processing pipeline (and thus no parallelism) while there is pipelining.

    • js8 2 hours ago

      I am not sure why would you want to do this. If I take an example of the bottling pipeline, it seems to me that the best way to process the bottles on an SMP system is to have multiple threads where each thread is doing all the work for a single bottle, sequentially.

      Maybe, if the processes at each stage are I/O-bound, then it might make sense. But if they are CPU-bound, then I am not sure this way of pipelining helps - you're moving data between different CPUs, destroying cache locality.

      • adrianmonk an hour ago

        I was about to suggest the same thing, and I still think it makes a lot of sense.

        However, the cache locality thing is complicated. Each bottle has data associated with it, but each processing stage might also have data. For example, maybe a particular stage uses a lookup table. Or maybe stages keep statistics as they process bottles.

        If you have one CPU doing all the work for a particular stage, then per-stage data stays in its cache. But if you have one CPU doing all the work for a particular bottle, then all the per-bottle data stays in its cache. So there's a trade-off that depends on the specifics.

        • anonymoushn 26 minutes ago

          You probably gain some ILP or avoid some pipeline stalls by handling a lot of bottles at a time rather than one.

        • shermantanktop 2 hours ago

          I’ve done something like this in large service ecosystems. It has two very unfortunate properties:

          - if the actual performance deviates from the predicted (scored) performance, the system easily enters a degenerate bottlenecked state.

          - and if that happens, the many internal queues make diagnosis, root causing, and confidence in a fix all exponentially worse.

          Now you might assert that this will be applied in situations where scores are accurate and brown failures do not occur. Those aren’t the situations I deal with.

          • Joker_vD an hour ago

            The algorithm in TFA, as I understand it, requires the scheduler to constantly monitor the queue lengths and reschedule the workers appropriately, as opposed to manually estimating the required number of workers for each stage and then never changing it.

            • shermantanktop an hour ago

              This is what happens with PID controllers. It is by definition a trailing view, and a single catastrophically bad request can cause catastrophic deviation of the whole system before the queue monitoring can "catch up."

          • jerf an hour ago

            This also contains a hidden assumption that reallocation from one task to another is very, very expensive, so it's necessary to name the distribution of resources well in advance. This is clearly true for a physical production line. But if you are in the very common circumstance in the computer world where the switching time is significantly less than the amount of progress than can be made on a given task in some time, then the optimum scenario is just to do work as it becomes available, and in many cases, almost any scheduling algorithm that isn't deliberately constructed for maximum pathology will ensure that significant progress is made.

            That's why in the vast majority of circumstances you'll be running many things on many CPUs, you just throw all the work at the CPUs and let the chips fall where they may. Deliberate scheduling is a tool, but an unusual one, especially as many times the correct solution to tight scheduling situations is to throw more resources at it anyhow. (Trying to eke out wins by changing your scheduling implies that you're also in a situation where slight increases in workloads will back the entire system up no matter what you do.)

            • bdcravens 2 hours ago

              Jefferson said we should rewrite the constitution every 19 years, so I assumed that means if threads haven't completed in some amount of time, to just blow them all away and start over lol