• solidasparagus 9 hours ago

    Nice work! There is a gap when it comes to writing single-machine, concurrent CPU-bound python code. Ray is too big, pykka is threads only, builtins are poorly abstracted. The syntax is also very nice!

    But I'm not sure I can use this even though I have a specific use-case that feels like it would work well (high-performance pure Python downloading from cloud object storage). The examples are a bit too simple and I don't understand how I can do more complicated things.

    I chunk up my work, run it in parallel and then I need to do a fan-in step to reduce my chunks - how do you do that in Pyper?

    Can the processes have state? Pure functions are nice, but if I'm reaching for multiprocess, I need performance and if I need performance, I'll often want a cache of some sort (I don't want to pickle and re-instantiate a cloud client every time I download some bytes for instance).

    How do exceptions work? Observability? Logs/prints?

    Then there's stuff that is probably asking too much from this project, but I get it if I write my own python pipeline so it matters to me - rate limiting WIP, cancellation, progress bars.

    But if some of these problems are/were solved and it offers an easy way to use multiprocessing in python, I would probably use it!

    • globular-toast 5 hours ago

      Do you really need to reinvent the wheel every time for parallel workloads? Just learn GNU parallel and write single-threaded code.

      Concurrency in general isn't about parallelism. It's just about doing multiple things at the same time.

      • halfcat 8 hours ago

        > I don't want to pickle and re-instantiate a cloud client every time I download some bytes for instance

        Have you tried multiprocessing.shared_memory to address this?

        • solidasparagus 6 hours ago

          I haven't played with that much! This isn't really a problem in general for my approach to writing this sort of code - when I use multiprocessing, I use a Process class or a worker task function with a setup step followed by a while loop that pulls from a work/control queue. But in the Pyper functional programming world, it would be a concern.

          IIRC multiprocessing.shared_memory is a much more low-level of abstraction than most python stuff, so I think I'd need to figure out how to make the client use the shared memory and I'm not sure if I could.

      • rtpg 8 hours ago

        You really should dive more into the `multiprocess` support option and highlight how this gets around issues with the GIL. This feels like a major value add, and "does this help with CPU-bound work" being "yes" is a big deal!

        I don't really need pipelining that much, but pipelining along with a certain level of durability and easy multiprocessing support? Now we're talking

        • t43562 5 hours ago

          ...although python 3.13 can be built without the GIL and it really does make threading useful. I did some comparisons with and without.

          I suppose one excellent thing about this would be if you could just change 1 parameter and switch from multiprocessing to threaded.

          • rtpg 2 hours ago

            I think you could build off of threading. I do think here it's good to acknowledge that Python async is fundamentally single-threaded (or rather, "single thread per event loop"), so if you do go a multi-thread version you might have to do some bookkeeping to make it all work well.

            I'm not sure how well async Python libs are tested against working in a world with multiple event loops, but I bet there are a _lot_ of latent bugs in that space.

        • urduntupu 2 hours ago

          Very good README.md, teasing and explaining very well the the provided value. Well done!

          • yablak 6 hours ago
            • minig33 3 days ago

              This is cool - I’ve been looking for something like this. I really liked the syntax of Prefect v1 but it was overcomplicated with execution configuration in subsequent versions. I just want something to help me just run async pipelines and prevent AsyncIO weirdness - going to test this out.

              • grandma_tea 10 hours ago

                Nice! I'm looking forward to trying it out. This seems very similar to https://github.com/cgarciae/pypeln/

                • kissgyorgy 7 hours ago

                  Very simple and elegant API!