« BackundefinedSubmitted by tetron 7 hours ago
  • moandcompany 6 hours ago

    It might be useful to mention that Arvados appears to be meant for biomedical data. It doesn't appear to point this out on the homepage, and you have to read the About page for context:

    "Arvados is a modern open source platform for managing and processing large biomedical data. By combining robust data and workflow management capabilities in a single platform, Arvados can organize and analyze petabytes of data and run reproducible and versioned computational workflows."

    • nine_k 4 hours ago

      What would make it unfit for running more general workflows?

    • tetron 7 hours ago

      I don't think this has been discussed on Hacker News before, but I wonder if people have any opinions about Arvados?

      • epistasis 6 hours ago

        Lots of people love it. I don't like CWL much so I don't like it, but it's a perfectly cromulent choice.

        The primary thing with all the workflow systems is to see what is abstracted and find out if it matches your needs. Airflow would never be suitable for what I want need (diverse tools in many languages, some needing conflicting library versions), Arvados would work but I prefer WDL or Snakemake or even NextFlow. Snakemake for incrementally building up a prototype pipeline or a one-off analysis. WDL when I need a production pipeline for years. NextFlow or CWL/Arvados when I need to fit into somebody else's culture or compute infrastructure.

        Edit: and something I have never seen in any workflow system is a looping mechanism that allows testing for convergence or dynamic parameter sweeps, etc. only the homegrown systems, built on top of cluster management like SLURM, have been that flexible. But these homegrown systems for managing compute clusters were never quite mature and generalizable enough to release as open source, even if the company had been willing to open source them.

        • kinow 4 hours ago

          CWL has a loops extension now that can he used for your example of convenging algorithms, https://cwltool.readthedocs.io/en/latest/loop.html

          And most workflow systems that support loops/cycles could be used for that too (e.g. Cylc, ecFlow, Prefect, Orchesta/StackStorm, Covenant, etc.).

          • epistasis 2 hours ago

            Thanks for the CWL pointer; the evaluation of our team either got that one wrong when it was evaluated many years ago, or CWL added it, and in either case it's good to know.