• vblanco 19 hours ago

    Interesting library, but i see it falls back into what happens to almost all SIMD libraries, which is that they hardcode the vector target completely and you cant mix/match feature levels within a build. The documentation recommends writing your kernels into DLLs and dynamic-loading them which is a huge mess https://jfalcou.github.io/eve/multiarch.html

    Meanwhile xsimd (https://github.com/xtensor-stack/xsimd) has the feature level as a template parameter on its vector objects, which lets you branch at runtime between simd levels as you wish. I find its a far better way of doing things if you actually want to ship the simd code to users.

    • janwas 15 hours ago

      +1, dynamic dispatch is important. Our Highway library has extensive support for this.

      Detailed intro by kfjahnke here: https://github.com/kfjahnke/zimt/blob/multi_isa/examples/mul...

      • spacechild1 19 hours ago

        Thanks, that's an important caveat!

        > Meanwhile xsimd (https://github.com/xtensor-stack/xsimd) has the feature level as a template parameter on its vector objects

        That's pretty cool because you can write function templates and instantiate different versions that you can select at runtime.

        • vblanco 17 hours ago

          Yeah thts the fun of it, you create your kernel/function so that the simd level is a template parameter, and then you can use simple branching like:

          if(supports<avx512>){ myAlgo<avx512>(); } else{ myAlgo<avx>(); }

          Ive also used it for benchmarking to see if my code scales to different simd widths well and its a huge help

          • dyaroshev 11 hours ago

            FYI: You don't want to do this. `supports<avx512>` is an expensive check. You really want to put this check in a static.

        • kookamamie 18 hours ago

          100% agreed. This is the main reason ISPC is my go-to tool for explicit vectorization.

          • dyaroshev 12 hours ago

            Our answer to this - is dynamic dispatch. If you want to have multiple version of the same kernel compiled - compile multiple dlls.

            The big problem here is: ODR violations. We really didn't want to do the xsimd thing of forcing the user to pass an arch everywhere.

            Also that kinda defeats the purpose of "simd portability" - any code with avx2 can't work for an arm platform.

            eve just works everywhere.

            Example: https://godbolt.org/z/bEGd7Tnb3

            • janwas 11 hours ago

              It is possible to avoid ODR violations :) We put the per-target code into unique namespaces, and export a function pointer to them.

              • dyaroshev 11 hours ago

                You can do many thing with macros and inline namespaces but I believe they run into problems when modules come into play. Can you compile the same code twice, with different flags with modules?

            • vlovich123 14 hours ago

              Since you seem knowledgeable about this, what does this do differently from other SIMD libraries like xsimd / highway? Is it the addition of algorithms similar to the STD library that are explicitly SIMD optimized?

              • dyaroshev 6 hours ago

                The algorithms I tried to make as good as I knew how. Maybe 95% there. Nice tail handling. A lot of things supported. I like or interface over other alternatives, but I'm biased here. Really massive math library.

            • thrtythreeforty 3 hours ago

              This library's eve::soa_vector is the first attempt I've seen at dealing with the "SOA problem," which is that if you write good, parallel-friendly code, all your types go to hell and never come back because the language can't express concepts like "my object is made from element 7 of each of these 6 pointers." Instead you write really FORTRAN-looking array processing code with no types or methods in sight.

              Does anyone know of other libraries that help a C++ programmer deal with struct-of-arrays?

              • Conscat 13 hours ago

                EVE is personally my favorite SIMD library in any programming language. It's the only one I've tried that provides masked lane operations in a declarative style, aside from SPMD languages like CUDA or OpenMP. The [] syntax for that is admittedly pretty exotic C++, but I think the usefulness of the feature is worth it. I wish the documentation was better, though. When I first started, I struggled to figure out how to simply make a 4-lane float vector that I can pass into shaders, because almost all of the examples are written for the "wide" native-SIMD size.

                • dyaroshev 11 hours ago

                  Hi!

                  Thanks for your interest in the library.

                  Here is a godbolt example: https://godbolt.org/z/bEGd7Tnb3 Here is a bunch of simple examples: https://github.com/jfalcou/eve/blob/fb093a0553d25bb8114f1396...

                  I personally think we have the following strenghs:

                  * Algorithms. Writing SIMD loops is very hard. We give you a lot of ready to go loops. (find, search, remove, set_intersection to name a few). * zip and SOA support out of the box. * High quality codegen. I haven't seen other libraries care about unrolling/aligning data accesses - meanwhile these give you substantial improvements. * Supporting more than transform/reduce. We have really decent compress implemented for sse/avx/neon implemented for example.

                  The following weaknesses:

                  * We don't support runtime sized sve/rvv (only fixed size). We tried really hard, but unfortunately just the C++ language refuses to play ball there. Here is a discussion about that https://stackoverflow.com/questions/73210512/arm-sve-wrappin...

                  If this is something you need we recommend compiling a few dynamic libraries with the correct fixed lengths. Google Highway manage to pull it off but the trade off is a variadics interface that I personally find very difficult.

                  * Runtime dispatch based on arch.

                  We again recommend dlls for this. The problem here is ODR. I believe there is a solution based on preprocessor and namespaces I could use but it breaks as soon as modules become a thing. So - in the module world - we don't have an option. I'm happy for suggestions.

                  * No MSVC support

                  C++20 and MSVC is still not a thing enough. And each new version breaks something that was already working. Sad times.

                  * Just tricky to get started.

                  I don't know what to do about that. I'm happy to just write examples for people. If you wanna try a library - please create an issue/discussion or smth - I'm happy to take some time and try to solve your case.

                  We talked about the library at CppCon: https://youtu.be/WZGNCPBMInI?si=buFteQB1e1vXRT5M

                  If you want to learn how SIMD algorithms work, here are a couple of talks I gave: https://youtu.be/PHZRTv3erlA?si=b87DBYMDskvzYcq1 https://youtu.be/vGcH40rkLdA?si=WL2e5gYQ7pSie9bd

                  Feel free to ask any questions.

                  • nickpsecurity 15 hours ago

                    I also found this looking for portable SIMD:

                    https://github.com/google/highway

                    • shadowpho 15 hours ago

                      Wait what about AMD? They only claim support for intel and arm

                      • dyaroshev 12 hours ago

                        AMD we support pretty well. I tested Zen1 and a bit Zen4

                        • Sadiinso 13 hours ago

                          « AMD » is x86