From the readme:
Parallel-hashmap or GTL?
The observant among us may have noticed that I have two github repos, parallel-hashmap and gtl, which both provide very similar functionality. Indeed the hash tables in both are equivalent and the code mostly the same. The main difference is that parallel-hashmap only requires a C++11 compiler, while gtl requires a C++20 compiler.
My recommendation would be to use gtl if you are compiling with C++20 or higher, and parallel-hashmap otherwise. While the included hash maps are equivalent, gtl is where new development occurs, and it will include useful new classes.
I think the "parallel" in this hashmap comes from the use of SIMD instructions for probing. I guess that's clever and legitimate. There is a mention of thread safety in the readme, but nothing about the hashmap itself using multicore parallelism, which doesn't make much sense anyway.
By default they are not thread safe, i.e., they offer the same thread safety as std::map or any stdlib type; however, the map can optionally be made thread safe and is apparently optimized for this usage. Details at: https://github.com/greg7mdp/parallel-hashmap?tab=readme-ov-f....
No, it's optimized for parallel usage. Unlike single-threaded hash maps or databases which need to lock the entire table.
It should be the default hashmap for everybody, I'm using it for years.
I think that the work looks quite interesting but it seriously lacks some important points to be covered.
Benchmarks [1] only cover the random insert workload. Why doesn't it include other types of workloads? Inserting into the hashmap is not the only interesting workload that there is. How about mixed workloads, read-only workloads, workloads that fit in LLC and ones that do not etc
Benchmarks only contrast the implementation against std::unordered_map. Why not against Abseil's flat_hash_map as well because that's a library that this work, according to information on the page, is based on?
Benchmarks only display 8-threads concurrency scenario and again only in random insert workload. This isn't a particularly high concurrency figure. I could make a "for-concurrency" wrapper around std::unordered_map, or Abseil's flat_hash_map, with RW-lock and modulo arithmetic to minimize the contention in probably no more than 100 lines of code. And it would scale to as many cores as there are on the machine.
For thread-level parallelism and reading, I guess the thing to do would be to do multiple reads in parallel, right? So there isn’t much for the implementation to do. Mixed could be interesting.
No, concurrent writes are the problem. That's why it spreads writability into buckets, so that they are mostly independent.
That’s a different problem. I was responding to a comment asking about benchmarks for concurrent reads.
In general we have CPU systems with hundreds of cores nowadays, so I think it’s hard to say something is “the problem” as far as parallelism goes. For example, I have a problem where I got a 100x speed up in the “computationally difficult” part of the problem… suddenly all the stuff that looked too cheap to bother with became more noticeable!
Yes, I mean if you have a hashmap for read-only workload where insertion is not going to take place after initial build-up then there isn't much to do.
Concurrent reads are also an issue if you hold an exclusive lock.
What’s the issue here? (Why would a lock be needed for reading). Or is it a read in parallel with a write (seems very tricky! But people are very clever).
Because under general case you cannot read from concurrent hashmap unless you make sure that write is not taking place. And to do that we need locks, either exclusive (mutex) or more fine grained locks such as rw-lock.
Are there any drawbacks, like maybe it’s slower for single-threaded code?
You may find the docs for Abseil's containers (upon which these appear to have been built) helpful: https://abseil.io/docs/cpp/guides/container#recommendation
In my experience, the main drawback is cognitive complexity: there are not one but four different implementations of map and set provided, each with slightly different memory and compatibility tradeoffs, and using the wrong one may break existing code that depends on (for example) stability of pointers to elements or iterators during set mutation.
Not much, still got all the swiss table tricks
I remember dropping parallel hashmap into my C++ app after years of using the standard library containers, and being honestly positively surprised my app got significantly faster after that.
So thanks for the developer of this!
How does it compare vs unordered_dense, which was the successor to robin_hood?
[for C++]