Comments Page - Parallel-hashmap: drop-in replacement for unordered_map, unordered

« Back Parallel-hashmap: drop-in replacement for unordered_map, unordered_setgithub.comSubmitted by klaussilveira 10 months ago

syspec 10 months ago
From the readme:
Parallel-hashmap or GTL?
The observant among us may have noticed that I have two github repos, parallel-hashmap and gtl, which both provide very similar functionality. Indeed the hash tables in both are equivalent and the code mostly the same. The main difference is that parallel-hashmap only requires a C++11 compiler, while gtl requires a C++20 compiler.
My recommendation would be to use gtl if you are compiling with C++20 or higher, and parallel-hashmap otherwise. While the included hash maps are equivalent, gtl is where new development occurs, and it will include useful new classes.
undefined 10 months ago
[deleted]
throwaway81523 10 months ago
I think the "parallel" in this hashmap comes from the use of SIMD instructions for probing. I guess that's clever and legitimate. There is a mention of thread safety in the readme, but nothing about the hashmap itself using multicore parallelism, which doesn't make much sense anyway.
- BeeOnRope 10 months ago
  By default they are not thread safe, i.e., they offer the same thread safety as std::map or any stdlib type; however, the map can optionally be made thread safe and is apparently optimized for this usage. Details at: https://github.com/greg7mdp/parallel-hashmap?tab=readme-ov-f....
- rurban 10 months ago
  No, it's optimized for parallel usage. Unlike single-threaded hash maps or databases which need to lock the entire table.
  It should be the default hashmap for everybody, I'm using it for years.
  menaerus 10 months ago
  I think that the work looks quite interesting but it seriously lacks some important points to be covered.
  Benchmarks [1] only cover the random insert workload. Why doesn't it include other types of workloads? Inserting into the hashmap is not the only interesting workload that there is. How about mixed workloads, read-only workloads, workloads that fit in LLC and ones that do not etc
  Benchmarks only contrast the implementation against std::unordered_map. Why not against Abseil's flat_hash_map as well because that's a library that this work, according to information on the page, is based on?
  Benchmarks only display 8-threads concurrency scenario and again only in random insert workload. This isn't a particularly high concurrency figure. I could make a "for-concurrency" wrapper around std::unordered_map, or Abseil's flat_hash_map, with RW-lock and modulo arithmetic to minimize the contention in probably no more than 100 lines of code. And it would scale to as many cores as there are on the machine.
  [1] https://greg7mdp.github.io/parallel-hashmap/
  bee_rider 10 months ago
  For thread-level parallelism and reading, I guess the thing to do would be to do multiple reads in parallel, right? So there isn’t much for the implementation to do. Mixed could be interesting.
  rurban 10 months ago
  No, concurrent writes are the problem. That's why it spreads writability into buckets, so that they are mostly independent.
  bee_rider 10 months ago
  That’s a different problem. I was responding to a comment asking about benchmarks for concurrent reads.
  In general we have CPU systems with hundreds of cores nowadays, so I think it’s hard to say something is “the problem” as far as parallelism goes. For example, I have a problem where I got a 100x speed up in the “computationally difficult” part of the problem… suddenly all the stuff that looked too cheap to bother with became more noticeable!
  menaerus 10 months ago
  Yes, I mean if you have a hashmap for read-only workload where insertion is not going to take place after initial build-up then there isn't much to do.
  menaerus 10 months ago
  Concurrent reads are also an issue if you hold an exclusive lock.
  bee_rider 10 months ago
  What’s the issue here? (Why would a lock be needed for reading). Or is it a read in parallel with a write (seems very tricky! But people are very clever).
  menaerus 10 months ago
  Because under general case you cannot read from concurrent hashmap unless you make sure that write is not taking place. And to do that we need locks, either exclusive (mutex) or more fine grained locks such as rw-lock.
  kstrauser 10 months ago
  Are there any drawbacks, like maybe it’s slower for single-threaded code?
  cbhl 10 months ago
  You may find the docs for Abseil's containers (upon which these appear to have been built) helpful: https://abseil.io/docs/cpp/guides/container#recommendation
  In my experience, the main drawback is cognitive complexity: there are not one but four different implementations of map and set provided, each with slightly different memory and compatibility tradeoffs, and using the wrong one may break existing code that depends on (for example) stability of pointers to elements or iterators during set mutation.
  rurban 10 months ago
  Not much, still got all the swiss table tricks
inDigiNeous 10 months ago
I remember dropping parallel hashmap into my C++ app after years of using the standard library containers, and being honestly positively surprised my app got significantly faster after that.
So thanks for the developer of this!
egberts1 10 months ago
This article is also important for nftables (Linux kernel firewall) for it also has the same 'set' and 'map' in its rule definition.
Last time I reviewed the nftables code, they are experiementing with multiple algorithm selections based on size of its 'set' and 'map'.
Night_Thastus 10 months ago
How does it compare vs unordered_dense, which was the successor to robin_hood?
undefined 10 months ago
[deleted]
remram 10 months ago
[for C++]