Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

But if you are then mapping the elements to (most likely vastly fewer than 2^32) buckets... why do the hash collisions matter? Just use the top 64 bits of the hash to compute a bucket, using ((hash >> 64) * buckets as u128) >> 64.

The reason this works is that hash64 / 2^64 is almost uniformly distributed on [0, 1), so floor(hash64 * buckets / 2^64) is almost uniformly distributed on [0, buckets). This compiles to a single mul instruction on x86-64, a single mulhi instruction on ARM.



Agreed on multiplying the top 64 bits rather than dividing: if the hash is calculated via prime multiplications, the top bits are "more random" than the bottom bits.

But we don't really know the hash table type. It could be doing something like hash table -> linked list -> items, in which case they might still be utilising the full 128-bit hash (I'd usually just store the lower 64 bits TBH).


I'm not saying to throw away the rest of the hash, just to ignore it for the bucket calculation. You can utilize the full 128-bit hash for other parts.


Ah, misunderstood you. Thanks for clarifying.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: