Actions
Feature #22876
closedChange message digest from SHA1/VARCHAR(40) to XXHASH/BIGINT
Description
Using 120bits SHA1 for detecting dupes is ultra-overkill. Simple 64bit number with hash function like CRC64 (https://github.com/postmodern/digest-crc) can do the trick, there will be no collisions for tens or even hundreds of millions of strings imported. But index on a number on 64bit system is much faster than index on VARCHAR, also this will save a lot of memory/space on the SQL server.
This would need:
- changing the digest from string to int64
- rehashing all entries
- code changes
- benchmark to verify it performs better (I will setup production instance with real data to get real numbers)
Actions