Skip to content

Too many collisions? #27

@doughboyks

Description

@doughboyks

Maybe I am doing it wrong but I ran through about 1,250 images (out of more than 350,000) and hashed (Perceptual) them. I then stored the hash in the database and I am using the SQL Bit_COUNT to get the hamming distance.

I took a random hash and ran the query and ended up with 2 at a distance of 0 (different hashes) and maybe 50 or more at a distance of 1. The farthest away is a distance of 27.

These are the two images that had different hashes but yet were still 0 away.

Hash: 12d2552c66ddc94b (image: https://10deb7fbfece20ff53da-95da5b03499e7e5b086c55c243f676a1.ssl.cf1.rackcdn.com/a1afc58c6ca9540d057299ec3016d726_l.jpg)

Hash: 12e627593dbc2307 (image: https://10deb7fbfece20ff53da-95da5b03499e7e5b086c55c243f676a1.ssl.cf1.rackcdn.com/3b8a614226a953a8cd9526fca6fe9ba5_l.jpg)

As you can see these are not anywhere close to the same.

SELECT c.*, BIT_COUNT('12e627593dbc2307' ^ i.hash) as hamming_distance
FROM images i
where hash is not null
ORDER BY hamming_distance ASC

Will this not work on "created" images? Maybe the sample size is too small for an accurate comparison. I think I am reading it is converted to an 8x8 image...maybe in my case it should be MUCH larger but I am not sure where to start.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions