Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

notes on hashing #449

Open
aatmanvaidya opened this issue Nov 30, 2024 · 0 comments
Open

notes on hashing #449

aatmanvaidya opened this issue Nov 30, 2024 · 0 comments

Comments

@aatmanvaidya
Copy link
Collaborator

I found an interesting paper talking about perceptual hashing in image/video and audio - paper link. We were exploring some of this a while back and thought it is relevant to feluda.

Researchers were studying virality of Information Diffusion on WhatsApp - paper link. To identify and analyze the propagation of messages in the dataset, they use hashing techniques to track multiple instances of the same message (allowing for minor alterations), in a privacy-preserving manner.

  • For images and videos, they used PDQ hashing
    • we could potentially look into the hashing method for video (something which we could not find last time) and think of implementing it for DAU.
    • Inspired by pHash -- researcher's at Facebook developed PDQ and TMK+PDQF hashing techniques for image and video (open-source). Official GitHub link - https://github.com/facebook/ThreatExchange
      • some other implementations - 1 - 2
    • instead of doing an exact match of the hash value to find duplicates, Hamming Distance is also popularly used to compare 2 hash values.
    • Apple also has developed something called Neural hash for detecting known CSAM images stored on Apple's Cloud.
  • For text, they applied Locality Sensitive Hashing

I thought these could be some minor (and computationally cheap) improvements of the existing pHash operator in Feluda.

These are just some notes (no action needed as of now)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant