Skip to content
This repository has been archived by the owner on Mar 14, 2024. It is now read-only.

How to calculate similarity score between two embeddings if affine operator was used? #259

Open
ahakanbaba opened this issue May 17, 2022 · 2 comments

Comments

@ahakanbaba
Copy link

According to the affine operator docs, the affine operator does a linear transformation and a translation, a vector addition and a matrix multiplication respectively.

Say I would like to calculate similarity between two embeddings belonging to two entities of the same entity type.
How should I calculate that if one or more relationships in the graph used affine operator?

According to the scoring function definition here https://torchbiggraph.readthedocs.io/en/latest/scoring.html , do I need to make the same linear transformation and the translation to the right hand side of the two embeddings? Not sure what the "right hand side means" in this context of calculating similarity post training.

According to the I/O docs the parameters for the affine operator are in the model.h5 file.

Should I read the linear transformation and translation parameters from the model.h5 file and apply the operations to one of the embeddings while calculating the similarity score ?

If you could point me to a code sample that does this it would be much appreciated.

Also, if you have some example use cases or intuition of the benefits of such an affine operator, it would be immensely useful.

@yaoruoyangfb
Copy link

If two entities of the same type have multiple relations, then it's less clear what "similarity" means for two embeddings, instead the score just measures how likely that relation happens between those two entities. For example you could use one affine operator for a user "like" another user, a different affine operator for a user "dislike" another user. After the training, you need to apply the corresponding trained operator parameters to the rhs of the embedding, then apply the score function to the lhs embedding and the transformed rhs embedding to predict how likely that relation could happen between two entities. Hope this helps.

@adamlerer
Copy link
Contributor

adamlerer commented May 31, 2022

Adding a little more color here, lets differentiate between two types of similarity you may be interested in.

First order similarity: i.e. model predicts high likelihood of edge between u and v. Extending this to the multi-relation case, if your relations use our affine operator the score for an edge (u,v) would be <u, Av + b>. So you'd need to extract the relation parameters A and b and compute this score function. Beware that depending on your loss function, the score may only correspond to a relative likelihood of an edge vs corrupted edges, e.g. the ranking loss only forces scores for edges to be higher than scores for edges with the src or dst corrupted, and doesn't guarantee anything about the absolute scores.

Second order similarity: model predicts that u and v have a similar distribution of edges (in NLP parlance, u and v "occur in similar contexts"). This is closer to saying "u and v are similar", and sounds like this is what you're actually interested in. Formally, second order similarity requires that for all nodes w, score(u, w) ~= score(v, w). So you should just be comparing u and v directly with no relation operators applied. Note: second-order similarity is only well-defined for nodes of the same entity type.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants