Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Batch processing is causing inconsistencies #93

Open
roquelopez opened this issue Jan 7, 2025 · 1 comment
Open

Batch processing is causing inconsistencies #93

roquelopez opened this issue Jan 7, 2025 · 1 comment
Assignees

Comments

@roquelopez
Copy link
Collaborator

roquelopez commented Jan 7, 2025

When we run the ct_learning method (contrastive learning), it produces inconsistent matches for the same column under different scenarios. For example, if we provide the source columns A, B, and C together, the method returns the pair (A, Target1). However, if we provide only the column A as input, it returns (A, Target2).

Upon forcing batch_size=1 during inference time, the results become consistent. This behavior suggests a potential issue with the padding strategy or inference in the batches.

@lyrain2001
Copy link
Collaborator

Hi Roque, I believe you are right. I didn't fully grasp the problem when Eden discussed it with me yesterday, but I now understand that the issue is related to padding. I created a completely new version for Magneto, so this issue slipped my mind. We now have two options:

  1. Update the entire bdi-kit codebase to incorporate Magneto's version.
  2. Rewrite this particular part.
    Could you please let me know which option you prefer? Thank you very much!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants