Batch processing is causing inconsistencies #93

roquelopez · 2025-01-07T16:24:48Z

When we run the ct_learning method (contrastive learning), it produces inconsistent matches for the same column under different scenarios. For example, if we provide the source columns A, B, and C together, the method returns the pair (A, Target1). However, if we provide only the column A as input, it returns (A, Target2).

Upon forcing batch_size=1 during inference time, the results become consistent. This behavior suggests a potential issue with the padding strategy or inference in the batches.

The text was updated successfully, but these errors were encountered:

lyrain2001 · 2025-01-07T17:27:10Z

Hi Roque, I believe you are right. I didn't fully grasp the problem when Eden discussed it with me yesterday, but I now understand that the issue is related to padding. I created a completely new version for Magneto, so this issue slipped my mind. We now have two options:

Update the entire bdi-kit codebase to incorporate Magneto's version.
Rewrite this particular part.
Could you please let me know which option you prefer? Thank you very much!

roquelopez assigned EdenWuyifan and lyrain2001 Jan 7, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Batch processing is causing inconsistencies #93

Batch processing is causing inconsistencies #93

roquelopez commented Jan 7, 2025 •

edited

Loading

lyrain2001 commented Jan 7, 2025

Batch processing is causing inconsistencies #93

Batch processing is causing inconsistencies #93

Comments

roquelopez commented Jan 7, 2025 • edited Loading

lyrain2001 commented Jan 7, 2025

roquelopez commented Jan 7, 2025 •

edited

Loading