You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I noticed a few points in the figure in the appendix that I find a bit confusing, and here are two questions:
Since the 'Training Long Language Model' step uses a context length of only 224k, why does the model still show high accuracy even when the context length reaches 512k?
I observed that when the distractor is set to 5, the distribution of the NIAH results appears unusual. It seems that the context length of 224k performs better than the context length of 64k, which is quite different from what is typically seen in NIAH results for other models.
Looking forward to your insights on these points
Best regards
The text was updated successfully, but these errors were encountered:
Congrats for the insightful paper!
I noticed a few points in the figure in the appendix that I find a bit confusing, and here are two questions:
Since the 'Training Long Language Model' step uses a context length of only 224k, why does the model still show high accuracy even when the context length reaches 512k?
I observed that when the distractor is set to 5, the distribution of the NIAH results appears unusual. It seems that the context length of 224k performs better than the context length of 64k, which is quite different from what is typically seen in NIAH results for other models.
Looking forward to your insights on these points
Best regards
The text was updated successfully, but these errors were encountered: