You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I have tried the following code
a=torch.tensor([3.0])
out=float_quantize(a,8,23,"nearest")
The output is printed as -3.0.
This happens only when the rounding is nearest .I am not able to understand why is this happening. Can you please explain me why is this happening, as I am missing something here.
The text was updated successfully, but these errors were encountered:
This is from round_bitwise function in quant_cpu.cpp.
Specifically rand_prob = 1 << (23 - man_bits - 1); when man_bit = 23 then it becomes rand_prob = 1 << -1;
Hi,
I have tried the following code
a=torch.tensor([3.0])
out=float_quantize(a,8,23,"nearest")
The output is printed as -3.0.
This happens only when the rounding is nearest .I am not able to understand why is this happening. Can you please explain me why is this happening, as I am missing something here.
The text was updated successfully, but these errors were encountered: