Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DHVC complexity and number of parameters #4

Open
mwindsp opened this issue Feb 5, 2025 · 5 comments
Open

DHVC complexity and number of parameters #4

mwindsp opened this issue Feb 5, 2025 · 5 comments

Comments

@mwindsp
Copy link

mwindsp commented Feb 5, 2025

Thank you for publishing your work!

I was calculating the complexity in kMAC/pix and the number of parameters and can not reproduce the results in your paper.
I used PyTorch-OpCounter and the DHVC model generated by the function dhvc_base() in dhvc2.py.
The paper states that it is calculated for a 1080p video (thus: frame = torch.rand([1, 3, 1088, 1920]), due to padding).

model = dhvc_base()
frame = [torch.rand([1, 3, 1088, 1920])]
macs, params = thop.profile(model=model, inputs=([frame]))
macs_per_pix = macs / (1080 * 1920)

My results lead to 2236 kMAC/pix and 118M parameters, while the paper states 433 kMAC/pix and 112M parameters.
So, while the number of parameters is at least close to the results in the paper, the gap in kMAC/pix is quite significant.

How have you calculated your results? Are there any changes between the model in dhvc_base() and the one in the paper? Or am I making any errors in the calculation?

Update:
I found out, that the MAC calculation is not always fully consistent. Repeating the full profiling process seams to help. I sometimes got also 894 kMAC/pix and finally most often 447 kMAC/pix. However, in order to work I had to reload the DHVC model using model = dhvc_base().
I don't know whether this behaviour is related to the DHVC source code, the thop library or any specifics regarding my system.

However, I still don't get the exact same results in terms of MACS and number of parameters, so I am still curious whether the source code is identical to the model in the paper.

@congwuyang
Copy link
Collaborator

Thank you for publishing your work!

I was calculating the complexity in kMAC/pix and the number of parameters and can not reproduce the results in your paper. I used PyTorch-OpCounter and the DHVC model generated by the function dhvc_base() in dhvc2.py. The paper states that it is calculated for a 1080p video (thus: frame = torch.rand([1, 3, 1088, 1920]), due to padding).

model = dhvc_base()
frame = [torch.rand([1, 3, 1088, 1920])]
macs, params = thop.profile(model=model, inputs=([frame]))
macs_per_pix = macs / (1080 * 1920)

My results lead to 2236 kMAC/pix and 118M parameters, while the paper states 433 kMAC/pix and 112M parameters. So, while the number of parameters is at least close to the results in the paper, the gap in kMAC/pix is quite significant.

How have you calculated your results? Are there any changes between the model in dhvc_base() and the one in the paper? Or am I making any errors in the calculation?

Update: I found out, that the MAC calculation is not always fully consistent. Repeating the full profiling process seams to help. I sometimes got also 894 kMAC/pix and finally most often 447 kMAC/pix. However, in order to work I had to reload the DHVC model using model = dhvc_base(). I don't know whether this behaviour is related to the DHVC source code, the thop library or any specifics regarding my system.

However, I still don't get the exact same results in terms of MACS and number of parameters, so I am still curious whether the source code is identical to the model in the paper.

Thank you for your question. The code tested in the paper is consistent with the current code. Our test is conducted using 'from ptflops import get_model_complexity_info', so is it a problem with the thop library?

@mwindsp
Copy link
Author

mwindsp commented Feb 10, 2025

I tested it using ptflops, but I still get 447 kMAC/pix and 118M parameters.
I noticed that the TemporalLatentBlock2, which is used in the DHVC model returns the feature $f_t^l$ via additional['z']. According to the paper, it should return $z_t^l$ if get_latent = True.
I assume that this is the reason for the increase, since the channel dimensions of the used context changes from $2\cdot \text{zdim}$ to $2\cdot \text{width}$. And while zdim is [32, 32, 96, 8] for the four Latent Blocks, width is [512, 512, 384, 256].

@congwuyang
Copy link
Collaborator

I tested it using ptflops, but I still get 447 kMAC/pix and 118M parameters. I noticed that the TemporalLatentBlock2, which is used in the DHVC model returns the feature f t l via additional['z']. According to the paper, it should return z t l if get_latent = True. I assume that this is the reason for the increase, since the channel dimensions of the used context changes from 2 ⋅ zdim to 2 ⋅ width . And while zdim is [32, 32, 96, 8] for the four Latent Blocks, width is [512, 512, 384, 256].

Thank you for your question. We apologize for any possible ambiguity in the code. Let's clarify that as shown in Figure.2 (b) of the paper, $z_t^l$ and $f_t^l$ correspond to z and additional['z'] in the code, respectively. The feature returned by the TemporalDatentBlock2 module when get_latent=True is additional['z']. In addition, we have updated the code for DHVC-1.0 and released the code for testing KMACs/pix and Params(M). We tested with ptflops=0.6.9 and obtained KMACs/pix=443.509, params=117.924M. Perhaps due to different versions of the ptflops library, there may be slight deviations.

@mwindsp
Copy link
Author

mwindsp commented Feb 13, 2025

Ok, so do I understand correct, that $Z_{<t}^l$ is $[f_{t-2}^l, f_{t-1}^l]$ and not $[z_{t-2}^l, z_{t-1}^l]$? From the paper I understood that it is the latter and with DHVC 2.0 it switched to $[d_{t-2}^l, d_{t-1}^l]$.
As the paper on DHVC 2.0 states:

Instead of accumulating latent residual variables
across frames in DHVC 1.0, DHVC 2.0 has buffered
and utilized multiscale decoded features

And I interpreted "latent residual variables" as $[z_{t-2}^l, z_{t-1}^l]$.

@congwuyang
Copy link
Collaborator

Ok, so do I understand correct, that Z < t l is [ f t − 2 l , f t − 1 l ] and not [ z t − 2 l , z t − 1 l ] ? From the paper I understood that it is the latter and with DHVC 2.0 it switched to [ d t − 2 l , d t − 1 l ] . As the paper on DHVC 2.0 states:

Instead of accumulating latent residual variables
across frames in DHVC 1.0, DHVC 2.0 has buffered
and utilized multiscale decoded features

And I interpreted "latent residual variables" as [ z t − 2 l , z t − 1 l ] .

Yes, there is no problem with your understanding. There is indeed some ambiguity in the use of some notions. According to Figure 2 (b) in DHVC-1.0, we believe that $f_t^t$ contains the information of $z_t^l$ (after the fuse_features_and_z() function), and can bring better performance.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants