DHVC complexity and number of parameters #4

mwindsp · 2025-02-05T15:44:31Z

Thank you for publishing your work!

I was calculating the complexity in kMAC/pix and the number of parameters and can not reproduce the results in your paper.
I used PyTorch-OpCounter and the DHVC model generated by the function dhvc_base() in dhvc2.py.
The paper states that it is calculated for a 1080p video (thus: frame = torch.rand([1, 3, 1088, 1920]), due to padding).

model = dhvc_base()
frame = [torch.rand([1, 3, 1088, 1920])]
macs, params = thop.profile(model=model, inputs=([frame]))
macs_per_pix = macs / (1080 * 1920)

My results lead to 2236 kMAC/pix and 118M parameters, while the paper states 433 kMAC/pix and 112M parameters.
So, while the number of parameters is at least close to the results in the paper, the gap in kMAC/pix is quite significant.

How have you calculated your results? Are there any changes between the model in dhvc_base() and the one in the paper? Or am I making any errors in the calculation?

Update:
I found out, that the MAC calculation is not always fully consistent. Repeating the full profiling process seams to help. I sometimes got also 894 kMAC/pix and finally most often 447 kMAC/pix. However, in order to work I had to reload the DHVC model using model = dhvc_base().
I don't know whether this behaviour is related to the DHVC source code, the thop library or any specifics regarding my system.

However, I still don't get the exact same results in terms of MACS and number of parameters, so I am still curious whether the source code is identical to the model in the paper.

The text was updated successfully, but these errors were encountered:

congwuyang · 2025-02-08T02:21:40Z

Thank you for publishing your work!

I was calculating the complexity in kMAC/pix and the number of parameters and can not reproduce the results in your paper. I used PyTorch-OpCounter and the DHVC model generated by the function dhvc_base() in dhvc2.py. The paper states that it is calculated for a 1080p video (thus: frame = torch.rand([1, 3, 1088, 1920]), due to padding).
model = dhvc_base()
frame = [torch.rand([1, 3, 1088, 1920])]
macs, params = thop.profile(model=model, inputs=([frame]))
macs_per_pix = macs / (1080 * 1920)
My results lead to 2236 kMAC/pix and 118M parameters, while the paper states 433 kMAC/pix and 112M parameters. So, while the number of parameters is at least close to the results in the paper, the gap in kMAC/pix is quite significant.

How have you calculated your results? Are there any changes between the model in dhvc_base() and the one in the paper? Or am I making any errors in the calculation?

Update: I found out, that the MAC calculation is not always fully consistent. Repeating the full profiling process seams to help. I sometimes got also 894 kMAC/pix and finally most often 447 kMAC/pix. However, in order to work I had to reload the DHVC model using model = dhvc_base(). I don't know whether this behaviour is related to the DHVC source code, the thop library or any specifics regarding my system.

However, I still don't get the exact same results in terms of MACS and number of parameters, so I am still curious whether the source code is identical to the model in the paper.

Thank you for your question. The code tested in the paper is consistent with the current code. Our test is conducted using 'from ptflops import get_model_complexity_info', so is it a problem with the thop library?

mwindsp · 2025-02-10T06:30:19Z

I tested it using ptflops, but I still get 447 kMAC/pix and 118M parameters.
I noticed that the TemporalLatentBlock2, which is used in the DHVC model returns the feature $f_t^l$ via additional['z']. According to the paper, it should return $z_t^l$ if get_latent = True.
I assume that this is the reason for the increase, since the channel dimensions of the used context changes from $2\cdot \text{zdim}$ to $2\cdot \text{width}$. And while zdim is [32, 32, 96, 8] for the four Latent Blocks, width is [512, 512, 384, 256].

congwuyang · 2025-02-13T03:13:30Z

I tested it using ptflops, but I still get 447 kMAC/pix and 118M parameters. I noticed that the TemporalLatentBlock2, which is used in the DHVC model returns the feature f t l via additional['z']. According to the paper, it should return z t l if get_latent = True. I assume that this is the reason for the increase, since the channel dimensions of the used context changes from 2 ⋅ zdim to 2 ⋅ width . And while zdim is [32, 32, 96, 8] for the four Latent Blocks, width is [512, 512, 384, 256].

Thank you for your question. We apologize for any possible ambiguity in the code. Let's clarify that as shown in Figure.2 (b) of the paper, $z_t^l$ and $f_t^l$ correspond to z and additional['z'] in the code, respectively. The feature returned by the TemporalDatentBlock2 module when get_latent=True is additional['z']. In addition, we have updated the code for DHVC-1.0 and released the code for testing KMACs/pix and Params(M). We tested with ptflops=0.6.9 and obtained KMACs/pix=443.509, params=117.924M. Perhaps due to different versions of the ptflops library, there may be slight deviations.

mwindsp · 2025-02-13T09:19:42Z

Ok, so do I understand correct, that $Z_{<t}^l$ is $[f_{t-2}^l, f_{t-1}^l]$ and not $[z_{t-2}^l, z_{t-1}^l]$? From the paper I understood that it is the latter and with DHVC 2.0 it switched to $[d_{t-2}^l, d_{t-1}^l]$.
As the paper on DHVC 2.0 states:

Instead of accumulating latent residual variables
across frames in DHVC 1.0, DHVC 2.0 has buffered
and utilized multiscale decoded features

And I interpreted "latent residual variables" as $[z_{t-2}^l, z_{t-1}^l]$.

congwuyang · 2025-02-13T09:57:26Z

Ok, so do I understand correct, that Z < t l is [ f t − 2 l , f t − 1 l ] and not [ z t − 2 l , z t − 1 l ] ? From the paper I understood that it is the latter and with DHVC 2.0 it switched to [ d t − 2 l , d t − 1 l ] . As the paper on DHVC 2.0 states:

Instead of accumulating latent residual variables
across frames in DHVC 1.0, DHVC 2.0 has buffered
and utilized multiscale decoded features

And I interpreted "latent residual variables" as [ z t − 2 l , z t − 1 l ] .

Yes, there is no problem with your understanding. There is indeed some ambiguity in the use of some notions. According to Figure 2 (b) in DHVC-1.0, we believe that $f_t^t$ contains the information of $z_t^l$ (after the fuse_features_and_z() function), and can bring better performance.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DHVC complexity and number of parameters #4

DHVC complexity and number of parameters #4

mwindsp commented Feb 5, 2025 •

edited

Loading

congwuyang commented Feb 8, 2025

mwindsp commented Feb 10, 2025 •

edited

Loading

congwuyang commented Feb 13, 2025

mwindsp commented Feb 13, 2025

congwuyang commented Feb 13, 2025

DHVC complexity and number of parameters #4

DHVC complexity and number of parameters #4

Comments

mwindsp commented Feb 5, 2025 • edited Loading

congwuyang commented Feb 8, 2025

mwindsp commented Feb 10, 2025 • edited Loading

congwuyang commented Feb 13, 2025

mwindsp commented Feb 13, 2025

congwuyang commented Feb 13, 2025

mwindsp commented Feb 5, 2025 •

edited

Loading

mwindsp commented Feb 10, 2025 •

edited

Loading