Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can nvdla_small spec be modified with secondary SRAMIF and batch mode for NN_L0_1_small_fbuf? #265

Open
nookfoo opened this issue Feb 9, 2019 · 5 comments

Comments

@nookfoo
Copy link

nookfoo commented Feb 9, 2019

Hello there,

I am running nvdla_small on ZCU102 FPGA and am curioues to find out if enabling batch mode or the secondary memory interface sramif will improve performance. I intend to try it out anyway, but wanted to ask beforehand, if this is even sensible since NN_L0_1_small_fbuf cannot be modified due to unreleased compiler.

Can NN_L0_1_small_fbuf utilize batch mode or secondary sramif enabled in nvdla_small spec?

@JSnobody
Copy link

@nookfoo Have you ever run nvdla_small on ZCU102 FPGA successfully?Maybe we can talk together.

@ghost
Copy link

ghost commented Feb 13, 2019

@nookfoo Let me know about your findings about batch mode. From my previous observations enabling it hardly increases the resource usage in FPGA, which was quite confusing.

Enabling sramif would require BDMA which is kind of separate engine. My guess is that NN_L0_1_small_fbuf would need to physically contain instructions (operations) handling BDMA.

I don't know if sramif will work with current nv_small spec, but at least you can measure the actual DDR performance with Xilinx' AXI Performance Monitor IP core. Nice thing about this core is that it measures also peak wait state cycles on the AXI4 bus.

With DDR4 2400, 64-bit on the PS side it is hardly possible to use entire bandwidth from PL side (with single AXI4 bus). However you may observe quite decent wait cycles. For example FPGA running at 250 MHz, one channel write (AXI4 Traffic Generator IP Core), gave us following performance:

  • Average bandwidth: 3.711 GB/s (95% of theoretical bandwidth for 128-bit AXI4 bus)
  • Peak latency (wait state): 1507 cycles - I guess this is where you would like to use PRIMARY_MEMIF_LATENCY_1024 in place of PRIMARY_MEMIF_LATENCY_64.

Alas ZU9EG does not have URAM block, which would be very handful for efficient sramif implementation.

@ghost
Copy link

ghost commented Feb 25, 2019

@peterzh2018888 Have you ever heard about "One bug, one bug report?" rule? No matter if it's a bug, or problem in your setup - in last two days I've got roughly 17 notifications from you describing the same or similar problem. It starts to be indistinguishable from spamming...

Don't understand me wrong... People are reading your posts. For example I am subscribed to both sw and hw projects, and I see everything what's happening here. But this is not a big community, so it is unlikely to get response immediately (and hopefully the maintaners are busy with releasing the compiler 👍).

You may increase your chances by describing the problem in detail, including the environment setup (compiler version, compiler flags, ...) and steps to reproduce the problem. Few people already worked on Zynq. And as I recall some already had problems with DRM and interrupts so maybe they will be kind enough to compare your workflow with theirs... By spamming them you only increase the level of their annoyance :)

@shgangchen
Copy link

hi @nookfoo , I received your update about the sythesis problem, however, I can't find it here. Is that not a problem anymore?

@huangwei858
Copy link

@peterzh2018888 Have you ever heard about "One bug, one bug report?" rule? No matter if it's a bug, or problem in your setup - in last two days I've got roughly 17 notifications from you describing the same or similar problem. It starts to be indistinguishable from spamming...

Don't understand me wrong... People are reading your posts. For example I am subscribed to both sw and hw projects, and I see everything what's happening here. But this is not a big community, so it is unlikely to get response immediately (and hopefully the maintaners are busy with releasing the compiler +1).

You may increase your chances by describing the problem in detail, including the environment setup (compiler version, compiler flags, ...) and steps to reproduce the problem. Few people already worked on Zynq. And as I recall some already had problems with DRM and interrupts so maybe they will be kind enough to compare your workflow with theirs... By spamming them you only increase the level of their annoyance :)

Hey, I've send some questions in another issues about how to config BRAM replace RAMDP/RAMPDP, could you share your experience in logical ram wrapper instead of simulation ram

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants