Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Using static for LUTs #354

Open
wants to merge 2 commits into
base: master
Choose a base branch
from
Open
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
12 changes: 6 additions & 6 deletions TrackletAlgorithm/MatchEngineUnit_parameters.h
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@

template<int kNBitsBuffer>
static const ap_uint<(1 << (2 * kNBitsBuffer))> nearFullUnit() {
ap_uint<(1 << (2 * kNBitsBuffer))> lut;
static ap_uint<(1 << (2 * kNBitsBuffer))> lut;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bryates For this to work, I think you also need logic to prevent the reinitialization of the LUT. For example:

template<int kNBitsBuffer>
static const ap_uint<(1 << (2 * kNBitsBuffer))> nearFullUnit() {
  static ap_uint<(1 << (2 * kNBitsBuffer))> lut;
  static bool initialized = false;
  if (!initialized) {
    initialized = true;
    for(int i = 0; i < (1 << (2 * kNBitsBuffer)); ++i) {
#pragma HLS unroll
      ap_uint<kNBitsBuffer> wptr, rptr;
      ap_uint<2 * kNBitsBuffer> address(i);
      (rptr,wptr) = address;
      auto wptr1 = wptr+1;
      auto wptr2 = wptr+2;
      bool result = wptr1==rptr || wptr2==rptr;
      lut[i] = result;
    }
  }
  return lut;
}

Of course, this applies to the other functions in this file.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bryates You might also remove the static const on the return types of these functions, since they don't really have any effect.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bryates For this to work, I think you also need logic to prevent the reinitialization of the LUT. For example:

template<int kNBitsBuffer>
static const ap_uint<(1 << (2 * kNBitsBuffer))> nearFullUnit() {
  static ap_uint<(1 << (2 * kNBitsBuffer))> lut;
  static bool initialized = false;
  if (!initialized) {
    initialized = true;
    for(int i = 0; i < (1 << (2 * kNBitsBuffer)); ++i) {
#pragma HLS unroll
      ap_uint<kNBitsBuffer> wptr, rptr;
      ap_uint<2 * kNBitsBuffer> address(i);
      (rptr,wptr) = address;
      auto wptr1 = wptr+1;
      auto wptr2 = wptr+2;
      bool result = wptr1==rptr || wptr2==rptr;
      lut[i] = result;
    }
  }
  return lut;
}

Of course, this applies to the other functions in this file.

The one issue with this solution is that it relies on "static bool". Non const statics are not thread safe, so this would not be useable from the Future CMSSW SW (unless we used pragmas to provide a thread protection for this static variable when run inside CMSSW)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a very good point. @aehart what do you think? I guess I can't say for sure if this bool did anything, but the resource utilization and timing didn't change much. Maybe @tomalin can test this in CMSSW with and without the bool to see if the CPU usage still blows up?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The bool is key to this solution. Without it, every call to these functions will cause the LUTs to be reinitialized. The static keyword alone doesn't magically change that.

In any case, the LUT is also a non-const static. I didn't realize this was a constraint. I'll need to think about an alternative solution.

for(int i = 0; i < (1 << (2 * kNBitsBuffer)); ++i) {
#pragma HLS unroll
ap_uint<kNBitsBuffer> wptr, rptr;
Expand All @@ -30,7 +30,7 @@ static bool nearFullUnitBool(ap_uint<kNBitsBuffer> rptr, ap_uint<kNBitsBuffer> w

template<int kNBitsBuffer>
static const ap_uint<(1 << (2 * kNBitsBuffer))> nearFull3Unit() {
ap_uint<(1 << (2 * kNBitsBuffer))> lut;
static ap_uint<(1 << (2 * kNBitsBuffer))> lut;
for(int i = 0; i < (1 << (2 * kNBitsBuffer)); ++i) {
#pragma HLS unroll
ap_uint<kNBitsBuffer> wptr, rptr;
Expand All @@ -55,7 +55,7 @@ static bool nearFull3UnitBool(ap_uint<kNBitsBuffer> rptr, ap_uint<kNBitsBuffer>

template<int kNBitsBuffer>
static const ap_uint<(1 << (2 * kNBitsBuffer))> nearFull4Unit() {
ap_uint<(1 << (2 * kNBitsBuffer))> lut;
static ap_uint<(1 << (2 * kNBitsBuffer))> lut;
for(int i = 0; i < (1 << (2 * kNBitsBuffer)); ++i) {
#pragma HLS unroll
ap_uint<kNBitsBuffer> wptr, rptr;
Expand All @@ -82,7 +82,7 @@ static bool nearFull4UnitBool(ap_uint<kNBitsBuffer> rptr, ap_uint<kNBitsBuffer>

template<int kNBitsBuffer>
static const ap_uint<(1 << (2 * kNBitsBuffer))> emptyUnit() {
ap_uint<(1 << (2 * kNBitsBuffer))> lut;
static ap_uint<(1 << (2 * kNBitsBuffer))> lut;
for(int i = 0; i < (1 << (2 * kNBitsBuffer)); ++i) {
#pragma HLS unroll
ap_uint<kNBitsBuffer> wptr, rptr;
Expand All @@ -104,7 +104,7 @@ static bool emptyUnitBool(ap_uint<kNBitsBuffer> wptr, ap_uint<kNBitsBuffer> rptr

template<int kNBitsBuffer>
static const ap_uint<(1 << (2 * kNBitsBuffer))> geq() {
ap_uint<(1 << (2 * kNBitsBuffer))> lut;
static ap_uint<(1 << (2 * kNBitsBuffer))> lut;
for(int i = 0; i < (1 << (2 * kNBitsBuffer)); ++i) {
#pragma HLS unroll
ap_uint<kNBitsBuffer> istub, nstubs;
Expand All @@ -118,7 +118,7 @@ static const ap_uint<(1 << (2 * kNBitsBuffer))> geq() {

template<int kNBitsBuffer>
static const ap_uint<(1 << kNBitsBuffer)> nextUnit() {
ap_uint<(1 << kNBitsBuffer)> lut;
static ap_uint<(1 << kNBitsBuffer)> lut;
for(int i = 0; i < (1 << kNBitsBuffer); ++i) {
#pragma HLS unroll
ap_uint<kNBitsBuffer> ptr(i);
Expand Down
Loading