-
Notifications
You must be signed in to change notification settings - Fork 88
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Roialign fix and half_pixel mode support #3482
base: develop
Are you sure you want to change the base?
Conversation
…ng in a dimension of size 1
…X into stride_ordering_for_mlir
…code. Tests need to be completed, including updating generated onnx test files.
…_half_pixel_verify_test for first roi but fails for second
…k in progress with debug code.
…ive correct result only for ROI in bounds.
@@ -41,7 +41,7 @@ TEST_CASE(roialign_test) | |||
{{"coordinate_transformation_mode", "output_half_pixel"}, | |||
{"spatial_scale", 2.0f}, | |||
{"output_height", 5}, | |||
{"output_width", 5}, | |||
{"output_width", 3}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This particular change was a big deal, btw, since the old code appeared to work fine until I gave it an output_height and output_width that were not the same.
migraphx::shape srois{migraphx::shape::float_type, {2, 4}}; | ||
std::vector<float> rois_data = {1.1, 0.73, 1.7, 1.13, 1.1, 0.73, 2.6, 1.13}; | ||
migraphx::shape sbi{migraphx::shape::int64_type, {2}}; // batch_index | ||
std::vector<int64_t> bi_data = {0, 1}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for good measure, can you add a tests with the following cases:
- Repeated batch indices
- Missing batch indices (ie. not all batch items are computed on)
- Number of ROIs != batch_size
You can probably just create one test case to get all these. Make the input batch_size 3 and the batch_indices something like {1,2,2,1} (and hence the rois shape will be {4,4})
Would be good to have a gpu verify test for this same case too just to be sure gpu impl matches
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added such a case in ref. tests, along with other updates. Some of the new cases fail and I'm now debugging those.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A whitespace in the table needs to be removed.
docs/dev/onnx_operators.rst
Outdated
@@ -697,7 +697,7 @@ Operator Support Matrix | |||
| | | | functions are | | |||
| | | | not enabled | | |||
+--------------------------+-----------+-----------------+------------------------------+ | |||
| RoiAlign | ✅ | FP8, FP16, | | | |||
| RoiAlign | ✅ | FP8, FP16, | | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The extra whitespace at the end of this row is causing the table to be improperly formatted and not appear on the doc page.
| RoiAlign | ✅ | FP8, FP16, | | | |
| RoiAlign | ✅ | FP8, FP16, | | |
…variety of options including pooling mode, transformation type, spatial scale, multiple input channels, non-symmetrical output shape, and roi index list with skips and duplicates. Changed roialign_half_pixel_verify_test to match one of the new ref test cases. Cases using max pooling do not pass test.
test/ref/roialign.cpp
Outdated
@@ -84,114 +84,164 @@ TEST_CASE(roialign_out_of_bound_test) | |||
} | |||
} | |||
|
|||
auto create_program(const std::string& trans_mode = "half_pixel", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Notes on ref tests: added cases with all 4 combinations of trans_mode and pooling_mode and split them apart into separate named cases. The modified create_program()
has a reshaped input with multiple channels, multiple layers, an ROI list that doesn't match 1-1 with the layers, non-unity scale and sampling ratio, one negative data value (any float value is legal) and an ROI that goes out of bounds (also legal). Also, the output height and width are no longer equal, which masked errors in the original implementation.
Need to debug the max pooling cases!
…ues for roialign with max pooling were found to be erroneous.
The licensing check fail now occurring is for a file not related to this PR:
|
I think it would be possible to add a test following the model of the existing tests in test/py/. With luck it wouldn't be very much extra work, half a day or so. @pfultz2 what do you think? The rationale for adding an op test here is that the ROIAlign op is defined in terms of the Onnxruntime implementation so it makes sense to have a specialized test with ORT as the reference. Note my recent comment that I learned the ORT implementation of the max pooling option is buggy and can't be used for a test reference until the fix is released. I don't know whether max pooling is widely used with this op or not. |
Do you want me to go over it with you? I can explain the intent of nearly everything but the indexing is still very difficult to unravel. |
test/py/test_roialign.py
Outdated
# XXXXX 0x562d956ec8f0 (0x562d956ec8f0 + 0 * 2 + channel 0) * 4 * 3 | ||
# XXXXX 0x562d956ec920 (0x562d956ec8f0 + 0 * 2 + channel 1) * 4 * 3 | ||
res = sess.run(['y'], {'x': data, 'rois': roi_data, 'batch_ind': index_data}) | ||
assert np.allclose(mgx_result, res, rtol=1e-05, atol=1e-08, equal_nan=False) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
tolerances are the Numpy defaults
Requesting re-review after a recent change: Added a Python test Repeat of an earlier comment: we can't do a similar check vs. onnxruntime for "max" pooling mode because the ORT implementation of max pooling in ROIAlign has a known bug. |
This build is not recommended to merge 🔴 |
🔴bert_large_uncased_fp16: FAILED: MIGraphX is not within tolerance - check verbose output |
You should capture the onnxruntime results and just create a ref test. |
|
||
|
||
if __name__ == "__main__": | ||
test_roialign() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This file is not even used in our test suite. It should just be removed and a ref test should be used.
Fix bugs in the implementation of ROIAlign operation which were found when attempting to run it with the
half_pixel
coordinate conversion mode, to include more thorough tests. Some bugs are mode-specific and some are not.The ROIAlign operation was first proposed in a paper at https://arxiv.org/abs/1703.06870v3 which introduced the
Mask R-CNN
model. It was a variant of the ROIPool operation which was found to give significantly better accuracy. In the implementations in Torch, Onnxruntime, and Migraphx, ROIPool and ROIAlign are implemented in the same op. with different choices for the mode attribute, withoutput_half_pixel
for ROIPool andhalf_pixel
for ROIAlign; thus, there is no ROIAlign op without fixing thehalf_pixel
mode.Note, by the way, that these same coordinate conversion modes are also attributes of the
Resize
op.MIGraphX uses the Onnxruntime implementation of ROIAlign as its functional specification and should give identical results.
This change is prerequisite for torch-migraphx PR #143 but does not close it.