Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tests fail with nvfortran compiler #531

Open
timofeymukha opened this issue Apr 18, 2023 · 12 comments
Open

Tests fail with nvfortran compiler #531

timofeymukha opened this issue Apr 18, 2023 · 12 comments

Comments

@timofeymukha
Copy link

Hello! We are trying to use json-fortran with the nvfortran compiler. Unfortunately, the test suite reports failure of several tests

The following tests FAILED:
	  3 - jf_test_01 (SEGFAULT)
	  5 - jf_test_03 (SEGFAULT)
	  9 - jf_test_07 (SEGFAULT)
	 10 - jf_test_08 (SEGFAULT)
	 12 - jf_test_10 (Failed)
	 16 - jf_test_14 (SEGFAULT)
	 18 - jf_test_16 (SEGFAULT)
	 25 - jf_test_23 (SEGFAULT)
	 31 - jf_test_29 (SEGFAULT)
	 32 - jf_test_30 (Failed)
	 47 - jf_test_45 (SEGFAULT)
	 54 - regression-test2.json (Failed)

Perhaps the pattern in which tests fail give a hint of where to look for the issue? The full failing CI run is available here. Any help is highly appreciated!

@jacobwilliams
Copy link
Owner

Interesting. I've never used the nvfortran compiler. I can try to take a look when I get a chance.

@timofeymukha
Copy link
Author

Thanks, it would be really appreciated. But chances are that it is the compiler's fault.

@EmilyBourne
Copy link
Contributor

EmilyBourne commented Jan 31, 2024

I would also like to use the NVHPC compiler.

I am testing with version 23.3. It raises a warning during compilation:

NVFORTRAN-W-0194-INTENT(IN) argument cannot be defined - p (/home/json-fortran/src/json_value_module.F90: 1935)
  0 inform,   1 warnings,   0 severes, 0 fatal for json_value_rename

This is coming from this code:

subroutine json_value_rename(json,p,name)
implicit none
class(json_core),intent(inout) :: json
type(json_value),pointer,intent(in) :: p
character(kind=CK,len=*),intent(in) :: name !! new variable name
if (json%trailing_spaces_significant) then
p%name = name
else
p%name = trim(name)
end if
end subroutine json_value_rename

The compiler seems to be correct. The argument p is marked as intent(in) so it should not be modified. I think that the correct solution is to change the intent to inout. This then has repercussions on multiple functions.

However fixing this issue doesn't seem to fix the tests (although I only managed to run 7 of the 56 tests due to the cmake issue mentioned in #539). The (almost identical) results are:

	  3 - jf_test_01 (SEGFAULT)
	  5 - jf_test_03 (SEGFAULT)
	  9 - jf_test_07 (SEGFAULT)
	 10 - jf_test_08 (SEGFAULT)
	 12 - jf_test_10 (Failed)
	 16 - jf_test_14 (SEGFAULT)
	 18 - jf_test_16 (SEGFAULT)
	 25 - jf_test_23 (Failed)
	 31 - jf_test_29 (SEGFAULT)
	 32 - jf_test_30 (Failed)
	 47 - jf_test_45 (SEGFAULT)
	 54 - regression-test2.json (Failed)

@jacobwilliams
Copy link
Owner

Interesting. However, intent(in) for pointer arguments doesn't mean their attributes can't be changed, it only means we can't change what the pointer is pointing too. So I think this code is legal.

It could be a compile bug?

@EmilyBourne
Copy link
Contributor

Interesting. However, intent(in) for pointer arguments doesn't mean their attributes can't be changed, it only means we can't change what the pointer is pointing too. So I think this code is legal.

It could be a compile bug?

It does indeed seem to be a compile bug. It has already been reported and fixed here:
https://forums.developer.nvidia.com/t/incorrect-warning-for-intent-in-pointer-to-type-in-nvfortran-22-1/201471

As far as I can tell this particular problem has minimal impact as it is only a warning anyway.

Given this has been fixed I tried a more recent compiler (23.11 second-to-most-recent). The results are:

	  3 - jf_test_01 (SEGFAULT)
	  5 - jf_test_03 (SEGFAULT)
	 10 - jf_test_08 (SEGFAULT)
	 12 - jf_test_10 (Failed)
	 16 - jf_test_14 (SEGFAULT)
	 18 - jf_test_16 (SEGFAULT)
	 25 - jf_test_23 (Failed)
	 31 - jf_test_29 (SEGFAULT)
	 32 - jf_test_30 (Failed)
	 47 - jf_test_45 (SEGFAULT)

So good news! jf_test_07 no longer segfaults! But that still leaves several problematic tests.

I looked into the first segfault. The behaviour is quite strange. I am struggling to put together a reproducer but it seems that when json_value_module::json_traverse::traverse calls json_value_module::json_count it fails to pass through the pointer correctly. In json_value_module::json_count p is a nullptr. The easiest/cleanest fix that I have found for this is to make json_value_module::json_traverse recursive directly and avoid the function inside the function. Do you see any reason not to do this? If not then I can fork and open a PR.

@EmilyBourne
Copy link
Contributor

EmilyBourne commented Feb 1, 2024

Interestingly jf_test_03 also fails in a subroutine due to problems passing arguments (in json_value_module::json_check_all_for_duplicate_keys::duplicate_key_func)

@jacobwilliams
Copy link
Owner

I'm not sure why I did json_traverse like that. I wonder if there was some reason... Maybe we can try making it recursive and eliminating the function within a function.

@EmilyBourne
Copy link
Contributor

I'm not sure why I did json_traverse like that. I wonder if there was some reason... Maybe we can try making it recursive and eliminating the function within a function.

I can open a PR for this

@jacobwilliams
Copy link
Owner

Current status on this is at: #549

It doesn't seem like the proposed solution is going to work. Probably a workaround could be created but I'm not sure it can be done without changing the api in some backwards-incompatible way, which I don't really want to do for a compiler bug. Can we report this bug to the nvfortran folks?

@EmilyBourne
Copy link
Contributor

Can we report this bug to the nvfortran folks?

I have already reported the whole repo to the Nvidia guys.

Here is an update for version 23.4 of Nvfortran:

	  5 - jf_test_03 (SEGFAULT)
	  9 - jf_test_07 (SEGFAULT)
	 10 - jf_test_08 (SEGFAULT)
	 12 - jf_test_10 (SEGFAULT)
	 18 - jf_test_16 (SEGFAULT)
	 25 - jf_test_23 (Failed)
	 31 - jf_test_29 (SEGFAULT)
	 32 - jf_test_30 (Failed)

jf_test_01 is now fixed as is jf_test_14 and jf_test_45, however jf_test_07 which was working previously now appears to be broken

@jacobwilliams jacobwilliams mentioned this issue May 27, 2024
@EmilyBourne
Copy link
Contributor

Here is an update for version 23.4 of Nvfortran:

The same tests (but no extras) are failing with version 24.5

@JorgeG94
Copy link

Here is an update for version 23.4 of Nvfortran:

The same tests (but no extras) are failing with version 24.5

I am interested in building also with nvhpc so I might take a squiz over the weekend

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants