Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[onert] Series run of unittest fail on ubuntu 24.04 arm device #14391

Closed
hseok-oh opened this issue Dec 2, 2024 · 14 comments
Closed

[onert] Series run of unittest fail on ubuntu 24.04 arm device #14391

hseok-oh opened this issue Dec 2, 2024 · 14 comments
Labels
area/onert ONE runtime type/issue There is something strange

Comments

@hseok-oh
Copy link
Contributor

hseok-oh commented Dec 2, 2024

Run all unittests (release build) on xu4 ubuntu 24.04

$ ./Product/armv7l-linux.release/out/test/onert-test unittest

...

[       OK ] GenModelTest.neg_OneOp_FullyConnected_NoBias (1 ms)
[ RUN      ] GenModelTest.OneOp_Gather_Q4_0
/home/nfs/git/ONE/Product/armv7l-linux.release/out/test/command/unittest: line 78:  1563 Illegal instruction     $TEST_BIN $(get_gtest_option)
/home/nfs/git/ONE/Product/armv7l-linux.release/out/unittest/nnfw_api_gtest failed... return code: 132
============================================
Finishing set 6: /home/nfs/git/ONE/Product/armv7l-linux.release/out/unittest/nnfw_api_gtest...
============================================

But nnfw_api_gtest run passed

$ ./Product/out/unittest/nnfw_api_gtest

...

[       OK ] GenModelTest/WhileWrongSubgraphIndex.neg_Test/4 (0 ms)
[----------] 5 tests from GenModelTest/WhileWrongSubgraphIndex (3 ms total)

[----------] Global test environment tear-down
[==========] 650 tests from 35 test suites ran. (12359 ms total)
[  PASSED  ] 650 tests.

This issue is found on release build only (not on debug build)

@hseok-oh hseok-oh added area/onert ONE runtime type/issue There is something strange labels Dec 2, 2024
@glistening
Copy link
Contributor

glistening commented Dec 3, 2024

On my RPI4, ubuntu mate 22.04 (armv7l):

$ valgrind Product/armv7l-linux.release/out/unittest/nnfw_api_gtest \
--gtest_filter="GenModelTest.neg*"

...
[       OK ] GenModelTest.neg_OneOp_FullyConnected_NoBias (189 ms)
[ RUN      ] GenModelTest.neg_OneOp_Gather_Q4_0_InvalidOutType
disInstr(thumb): unhandled instruction: 0xEEB3 0x6A46

==14108== Process terminating with default action of signal 4 (SIGILL)
==14108==  Illegal opcode at address 0x36B6DB
==14108==    at 0x36B6DA: quantize_row_q4_0_ref (ggml-quants.c:721)
==14108==    by 0x36CE0F: quantize_q4_0 (ggml-quants.c:3182)
==14108==    by 0x36B5AF: ggml_quantize_chunk (ggml.c:20756)
==14108==    by 0x154803: quantData(std::vector<float, std::allocator<float> > const&, circle::TensorType) (common.cc:56)
==14108==    by 0x239C7F: GenModelTest_neg_OneOp_Gather_Q4_0_InvalidOutType_Test::TestBody() (Gather.test.cc:62)
==14108==    by 0x368A67: HandleSehExceptionsInMethodIfSupported<testing::Test, void> (gtest.cc:2599)
==14108==    by 0x368A67: void testing::internal::HandleExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::*)(), char const*) (gtest.cc:2635)
==14108==    by 0x35D21B: testing::Test::Run() [clone .part.0] (gtest.cc:2674)
==14108==    by 0x35D6A1: Run (gtest.cc:2665)
==14108==    by 0x35D6A1: testing::TestInfo::Run() (gtest.cc:2853)
==14108==    by 0x35DDCD: testing::TestSuite::Run() [clone .part.0] (gtest.cc:3012)
==14108==    by 0x35ED7B: Run (gtest.cc:2986)
==14108==    by 0x35ED7B: testing::internal::UnitTestImpl::RunAllTests() (gtest.cc:5870)
==14108==    by 0x35D7DB: HandleSehExceptionsInMethodIfSupported<testing::internal::UnitTestImpl, bool> (gtest.cc:2599)
==14108==    by 0x35D7DB: HandleExceptionsInMethodIfSupported<testing::internal::UnitTestImpl, bool> (gtest.cc:2635)
==14108==    by 0x35D7DB: testing::UnitTest::Run() (gtest.cc:5444)
==14108==    by 0x12C21B: RUN_ALL_TESTS (gtest.h:2293)
==14108==    by 0x12C21B: main (main.cc:38)

@glistening
Copy link
Contributor

glistening commented Dec 3, 2024

One more:

$ valgrind Product/armv7l-linux.release/out/unittest/nnfw_api_gtest \
--gtest_filter="GenModelTrain.TestModel_Trainability_full_training_enabled"

[==========] Running 1 test from 1 test suite.
[----------] Global test environment set-up.
[----------] 1 test from GenModelTrain
[ RUN      ] GenModelTrain.TestModel_Trainability_full_training_enabled
disInstr(thumb): unhandled instruction: 0xEF41 0x0CBD
==14444== valgrind: Unrecognised instruction at address 0x5db4a7b.
==14444==    at 0x5DB4A7A: vfma_f32 (arm_neon.h:1723)
==14444==    by 0x5DB4A7A: pmadd<__vector(2) float> (PacketMath.h:1045)
==14444==    by 0x5DB4A7A: pmadd (ConjHelper.h:98)
==14444==    by 0x5DB4A7A: madd<__vector(2) float, __vector(2) float, __vector(2) float, Eigen::internal::FixedInt<0> > (GeneralBlockPanelKernel.h:529)
==14444==    by 0x5DB4A7A: madd<__vector(2) float, __vector(2) float, Eigen::internal::FixedInt<0> > (GeneralBlockPanelKernel.h:538)
==14444==    by 0x5DB4A7A: peeled_kc_onestep (GeneralBlockPanelKernel.h:1212)
==14444==    by 0x5DB4A7A: operator() (GeneralBlockPanelKernel.h:1408)
==14444==    by 0x5DB4A7A: Eigen::internal::gebp_kernel<float, float, int, Eigen::internal::blas_data_mapper<float, int, 0, 0, 1>, 12, 4, false, false>::operator()(Eigen::internal::blas_data_mapper<float, int, 0, 0, 1> const&, float const*, float const*, int, int, int, float, int, int, int, int) [clone .constprop.0] (GeneralBlockPanelKernel.h:2415)
==14444==    by 0x5DED1F3: void Eigen::TensorContractionEvaluatorBase<Eigen::TensorEvaluator<Eigen::TensorContractionOp<std::array<Eigen::IndexPair<int>, 1u> const, Eigen::TensorReshapingOp<Eigen::DSizes<int, 2> const, Eigen::TensorImagePatchOp<-1, -1, Eigen::TensorMap<Eigen::Tensor<float const, 4, 1, int>, 16, Eigen::MakePointer> const> const> const, Eigen::TensorReshapingOp<Eigen::DSizes<int, 2> const, Eigen::TensorMap<Eigen::Tensor<float const, 4, 1, int>, 16, Eigen::MakePointer> const> const, Eigen::NoOpOutputKernel const> const, Eigen::ThreadPoolDevice> >::evalGemmPartial<true, true, false, 0, true>(float*, int, int, int) const (TensorContraction.h:900)
==14444==    by 0x5E15D4B: evalGemm<true, true, false, 0> (TensorContraction.h:787)
==14444==    by 0x5E15D4B: evalProductSequential<true, true, false, 0> (TensorContraction.h:724)
==14444==    by 0x5E15D4B: void Eigen::TensorEvaluator<Eigen::TensorContractionOp<std::array<Eigen::IndexPair<int>, 1u> const, Eigen::TensorReshapingOp<Eigen::DSizes<int, 2> const, Eigen::TensorImagePatchOp<-1, -1, Eigen::TensorMap<Eigen::Tensor<float const, 4, 1, int>, 16, Eigen::MakePointer> const> const> const, Eigen::TensorReshapingOp<Eigen::DSizes<int, 2> const, Eigen::TensorMap<Eigen::Tensor<float const, 4, 1, int>, 16, Eigen::MakePointer> const> const, Eigen::NoOpOutputKernel const> const, Eigen::ThreadPoolDevice>::evalProductImpl<Eigen::TensorEvaluator<Eigen::TensorContractionOp<std::array<Eigen::IndexPair<int>, 1u> const, Eigen::TensorReshapingOp<Eigen::DSizes<int, 2> const, Eigen::TensorImagePatchOp<-1, -1, Eigen::TensorMap<Eigen::Tensor<float const, 4, 1, int>, 16, Eigen::MakePointer> const> const> const, Eigen::TensorReshapingOp<Eigen::DSizes<int, 2> const, Eigen::TensorMap<Eigen::Tensor<float const, 4, 1, int>, 16, Eigen::MakePointer> const> const, Eigen::NoOpOutputKernel const> const, Eigen::ThreadPoolDevice>::NoCallback, 0>(float*, Eigen::TensorEvaluator<Eigen::TensorContractionOp<std::array<Eigen::IndexPair<int>, 1u> const, Eigen::TensorReshapingOp<Eigen::DSizes<int, 2> const, Eigen::TensorImagePatchOp<-1, -1, Eigen::TensorMap<Eigen::Tensor<float const, 4, 1, int>, 16, Eigen::MakePointer> const> const> const, Eigen::TensorReshapingOp<Eigen::DSizes<int, 2> const, Eigen::TensorMap<Eigen::Tensor<float const, 4, 1, int>, 16, Eigen::MakePointer> const> const, Eigen::NoOpOutputKernel const> const, Eigen::ThreadPoolDevice>::NoCallback) const [clone .isra.0] (TensorContractionThreadPool.h:184)
==14444==    by 0x5E16771: evalProduct<0> (TensorContractionThreadPool.h:77)
==14444==    by 0x5E16771: evalTo (TensorContraction.h:703)
==14444==    by 0x5E16771: evalSubExprsIfNeeded (TensorContraction.h:609)
==14444==    by 0x5E16771: evalSubExprsIfNeeded (TensorMorphing.h:163)
==14444==    by 0x5E16771: evalSubExprsIfNeeded (TensorAssign.h:155)
==14444==    by 0x5E16771: Eigen::internal::TensorExecutor<Eigen::TensorAssignOp<Eigen::TensorMap<Eigen::Tensor<float, 4, 1, int>, 16, Eigen::MakePointer>, Eigen::TensorReshapingOp<Eigen::DSizes<int, 4> const, Eigen::TensorContractionOp<std::array<Eigen::IndexPair<int>, 1u> const, Eigen::TensorReshapingOp<Eigen::DSizes<int, 2> const, Eigen::TensorImagePatchOp<-1, -1, Eigen::TensorMap<Eigen::Tensor<float const, 4, 1, int>, 16, Eigen::MakePointer> const> const> const, Eigen::TensorReshapingOp<Eigen::DSizes<int, 2> const, Eigen::TensorMap<Eigen::Tensor<float const, 4, 1, int>, 16, Eigen::MakePointer> const> const, Eigen::NoOpOutputKernel const> const> const> const, Eigen::ThreadPoolDevice, true, (Eigen::internal::TiledEvaluation)0>::run(Eigen::TensorAssignOp<Eigen::TensorMap<Eigen::Tensor<float, 4, 1, int>, 16, Eigen::MakePointer>, Eigen::TensorReshapingOp<Eigen::DSizes<int, 4> const, Eigen::TensorContractionOp<std::array<Eigen::IndexPair<int>, 1u> const, Eigen::TensorReshapingOp<Eigen::DSizes<int, 2> const, Eigen::TensorImagePatchOp<-1, -1, Eigen::TensorMap<Eigen::Tensor<float const, 4, 1, int>, 16, Eigen::MakePointer> const> const> const, Eigen::TensorReshapingOp<Eigen::DSizes<int, 2> const, Eigen::TensorMap<Eigen::Tensor<float const, 4, 1, int>, 16, Eigen::MakePointer> const> const, Eigen::NoOpOutputKernel const> const> const> const&, Eigen::ThreadPoolDevice const&) (TensorExecutor.h:336)
==14444==    by 0x5DCC045: operator=<Eigen::TensorReshapingOp<const Eigen::DSizes<int, 4>, const Eigen::TensorContractionOp<const std::array<Eigen::IndexPair<int>, 1>, const Eigen::TensorReshapingOp<const Eigen::DSizes<int, 2>, const Eigen::TensorImagePatchOp<-1, -1, const Eigen::TensorMap<Eigen::Tensor<float const, 4, 1, int>, 16> > >, const Eigen::TensorReshapingOp<const Eigen::DSizes<int, 2>, const Eigen::TensorMap<Eigen::Tensor<float const, 4, 1, int>, 16> >, const Eigen::NoOpOutputKernel> > > (TensorDevice.h:40)
==14444==    by 0x5DCC045: operator() (Conv.h:247)
==14444==    by 0x5DCC045: Conv (Conv.h:282)
==14444==    by 0x5DCC045: operator() (Conv.h:101)
==14444==    by 0x5DCC045: operator() (Conv.h:86)
==14444==    by 0x5DCC045: onert::backend::cpu::ops::ConvolutionLayer::convFloat32() (ConvolutionLayer.cc:63)
==14444==    by 0x4DD2B89: onert::exec::train::TrainableFnSequence::forward(bool) (TrainableFnSequence.cc:30)
==14444==    by 0x4DD02E1: onert::exec::train::TrainableExecutor::forwardImpl(onert::exec::ExecutionObservee const&, bool) (TrainableExecutor.cc:132)
==14444==    by 0x4DD0483: onert::exec::train::TrainableExecutor::forward(std::vector<onert::backend::IPortableTensor*, std::allocator<onert::backend::IPortableTensor*> > const&, std::vector<onert::backend::IPortableTensor*, std::allocator<onert::backend::IPortableTensor*> > const&, onert::exec::ExecutionOptions const&, bool) (TrainableExecutor.cc:93)
==14444==    by 0x4DD2351: onert::exec::train::TrainableExecutors::forward(onert::exec::ExecutionContext const&, std::vector<std::unique_ptr<onert::backend::builtin::UserTensor, std::default_delete<onert::backend::builtin::UserTensor> >, std::allocator<std::unique_ptr<onert::backend::builtin::UserTensor, std::default_delete<onert::backend::builtin::UserTensor> > > >&, bool) (TrainableExecutors.cc:123)
==14444==    by 0x4DD2759: onert::exec::train::TrainableExecutors::train(onert::exec::ExecutionContext const&, unsigned int) (TrainableExecutors.cc:81)
==14444==    by 0x4DA5A2B: onert::exec::Execution::train(unsigned int) (Execution.cc:172)
==14444==    by 0x489CC49: nnfw_session::train_run(bool) (nnfw_api_internal.cc:1593)
==14444== Your program just tried to execute an instruction that Valgrind
==14444== did not recognise.  There are two possible reasons for this.
==14444== 1. Your program has a bug and erroneously jumped to a non-code
==14444==    location.  If you are running Memcheck and you just saw a
==14444==    warning about a bad jump, it's probably your program's fault.
==14444== 2. The instruction is legitimate but Valgrind doesn't handle it,
==14444==    i.e. it's Valgrind's fault.  If you think this is the case or
==14444==    you are not sure, please let us know and we'll try to fix it.
==14444== Either way, Valgrind will now raise a SIGILL signal which will
==14444== probably kill your program.
==14444== 
==14444== Process terminating with default action of signal 4 (SIGILL)
==14444==  Illegal opcode at address 0x5DB4A7B
==14444==    at 0x5DB4A7A: vfma_f32 (arm_neon.h:1723)
==14444==    by 0x5DB4A7A: pmadd<__vector(2) float> (PacketMath.h:1045)
==14444==    by 0x5DB4A7A: pmadd (ConjHelper.h:98)
==14444==    by 0x5DB4A7A: madd<__vector(2) float, __vector(2) float, __vector(2) float, Eigen::internal::FixedInt<0> > (GeneralBlockPanelKernel.h:529)
==14444==    by 0x5DB4A7A: madd<__vector(2) float, __vector(2) float, Eigen::internal::FixedInt<0> > (GeneralBlockPanelKernel.h:538)
==14444==    by 0x5DB4A7A: peeled_kc_onestep (GeneralBlockPanelKernel.h:1212)
==14444==    by 0x5DB4A7A: operator() (GeneralBlockPanelKernel.h:1408)
==14444==    by 0x5DB4A7A: Eigen::internal::gebp_kernel<float, float, int, Eigen::internal::blas_data_mapper<float, int, 0, 0, 1>, 12, 4, false, false>::operator()(Eigen::internal::blas_data_mapper<float, int, 0, 0, 1> const&, float const*, float const*, int, int, int, float, int, int, int, int) [clone .constprop.0] (GeneralBlockPanelKernel.h:2415)
==14444==    by 0x5DED1F3: void Eigen::TensorContractionEvaluatorBase<Eigen::TensorEvaluator<Eigen::TensorContractionOp<std::array<Eigen::IndexPair<int>, 1u> const, Eigen::TensorReshapingOp<Eigen::DSizes<int, 2> const, Eigen::TensorImagePatchOp<-1, -1, Eigen::TensorMap<Eigen::Tensor<float const, 4, 1, int>, 16, Eigen::MakePointer> const> const> const, Eigen::TensorReshapingOp<Eigen::DSizes<int, 2> const, Eigen::TensorMap<Eigen::Tensor<float const, 4, 1, int>, 16, Eigen::MakePointer> const> const, Eigen::NoOpOutputKernel const> const, Eigen::ThreadPoolDevice> >::evalGemmPartial<true, true, false, 0, true>(float*, int, int, int) const (TensorContraction.h:900)
==14444==    by 0x5E15D4B: evalGemm<true, true, false, 0> (TensorContraction.h:787)
==14444==    by 0x5E15D4B: evalProductSequential<true, true, false, 0> (TensorContraction.h:724)
==14444==    by 0x5E15D4B: void Eigen::TensorEvaluator<Eigen::TensorContractionOp<std::array<Eigen::IndexPair<int>, 1u> const, Eigen::TensorReshapingOp<Eigen::DSizes<int, 2> const, Eigen::TensorImagePatchOp<-1, -1, Eigen::TensorMap<Eigen::Tensor<float const, 4, 1, int>, 16, Eigen::MakePointer> const> const> const, Eigen::TensorReshapingOp<Eigen::DSizes<int, 2> const, Eigen::TensorMap<Eigen::Tensor<float const, 4, 1, int>, 16, Eigen::MakePointer> const> const, Eigen::NoOpOutputKernel const> const, Eigen::ThreadPoolDevice>::evalProductImpl<Eigen::TensorEvaluator<Eigen::TensorContractionOp<std::array<Eigen::IndexPair<int>, 1u> const, Eigen::TensorReshapingOp<Eigen::DSizes<int, 2> const, Eigen::TensorImagePatchOp<-1, -1, Eigen::TensorMap<Eigen::Tensor<float const, 4, 1, int>, 16, Eigen::MakePointer> const> const> const, Eigen::TensorReshapingOp<Eigen::DSizes<int, 2> const, Eigen::TensorMap<Eigen::Tensor<float const, 4, 1, int>, 16, Eigen::MakePointer> const> const, Eigen::NoOpOutputKernel const> const, Eigen::ThreadPoolDevice>::NoCallback, 0>(float*, Eigen::TensorEvaluator<Eigen::TensorContractionOp<std::array<Eigen::IndexPair<int>, 1u> const, Eigen::TensorReshapingOp<Eigen::DSizes<int, 2> const, Eigen::TensorImagePatchOp<-1, -1, Eigen::TensorMap<Eigen::Tensor<float const, 4, 1, int>, 16, Eigen::MakePointer> const> const> const, Eigen::TensorReshapingOp<Eigen::DSizes<int, 2> const, Eigen::TensorMap<Eigen::Tensor<float const, 4, 1, int>, 16, Eigen::MakePointer> const> const, Eigen::NoOpOutputKernel const> const, Eigen::ThreadPoolDevice>::NoCallback) const [clone .isra.0] (TensorContractionThreadPool.h:184)
==14444==    by 0x5E16771: evalProduct<0> (TensorContractionThreadPool.h:77)
==14444==    by 0x5E16771: evalTo (TensorContraction.h:703)
==14444==    by 0x5E16771: evalSubExprsIfNeeded (TensorContraction.h:609)
==14444==    by 0x5E16771: evalSubExprsIfNeeded (TensorMorphing.h:163)
==14444==    by 0x5E16771: evalSubExprsIfNeeded (TensorAssign.h:155)
==14444==    by 0x5E16771: Eigen::internal::TensorExecutor<Eigen::TensorAssignOp<Eigen::TensorMap<Eigen::Tensor<float, 4, 1, int>, 16, Eigen::MakePointer>, Eigen::TensorReshapingOp<Eigen::DSizes<int, 4> const, Eigen::TensorContractionOp<std::array<Eigen::IndexPair<int>, 1u> const, Eigen::TensorReshapingOp<Eigen::DSizes<int, 2> const, Eigen::TensorImagePatchOp<-1, -1, Eigen::TensorMap<Eigen::Tensor<float const, 4, 1, int>, 16, Eigen::MakePointer> const> const> const, Eigen::TensorReshapingOp<Eigen::DSizes<int, 2> const, Eigen::TensorMap<Eigen::Tensor<float const, 4, 1, int>, 16, Eigen::MakePointer> const> const, Eigen::NoOpOutputKernel const> const> const> const, Eigen::ThreadPoolDevice, true, (Eigen::internal::TiledEvaluation)0>::run(Eigen::TensorAssignOp<Eigen::TensorMap<Eigen::Tensor<float, 4, 1, int>, 16, Eigen::MakePointer>, Eigen::TensorReshapingOp<Eigen::DSizes<int, 4> const, Eigen::TensorContractionOp<std::array<Eigen::IndexPair<int>, 1u> const, Eigen::TensorReshapingOp<Eigen::DSizes<int, 2> const, Eigen::TensorImagePatchOp<-1, -1, Eigen::TensorMap<Eigen::Tensor<float const, 4, 1, int>, 16, Eigen::MakePointer> const> const> const, Eigen::TensorReshapingOp<Eigen::DSizes<int, 2> const, Eigen::TensorMap<Eigen::Tensor<float const, 4, 1, int>, 16, Eigen::MakePointer> const> const, Eigen::NoOpOutputKernel const> const> const> const&, Eigen::ThreadPoolDevice const&) (TensorExecutor.h:336)
==14444==    by 0x5DCC045: operator=<Eigen::TensorReshapingOp<const Eigen::DSizes<int, 4>, const Eigen::TensorContractionOp<const std::array<Eigen::IndexPair<int>, 1>, const Eigen::TensorReshapingOp<const Eigen::DSizes<int, 2>, const Eigen::TensorImagePatchOp<-1, -1, const Eigen::TensorMap<Eigen::Tensor<float const, 4, 1, int>, 16> > >, const Eigen::TensorReshapingOp<const Eigen::DSizes<int, 2>, const Eigen::TensorMap<Eigen::Tensor<float const, 4, 1, int>, 16> >, const Eigen::NoOpOutputKernel> > > (TensorDevice.h:40)
==14444==    by 0x5DCC045: operator() (Conv.h:247)
==14444==    by 0x5DCC045: Conv (Conv.h:282)
==14444==    by 0x5DCC045: operator() (Conv.h:101)
==14444==    by 0x5DCC045: operator() (Conv.h:86)
==14444==    by 0x5DCC045: onert::backend::cpu::ops::ConvolutionLayer::convFloat32() (ConvolutionLayer.cc:63)
==14444==    by 0x4DD2B89: onert::exec::train::TrainableFnSequence::forward(bool) (TrainableFnSequence.cc:30)
==14444==    by 0x4DD02E1: onert::exec::train::TrainableExecutor::forwardImpl(onert::exec::ExecutionObservee const&, bool) (TrainableExecutor.cc:132)
==14444==    by 0x4DD0483: onert::exec::train::TrainableExecutor::forward(std::vector<onert::backend::IPortableTensor*, std::allocator<onert::backend::IPortableTensor*> > const&, std::vector<onert::backend::IPortableTensor*, std::allocator<onert::backend::IPortableTensor*> > const&, onert::exec::ExecutionOptions const&, bool) (TrainableExecutor.cc:93)
==14444==    by 0x4DD2351: onert::exec::train::TrainableExecutors::forward(onert::exec::ExecutionContext const&, std::vector<std::unique_ptr<onert::backend::builtin::UserTensor, std::default_delete<onert::backend::builtin::UserTensor> >, std::allocator<std::unique_ptr<onert::backend::builtin::UserTensor, std::default_delete<onert::backend::builtin::UserTensor> > > >&, bool) (TrainableExecutors.cc:123)
==14444==    by 0x4DD2759: onert::exec::train::TrainableExecutors::train(onert::exec::ExecutionContext const&, unsigned int) (TrainableExecutors.cc:81)
==14444==    by 0x4DA5A2B: onert::exec::Execution::train(unsigned int) (Execution.cc:172)
==14444==    by 0x489CC49: nnfw_session::train_run(bool) (nnfw_api_internal.cc:1593)

@glistening
Copy link
Contributor

glistening commented Dec 3, 2024

Two SIGILL caught my eyes. However, I am not sure it is our bug. It may be the case of

==14444==    warning about a bad jump, it's probably your program's fault.
==14444== 2. The instruction is legitimate but Valgrind doesn't handle it,
==14444==    i.e. it's Valgrind's fault.  If you think this is the case or
==14444==    you are not sure, please let us know and we'll try to fix it.

(ADD)

For example, I got similar SIGILL during ggml,

disInstr(thumb): unhandled instruction: 0xEEF2 0x7A67

But it turned out valid. Disammsemble 0xEEF2 0x7A67 shows,

VMOV s15, r7

@batcheu
Copy link
Contributor

batcheu commented Dec 3, 2024

First of all, as you may know that XU4 is armv7 and RPI4 is armv8.
The instruction, vfmat_f32, was blamed as Illegal instruction in second callstack seems also valid on both instruction set.

So, SIGILL is wrong unless both machine does support NEON intrinsics.

Then, why does SIGILL occurs?

  • I'm not sure about the root cause of this fault, but it seems it executes in Thumb mode (T32 in armv7) where the bytecode is compiled as ARM mode. (A32 in armv8).
  • The PC addresses in all callstacks (@glistening attached) showed that it was in Thumb mode since they are all ended with 0th bit is set. (e.g. Illegal opcode at address 0x36B6DB)

@glistening
Copy link
Contributor

I checked with address sanitizer. But it is not helpful.

@glistening
Copy link
Contributor

I cannot reproduce error on my RIP4.
I tried @hseok-oh's command.

$ Product/armv7l-linux.debug/out/test/onert-test unittest

But it shows many fails:

Couldn't find any of the following OpenCL library: libOpenCL.so libGLES_mali.so libmali.so 
Error during model prepare : No loaded backends available.
/mnt/ONE/tests/nnfw_api/lib/GenModelTest.h:319: Failure
Expected equality of these values:
  (nnfw_prepare(_so.session))
    Which is: 1
  NNFW_STATUS_NO_ERROR
    Which is: 0
[  FAILED  ] GenModelTest.UnusedConstOutputOnly (126 ms)
[ RUN      ] GenModelTest.UnusedConstOutputAndAdd
Error during model prepare : No loaded backends available.
/mnt/ONE/tests/nnfw_api/lib/GenModelTest.h:319: Failure
Expected equality of these values:
  (nnfw_prepare(_so.session))
    Which is: 1
  NNFW_STATUS_NO_ERROR
    Which is: 0
[  FAILED  ] GenModelTest.UnusedConstOutputAndAdd (2 ms)
...
[==========] 674 tests from 35 test suites ran. (4399 ms total)
[  PASSED  ] 512 tests.
[  FAILED  ] 162 tests, listed below:
[  FAILED  ] GenModelTest.UnusedConstOutputOnly
[  FAILED  ] GenModelTest.UnusedConstOutputAndAdd
...

It seems not reproducible on RPI4.

@hseok-oh
Copy link
Contributor Author

hseok-oh commented Dec 4, 2024

Couldn't find any of the following OpenCL library: libOpenCL.so libGLES_mali.so libmali.so

You need OpenCL library to use ARMCompute backend. Unittest try to test all backends.
Please build without ARMCompute backends: OPTIONS='-DBUILD_ARMCOMPUTE=OFF' environment variable setting

@hseok-oh
Copy link
Contributor Author

hseok-oh commented Dec 4, 2024

Currently, I can reproduce this issue by running nnfw_api_gtest only.

I tried to comment out body of OneOp_Gather_Q4_0 test except quantization, but it fails on that.

--- a/tests/nnfw_api/src/GenModelTests/one_op_tests/Gather.test.cc
+++ b/tests/nnfw_api/src/GenModelTests/one_op_tests/Gather.test.cc
@@ -34,6 +34,7 @@ TEST_F(GenModelTest, OneOp_Gather_Q4_0)
   }
 
   auto input_vector = quantData(params, circle::TensorType::TensorType_GGML_Q4_0);
+#if 0
   auto input_buf = cgen.addBuffer(input_vector);
   int input = cgen.addTensor({{4, 32}, circle::TensorType::TensorType_GGML_Q4_0, input_buf});
   int indice = cgen.addTensor({{1, 1}, circle::TensorType::TensorType_INT32});
@@ -49,7 +50,7 @@ TEST_F(GenModelTest, OneOp_Gather_Q4_0)
   tc.addOutput<float>(std::vector<float>{params.begin() + 64, params.begin() + 96});
   _context->addTestCase(tc);
   _context->setBackends({"cpu"});
-
+#endif
   SUCCEED();
 }
$ $ ./Product/out/unittest/nnfw_api_gtest --gtest_filter=GenModelTest.OneOp_Gather_Q4_0
Note: Google Test filter = GenModelTest.OneOp_Gather_Q4_0
[==========] Running 1 test from 1 test suite.
[----------] Global test environment set-up.
[----------] 1 test from GenModelTest
[ RUN      ] GenModelTest.OneOp_Gather_Q4_0
Illegal instruction

Maybe issue on quantization function.

@glistening
Copy link
Contributor

glistening commented Dec 4, 2024

I will set up the same environmet of yours. (i.e. ubuntu 24.04 environment on xu4), instead of 22.04 on rip4). and try to reproduce by following your guide. It will take some (it requires cross-build environment setup and so on.)

@glistening
Copy link
Contributor

@hseok-oh What version of OS did you install? 24.04 (240506)? or 24.04.1 (240911)?

@hseok-oh
Copy link
Contributor Author

hseok-oh commented Dec 5, 2024

I don't know exact reason, but #14418 resolves this issue.
As a long-term plan, we may need to internalize kernel for q4_0 and q8_0 weight, not using ggml porting.

@glistening
Copy link
Contributor

glistening commented Dec 6, 2024

Now, on Ubuntu Mate 24.04.1 odroid xu4, I can reproduce SIGILL by:

$ Product/armv7l-linux.release/out/bin/onert_run -w 1 5.circle

where 5.circle has only 5th node (= ggml fully connected) from decblk.circle.

$ python tools/tflitefile_tool/select_operator.py decblk.circle <( echo 5 ) 5.circle

@glistening
Copy link
Contributor

glistening commented Dec 9, 2024

Ubuntu 24.04 release binary always gets SIGILL in quantize_row_q8_0.

It doesn't need series run of unittest. ← Seems fixed with #14418.

It is another bug. It occurs even with single thread.
It is deterministic, which seems neither memory corruption nor threading problem.

Backtrace

$ gdb Product/armv7l-linux.release/out/bin/onert_run
...
(gdb) r 5.circle

Thread 1 "onert_run" received signal SIGILL, Illegal instruction.
0xb6c80a6c in quantize_row_q8_0 () from Product/out/lib/nnfw/backend/libbackend_cpu.so
(gdb) bt
#0  0xb6c80a6c in quantize_row_q8_0 () from Product/out/lib/nnfw/backend/libbackend_cpu.so
#1  0x00000000 in ?? ()

Only -O0 works

Suspects gcc-13 -O3 since it only happens in release build.

-O2, -O1 don't work. -O0 only works.

Some optimization, which is activated on On, where n = 1, 2, 3, generates illegal instructions.

See more in #14436.

@hseok-oh
Copy link
Contributor Author

hseok-oh commented Dec 9, 2024

#14436 is merged and it resolves this failure.

@hseok-oh hseok-oh closed this as completed Dec 9, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/onert ONE runtime type/issue There is something strange
Projects
None yet
Development

No branches or pull requests

3 participants