Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Compile FE] Build ARM32 target #8379

Open
seanshpark opened this issue Feb 7, 2022 · 27 comments
Open

[Compile FE] Build ARM32 target #8379

seanshpark opened this issue Feb 7, 2022 · 27 comments

Comments

@seanshpark
Copy link
Contributor

seanshpark commented Feb 7, 2022

Let's enable build compiler frontend for ARM32 target.

  • first module to enable is luci
  • refer runtime compilation for ARM32
  • to enable running inside Tizen

Why ARM32 not AArch64?

  • Our Tizen target device runs in ARM32.
  • From my point of view, ARM32 compilation test is to validate correct execution in ARM32.
@seanshpark
Copy link
Contributor Author

seanshpark commented Feb 7, 2022

TODO

  • prepare ARM32 rootfs
  • cmake configure
    • armv7l compiler (arm-linux-gnueabihf-gcc, arm-linux-gnueabihf-g++)
    • build dependency libraries to ARM32
      • protobuf
      • flatbuffers
      • HDF5
      • gtest
      • jsoncpp
      • ...
    • native flatc for mio-circle, mio-tflite
    • native protoc for tflchef
  • run luci tests in ARM32 device
  • run unit tests in ARM32 device
  • build common-artifacts
  • run luci-value-test
    • fix Sqrt_000 failure
  • enable NEON for kernels
  • CI test with ARM32 device
  • do not use locomotiv ?
  • ...

@seanshpark
Copy link
Contributor Author

seanshpark commented Feb 7, 2022

Issues; build host architecture executables are needed for compile time file generations

  • flatc native is required to compile fbs file(s)
  • protoc, js_embed native is required to compile proto file(s) and generate well_known_types_js
  • tflchef, circlechef

--> try with (1) build only these modules in host (x86-64) (2) provide these executables to cross build by environment variables (3) cross build

@seanshpark
Copy link
Contributor Author

For protobuf, find_package(protobuf EXACT 3.5.2 QUIET) fails in cross compile.

@seanshpark
Copy link
Contributor Author

seanshpark commented Feb 10, 2022

Run tests in RPi4

sudo apt-get install software-properties-common
sudo apt-get install build-essential cmake scons git

Issue:

  • running make test inside luci/tests fails
  • Makefile has absolute path of cmake and ctest, which is in my ~/bin
  • Had to make a alias in ~/bin

Run test result in RPi4/Ubuntu18.04

Running tests...
Test project /home/ubuntu/one/build/arm32.debug/compiler/luci/tests
    Start 1: luci_unit_readtest
1/2 Test #1: luci_unit_readtest ...............   Passed   35.23 sec
    Start 2: luci_unit_writetest
2/2 Test #2: luci_unit_writetest ..............   Passed   48.20 sec

100% tests passed, 0 tests failed out of 2

Total Test time (real) =  83.66 sec

@seanshpark
Copy link
Contributor Author

seanshpark commented Feb 10, 2022

HDF5 uses TRY_RUN which doesn't seem to work with cross compile...
Need to find some workarounds.

-- Checking for appropriate format for 64 bit long:
CMake Error: TRY_RUN() invoked in cross-compiling mode, please set the following cache variables appropriately:
H5_PRINTF_LL_TEST_RUN (advanced)
H5_PRINTF_LL_TEST_RUN__TRYRUN_OUTPUT (advanced)
H5_LDOUBLE_TO_LONG_SPECIAL_RUN (advanced)
H5_LDOUBLE_TO_LONG_SPECIAL_RUN__TRYRUN_OUTPUT (advanced)
...

Try: build native ARM32, get the values, set to cache (how?)

cmake .. \
-DHDF5_BUILD_TOOLS:BOOL=OFF \
-DHDF5_ENABLE_SZIP_SUPPORT:BOOL=OFF \
-DHDF5_ENABLE_Z_LIB_SUPPORT:BOOL=OFF \
-DHDF5_BUILD_TOOLS:BOOL=ON

--> doesn't work. need to patch not to run TRY_RUN for cross compile.

Try: patch after download
-->

infra/cmake/packages/HDF5Config.cmake:56 (include):
include could not find load file:
HDF5_CONFIG_DIR-NOTFOUND/hdf5-config.cmake

@seanshpark
Copy link
Contributor Author

seanshpark commented Feb 11, 2022

HDF5_CONFIG_DIR-NOTFOUND/hdf5-config.cmake

--> install has problem --> fixed
--> still find_path doesn't work for cross build...
--> give NO_CMAKE_FIND_ROOT_PATH --> seems to work
--> check this after clean build

@seanshpark
Copy link
Contributor Author

record-minmax requires luci-interpreter

  • need to check kernel codes that use x86 specific codes

@seanshpark
Copy link
Contributor Author

seanshpark commented Feb 11, 2022

tests are disabled as gtest isn't found in cross compile.

  • at least I want to run unit tests to run in target device
  • later we can enable this in CI with target devices

--> fixed in cmake and seems to work

@seanshpark
Copy link
Contributor Author

How to enable common-artifacts ?

  • issue: python virtual envionment with TensorFlow 2.6 is needed

Maybe ?

  • host drops all reference data
  • target executes luci-interpreter and compare with reference data

@seanshpark
Copy link
Contributor Author

Higher version of CMake (Ubuntu 20.04 and Tizen) fails for some find_package that was built from source.
nnas_find_package_folder() seems to solve this issue.

@chunseoklee
Copy link
Contributor

chunseoklee commented Feb 22, 2022

Higher version of CMake (Ubuntu 20.04 and Tizen) fails for some find_package that was built from source. nnas_find_package_folder() seems to solve this issue.

This does not solve infinite recursive loop of GTest. I have succeed with the following patch : bd2bb66

@seanshpark
Copy link
Contributor Author

I have succeed with the following patch

OK thanks for the check!

@seanshpark
Copy link
Contributor Author

seanshpark commented Feb 23, 2022

How to do value test of circle2circle and luci-interpreter with executables from cross build?

  • basic: prepare what can be done in host -> run in target and compare with host preparation

Try

  • in host build, prepare (1) model (2) input data (3) expected output data
  • in target, (4) model' with target circle2circle (5) run luci-interpreter (6) save output data
  • in target, compare (3), (6) ?

To run python test script in RPi4/Ubuntu18.05,

sudo apt-get install python3-pip

python3 -m pip install --user cython
python3 -m pip install --user numpy

@chunseoklee
Copy link
Contributor

Have you figure out how arm32 cross build succeed w.r.t neon_tensor_utils.cc issue( #8480 (comment) ) ?

@seanshpark
Copy link
Contributor Author

Have you figure out how arm32 cross build succeed w.r.t neon_tensor_utils.cc

Nope... I'm working on to validate excution result of luci-interpreter.
Currently I'm not sure luci-interpreter for ARM32 build is correct.

@chunseoklee
Copy link
Contributor

Nope... I'm working on to validate excution result of luci-interpreter.
Currently I'm not sure luci-interpreter for ARM32 build is correct.

If luci-interperter is runnable on arm32 device, build semms OK. It is very strange.

@seanshpark
Copy link
Contributor Author

seanshpark commented Mar 2, 2022

Test in ARM32; luci_value_cross_test fails only with Sqrt_000 ?

      Start 44: luci_value_cross_test
44/45 Test #44: luci_value_cross_test ............***Failed   82.64 sec
FAILED
- Sqrt_000

@seanshpark
Copy link
Contributor Author

seanshpark commented Mar 2, 2022

I found that (on my local ubuntu) -mfpu=neon flag is not passed in arm32 cross build.

config_armv7l-linux.cmake is not included while configuration. I've found that there are missing changes. With this file included, there are link errors like you got for Tizen build.
I'll first finish landing current draft and then work on enabling Neon in luci-interpreter.

@seanshpark
Copy link
Contributor Author

With neon,

2: [ RUN      ] L2Pool2DTest.FloatPaddingSameStride
2: compiler/luci-interpreter/src/kernels/L2Pool2D.test.cpp:209: Failure
2: Value of: extractTensorData<float>(output_tensor)
2: Expected: has 8 elements where
2: element #0 is approximately 3.5 (absolute error <= 9.9999997e-06),
2: element #1 is approximately 6 (absolute error <= 9.9999997e-06),
2: element #2 is approximately 6.5 (absolute error <= 9.9999997e-06),
2: element #3 is approximately 5.7008801 (absolute error <= 9.9999997e-06),
2: element #4 is approximately 2.54951 (absolute error <= 9.9999997e-06),
2: element #5 is approximately 7.2111001 (absolute error <= 9.9999997e-06),
2: element #6 is approximately 8.63134 (absolute error <= 9.9999997e-06),
2: element #7 is approximately 7 (absolute error <= 9.9999997e-06)
2:   Actual: { 3.49999, 5.99999, 6.49999, 5.70087, 2.5495, 7.21109, 8.63132, 6.99999 }, whose element #1 doesn't match, which is -1.14441e-05 from 6
2: [  FAILED  ] L2Pool2DTest.FloatPaddingSameStride (0 ms)

@chunseoklee
Copy link
Contributor

6 vs 5.9999
which is -1.14441e-05 from 6

IDK why this diff results in mismatch.

@seanshpark
Copy link
Contributor Author

IDK why this diff results in mismatch.

Using neon + ruy gives this difference.

@seanshpark
Copy link
Contributor Author

done except for build test in CI, which added another issue

@tomdol
Copy link
Contributor

tomdol commented Jan 9, 2025

I'd like to ask a question related to the following PR #14535 where I'm attempting to update the protobuf version.

There used to be a patch applied to the protobuf source and it was introduced as a part of this issue's resolution #8505

Since the new version of Protobuf does not contain the js_embed library any more, is there any other infrastructure change that should be done so that the external (native) js_embed is not set nor used?

@seanshpark seanshpark reopened this Jan 9, 2025
@seanshpark
Copy link
Contributor Author

... should be done so that the external (native) js_embed is not set nor used?

I do not know.

If you like to upgrade protobuf, which ARM32 build maybe affected, please check ARM32 build
and share the results so that I confirm the upgrade is OK.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants