Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

workload error; It runs for a long time with no results #3817

Open
3 tasks done
yaoyuexiaogege opened this issue Oct 30, 2024 · 8 comments
Open
3 tasks done

workload error; It runs for a long time with no results #3817

yaoyuexiaogege opened this issue Oct 30, 2024 · 8 comments
Labels
problem Problem requiring help

Comments

@yaoyuexiaogege
Copy link

yaoyuexiaogege commented Oct 30, 2024

Before start

  • I have read the XiangShan Documents. 我已经阅读过香山文档。
  • I have searched the previous issues and did not find anything relevant. 我已经搜索过之前的 issue,并没有找到相关的。
  • I have searched the previous discussions and did not find anything relevant. 我已经搜索过之前的 discussions,并没有找到相关的。

Describe you problem

image
在香山核仿真程序上仿真运行 workload
执行命令./build/emu -i $NOOP_HOME/ready-to-run/linux.bin
运行了很长时间(超过2hours)却没有结果,输出结果如下:
emu compiled at Oct 24 2024, 19:22:33
Using simulated 32768B flash
Using simulated 8192MB RAM
The image is /home/xiaoyuge/Desktop/test1/xs-env/XiangShan/ready-to-run/linux.bin
The reference model is /home/xiaoyuge/Desktop/test1/xs-env/NEMU/build/riscv64-nemu-interpreter-so
The first instruction of core 0 has commited. Difftest enabled.
...
(漫长的等待无结果)
image

[TRANSLATION]
In the XiangShan core simulation program, simulating and running the workload with the command ./build/emu -i $NOOP_HOME/ready-to-run/linux.bin has been running for a long time (over 2 hours) without any results. The output is as follows:
emu compiled at Oct 24 2024, 19:22:33
Using simulated 32768B flash
Using simulated 8192MB RAM
The image is /home/xiaoyuge/Desktop/test1/xs-env/XiangShan/ready-to-run/linux.bin
The reference model is /home/xiaoyuge/Desktop/test1/xs-env/NEMU/build/riscv64-nemu-interpreter-so
The first instruction of core 0 has commited. Difftest enabled.
...
(A long wait with no result)

What did you do before

source ./env.sh
cd NEMU
make clean
make riscv64-xs-ref_defconfig
make -j
cd xs-env/XiangShan
./build/emu -i $NOOP_HOME/ready-to-run/linux.bin

Environment

  • XiangShan branch:
  • XiangShan commit id:
  • NEMU commit id:
  • SPIKE commit id:
  • Operating System:
  • gcc version:
  • mill version:
  • java version:

Additional context

No response

@yaoyuexiaogege yaoyuexiaogege added the problem Problem requiring help label Oct 30, 2024
@cebarobot
Copy link
Member

cebarobot commented Oct 30, 2024

麻烦您提供一下您编译香山时的命令(主要需检查香山仿真所用线程数)以及您机器的配置。

根据我们的经验,在 AMD EPYC 9684X 服务器上以 16 进程(编译时带有参数 EMU_THREADS=16)仿真,linux.bin(opensbi+linux+hello)大约需要 8 分钟才能有最初的输出(OpenSBI 的 LOGO)。可以想象,在机器性能较低、仿真线程较少的情况下,可能需要更长时间才能有输出。

Please provide the command line you used to build emu of XiangShan, in order to check the number of emu threads. Please provide your computer setup as well.

According to our experience, We run XiangShan emu on AMD EPYC 9684X server with 16 threads and it will take 8 minute to get first output, the OpenSBI logo, for linux.bin (opensbi+linux+hello). It can be imagined that, in the case of lower machine performance and fewer emu threads, it may take a longer time to get output.

@yaoyuexiaogege
Copy link
Author

yaoyuexiaogege commented Oct 30, 2024

麻烦您提供一下您编译香山时的命令(主要需检查香山仿真所用线程数)以及您机器的配置。

根据我们的经验,在 AMD EPYC 9684X 服务器上以 16 进程(编译时带有参数 EMU_THREADS=16)仿真,linux.bin(opensbi+linux+hello)大约需要 8 分钟才能有最初的输出(OpenSBI 的 LOGO)。可以想象,在机器性能较低、仿真线程较少的情况下,可能需要更长时间才能有输出。

Please provide the command line you used to build emu of XiangShan, in order to check the number of emu threads. Please provide your computer setup as well.

According to our experience, We run XiangShan emu on AMD EPYC 9684X server with 16 threads and it will take 8 minute to get first output, the OpenSBI logo, for linux.bin (opensbi+linux+hello). It can be imagined that, in the case of lower machine performance and fewer emu threads, it may take a longer time to get output.

您好,我的编译香山时的命令为:
make emu CONFIG=MinimalConfig EMU_TRACE=1 -j4

[TRANSLATION]
Hello, my command for compiling Xiangshan is:
make emu CONFIG=MinimalConfig EMU_TRACE=1 -j4

@cebarobot
Copy link
Member

cebarobot commented Oct 30, 2024

您编译的指令最终会生成单线程的仿真程序,可预见仿真速度会很慢。可能还是需要换用更好的机器,或参考 这篇文档 的内容来加快仿真速度。

[TRANSLATION]
The command of compile you execute will ultimately generate a single-threaded simulation program, and it's expected that the simulation speed will be very slow. You may still need to switch to a better machine or refer to the contents of this document to speed up the simulation.

@quanchenliu
Copy link

quanchenliu commented Nov 27, 2024

您编译的指令最终会生成单线程的仿真程序,可预见仿真速度会很慢。可能还是需要换用更好的机器,或参考 这篇文档 的内容来加快仿真速度。

[TRANSLATION] The command of compile you execute will ultimately generate a single-threaded simulation program, and it's expected that the simulation speed will be very slow. You may still need to switch to a better machine or refer to the contents of this document to speed up the simulation.

您好,我在执行的过程中遇到了与题主一模一样的问题。根据您的回答,我是否可以理解为:我需要重新生成香山核的仿真程序?

那么在重新执行生成香山核仿真程序之前,我应该如何清除之前生成的相关内容,直接在 xs-env\XiangShan 目录下执行 make clean ,然后执行生成香山核仿真程序生成指令 make emu CONFIG=MinimalConfig EMU_TRACE=1 EMU_THREADS=2 -j4 吗?

[TRANSLATION]
Hello, I encountered the exact same issue as the original poster during execution. Based on your response, can I understand that I need to regenerate the XiangShan core simulation program?

Before regenerating the XiangShan core simulation program, how should I clean up the previously generated content? Should I directly execute the command make clean in the xs-env/XiangShan directory, and then execute the command for generating the XiangShan core simulation program:

make emu CONFIG=MinimalConfig EMU_TRACE=1 EMU_THREADS=2 -j4

@Ma-YX
Copy link
Contributor

Ma-YX commented Nov 28, 2024

您编译的指令最终会生成单线程的仿真程序,可预见仿真速度会很慢。可能还是需要换用更好的机器,或参考 这篇文档 的内容来加快仿真速度。
[TRANSLATION] The command of compile you execute will ultimately generate a single-threaded simulation program, and it's expected that the simulation speed will be very slow. You may still need to switch to a better machine or refer to the contents of this document to speed up the simulation.

您好,我在执行的过程中遇到了与题主一模一样的问题。根据您的回答,我是否可以理解为:我需要重新生成香山核的仿真程序?

那么在重新执行生成香山核仿真程序之前,我应该如何清除之前生成的相关内容,直接在 xs-env\XiangShan 目录下执行 make clean ,然后执行生成香山核仿真程序生成指令 make emu CONFIG=MinimalConfig EMU_TRACE=1 EMU_THREADS=2 -j4 吗?

[TRANSLATION] Hello, I encountered the exact same issue as the original poster during execution. Based on your response, can I understand that I need to regenerate the XiangShan core simulation program?

Before regenerating the XiangShan core simulation program, how should I clean up the previously generated content? Should I directly execute the command make clean in the xs-env/XiangShan directory, and then execute the command for generating the XiangShan core simulation program:

make emu CONFIG=MinimalConfig EMU_TRACE=1 EMU_THREADS=2 -j4

是的,您需要首先在 xs-env\XiangShan 目录下执行 make clean清除之前生成的相关内容,再重新生成香山仿真程序,您所说的流程是正确的。

不过基于您的命令会生成双线程的仿真程序,在部分机器上进行仿真可能依旧会耗时较长。在条件允许的情况下,您可以参考这篇文档 进一步加快仿真速度,或是换用更好的机器

[TRANSLATION]
Yes, you need to first execute make clean in the xs-env/XiangShan directory to clean up the previously generated content, and then regenerate the XiangShan simulation program. The process you described is correct.

However, your new command will generate a dual-threaded simulation program, which may still take a long time to simulate on some machines. If conditions allow, you can refer to this document to further accelerate the simulation speed or consider switching to a better-performing machine.

@quanchenliu
Copy link

quanchenliu commented Nov 28, 2024

感谢您的热心帮助!

[TRANSLATION]
Thank you for your kind help!

@quanchenliu
Copy link

quanchenliu commented Dec 4, 2024

您编译的指令最终会生成单线程的仿真程序,可预见仿真速度会很慢。可能还是需要换用更好的机器,或参考 这篇文档 的内容来加快仿真速度。
[TRANSLATION] The command of compile you execute will ultimately generate a single-threaded simulation program, and it's expected that the simulation speed will be very slow. You may still need to switch to a better machine or refer to the contents of this document to speed up the simulation.

您好,我在执行的过程中遇到了与题主一模一样的问题。根据您的回答,我是否可以理解为:我需要重新生成香山核的仿真程序?
那么在重新执行生成香山核仿真程序之前,我应该如何清除之前生成的相关内容,直接在 xs-env\XiangShan 目录下执行 make clean ,然后执行生成香山核仿真程序生成指令 make emu CONFIG=MinimalConfig EMU_TRACE=1 EMU_THREADS=2 -j4 吗?
[TRANSLATION] Hello, I encountered the exact same issue as the original poster during execution. Based on your response, can I understand that I need to regenerate the XiangShan core simulation program?
Before regenerating the XiangShan core simulation program, how should I clean up the previously generated content? Should I directly execute the command make clean in the xs-env/XiangShan directory, and then execute the command for generating the XiangShan core simulation program:

make emu CONFIG=MinimalConfig EMU_TRACE=1 EMU_THREADS=2 -j4

是的,您需要首先在 xs-env\XiangShan 目录下执行 make clean清除之前生成的相关内容,再重新生成香山仿真程序,您所说的流程是正确的。

不过基于您的命令会生成双线程的仿真程序,在部分机器上进行仿真可能依旧会耗时较长。在条件允许的情况下,您可以参考这篇文档 进一步加快仿真速度,或是换用更好的机器

[TRANSLATION] Yes, you need to first execute make clean in the xs-env/XiangShan directory to clean up the previously generated content, and then regenerate the XiangShan simulation program. The process you described is correct.

However, your new command will generate a dual-threaded simulation program, which may still take a long time to simulate on some machines. If conditions allow, you can refer to this document to further accelerate the simulation speed or consider switching to a better-performing machine.

我重新执行了以下指令:
[TRANSLATION] I re-executed the following commands.

make clean
make riscv64-xs-ref_defconfig
make -j
cd ..
cd XiangShan/
./build/emu -i $NOOP_HOME/ready-to-run/linux.bin

我并没有看到 OpenSBI 的 LOGO,因此并不确定该指令是否完成了预期的任务。为了方便您能更好地理解我所遇到的问题,我将输出上传至 output.txt
[TRANSLATION] I did not see the OpenSBI logo, so I'm not sure if the command has completed the expected task. To help you better understand the issue I encountered, I have uploaded the output to output.txt

@NewPaulWalker
Copy link
Contributor

我并没有看到 OpenSBI 的 LOGO,因此并不确定该指令是否完成了预期的任务。为了方便您能更好地理解我所遇到的问题,我将输出上传至 output.txt [TRANSLATION] I did not see the OpenSBI logo, so I'm not sure if the command has completed the expected task. To help you better understand the issue I encountered, I have uploaded the output to output.txt

我使用default config的emu能够在仿真300w条指令前打印出OpenSBI的logo。你提供的output.txt中显示约在90w条指令处XiangShan和NEMU的行为发生了不一致。我同时也使用了一个minimal config的emu进行仿真,并且执行到将近200w条指令处仍未报错,但是这一过程十分耗时,对于minimal config下EMU_THREADS=2的emu,linux.bin 仿真180w条指令花费了30多分钟,而default config下EMU_THREADS=8的emu,linux.bin 仿真300w条指令仅花费10分钟。

因此,我首先建议你尝试更新XiangShan最新的代码,包括子模块的更新,然后重新编译emu仿真看是否解决了上述不一致的问题。因为我在较新的master分支下没有复现这个不一致的问题。第二,你应该及时更新代码,并且在提问时,告知我们你使用的XiangShan代码commit号。第三,如果你仍然使用minimal config来仿真,在不报错的情况下可能需要花费1个多小时来看到打印opensbi的logo。

[TRANSLATION]
I was able to see the OpenSBI logo printed before simulating 3 million instructions with the emulator using the default config. In the output.txt you provided, it shows that around 900,000 instructions, the behavior of XiangShan and NEMU became inconsistent. I also used an emulator with a minimal config for simulation, and even after executing nearly 2 million instructions, no errors occurred. However, this process was very time-consuming. For the emulator with EMU_THREADS=2 under minimal config, simulating 1.8 million instructions with linux.bin took over 30 minutes, while for the emulator with EMU_THREADS=8 under default config, simulating 3 million instructions with linux.bin only took 10 minutes.

Therefore, I first recommend that you try updating the latest code of XiangShan, including submodule updates, and then recompile the emulator to see if the inconsistency issue is resolved, as I did not encounter this inconsistency under the newer master branch.
Second, you should update your code regularly and inform us of the XiangShan commit hash you are using when asking questions.
Third, if you are still using the minimal config for simulation, it may take over an hour to see the OpenSBI logo printed without any errors.

@quanchenliu quanchenliu mentioned this issue Dec 10, 2024
3 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
problem Problem requiring help
Projects
None yet
Development

No branches or pull requests

5 participants