Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Support For Executing Runtime Tests Over a Serial Connection #1

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

oppelt-boeing
Copy link

Adds support for executing runtime tests over a serial connection. This is done with a new serial test target, which is a drop in replacement for the ssh target. The serial target has the same API as the ssh target, with runtime limitations discussed below.

To use, set the following:

  • TEST_TARGET to "serial"
  • TEST_SERIALCONTROL_CMD to a shell command or script which connects to the serial console of the target and forwards that connection to standard input/output.
  • TEST_SERIALCONTROL_EXTRA_ARGS (optional) any parameters that must be passed to the serial control command.
  • TEST_SERIALCONTROL_PS2 A regex string representing an empty prompt on the target terminal. Example: "root@target:.*#". This is used to find an empty shell after each command is run.

The serial target does have some additional limitations over the ssh target.

  1. Only supports one "run" command at a time. If two threads attempt to call "run", one will block until it finishes. This is a limitation of the serial link, since two connections cannot be opened at once.
  2. For file transfer, the target needs a shell and the base32 program. The file transfer implementation was chosen to be as generic as possible, so it could support as many targets as possible.
  3. Transferring files is significantly slower. On a 115200 baud serial connection, the fastest observed speed was 31kbps. This is caused by the slower link speed and overhead in the implementation due to decisions documented in Add copyDirFrom method to oesshtarget #2 above.

Signed-off-by: Andrew Oppelt [email protected]

@oppelt-boeing oppelt-boeing added the enhancement New feature or request label Jun 26, 2024
@oppelt-boeing oppelt-boeing self-assigned this Jun 26, 2024
Copy link
Contributor

@matthew-l-weber matthew-l-weber left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm good with these changes.

Signed-off-by: Matthew Weber [email protected]

@oppelt-boeing oppelt-boeing requested a review from mk3890 June 27, 2024 17:35
github-actions bot pushed a commit that referenced this pull request Jun 27, 2024
…ctor

Integrating the following commit(s) to linux-yocto/6.5:

1/2 [
    Author: Thomas Gleixner
    Email: [email protected]
    Subject: x86/alternatives: Sync core before enabling interrupts
    Date: Thu, 7 Dec 2023 20:49:24 +0100

    text_poke_early() does:

       local_irq_save(flags);
       memcpy(addr, opcode, len);
       local_irq_restore(flags);
       sync_core();

    That's not really correct because the synchronization should happen before
    interrupts are reenabled to ensure that a pending interrupt observes the
    complete update of the opcodes.

    It's not entirely clear whether the interrupt entry provides enough
    serialization already, but moving the sync_core() invocation into interrupt
    disabled region does no harm and is obviously correct.

    Signed-off-by: Thomas Gleixner <[email protected]>
    Signed-off-by: Bruce Ashfield <[email protected]>
]

2/2 [
    Author: Thomas Gleixner
    Email: [email protected]
    Subject: x86/alternatives: Disable interrupts and sync when optimizing NOPs in place
    Date: Thu, 7 Dec 2023 20:49:26 +0100

    apply_alternatives() treats alternatives with the ALT_FLAG_NOT flag set
    special as it optimizes the existing NOPs in place.

    Unfortunately this happens with interrupts enabled and does not provide any
    form of core synchronization.

    So an interrupt hitting in the middle of the update and using the affected
    code path will observe a half updated NOP and crash and burn. The following
    3 NOP sequence was observed to expose this crash halfways reliably under
    QEMU 32bit:

       0x90 0x90 0x90

    which is replaced by the optimized 3 byte NOP:

       0x8d 0x76 0x00

    So an interrupt can observe:

       1) 0x90 0x90 0x90		nop nop nop
       2) 0x8d 0x90 0x90		undefined
       3) 0x8d 0x76 0x90		lea    -0x70(%esi),%esi
       4) 0x8d 0x76 0x00		lea     0x0(%esi),%esi

    Where only #1 and #4 are true NOPs. The same problem exists for 64bit obviously.

    Disable interrupts around this NOP optimization and invoke sync_core()
    before reenabling them.

    Fixes: 270a69c4485d ("x86/alternative: Support relocations in alternatives")
    Reported-by: Paul Gortmaker <[email protected]>
    Signed-off-by: Thomas Gleixner <[email protected]>
    Cc: [email protected]
    Signed-off-by: Bruce Ashfield <[email protected]>
]

Signed-off-by: Bruce Ashfield <[email protected]>
Signed-off-by: Alexandre Belloni <[email protected]>
Signed-off-by: Richard Purdie <[email protected]>
(cherry picked from commit 1c8d29a)
Signed-off-by: Steve Sakoman <[email protected]>
github-actions bot pushed a commit that referenced this pull request Jun 27, 2024
Qemu introduced a commit "target/i386: Enable AVX cpuid bits when using TCG"
since v7.2.0. It causes qemu-system-i386 hang with following error:

traps: rndc-confgen[342] general protection fault ip:b7ef5545 sp:bfcc6e6c error:0
------------[ cut here ]------------
Bad FPU state detected at __restore_fpregs_from_fpstate+0x2f/0x60, reinitializing FPU registers.
WARNING: CPU: 7 PID: 353 at arch/x86/mm/extable.c:65 fixup_exception+0x29c/0x2d0
Modules linked in: cfg80211 8021q parport_pc parport sch_fq_codel openvswitch nsh nf_conncount nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 kvm irqbypass fuse configfs
CPU: 7 PID: 353 Comm: in:imklog Not tainted 5.15.78-yocto-standard #1
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.16.1-0-g3208b098f51a-prebuilt.qemu.org 04/01/2014
EIP: fixup_exception+0x29c/0x2d0
Code: 05 ed da 89 df 01 68 b0 cb 5f df e8 4f e7 b6 00 0f 0b 58 e9 9d fe ff ff c6 05 ef da 89 df 01 50 68 f0 cb 5f df e8 35 e7 b6 00 <0f> 0b 5b 5e e9 0a ff ff ff ba 01 00 00 00 89 f0 e8 8a c1 b6 00 0f
EAX: 00000060 EBX: df734b60 ECX: f5be9cd0 EDX: f5be9ccc
ESI: c3485eec EDI: 0000000d EBP: c3485e64 ESP: c3485e4c
DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068 EFLAGS: 00000096
CR0: 80050033 CR2: b79fdde0 CR3: 03cbe000 CR4: 001506d0
Call Trace:
 ? __restore_fpregs_from_fpstate+0x2f/0x60
 exc_general_protection+0x9a/0x390
 ? exc_bounds+0x90/0x90
 handle_exception+0x133/0x133

Upstream has been fixed this issue[1], so backport the patch to fix it.

Ref:
[1] https://gitlab.com/qemu-project/qemu/-/commit/48b60eb6c917646df9efa7ddb4c25929f358d647

Signed-off-by: Xiangyu Chen <[email protected]>
Signed-off-by: Steve Sakoman <[email protected]>
meta/lib/oeqa/core/target/serial.py Show resolved Hide resolved
meta/classes-recipe/testexport.bbclass Show resolved Hide resolved
meta/lib/oeqa/core/target/serial.py Outdated Show resolved Hide resolved
meta/lib/oeqa/core/target/serial.py Outdated Show resolved Hide resolved
meta/lib/oeqa/core/target/serial.py Show resolved Hide resolved
meta/lib/oeqa/runtime/context.py Show resolved Hide resolved
@matthew-l-weber
Copy link
Contributor

Could you post the test sequence/command listing you used to validate this? That would also be good to add below a -- line in the commit description.

@matthew-l-weber
Copy link
Contributor

Would this submission be to https://lists.openembedded.org/g/openembedded-devel or -core?

@oppelt-boeing
Copy link
Author

Could you post the test sequence/command listing you used to validate this? That would also be good to add below a -- line in the commit description.

Added this information to the commit message.

github-actions bot pushed a commit that referenced this pull request Jul 9, 2024
Integrating the following commit(s) to linux-yocto/6.6:

1/1 [
    Author: Bruce Ashfield
    Email: [email protected]
    Subject: cpu/amd: inhibit SMP check for qemux86
    Date: Fri, 28 Jun 2024 12:55:18 -0400

    When booting with kvm enabled on a AMD host, the following
    trace is thrown:

      [    0.084519] ------------[ cut here ]------------
      [    0.084519] WARNING: This combination of AMD processors is not suitable for SMP.
      [    0.084519] WARNING: CPU: 1 PID: 0 at /arch/x86/kernel/cpu/amd.c:341 init_amd+0xaee/0xbcc
      [    0.084519] Modules linked in:
      [    0.084519] CPU: 1 PID: 0 Comm: swapper/1 Not tainted 6.6.32-yocto-standard #1
      [    0.084519] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.16.3-0-ga6ed6b701f0a-prebuilt.qemu.org 04/01/2014

    This warning is not valid in our configuration and is unnecesarily
    causing issue with debug.

    This has been know for some time (10+ years), but no acceptable
    solutioon has been found upstream:

       https://lists.gnu.org/archive/html/qemu-devel/2010-03/msg01428.html
       https://lkml.org/lkml/2010/3/30/397

    We have a configuration CONFIG_QEMUX86 that has been added for
    situations like this. When that value is defined, we inhibit the
    warning, but leave it as-is for other BSPs.

    Signed-off-by: Bruce Ashfield <[email protected]>
]

Signed-off-by: Bruce Ashfield <[email protected]>
Signed-off-by: Richard Purdie <[email protected]>
github-actions bot pushed a commit that referenced this pull request Jul 9, 2024
Integrating the following commit(s) to linux-yocto/6.6:

1/1 [
    Author: Bruce Ashfield
    Email: [email protected]
    Subject: cpu/amd: inhibit SMP check for qemux86
    Date: Fri, 28 Jun 2024 12:55:18 -0400

    When booting with kvm enabled on a AMD host, the following
    trace is thrown:

      [    0.084519] ------------[ cut here ]------------
      [    0.084519] WARNING: This combination of AMD processors is not suitable for SMP.
      [    0.084519] WARNING: CPU: 1 PID: 0 at /arch/x86/kernel/cpu/amd.c:341 init_amd+0xaee/0xbcc
      [    0.084519] Modules linked in:
      [    0.084519] CPU: 1 PID: 0 Comm: swapper/1 Not tainted 6.6.32-yocto-standard #1
      [    0.084519] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.16.3-0-ga6ed6b701f0a-prebuilt.qemu.org 04/01/2014

    This warning is not valid in our configuration and is unnecesarily
    causing issue with debug.

    This has been know for some time (10+ years), but no acceptable
    solutioon has been found upstream:

       https://lists.gnu.org/archive/html/qemu-devel/2010-03/msg01428.html
       https://lkml.org/lkml/2010/3/30/397

    We have a configuration CONFIG_QEMUX86 that has been added for
    situations like this. When that value is defined, we inhibit the
    warning, but leave it as-is for other BSPs.

    Signed-off-by: Bruce Ashfield <[email protected]>
]

Signed-off-by: Bruce Ashfield <[email protected]>
Signed-off-by: Richard Purdie <[email protected]>
(cherry picked from commit f0c0300)
Signed-off-by: Steve Sakoman <[email protected]>
@oppelt-boeing oppelt-boeing force-pushed the serial_support branch 2 times, most recently from 7adb0d3 to 436b990 Compare July 23, 2024 22:36
Copy link
Contributor

@chuckwolber chuckwolber left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some additional findings after discussing use cases on MM (testexport vs. testimage). In taking a closer look I had some additional findings beyond what was discussed. Apologies for not pointing them out sooner.

Once all reviews are complete and resolved, please add Signed-off-by: lines for Matthew Weber and I to the commit message. Mine should look like, Signed-off-by: Chuck Wolber <[email protected]>.

meta/classes-recipe/testimage.bbclass Outdated Show resolved Hide resolved
meta/lib/oeqa/core/target/serial.py Outdated Show resolved Hide resolved
meta/lib/oeqa/core/target/serial.py Outdated Show resolved Hide resolved
meta/lib/oeqa/runtime/context.py Show resolved Hide resolved
chuckwolber
chuckwolber previously approved these changes Jul 30, 2024
@chuckwolber
Copy link
Contributor

Great work, all changes look good to me!

Please add me as a signer below @matthew-l-weber in your commit message: Signed-off-by: Chuck Wolber <[email protected]>.

@matthew-l-weber
Copy link
Contributor

Looks good to send out!

@chuckwolber chuckwolber self-requested a review August 12, 2024 22:04
@chuckwolber chuckwolber dismissed their stale review August 12, 2024 22:07

Autobuilder findings require an update to this patch set.

Copy link
Contributor

@chuckwolber chuckwolber left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like there are some autobuilder failures from this patch:

https://autobuilder.yoctoproject.org/typhoon/#/builders/83/builds/7227/steps/25/logs/stdio
https://autobuilder.yoctoproject.org/typhoon/#/builders/117/builds/5177/steps/13/logs/stdio
https://autobuilder.yoctoproject.org/typhoon/#/builders/44/builds/9390/steps/13/logs/stdio
https://autobuilder.yoctoproject.org/typhoon/#/builders/47/builds/9314/steps/13/logs/stdio

I asked @kraj about this and he pointed me at the GLIBC selftest code. Looking at the GLIBC selftest code, as well as several others, it looks like we need to add a meta/lib/oeqa/selftest/cases/serial.py that manages the actual test and its dependencies.

The autobuilder messages look like they have enough information to reproduce this on our end before pushing a V2 patch set.

Uses TEST_SERIALCONTROL_CMD to open a serial connection to the target
and execute commands. This is a drop in replacement for the ssh target,
fully supporting the same API. Supported with testexport.

To use, set the following in local.conf:
- TEST_TARGET to "serial"
- TEST_SERIALCONTROL_CMD to a shell command or script which connects to
  the serial console of the target and forwards that connection to
  standard input/output.
- TEST_SERIALCONTROL_EXTRA_ARGS (optional) any parameters that must be
  passed to the serial control command.
- TEST_SERIALCONTROL_PS1 (optional) A regex string representing an empty
  prompt on the target terminal. Example: "root@target:.*# ". This is
  used to find an empty shell after each command is run. This field is
  optional and will default to "root@{MACHINE}:.*# " if no other value is
  given.
- TEST_SERIALCONTROL_CONNECT_TIMEOUT (optional) Specifies the timeout in
  seconds for the initial connection to the target. Defaults to 10 if no
  other value is given.

The serial target does have some additional limitations over the ssh
target.
1. Only supports one "run" command at a time. If two threads attempt to
   call "run", one will block until it finishes. This is a limitation of
   the serial link, since two connections cannot be opened at once.
2. For file transfer, the target needs a shell and the base32 program.
   The file transfer implementation was chosen to be as generic as
   possible, so it could support as many targets as possible.
3. Transferring files is significantly slower. On a 115200 baud serial
   connection, the fastest observed speed was 30kbps. This is due to
   overhead in the implementation due to decisions documented in Boeing#2
   above.

Signed-off-by: Andrew Oppelt <[email protected]>
Signed-off-by: Matthew Weber <[email protected]>
Signed-off-by: Chuck Wolber <[email protected]>

--

Tested with core-image-sato on real hardware. TEST_SERIALCONTROL_CMD
was set to a bash script which connected with telnet to the target.

Additionally tested with QEMU by setting TEST_SERIALCONTROL_CMD to
"ssh -o StrictHostKeyChecking=no [email protected]". This imitates
a serial connection to the QEMU instance.

Steps:
1) Set the following in local.conf:
  - IMAGE_CLASSES += "testexport"
  - TEST_TARGET = "serial"
  - TEST_SERIALCONTROL_CMD="ssh -o StrictHostKeyChecking=no [email protected]"
2) Build an image
  - bitbake core-image-sato
3) Run the test export
  - bitbake -c testexport core-image-sato
4) Run the image in qemu
  - runqemu nographic core-image-sato
5) Navigate to the test export directory
6) Run the exported tests with target-type set to serial
 - ./oe-test runtime --test-data-file ./data/testdata.json --packages-manifest ./data/manifest --debug --target-type serial
@oppelt-boeing
Copy link
Author

Looks like there are some autobuilder failures from this patch:

https://autobuilder.yoctoproject.org/typhoon/#/builders/83/builds/7227/steps/25/logs/stdio https://autobuilder.yoctoproject.org/typhoon/#/builders/117/builds/5177/steps/13/logs/stdio https://autobuilder.yoctoproject.org/typhoon/#/builders/44/builds/9390/steps/13/logs/stdio https://autobuilder.yoctoproject.org/typhoon/#/builders/47/builds/9314/steps/13/logs/stdio

I asked @kraj about this and he pointed me at the GLIBC selftest code. Looking at the GLIBC selftest code, as well as several others, it looks like we need to add a meta/lib/oeqa/selftest/cases/serial.py that manages the actual test and its dependencies.

The autobuilder messages look like they have enough information to reproduce this on our end before pushing a V2 patch set.

I re-submitted the patch and moved the pexpect requirement out of build time and to only be required when specifically using a serial test export.

@chuckwolber
Copy link
Contributor

Patchwork link for tracking purposes: https://patchwork.yoctoproject.org/project/oe-core/patch/[email protected]/

github-actions bot pushed a commit that referenced this pull request Aug 25, 2024
Fixes
ERROR: systemd-1_256.5-r0 do_patch: QA Issue: Fuzz detected:

Applying patch 0017-missing_syscall.h-Define-MIPS-ABI-defines-for-musl.patch
patching file src/basic/missing_syscall.h
Hunk #1 succeeded at 20 with fuzz 1.

The issue surfaces when building with musl

Signed-off-by: Khem Raj <[email protected]>
Signed-off-by: Richard Purdie <[email protected]>
github-actions bot pushed a commit that referenced this pull request Sep 6, 2024
In systemd/systemd@924453c
ProtectHome was set to true for systemd-coredump in order to reduce risk, since an attacker could craft a malicious binary in order to compromise systemd-coredump.
At that point the object analysis was done in the main systemd-coredump process.
Because of this systemd-coredump is unable to product symbolicated call-stacks for binaries running under /home ("n/a" is shown instead of function names).

However, later in systemd/systemd@61aea45 systemd-coredump was changed to do the object analysis in a forked process,
covering those security concerns.

Let's set ProtectHome to read-only so that systemd-coredump produces symbolicated call-stacks for processes running under /home.

Note: it still does not work in /tmp (because of PrivateTmp=yes) and in /root (for unknown reasons).

Before the change (with minidebuginfo enabled):

    root@qemux86-64:~# /home/sleep 1000 &
    [1] 426
    root@qemux86-64:~# kill -11 $(pidof sleep)
    root@qemux86-64:~# coredumpctl info
               PID: 426 (sleep)
               UID: 0 (root)
               GID: 0 (root)
            Signal: 11 (SEGV)
         Timestamp: Fri 2024-09-06 17:25:18 UTC (3s ago)
      Command Line: /home/sleep 1000
        Executable: /home/sleep
     Control Group: /system.slice/system-serial\x2dgetty.slice/[email protected]
              Unit: [email protected]
             Slice: system-serial\x2dgetty.slice
           Boot ID: 44ef4ddfaad249ceaa29d1e9f330d3b5
        Machine ID: fb279f18f2c849c59768754c7a274ee3
          Hostname: qemux86-64
           Storage: /var/lib/systemd/coredump/core.sleep.0.44ef4ddfaad249ceaa29d1e9f330d3b5.426.1725643518000000.zst (present)
      Size on Disk: 16.5K
           Message: Process 426 (sleep) of user 0 dumped core.

                    Stack trace of thread 426:
                    #0  0x00007f365f3849a7 clock_nanosleep (libc.so.6 + 0xd49a7)
                    #1  0x00007f365f38f667 __nanosleep (libc.so.6 + 0xdf667)
                    #2  0x0000561fee703737 n/a (/home/sleep + 0x7737)
                    #3  0x000000003a6227c5 n/a (n/a + 0x0)
                    ELF object binary architecture: AMD x86-64
    [1]+  Segmentation fault      (core dumped) /home/sleep 1000

After the change (with minidebuginfo enabled):

    root@qemux86-64:~# /home/sleep 1000 &
    [1] 450
    root@qemux86-64:~# kill -11 $(pidof sleep)
    root@qemux86-64:~# coredumpctl info
               PID: 450 (sleep)
               UID: 0 (root)
               GID: 0 (root)
            Signal: 11 (SEGV)
         Timestamp: Fri 2024-09-06 17:30:12 UTC (4s ago)
      Command Line: /home/sleep 1000
        Executable: /home/sleep
     Control Group: /system.slice/system-serial\x2dgetty.slice/[email protected]
              Unit: [email protected]
             Slice: system-serial\x2dgetty.slice
           Boot ID: 44ef4ddfaad249ceaa29d1e9f330d3b5
        Machine ID: fb279f18f2c849c59768754c7a274ee3
          Hostname: qemux86-64
           Storage: /var/lib/systemd/coredump/core.sleep.0.44ef4ddfaad249ceaa29d1e9f330d3b5.450.1725643812000000.zst (present)
      Size on Disk: 16.5K
           Message: Process 450 (sleep) of user 0 dumped core.

                    Stack trace of thread 450:
                    #0  0x00007f795dd689a7 clock_nanosleep (libc.so.6 + 0xd49a7)
                    #1  0x00007f795dd73667 __nanosleep (libc.so.6 + 0xdf667)
                    #2  0x0000561965c9d737 rpl_nanosleep (sleep + 0x7737)
                    #3  0x0000561965c9d0c1 xnanosleep (sleep + 0x70c1)
                    #4  0x0000561965c985c8 main (sleep + 0x25c8)
                    #5  0x00007f795dcba01b __libc_start_call_main (libc.so.6 + 0x2601b)
                    #6  0x00007f795dcba0d9 __libc_start_main (libc.so.6 + 0x260d9)
                    #7  0x0000561965c98685 _start (sleep + 0x2685)
                    ELF object binary architecture: AMD x86-64
    [1]+  Segmentation fault      (core dumped) /home/sleep 1000

Signed-off-by: Etienne Cordonnier <[email protected]>
Signed-off-by: Richard Purdie <[email protected]>
github-actions bot pushed a commit that referenced this pull request Sep 7, 2024
In systemd/systemd@924453c
ProtectHome was set to true for systemd-coredump in order to reduce risk, since an attacker could craft a malicious binary in order to compromise systemd-coredump.
At that point the object analysis was done in the main systemd-coredump process.
Because of this systemd-coredump is unable to product symbolicated call-stacks for binaries running under /home ("n/a" is shown instead of function names).

However, later in systemd/systemd@61aea45 systemd-coredump was changed to do the object analysis in a forked process,
covering those security concerns.

Let's set ProtectHome to read-only so that systemd-coredump produces symbolicated call-stacks for processes running under /home.

Note: it still does not work in /tmp (because of PrivateTmp=yes) and in /root (for unknown reasons).

Before the change (with minidebuginfo enabled):

    root@qemux86-64:~# /home/sleep 1000 &
    [1] 426
    root@qemux86-64:~# kill -11 $(pidof sleep)
    root@qemux86-64:~# coredumpctl info
               PID: 426 (sleep)
               UID: 0 (root)
               GID: 0 (root)
            Signal: 11 (SEGV)
         Timestamp: Fri 2024-09-06 17:25:18 UTC (3s ago)
      Command Line: /home/sleep 1000
        Executable: /home/sleep
     Control Group: /system.slice/system-serial\x2dgetty.slice/[email protected]
              Unit: [email protected]
             Slice: system-serial\x2dgetty.slice
           Boot ID: 44ef4ddfaad249ceaa29d1e9f330d3b5
        Machine ID: fb279f18f2c849c59768754c7a274ee3
          Hostname: qemux86-64
           Storage: /var/lib/systemd/coredump/core.sleep.0.44ef4ddfaad249ceaa29d1e9f330d3b5.426.1725643518000000.zst (present)
      Size on Disk: 16.5K
           Message: Process 426 (sleep) of user 0 dumped core.

                    Stack trace of thread 426:
                    #0  0x00007f365f3849a7 clock_nanosleep (libc.so.6 + 0xd49a7)
                    #1  0x00007f365f38f667 __nanosleep (libc.so.6 + 0xdf667)
                    #2  0x0000561fee703737 n/a (/home/sleep + 0x7737)
                    #3  0x000000003a6227c5 n/a (n/a + 0x0)
                    ELF object binary architecture: AMD x86-64
    [1]+  Segmentation fault      (core dumped) /home/sleep 1000

After the change (with minidebuginfo enabled):

    root@qemux86-64:~# /home/sleep 1000 &
    [1] 450
    root@qemux86-64:~# kill -11 $(pidof sleep)
    root@qemux86-64:~# coredumpctl info
               PID: 450 (sleep)
               UID: 0 (root)
               GID: 0 (root)
            Signal: 11 (SEGV)
         Timestamp: Fri 2024-09-06 17:30:12 UTC (4s ago)
      Command Line: /home/sleep 1000
        Executable: /home/sleep
     Control Group: /system.slice/system-serial\x2dgetty.slice/[email protected]
              Unit: [email protected]
             Slice: system-serial\x2dgetty.slice
           Boot ID: 44ef4ddfaad249ceaa29d1e9f330d3b5
        Machine ID: fb279f18f2c849c59768754c7a274ee3
          Hostname: qemux86-64
           Storage: /var/lib/systemd/coredump/core.sleep.0.44ef4ddfaad249ceaa29d1e9f330d3b5.450.1725643812000000.zst (present)
      Size on Disk: 16.5K
           Message: Process 450 (sleep) of user 0 dumped core.

                    Stack trace of thread 450:
                    #0  0x00007f795dd689a7 clock_nanosleep (libc.so.6 + 0xd49a7)
                    #1  0x00007f795dd73667 __nanosleep (libc.so.6 + 0xdf667)
                    #2  0x0000561965c9d737 rpl_nanosleep (sleep + 0x7737)
                    #3  0x0000561965c9d0c1 xnanosleep (sleep + 0x70c1)
                    #4  0x0000561965c985c8 main (sleep + 0x25c8)
                    #5  0x00007f795dcba01b __libc_start_call_main (libc.so.6 + 0x2601b)
                    #6  0x00007f795dcba0d9 __libc_start_main (libc.so.6 + 0x260d9)
                    #7  0x0000561965c98685 _start (sleep + 0x2685)
                    ELF object binary architecture: AMD x86-64
    [1]+  Segmentation fault      (core dumped) /home/sleep 1000

Signed-off-by: Etienne Cordonnier <[email protected]>
Signed-off-by: Richard Purdie <[email protected]>
github-actions bot pushed a commit that referenced this pull request Sep 9, 2024
In systemd/systemd@924453c
ProtectHome was set to true for systemd-coredump in order to reduce risk, since an attacker could craft a malicious binary in order to compromise systemd-coredump.
At that point the object analysis was done in the main systemd-coredump process.
Because of this systemd-coredump is unable to product symbolicated call-stacks for binaries running under /home ("n/a" is shown instead of function names).

However, later in systemd/systemd@61aea45 systemd-coredump was changed to do the object analysis in a forked process,
covering those security concerns.

Let's set ProtectHome to read-only so that systemd-coredump produces symbolicated call-stacks for processes running under /home.

Note: it still does not work in /tmp (because of PrivateTmp=yes) and in /root (for unknown reasons).

Before the change (with minidebuginfo enabled):

    root@qemux86-64:~# /home/sleep 1000 &
    [1] 426
    root@qemux86-64:~# kill -11 $(pidof sleep)
    root@qemux86-64:~# coredumpctl info
               PID: 426 (sleep)
               UID: 0 (root)
               GID: 0 (root)
            Signal: 11 (SEGV)
         Timestamp: Fri 2024-09-06 17:25:18 UTC (3s ago)
      Command Line: /home/sleep 1000
        Executable: /home/sleep
     Control Group: /system.slice/system-serial\x2dgetty.slice/[email protected]
              Unit: [email protected]
             Slice: system-serial\x2dgetty.slice
           Boot ID: 44ef4ddfaad249ceaa29d1e9f330d3b5
        Machine ID: fb279f18f2c849c59768754c7a274ee3
          Hostname: qemux86-64
           Storage: /var/lib/systemd/coredump/core.sleep.0.44ef4ddfaad249ceaa29d1e9f330d3b5.426.1725643518000000.zst (present)
      Size on Disk: 16.5K
           Message: Process 426 (sleep) of user 0 dumped core.

                    Stack trace of thread 426:
                    #0  0x00007f365f3849a7 clock_nanosleep (libc.so.6 + 0xd49a7)
                    #1  0x00007f365f38f667 __nanosleep (libc.so.6 + 0xdf667)
                    #2  0x0000561fee703737 n/a (/home/sleep + 0x7737)
                    #3  0x000000003a6227c5 n/a (n/a + 0x0)
                    ELF object binary architecture: AMD x86-64
    [1]+  Segmentation fault      (core dumped) /home/sleep 1000

After the change (with minidebuginfo enabled):

    root@qemux86-64:~# /home/sleep 1000 &
    [1] 450
    root@qemux86-64:~# kill -11 $(pidof sleep)
    root@qemux86-64:~# coredumpctl info
               PID: 450 (sleep)
               UID: 0 (root)
               GID: 0 (root)
            Signal: 11 (SEGV)
         Timestamp: Fri 2024-09-06 17:30:12 UTC (4s ago)
      Command Line: /home/sleep 1000
        Executable: /home/sleep
     Control Group: /system.slice/system-serial\x2dgetty.slice/[email protected]
              Unit: [email protected]
             Slice: system-serial\x2dgetty.slice
           Boot ID: 44ef4ddfaad249ceaa29d1e9f330d3b5
        Machine ID: fb279f18f2c849c59768754c7a274ee3
          Hostname: qemux86-64
           Storage: /var/lib/systemd/coredump/core.sleep.0.44ef4ddfaad249ceaa29d1e9f330d3b5.450.1725643812000000.zst (present)
      Size on Disk: 16.5K
           Message: Process 450 (sleep) of user 0 dumped core.

                    Stack trace of thread 450:
                    #0  0x00007f795dd689a7 clock_nanosleep (libc.so.6 + 0xd49a7)
                    #1  0x00007f795dd73667 __nanosleep (libc.so.6 + 0xdf667)
                    #2  0x0000561965c9d737 rpl_nanosleep (sleep + 0x7737)
                    #3  0x0000561965c9d0c1 xnanosleep (sleep + 0x70c1)
                    #4  0x0000561965c985c8 main (sleep + 0x25c8)
                    #5  0x00007f795dcba01b __libc_start_call_main (libc.so.6 + 0x2601b)
                    #6  0x00007f795dcba0d9 __libc_start_main (libc.so.6 + 0x260d9)
                    #7  0x0000561965c98685 _start (sleep + 0x2685)
                    ELF object binary architecture: AMD x86-64
    [1]+  Segmentation fault      (core dumped) /home/sleep 1000

Signed-off-by: Etienne Cordonnier <[email protected]>
Signed-off-by: Richard Purdie <[email protected]>
github-actions bot pushed a commit that referenced this pull request Sep 9, 2024
In systemd/systemd@924453c
ProtectHome was set to true for systemd-coredump in order to reduce risk, since an attacker could craft a malicious binary in order to compromise systemd-coredump.
At that point the object analysis was done in the main systemd-coredump process.
Because of this systemd-coredump is unable to product symbolicated call-stacks for binaries running under /home ("n/a" is shown instead of function names).

However, later in systemd/systemd@61aea45 systemd-coredump was changed to do the object analysis in a forked process,
covering those security concerns.

Let's set ProtectHome to read-only so that systemd-coredump produces symbolicated call-stacks for processes running under /home.

Note: it still does not work in /tmp (because of PrivateTmp=yes) and in /root (for unknown reasons).

Before the change (with minidebuginfo enabled):

    root@qemux86-64:~# /home/sleep 1000 &
    [1] 426
    root@qemux86-64:~# kill -11 $(pidof sleep)
    root@qemux86-64:~# coredumpctl info
               PID: 426 (sleep)
               UID: 0 (root)
               GID: 0 (root)
            Signal: 11 (SEGV)
         Timestamp: Fri 2024-09-06 17:25:18 UTC (3s ago)
      Command Line: /home/sleep 1000
        Executable: /home/sleep
     Control Group: /system.slice/system-serial\x2dgetty.slice/[email protected]
              Unit: [email protected]
             Slice: system-serial\x2dgetty.slice
           Boot ID: 44ef4ddfaad249ceaa29d1e9f330d3b5
        Machine ID: fb279f18f2c849c59768754c7a274ee3
          Hostname: qemux86-64
           Storage: /var/lib/systemd/coredump/core.sleep.0.44ef4ddfaad249ceaa29d1e9f330d3b5.426.1725643518000000.zst (present)
      Size on Disk: 16.5K
           Message: Process 426 (sleep) of user 0 dumped core.

                    Stack trace of thread 426:
                    #0  0x00007f365f3849a7 clock_nanosleep (libc.so.6 + 0xd49a7)
                    #1  0x00007f365f38f667 __nanosleep (libc.so.6 + 0xdf667)
                    #2  0x0000561fee703737 n/a (/home/sleep + 0x7737)
                    #3  0x000000003a6227c5 n/a (n/a + 0x0)
                    ELF object binary architecture: AMD x86-64
    [1]+  Segmentation fault      (core dumped) /home/sleep 1000

After the change (with minidebuginfo enabled):

    root@qemux86-64:~# /home/sleep 1000 &
    [1] 450
    root@qemux86-64:~# kill -11 $(pidof sleep)
    root@qemux86-64:~# coredumpctl info
               PID: 450 (sleep)
               UID: 0 (root)
               GID: 0 (root)
            Signal: 11 (SEGV)
         Timestamp: Fri 2024-09-06 17:30:12 UTC (4s ago)
      Command Line: /home/sleep 1000
        Executable: /home/sleep
     Control Group: /system.slice/system-serial\x2dgetty.slice/[email protected]
              Unit: [email protected]
             Slice: system-serial\x2dgetty.slice
           Boot ID: 44ef4ddfaad249ceaa29d1e9f330d3b5
        Machine ID: fb279f18f2c849c59768754c7a274ee3
          Hostname: qemux86-64
           Storage: /var/lib/systemd/coredump/core.sleep.0.44ef4ddfaad249ceaa29d1e9f330d3b5.450.1725643812000000.zst (present)
      Size on Disk: 16.5K
           Message: Process 450 (sleep) of user 0 dumped core.

                    Stack trace of thread 450:
                    #0  0x00007f795dd689a7 clock_nanosleep (libc.so.6 + 0xd49a7)
                    #1  0x00007f795dd73667 __nanosleep (libc.so.6 + 0xdf667)
                    #2  0x0000561965c9d737 rpl_nanosleep (sleep + 0x7737)
                    #3  0x0000561965c9d0c1 xnanosleep (sleep + 0x70c1)
                    #4  0x0000561965c985c8 main (sleep + 0x25c8)
                    #5  0x00007f795dcba01b __libc_start_call_main (libc.so.6 + 0x2601b)
                    #6  0x00007f795dcba0d9 __libc_start_main (libc.so.6 + 0x260d9)
                    #7  0x0000561965c98685 _start (sleep + 0x2685)
                    ELF object binary architecture: AMD x86-64
    [1]+  Segmentation fault      (core dumped) /home/sleep 1000

Signed-off-by: Etienne Cordonnier <[email protected]>
Signed-off-by: Richard Purdie <[email protected]>
github-actions bot pushed a commit that referenced this pull request Oct 10, 2024
Fixed:
1) $ bitbake virtual/kernel -cmenuconfig
Do some changes and save the new config to default .config.
2) $ bitbake virtual/kernel -cdiffconfig
The config fragment is dumped into ${WORKDIR}/fragment.cfg.

But the .config which was saved by step #1 is overridden by .config.orig, so
the changes will be lost if run 'bitbake virtual/kernel'

And the following comment is for subprocess.call(), not for shutil.copy(),
so move subprocess.call() to the correct location.
    # No need to check the exit code as we know it's going to be
    # non-zero, but that's what we expect.

Signed-off-by: Robert Yang <[email protected]>
Signed-off-by: Richard Purdie <[email protected]>
github-actions bot pushed a commit that referenced this pull request Oct 10, 2024
Fixed:
1) $ bitbake virtual/kernel -cmenuconfig
Do some changes and save the new config to default .config.
2) $ bitbake virtual/kernel -cdiffconfig
The config fragment is dumped into ${WORKDIR}/fragment.cfg.

But the .config which was saved by step #1 is overridden by .config.orig, so
the changes will be lost if run 'bitbake virtual/kernel'

And the following comment is for subprocess.call(), not for shutil.copy(),
so move subprocess.call() to the correct location.
    # No need to check the exit code as we know it's going to be
    # non-zero, but that's what we expect.

Signed-off-by: Robert Yang <[email protected]>
Signed-off-by: Richard Purdie <[email protected]>
github-actions bot pushed a commit that referenced this pull request Oct 11, 2024
Fixed:
1) $ bitbake virtual/kernel -cmenuconfig
Do some changes and save the new config to default .config.
2) $ bitbake virtual/kernel -cdiffconfig
The config fragment is dumped into ${WORKDIR}/fragment.cfg.

But the .config which was saved by step #1 is overridden by .config.orig, so
the changes will be lost if run 'bitbake virtual/kernel'

And the following comment is for subprocess.call(), not for shutil.copy(),
so move subprocess.call() to the correct location.
    # No need to check the exit code as we know it's going to be
    # non-zero, but that's what we expect.

Signed-off-by: Robert Yang <[email protected]>
Signed-off-by: Richard Purdie <[email protected]>
github-actions bot pushed a commit that referenced this pull request Nov 26, 2024
Fixed:
1) $ bitbake virtual/kernel -cmenuconfig
Do some changes and save the new config to default .config.
2) $ bitbake virtual/kernel -cdiffconfig
The config fragment is dumped into ${WORKDIR}/fragment.cfg.

But the .config which was saved by step #1 is overridden by .config.orig, so
the changes will be lost if run 'bitbake virtual/kernel'

And the following comment is for subprocess.call(), not for shutil.copy(),
so move subprocess.call() to the correct location.
    # No need to check the exit code as we know it's going to be
    # non-zero, but that's what we expect.

Signed-off-by: Robert Yang <[email protected]>
Signed-off-by: Richard Purdie <[email protected]>
(cherry picked from commit 6cccf6b)
Signed-off-by: Steve Sakoman <[email protected]>
github-actions bot pushed a commit that referenced this pull request Nov 26, 2024
Fixed:
1) $ bitbake virtual/kernel -cmenuconfig
Do some changes and save the new config to default .config.
2) $ bitbake virtual/kernel -cdiffconfig
The config fragment is dumped into ${WORKDIR}/fragment.cfg.

But the .config which was saved by step #1 is overridden by .config.orig, so
the changes will be lost if run 'bitbake virtual/kernel'

And the following comment is for subprocess.call(), not for shutil.copy(),
so move subprocess.call() to the correct location.
    # No need to check the exit code as we know it's going to be
    # non-zero, but that's what we expect.

Signed-off-by: Robert Yang <[email protected]>
Signed-off-by: Richard Purdie <[email protected]>
(cherry picked from commit 6cccf6b)
Signed-off-by: Steve Sakoman <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants