Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add oscap debug #264

Merged
merged 9 commits into from
Oct 7, 2024
Merged

Add oscap debug #264

merged 9 commits into from
Oct 7, 2024

Conversation

comps
Copy link
Contributor

@comps comps commented Sep 18, 2024

(Not sure we should merge this in any form, this is just for possibly temporary testing and having it as a PR is convenient as our CI can run it.)

@comps
Copy link
Contributor Author

comps commented Sep 19, 2024

Added 3 tests in total:

  • vm-scan uses a nested VM to run a scan and try to reproduce the freeze - this seems to be MUCH faster than trying to run it on the same host OS, possibly because the host has a lot more packages (files on disk) due to being installed by Beaker -- freeze typically happens within 1-100 runs, longest took ~300
  • sysctl-only is unselecting everything except sysctl rules and running those on the host OS (which is fast enough) -- the only freeze I was able to reproduce was after ~5000 runs, not very reliable
  • helgrind is not trying to freeze openscap, it just runs oscap via valgrind --tool=helgrind on the host OS to try to debug threads deadlocking

Copy link
Contributor

@mildas mildas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cis_workstation_l1 is fast-to-scan profile and ok-ish for oscap-debug/vm-scan.py.
But for oscap-debug/sysctl-only.py it would be better to use other profile with more sysctl rules, for example ANSSI High (45 sysctl rules vs 71).

Moreover, I've seen freezes also on file_* rules which should be fast-to-scan as well (checking file permission/ownership). That's another 76 rules with chance to hit the freeze. So changing sysctl-only.py to something like textfilecontent-rules.py (as textfilecontent54 is the suspected problematic oscap probe) should make that more efficient. Nevermind, file_* are noticeably slower than sysctl.

test_metadata = yaml.safe_load(f)
self.update(test_metadata)

# dict's .copy() returns 'dict', not 'TestMetadata'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment confused me. My understanding was you want to get dict from .copy(), not TestMetadata. But it's the opposite.

Change it to something like

# return `TestMetadata` for `.copy()`, not `dict`

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Honestly, that piece of code is >1 year old, so I was recently re-reading it myself and it wasn't confusing to me, so I guess I see both versions as reasonable. I'll use yours just to be safe. 😄

We could probably even remove it, it was for the relatively rare case somebody would want to create a copy of the instance, which was the case in the original codebase this snippet came from (early runcontest).

@mildas
Copy link
Contributor

mildas commented Sep 24, 2024

Got error on rhel9 sysctl-only

Using host libthread_db library "/lib64/libthread_db.so.1".
__futex_abstimed_wait_common64 (private=0, cancel=true, abstime=0x0, op=393, expected=0, futex_word=0x55f8267d8774) at futex-internal.c:57
57      return INTERNAL_SYSCALL_CANCEL (futex_time64, futex_word, op, expected,
warning: target file /proc/1301254/cmdline contained unexpected null characters
warning: Memory read failed for corefile section, 4096 bytes at 0xffffffffff600000.
Saved corefile oscap.core
gdb.script:5: Error in sourced command file:
Undefined set logging command: "enabled on".  Try "help set logging".
[Inferior 1 (process 1301254) detached]
Traceback (most recent call last):
  File "/var/tmp/tmt/run-004/default/plan/discover/default-0/tests/scanning/oscap-debug/sysctl-only.py", line 61, in <module>
    returncode = oscap_proc.wait(oscap_timeout)
  File "/usr/lib64/python3.9/subprocess.py", line 1189, in wait
    return self._wait(timeout=timeout)
  File "/usr/lib64/python3.9/subprocess.py", line 1925, in _wait
    raise TimeoutExpired(self.args, timeout)
subprocess.TimeoutExpired: Command '['oscap', 'xccdf', 'eval', '--profile', 'anssi_bp28_high', '--progress', 'scan-ds.xml']' timed out after 10 seconds

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/var/tmp/tmt/run-004/default/plan/discover/default-0/tests/lib/runtest.py", line 74, in <module>
    runpy.run_path(str(test_script), run_name='__main__')
  File "/usr/lib64/python3.9/runpy.py", line 288, in run_path
    return _run_module_code(code, init_globals, run_name,
  File "/usr/lib64/python3.9/runpy.py", line 97, in _run_module_code
    _run_code(code, mod_globals, init_globals,
  File "/usr/lib64/python3.9/runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "/var/tmp/tmt/run-004/default/plan/discover/default-0/tests/scanning/oscap-debug/sysctl-only.py", line 86, in <module>
    util.subprocess_run(
  File "/var/tmp/tmt/run-004/default/plan/discover/default-0/tests/lib/util/subprocess.py", line 20, in subprocess_run
    return subprocess.run(cmd, **kwargs)
  File "/usr/lib64/python3.9/subprocess.py", line 528, in run
    raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command '['gdb', '-n', '-batch', '-x', 'gdb.script', '-p', '1301254']' returned non-zero exit status 1.
2024-09-24 11:17:17 runtest.py:77: lib.results.report_plain:160: ERROR / (CalledProcessError: Command '['gdb', '-n', '-batch', '-x', 'gdb.script', '-p', '1301254']' returned non-zero exit status 1.)

@comps
Copy link
Contributor Author

comps commented Sep 25, 2024

Got error on rhel9 sysctl-only

Okay, I was able to reproduce it on 9.6, it worked on 9.5, will fix.

gdb.script:5: Error in sourced command file:
Undefined set logging command: "enabled on". Try "help set logging".

@comps
Copy link
Contributor Author

comps commented Sep 26, 2024

Actually, it was not 9.6, it was just RHEL-8, which I presume was your version as well and "rhel9" was a typo.

I tried using the older syntax of

set logging on

and that got rid of the error on RHEL-8 (while still working fine on 9), but it didn't make the test work on 8, now gdb just prints

Saved corefile oscap.core

and exits with error.

So I will probably limit the gdb-style tests to RHEL-9+, which should be good enough. We'll see if any freezing fixes helped on RHEL-8 simply by running normal tests.

@comps comps marked this pull request as ready for review September 27, 2024 17:16
@comps
Copy link
Contributor Author

comps commented Oct 2, 2024

Rebased + added a commit to avoid running these in productization.

Ie.

$ tmt tests show ./helgrind
/scanning/oscap-debug/helgrind
                 summary Runs oscap via valgrind - helgrind
               component scap-security-guide
                    test python3 -m lib.runtest ./helgrind.py
                    path /scanning/oscap-debug
               framework shell
                  manual false
                     tty false
                 require - type: file
                           pattern: /lib
                         - type: file
                           pattern: /conf
                         - scap-security-guide
                         - valgrind
               recommend python3
                         python36
                         python3-requests
                         python36-requests
                         python3-pyyaml
                         python36-pyyaml
                         python3-rpm
                         python36-rpm
                         rpm-build
             environment AVC_ERROR: +no_avc_check
                         TMPDIR: /var/tmp
                         PYTHONPATH: ../..
                duration 4h
                 enabled true
                  result custom
       restart_max_count 1
     restart_with_reboot false
                     tag needs-param

@mildas mildas self-assigned this Oct 7, 2024
comps added 7 commits October 7, 2024 17:05
In other instances of util.subprocess_run() and Guest.ssh() we simply
let the user use subprocess.PIPE directly when wanted.

This allows for flexibility like 'stderr=subprocess.STDOUT' as well.

Further, newer Python versions have 'capture_output=True' as
a shorthand for 'stdout=PIPE' + 'stderr=PIPE', so the user can always
use that.

So let's get rid of the special ssh()-specific 'capture'.

Signed-off-by: Jiri Jaburek <[email protected]>
This is for (repeated) snapshotting within one test, not for sharing
snapshots across tests.

The random tag (instead of an empty '') will also safeguard against
accidentally re-using snapshots by multiple tests using Guest()
without a tag, but then trying to g.snapshotted() without installation.

Signed-off-by: Jiri Jaburek <[email protected]>
Default to using all space for '/'.

Signed-off-by: Jiri Jaburek <[email protected]>
comps added 2 commits October 7, 2024 17:07
The sysctl-only scan is much simpler and faster, but doesn't
seem to reliably reproduce the freeze.

The nested VM test runs much faster than a full host OS scan,
and is able to easily reproduce it.

Signed-off-by: Jiri Jaburek <[email protected]>
@mildas mildas merged commit fb4ee5f into main Oct 7, 2024
3 checks passed
@mildas mildas deleted the add_oscap_debug branch October 7, 2024 15:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants