Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Don't scan after a failed Ansible remediation #290

Merged
merged 1 commit into from
Nov 29, 2024

Conversation

comps
Copy link
Contributor

@comps comps commented Nov 28, 2024

The original idea was to provide at least some scan results when (a few) Ansible tasks fail to remediate - to print the failed remediations as error + failing oscap rules as a nice double-check.

However RHEL-10 work has shown that this is not generally a good idea - we cannot tell if Ansible failed on something fatal (and thus about 100+ rules will fail) or if only 1-2 tasks failed and the scan won't be a wall of red lines.

So, to be on the safer side, don't do the scan when Ansible fails for any reason.

All the non-fatal remediation errors should still be reported (instead of only the first failing one) as check=True raises an exception only when all the output lines have been processed.

The original idea was to provide at least some scan results
when (a few) Ansible tasks fail to remediate - to print the failed
remediations as 'error' + failing oscap rules as a nice double-check.

However RHEL-10 work has shown that this is not generally a good
idea - we cannot tell if Ansible failed on something fatal (and thus
about 100+ rules will fail) or if only 1-2 tasks failed and the scan
won't be a wall of red lines.

So, to be on the safer side, don't do the scan when Ansible fails
for any reason.

All the non-fatal remediation errors should still be reported
(instead of only the first failing one) as check=True raises an
exception only when all the output lines have been processed.

Signed-off-by: Jiri Jaburek <[email protected]>
@comps comps changed the title don't scan after a failed Ansible remediation Don't scan after a failed Ansible remediation Nov 28, 2024
@mildas
Copy link
Contributor

mildas commented Nov 29, 2024

All the non-fatal remediation errors should still be reported (instead of only the first failing one) as check=True raises an exception only when all the output lines have been processed.

No errors will be reported by Contest as the reporting (ansible.report_from_output()) is done after playbook process.

Results will report that ansible-playbook returned non-zero return code and reviewer would need to go over output to find out which tasks failed.

Wouldn't it be better to just update the condition to

if proc.returncode != 0:
    raise RuntimeError(f"ansible-playbook failed with {proc.returncode}")

and not use check=True?

@comps
Copy link
Contributor Author

comps commented Nov 29, 2024

No errors will be reported by Contest as the reporting (ansible.report_from_output()) is done after playbook process.

I don't think that's true, any errors found during processing Ansible output are actually recorded (written) to results.yaml in realtime (making tmt read it regardless of how we exit), and the raised exception is further caught by https://github.com/RHSecurityCompliance/contest/blob/main/lib/runtest.py#L73-L77 which does call report_and_exit().

Also, as I mentioned in PR description,

All the non-fatal remediation errors should still be reported (instead of only the first failing one) as check=True raises an exception only when all the output lines have been processed.

this works because the tests call Ansible via subprocess_stream(), which is a wrapper for subprocess.Popen(), which runs the process in the background. And only when the interprocess pipe is emptied does the process actually end (otherwise it remains a zombie as the kernel caches IO for it).

>>> import subprocess
>>> from lib import util
>>> proc, lines = util.subprocess_stream(['ls', '/', '/sdasds'], stderr=subprocess.STDOUT, check=True)
2024-11-29 13:04:58 <stdin>:1: running: ls / /sdasds
>>> proc
<Popen: returncode: None args: ['ls', '/', '/sdasds']>   # notice rc:None
>>> for line in lines:
...   print(line)
... 
ls: cannot access '/sdasds': No such file or directory
/:
afs
bin
boot
dev
etc
home
lib
lib64
media
mnt
opt
proc
root
run
sbin
srv
sys
tmp
usr
var
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/jjaburek/gitit/contest/lib/util/subprocess.py", line 48, in generate_lines
    raise subprocess.CalledProcessError(cmd=cmd, returncode=code)
subprocess.CalledProcessError: Command '['ls', '/', '/sdasds']' returned non-zero exit status 2.
>>>

But maybe an explicit return code check provides a better explanation than 'ansible-playbook -abcd' returned non-zero exit status ?

@mildas mildas merged commit 97e1bc8 into main Nov 29, 2024
3 checks passed
@mildas mildas deleted the dont_scan_failing_ansible branch November 29, 2024 14:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants