Skip to content

Commit

Permalink
Fix nested alternative parsing
Browse files Browse the repository at this point in the history
  • Loading branch information
msm-code committed Sep 18, 2024
1 parent 58f8d89 commit 669f2ec
Show file tree
Hide file tree
Showing 5 changed files with 50 additions and 4 deletions.
30 changes: 27 additions & 3 deletions docs/yara.md
Original file line number Diff line number Diff line change
Expand Up @@ -231,9 +231,33 @@ rule UnluckyExample
}
```

This is not necessarily a bad rule, but there's not a single full 3gram that can
This is not necessarily a bad rule, but it will generate a following ursadb query:

```
{}
```

In other words, we ask for every file in the malware collection. That's because there's not a single full 3gram that can
be used to narrow the set of suspected files. Due to how mquery works, this will
yara scan every malware file in the dataset, and will be very slow. Becaue of this,
such queries are by defauly disasllowed. They can be enabled by setting
cause a yara scan of every file in the dataset, and will be usually very slow. Becaue of this,
such queries are disallowed by default. They can be enabled by setting
`query_allow_slow` config key to true. In this case mquery will allow such
queries, but it'll ask for confirmation first.

## Caveats and advanced topics

There are some things that could be parsed better, but currently aren't.

**Mquery ignores alternatives in hex strings**

``
rule alternative_edge_case {
strings:
$test1 = { 11 (22 | 33) 44 }
$test2 = { ( 11 11 11 | 22 22 22 ) }
condition:
all of them
}
```
The first string could be parsed as `{11 22 44} | {11 33 44}`, and the second as `{11 11 11} | {22 22 22}`, but as of mquery v1.4 everything that's a part of alternative is ignored.
2 changes: 1 addition & 1 deletion src/lib/yaraparse.py
Original file line number Diff line number Diff line change
Expand Up @@ -132,7 +132,7 @@ def ursify_hex(hex_str: str) -> UrsaExpression:
hex_str = hex_str.replace(" ", "")

# alternatives, are nested alternatives a thing?
hex_parts = re.split(r"\(.*?\)", hex_str)
hex_parts = re.split(r"\(.*\)", hex_str)
hex_parts = [x for y in hex_parts for x in re.split(r"\[[\d-]+\]", y)]

output: List[bytes] = []
Expand Down
14 changes: 14 additions & 0 deletions src/tests/test_yaraparse.py
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,20 @@ def test_literal():
assert result.query == "({3f2504e0})"


def test_literal_wildcard():
hex_str = "3F25??04E0"
result = ursify_hex(hex_str)

assert result.query == "({3f25} & {04e0})"


def test_literal_alternative():
hex_str = "11(22|33)44"
result = ursify_hex(hex_str)

assert result.query == "({11} & {44})"


def test_literal_to_hex():
rule = yaramod.YaraRuleBuilder().with_plain_string("$str", "abc").get()

Expand Down
1 change: 1 addition & 0 deletions src/tests/yararules/testdata/parse_exception_example.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
(min 2 of (({020000}), ({ffff68747470})))
7 changes: 7 additions & 0 deletions src/tests/yararules/testdata/parse_exception_example.yar
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
rule parse_exception_example {
strings:
$xor_key_size = { ((BB)|(68))??020000}
$c2 = { FF FF 68 74 74 70 }
condition:
all of them
}

0 comments on commit 669f2ec

Please sign in to comment.