Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor: update regex check in give_clean #7704

Draft
wants to merge 7 commits into
base: master
Choose a base branch
from
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
26 changes: 26 additions & 0 deletions src/Helpers/Utils.php
Original file line number Diff line number Diff line change
Expand Up @@ -144,6 +144,7 @@ public static function recursiveUrlDecode(string $data): string
/**
* The regular expression attempts to capture the basic structure of all data types that can be serialized by PHP.
*
* @unreleased Add regular expression to remove character located in not allowed places inside a serialized data structure
* @since 3.19.4 Decode the string and remove any character not allowed in a serialized string
* @since 3.19.3 Support all types of serialized data instead of only objects and arrays
* @since 3.17.2
Expand All @@ -162,6 +163,31 @@ public static function containsSerializedDataRegex($data): bool
*/
$data = preg_replace('/[^a-zA-Z0-9:{};"\'.\[\](),]/', '', $data);

/**
* Matches a delimiter (a, O, s, i, b, d) followed by optional non-colon characters,
* then a colon, digits, and ':"', ensuring the delimiter is not inside quotes.
*
* Example: In 'O63:8:"stdClass":1:{s63:4:"name";}', it matches 'O63:8:"' and 's63:4:"'.
*/
$data = preg_replace_callback('/([aOsibd])(?=(?:[^"]*"[^"]*")*[^"]*$)[^:]*:(\d+):"/',
function ($matches) {
static $count = 0;
$count++;

if ($count === 1) {
// Preserve the first occurrence by returning the entire match as is (useful when the serialized data is hidden inside a string)
// Example: For input 'O63:8:"stdClass":1:{s63:4:"name";s63:5:"James";}', the first match is 'O63:8:"', and it is returned unchanged.
return $matches[0]; // $matches[0] contains the full matched string
}

// For subsequent matches, remove unwanted characters between the delimiter and the pattern
// Example: For input 'O63:8:"stdClass":1:{s63:4:"name";s63:5:"James";}', the second match is 's63:4:"'. Here, $matches[1] is 's', and $matches[2] is '4'.
// The result is s:4:"
return $matches[1] . ':' . $matches[2] . ':"';
},
$data
);

$pattern = '/
(a:\d+:\{.*}) | # Matches arrays (e.g: a:2:{i:0;s:5:"hello";i:1;i:42;})
(O:\d+:"[^"]+":\{.*}) | # Matches objects (e.g: O:8:"stdClass":1:{s:4:"name";s:5:"James";})
Expand Down
7 changes: 6 additions & 1 deletion tests/Unit/Helpers/UtilsTest.php
Original file line number Diff line number Diff line change
Expand Up @@ -102,7 +102,7 @@ public function serializedDataProvider(): array
['Lorem ipsum b:1; dolor sit amet', true], // boolean
['Lorem ipsum d:3.14; dolor sit amet', true], // float
['Lorem ipsum N; dolor sit amet', true], // NULL
// Strings with special characters (e.g: emojis, spaces, control characters) that are not part of a predefined set of safe characters for serialized data structures (used to try to bypass the validations)
// Strings with special characters (e.g: emojis, spaces, control characters etc.) that are not part of a predefined set of safe characters for serialized data structures (used to try to bypass the validations)
[
// emojis bypass sample
'O😼:8:"stdClass":1:{s😼:4:"name";s😼:5:"James";}',
Expand All @@ -113,6 +113,11 @@ public function serializedDataProvider(): array
'O :8:"stdClass":1:{s :4:"name";s :5:"James";}',
true,
],
[
// %6\3 bypass sample
'O%6\3:8:"stdClass":1:{s%6\3:4:"name";s%6\3:5:"James";}',
true,
],
// Bypass with simple methods
[
// backslash
Expand Down
Loading