Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RequestFactory: optimized script path detection performance #35

Closed

Conversation

JanTvrdik
Copy link
Contributor

Micro-optimization.

Benchmark:

$datasets = [
    'a' => [new UrlScript('/projects/blog/www/'), '/projects/blog/www/index.php', '/projects/blog/www/'],
    'b' => [new UrlScript('/projects/blog/www/contact'), '/projects/blog/www/index.php', '/projects/blog/www/'],
    'c' => [new UrlScript('/projects/blog/www/index.php'), '/projects/blog/www/index.php', '/projects/blog/www/index.php'],
    'd' => [new UrlScript('/test/in'), '/test/index.php', '/test/'],
    'e' => [new UrlScript('/test//'), '/test/index.php', '/test/'],
];

$tests = [
    'old' => function ($testCount, UrlScript $url, $script, $expected) {
        for ($x = 0; $x < $testCount; $x++) {
            $path = strtolower($url->getPath()) . '/';
            $script = strtolower($script) . '/';
            $max = min(strlen($path), strlen($script));
            for ($i = 0; $i < $max; $i++) {
                if ($path[$i] !== $script[$i]) {
                    break;
                } elseif ($path[$i] === '/') {
                    $url->setScriptPath(substr($url->getPath(), 0, $i + 1));
                }
            }
        }
        Assert::same($expected, $url->scriptPath);
    },
    'new1' => function ($testCount, UrlScript $url, $script, $expected) {
        for ($x = 0; $x < $testCount; $x++) {
            $path = $url->getPath();
            $min = min(strlen($path), strlen($script));
            for ($i = 0, $j = 0; $i < $min; $i++) {
                if ($path[$i] !== $script[$i] && strcasecmp($path[$i], $script[$i])) {
                    break;
                } elseif ($path[$i] === '/') {
                    $j = $i;
                }
            }
            $url->setScriptPath($i === $min && strlen($path) === strlen($script) ? $path : substr($path, 0, $j + 1));
        }
        Assert::same($expected, $url->scriptPath);
    },
    'new2' => function ($testCount, UrlScript $url, $script, $expected) {
        for ($x = 0; $x < $testCount; $x++) {
            $path = $url->getPath();
            $max = min(strlen($path), strlen($script));
            for ($i = 0; $i < $max; $i++) {
                if ($path[$i] !== $script[$i] && strcasecmp($path[$i], $script[$i])) {
                    break;
                }
            }
            $url->setScriptPath($i === $max && strlen($path) === strlen($script)
                ? $path
                : substr($path, 0, strrpos($path, '/', $i - $max - 1) + 1)
            );
        }
        Assert::same($expected, $url->scriptPath);
    },
    'new3' => function ($testCount, UrlScript $url, $script, $expected) {
        for ($x = 0; $x < $testCount; $x++) {
            $path = $url->getPath();
            $max = min(strlen($path), strlen($script));
            for ($i = 0; $i < $max && ($path[$i] === $script[$i] || !strcasecmp($path[$i], $script[$i])); $i++);
            $url->setScriptPath($i === $max && strlen($path) === strlen($script)
                ? $path
                : substr($path, 0, strrpos($path, '/', $i - $max - 1) + 1)
            );
        }
        Assert::same($expected, $url->scriptPath);
    },

];

$testCount = 10000;
$padLength = max(array_map('strlen', array_keys($datasets))) + 3;
foreach ($tests as $testName => $testCallback) {
    printf("Test %s:\n", $testName);
    foreach ($datasets as $datasetName => $dataset) {
        $time = -microtime(TRUE);
        $testCallback($testCount, ...$dataset);
        $time += microtime(TRUE);
        printf("  %s%5.0f ms\n", str_pad($datasetName, $padLength), $time * 1e3);
    }
    printf("\n");
}

Results:

Test old:
  a    1045 ms
  b    1049 ms
  c    1075 ms
  d    1002 ms
  e    1007 ms

Test new1:
  a      56 ms
  b      58 ms
  c      75 ms
  d      31 ms
  e      28 ms

Test new2:
  a      44 ms
  b      46 ms
  c      54 ms
  d      27 ms
  e      25 ms

Test new3:
  a      46 ms
  b      46 ms
  c      54 ms
  d      27 ms
  e      26 ms

@JanTvrdik JanTvrdik force-pushed the request_factory_scriptpath_performance branch from e241565 to 8851c4d Compare December 18, 2014 22:06
if ($path[$i] !== $script[$i]) {
$path = $url->getPath();
$max = min(strlen($path), strlen($script)) - 1;
for ($i = 0, $j = 0; $i <= $max; $i++) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

j should be defined outside. ;)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why? This a common pattern. Is that confusing? Or is there a bug I don't see?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's common pattern if it's used only in scope of cycle. your code use it outside, there is no bug, it's just less readable.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This works in PHP, because it has crappy scope boundaries. But would not work e.g. in C++ (unless -fpermissive is set). +1 for moving it outside.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's readable enough. And I would not say that PHP has crappy scope boundaries – they just follow different rules than C/C++ (and other languages, of course) and I took advantage of those rules. So I'm keeping it unless @dg says otherwise.

@dg
Copy link
Member

dg commented Dec 20, 2014

You should use sentinel (i.e. / at end of string):

        $script .= '/';     
        $path = $url->getPath() . '/';
        $max = min(strlen($path), strlen($script));
        for ($i = 0, $j = 0; $i < $max; $i++) {
            if ($path[$i] !== $script[$i] && strcasecmp($path[$i], $script[$i])) {
                break;
            } elseif ($path[$i] === '/') {
                $j = $i;
            }
        }

Test it with:

    $_SERVER = array(
        'REQUEST_URI' => '/test/in',
        'SCRIPT_NAME' => '/test/index.php',
    );

@JanTvrdik JanTvrdik force-pushed the request_factory_scriptpath_performance branch from 8851c4d to 1c1ff1f Compare December 21, 2014 07:55
@JanTvrdik
Copy link
Contributor Author

Good job with finding the failing use-case, I added it to the tests. The trailing slash was something I intentionally wanted to avoid.

@JanTvrdik JanTvrdik force-pushed the request_factory_scriptpath_performance branch 7 times, most recently from fa5188e to e8350c4 Compare December 21, 2014 09:07
@JanTvrdik
Copy link
Contributor Author

One thing I don't understand is why we bother with case-insensitive comparison at all. Shouldn't URL path be generally considered case-sensitive? Also – we don't have unit tests for that. We should either add them or remove the case-insensitive behavior altogether.

Just to make it clear – this is no longer about performance, the optimized script path detection algorithm has close to zero slowdown due to the case-insensitivity. This is about what is technically right and wrong.

@JanTvrdik
Copy link
Contributor Author

One more thing – the case sensitivity was introduced in nette/nette@838abc3. However I'm unable to find a failing use-case for the previous implementation. Both known script path issues (http://forum.nette.org/cs/5932-lepsi-detekce-requesturi-a-scriptpath and nette/nette#489) worked on the ancient version properly.

@dg
Copy link
Member

dg commented Dec 21, 2014

It was introduced sooner, in 2008. But I have no idea why… I think it can be removed.

@xificurk
Copy link

My guess is that the intention was to be compatible with Win-world. How does your webserver on Win handle URLs like http://localhost/DIR/INDEX.HTML or http://localhost/dir/index.html without URL rewriting?
Imho it makes sense to be case insensitive in the path - BFU doesn't care about case-sensitivity and might type in URL or its part in upper case.

@dg
Copy link
Member

dg commented Dec 21, 2014

Mac world is case insensitive too.

@JanTvrdik
Copy link
Contributor Author

@xificurk But nette/http is (IMHO) a low-level library, it's not its job to be smart and care much about BFU (that is also why I consider the default URL filters in RouteFactory wrong). Route is case-insensitive by default, but Route is not a low-level library.

@xificurk
Copy link

@JanTvrdik You're right, this should be solved somewhere else... I was just guessing (not sure if correctly), what was the reason to introduce it in the first place.

@JanTvrdik JanTvrdik force-pushed the request_factory_scriptpath_performance branch from e8350c4 to 98cc19a Compare December 21, 2014 18:44
@JanTvrdik JanTvrdik force-pushed the request_factory_scriptpath_performance branch from 98cc19a to c5aa73e Compare December 21, 2014 19:13
@JanTvrdik
Copy link
Contributor Author

I've found on forum and added the missing failing case for the ancient script path detection algorithms.


I've also found what does removing the case-insensitivity break: On Windows (and possibly other platforms with case-insensitive file system) if you access http://localhost/nette/Sandbox/www/ when the directory name is actually lowercase sandbox you get:

REQUEST_URI => "/nette/Sandbox/www/" (19)
SCRIPT_NAME => "/nette/Sandbox/www/index.php" (28)
SCRIPT_FILENAME => "C:/Projects/nette/sandbox/www/index.php" (39)
DOCUMENT_ROOT => "C:/Projects" (11)

Therefore it still works fine even with case-sensitive behavior. However if for some reasons the SCRIPT_NAME would not be available, it fallbacks to SCRIPT_FILENAME and DOCUMENT_ROOT which results in /nette/sandbox/www/index.php which will fail with case-sensitive behavior. Which leads to the very same question I asked in #31 (comment). Under what circumstances is SCRIPT_NAME not available.

@dg dg closed this in f49fd2a Dec 27, 2014
dg added a commit to dg/nette-http that referenced this pull request Dec 27, 2014
@JanTvrdik JanTvrdik deleted the request_factory_scriptpath_performance branch December 30, 2014 12:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants