Increase the amount of request in the stress test, using multiple threads #90

ahandan · 2021-10-21T21:49:38Z

…uest

Overview

Enhancement for the stress_test.
Increase the amount of request by using multiple thread to increase the chance of making request in parallel as much as possible.
The increase amount of request also increases the chance to force the cache to run a refresh cycle by filling it up and trying to make a request that is overlapping the cache refreshing cycle to generate a known issue (NoTransaction, DetachedInstance).

Changes

MutliThreading execution of the tests

Related Issue / Discussion

This enhancement of the stress makes a reference to :
Magpie issue (More caching/session re-connect and logging #473, Ouranosinc/Magpie#473)

…uest

review-notebook-app · 2021-10-21T21:49:41Z

Check out this pull request on

See visual diffs & provide feedback on Jupyter Notebooks.

Powered by ReviewNB

tlvu · 2021-10-25T13:07:00Z

@ahandan if this PR is ready for a review from me, please explicitly set me as the reviewer. Otherwise I won't be touching it.

That said, I was curious and opened up in ReviewNB (https://app.reviewnb.com/Ouranosinc/PAVICS-e2e-workflow-tests/pull/90/) and I ended up with this error "Invalid notebook file. Please make sure both before and after version of the file contains valid notebook JSON. ".

tlvu

First pass review, I'll have to come back to it again. I haven't wrap my head around the full changes yet. Thanks for the fix.

notebooks/stress-tests.ipynb

fmigneault

see comments

ahandan · 2021-10-29T12:42:08Z

Changes ;

Reverted to "pavics.ouranos.ca"
Progression is now a class rather than a dictionary.
The timestamps are registered at each request rather than and the end of multiple requests (Result)
Combined similarity of a test in one function "Stress_test" with a test_type specifying which test to run ("wps" or "thredds")
2 new Methods :
- Stress_test : Test making multiple requests on the server
- run_threads: Function running a target function in a multithreaded environment

notebooks/stress-tests.ipynb

… url

fmigneault

Functionality-wise, all looks good.
I only have a few cosmetic comments regarding the output format, not blocking themselves.

Move the print(f'\n Execution # {i+1}') statement within run_threads with input parameter that provides the "run_index".

Remove extra spaces before:

" Timestamps" (this is hackish, not how it is intended to be used)
The field should be w=24 (4 spaces before + 20 from the timestamp length itself) for right align.
Or, w=14 for left align (4 spaces before + len("Timestamp") == 10).
Execution #<index> (to align on the left of output)
Thread(T2) Stress Test ... (to align on the left of output)

ahandan · 2021-11-02T22:27:01Z

@fmigneault, I agree for the "Timestamps I don’t know exactly how to make it look good. I tried to align it on the left, but then it's hard to keep track of the line. For me it's better that each column is closer to each other but separated enough to distinguish them.
So for now, I've set w = 16 and I added 5 spaces before the timestamps so that it is separated from Times .
Let me know what you think about it, is it clear or should we just align everything on the right?

fmigneault · 2021-11-03T15:15:20Z

@ahandan

You have to look at the fields this way.
Since you put w=16, and that the format uses :>, fields are right-aligned by the width.

Because Codes, Delta and Times are all containing values that are shorter than 16, the full width is 16.
Spacing between those "headers" are all 11 spaces because 16 - len("Codes") = 11.
Since Timestamp is 10 char, (5 more than "Codes", "Delta", "Times"), you need to add 5 (in the header variable, for last field) to keep the same distance.

In the case of the timestamps values themselves, they are 20 char. This is >16 so the field overflows to the left.
You could simply use a pseudo left-align for the timestamp values, by right aligning them with 26 (or 20 + any other extra spacing you add to the Timestamp header).

notebooks/stress-tests.ipynb

fmigneault · 2021-11-17T20:09:00Z

I'm fine with the way results are reported.
Maybe just check the output format. I'm not sure if its on my side only or real output. It is somewhat hard to read:

19:14:24  �[0;31m---------------------------------------------------------------------------�[0m
19:14:24  �[0;31mAssertionError�[0m                            Traceback (most recent call last)
19:14:24  �[0;32m/tmp/ipykernel_1032/1504455324.py�[0m in �[0;36m<module>�[0;34m�[0m
19:14:24  �[1;32m      7�[0m             �[0murl�[0m �[0;34m=�[0m �[0mthredds_url�[0m�[0;34m,�[0m�[0;34m�[0m�[0;34m�[0m�[0m
19:14:24  �[1;32m      8�[0m             �[0mn_threads�[0m �[0;34m=�[0m �[0mTEST_N_THREADS_THREDDS�[0m�[0;34m,�[0m�[0;34m�[0m�[0;34m�[0m�[0m
19:14:24  �[0;32m----> 9�[0;31m             runs_per_threads = TEST_RUNS_THREDDS)
19:14:24  �[0m
19:14:24  �[0;32m/tmp/ipykernel_1032/4243812779.py�[0m in �[0;36mrun_threads�[0;34m(target_test, url, execution_id, n_threads, runs_per_threads)�[0m
19:14:24  �[1;32m     29�[0m         �[0mthread�[0m�[0;34m.�[0m�[0mjoin�[0m�[0;34m(�[0m�[0;34m)�[0m�[0;34m�[0m�[0;34m�[0m�[0m
19:14:24  �[1;32m     30�[0m �[0;34m�[0m�[0m
19:14:24  �[0;32m---> 31�[0;31m     �[0;32massert�[0m �[0mstressTestProgression�[0m�[0;34m.�[0m�[0mfailed_count�[0m �[0;34m==�[0m �[0;36m0�[0m�[0;34m,�[0m �[0mstressTestProgression�[0m�[0;34m.�[0m�[0mfail_report�[0m�[0;34m(�[0m�[0;34m)�[0m�[0;34m�[0m�[0;34m�[0m�[0m
19:14:24  �[0m�[1;32m     32�[0m     �[0mprint�[0m�[0;34m(�[0m�[0;34mf"All threads passed {target_test.__name__} function."�[0m�[0;34m)�[0m�[0;34m�[0m�[0;34m�[0m�[0m
19:14:24  �[1;32m     33�[0m �[0;34m�[0m�[0m
19:14:24  
19:14:24  �[0;31mAssertionError�[0m: This test failed due to bad code received from the requests.
19:14:24   Number of failed thread(s) : 3/ 5
19:14:24   Number of failed requests : 5/2500
19:14:24   First failed request timestamp : 2021-11-17 00:13:29

ahandan · 2021-11-18T13:50:37Z

@fmigneault as for the weird outputs, these (�[0;31m) are the colors of the outputs.

Consol output

vs

Plain text

Looks like it's a Jenkins problem. Issue

tlvu

It is possible to actually show the error, like previously in comment #90 (comment)

The current error displayed in Jenkins do not have enough information https://daccs-jenkins.crim.ca/job/PAVICS-e2e-workflow-tests/job/multi_threading_stress_test/46/console:

18:57:14  _____________________ notebooks/stress-tests.ipynb::Cell 4 _____________________
18:57:14  Notebook cell execution failed
18:57:14  Cell 4: Cell execution caused an exception
18:57:14  
18:57:14  Input:
18:57:14  # NBVAL_IGNORE_OUTPUT
18:57:14  
18:57:14  # Executing THREDDS test on multiple threads
18:57:14  thredds_url = f"{TWITCHER_URL}/thredds/catalog/birdhouse/testdata/catalog.html?dataset=birdhouse/testdata/ta_Amon_MRI-CGCM3_decadal1980_r1i1p1_199101-200012.nc"
18:57:14  
18:57:14  run_threads(target_test = stress_test,
18:57:14              url = thredds_url,
18:57:14              n_threads = TEST_N_THREADS_THREDDS,
18:57:14              runs_per_threads = TEST_RUNS_THREDDS)
18:57:14  
18:57:14  Traceback:
18:57:14  
18:57:14  ---------------------------------------------------------------------------
18:57:14  AssertionError                            Traceback (most recent call last)
18:57:14  /tmp/ipykernel_1129/1504455324.py in <module>
18:57:14        7             url = thredds_url,
18:57:14        8             n_threads = TEST_N_THREADS_THREDDS,
18:57:14  ----> 9             runs_per_threads = TEST_RUNS_THREDDS)
18:57:14  
18:57:14  /tmp/ipykernel_1129/4243812779.py in run_threads(target_test, url, execution_id, n_threads, runs_per_threads)
18:57:14       29         thread.join()
18:57:14       30 
18:57:14  ---> 31     assert stressTestProgression.failed_count == 0, stressTestProgression.fail_report()
18:57:14       32     print(f"All threads passed {target_test.__name__} function.")
18:57:14       33 
18:57:14  
18:57:14  AssertionError: This test failed due to bad code received from the requests.
18:57:14   Number of failed thread(s) : 2/ 5
18:57:14   Number of failed requests : 3/1000
18:57:14   First failed request timestamp : 2021-11-17 18:56:51

We do not know which of the WPS services or Thredds is/are failing, what was the bad http code returned, how long it took.

You spent a lot of time formatting the output so why not simply show it, only for the run with errors of course, not all the runs.

tlvu · 2021-11-18T22:54:35Z

In case it's not clear, stress-test.ipynb has 2 purposes:

run manually to quickly reproduce the error
run by Jenkins as part of an automated test suite to catch regressions

In the Jenkins purpose: the result displayed when it failed in Jenkins is what matters. All the nice looking cell outputs do not make it to Jenkins and there is no way to retrieve them after the fact.

In the previous implementation, the failed outputs are saved and given to the assert error message. That was how we display only the failed cell outputs when failure occurred during Jenkins run.

fmigneault · 2021-11-19T17:55:45Z

@ahandan

I agree with @tlvu that for convenience, it would be great to have the failing subset of threads/URL reported directly in the raised message.
Sorry this changes the output summary we previously discussed about.

Do you have an ETA to implement this?
This PR is blocking #197 and another one coming soon by @ChaamC.
If this adjustment takes too long, I would rather have the current iteration integrated and a subsequent update for the output formatting.

tlvu · 2021-11-22T14:22:18Z

I agree with @tlvu that for convenience, it would be great to have the failing subset of threads/URL reported directly in the raised message.

It's not just a "convenience". Jenkins build failure is useless if we are unable to have the failing info to diagnose !

This PR is blocking bird-house/birdhouse-deploy#197 and another one coming soon by @ChaamC.
If this adjustment takes too long, I would rather have the current iteration integrated and a subsequent update for the output formatting.

If it helps remove pressure and unblock things, I can accept merging this PR as-is as long as a followup PR with proper error reporting when Jenkins fails is done promptly right after.

fmigneault · 2021-11-22T17:27:07Z

It's not just a "convenience". Jenkins build failure is useless if we are unable to have the failing info to diagnose !

Everything is available in the notebook outputs. Its just more trouble to load them in jupyter than have them already in logs.

ahandan · 2021-11-22T17:34:16Z

Ok no problem for changing the outputs.
I have already save exceptions and failed results in case we would change our minds.

I changed to assert printout to print only all the failed runs as the same output from the results one.
Also added a Exception Section printing the Reason
ex : (500 : Internal Server Error)

@tlvu Let me know what would rather have in the Exceptions area

the entire Error Message like :

<ExceptionReport version="1.0.0"
    xmlns="http://www.opengis.net/ows/1.1"
    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
    xsi:schemaLocation="http://www.opengis.net/ows/1.1 http://schemas.opengis.net/ows/1.1.0/owsExceptionReport.xsd">
    <Exception exceptionCode="NoApplicableCode" locator="NoApplicableCode">
        <ExceptionText>Unknown Error</ExceptionText>
    </Exception>
</ExceptionReport>

A simple : 500 : Internal Server Error

Here is an example of the output from the test :
Test 1
Test 2

12:32:50  ---------------------------------------------------------------------------
12:32:50  AssertionError                            Traceback (most recent call last)
12:32:50  /tmp/ipykernel_1067/1504455324.py in <module>
12:32:50        7             url = thredds_url,
12:32:50        8             n_threads = TEST_N_THREADS_THREDDS,
12:32:50  ----> 9             runs_per_threads = TEST_RUNS_THREDDS)
12:32:50  
12:32:50  /tmp/ipykernel_1067/1081959450.py in run_threads(target_test, url, execution_id, n_threads, runs_per_threads)
12:32:50       37     assert (
12:32:50       38         stress_test_progression.failed_threads_count == 0
12:32:50  ---> 39     ), stress_test_progression.fail_report()
12:32:50       40 
12:32:50       41     print(f"All threads passed {target_test.__name__} function.")
12:32:50  
12:32:50  AssertionError: This test failed due to bad code received from the requests.
12:32:50   Number of failed thread(s) : 1/ 5
12:32:50   Number of failed requests : 1/500
12:32:50  
12:32:50  
12:32:50   Stress Test:
12:32:50      Test:
12:32:50          code: 200
12:32:50          runs: 100
12:32:50          max-avg-time:  1s
12:32:50          max-err-code:  0
12:32:50          sum-err-code:  1
12:32:50          timeout-abort: 0s
12:32:50          timeout-retry: 0
12:32:50          timeout-count: 0
12:32:50      Request:
12:32:50          method: GET
12:32:50          url:    https://hirondelle.crim.ca/twitcher/ows/proxy/thredds/catalog/birdhouse/testdata/catalog.html?dataset=birdhouse/testdata/ta_Amon_MRI-CGCM3_decadal1980_r1i1p1_199101-200012.nc
12:32:50          args:   {}
12:32:50      Times:
12:32:50          min: 0.137s
12:32:50          avg: 0.001s
12:32:50          max: 0.137s
12:32:50      Results:
12:32:50          Run                 Codes                 Delta                 Times            Timestamps
12:32:50            1               (!) 500                0.000s                0.137s   2021-11-22 17:32:38
12:32:50      Summary:
12:32:50          Detected 1 erroneous HTTP codes not equal to expected 200.
12:32:50          Test failed (status=-1).
12:32:50   === Exceptions ===
12:32:50   1. Internal Server Error
12:32:50  
12:32:50  =========================== short test summary info ============================

tlvu · 2021-11-22T19:52:14Z

It's not just a "convenience". Jenkins build failure is useless if we are unable to have the failing info to diagnose !

Everything is available in the notebook outputs. Its just more trouble to load them in jupyter than have them already in logs.

The actual notebook outputs are discarded !

The outputs that are being saved is the result of another run and in the case of intermittent problem like this caching issue, no guaranty we will have the same result. Also another runs means all the timestamps do not match.

See

PAVICS-e2e-workflow-tests/Jenkinsfile

Lines 83 to 85 in 9ad91bd

    
                   booleanParam(name: 'SAVE_RESULTING_NOTEBOOK', defaultValue: true, 
        
                                description: '''Check the box to save the resulting notebooks of the run. 
        
           Note this is another run, will double the time and no guaranty to have same error as the run from py.test.''')

tlvu · 2021-11-22T20:04:00Z

I changed to assert printout to print only all the failed runs as the same output from the results one. Also added a Exception Section printing the Reason ex : (500 : Internal Server Error)

@tlvu Let me know what would rather have in the Exceptions area
1. the entire Error Message like :
<ExceptionReport version="1.0.0"
    xmlns="http://www.opengis.net/ows/1.1"
    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
    xsi:schemaLocation="http://www.opengis.net/ows/1.1 http://schemas.opengis.net/ows/1.1.0/owsExceptionReport.xsd">
    <Exception exceptionCode="NoApplicableCode" locator="NoApplicableCode">
        <ExceptionText>Unknown Error</ExceptionText>
    </Exception>
</ExceptionReport>
2. A simple : 500 : Internal Server Error

Can we combine both so we have the bad code and and actual error matching the bad code. This will be particularly useful in the case where we have multiple failing code. For example we have seen in the past both 500 and 408 can happen in the same run. Having the actual error matching the code will speed up the diagnostic.

Here is an example of the output from the test : Test 1 Test 2

12:32:50  ---------------------------------------------------------------------------
12:32:50  AssertionError                            Traceback (most recent call last)
12:32:50  /tmp/ipykernel_1067/1504455324.py in <module>
12:32:50        7             url = thredds_url,
12:32:50        8             n_threads = TEST_N_THREADS_THREDDS,
12:32:50  ----> 9             runs_per_threads = TEST_RUNS_THREDDS)
12:32:50  
12:32:50  /tmp/ipykernel_1067/1081959450.py in run_threads(target_test, url, execution_id, n_threads, runs_per_threads)
12:32:50       37     assert (
12:32:50       38         stress_test_progression.failed_threads_count == 0
12:32:50  ---> 39     ), stress_test_progression.fail_report()
12:32:50       40 
12:32:50       41     print(f"All threads passed {target_test.__name__} function.")
12:32:50  
12:32:50  AssertionError: This test failed due to bad code received from the requests.
12:32:50   Number of failed thread(s) : 1/ 5
12:32:50   Number of failed requests : 1/500
12:32:50  
12:32:50  
12:32:50   Stress Test:
12:32:50      Test:
12:32:50          code: 200
12:32:50          runs: 100
12:32:50          max-avg-time:  1s
12:32:50          max-err-code:  0
12:32:50          sum-err-code:  1
12:32:50          timeout-abort: 0s
12:32:50          timeout-retry: 0
12:32:50          timeout-count: 0
12:32:50      Request:
12:32:50          method: GET
12:32:50          url:    https://hirondelle.crim.ca/twitcher/ows/proxy/thredds/catalog/birdhouse/testdata/catalog.html?dataset=birdhouse/testdata/ta_Amon_MRI-CGCM3_decadal1980_r1i1p1_199101-200012.nc
12:32:50          args:   {}
12:32:50      Times:
12:32:50          min: 0.137s
12:32:50          avg: 0.001s
12:32:50          max: 0.137s
12:32:50      Results:
12:32:50          Run                 Codes                 Delta                 Times            Timestamps
12:32:50            1               (!) 500                0.000s                0.137s   2021-11-22 17:32:38
12:32:50      Summary:
12:32:50          Detected 1 erroneous HTTP codes not equal to expected 200.
12:32:50          Test failed (status=-1).
12:32:50   === Exceptions ===
12:32:50   1. Internal Server Error
12:32:50  
12:32:50  =========================== short test summary info ============================

Oh this is great !

So I see combining both status code and the actual error for the exception will look like

 === Exceptions ===
1. 500 : Internal Server Error
<ExceptionReport version="1.0.0"
     xmlns="http://www.opengis.net/ows/1.1"
     xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
     xsi:schemaLocation="http://www.opengis.net/ows/1.1 http://schemas.opengis.net/ows/1.1.0/owsExceptionReport.xsd">
     <Exception exceptionCode="NoApplicableCode" locator="NoApplicableCode">
         <ExceptionText>Unknown Error</ExceptionText>
     </Exception>
 </ExceptionReport>

3. 408 : Request Timeout
 <ExceptionReport>
    blah blah blah ...
 </ExceptionReport>

ahandan · 2021-11-23T15:15:58Z

@tlvu @fmigneault
Here is the latest test output :
The test Jenkins_Test

10:12:59  ---------------------------------------------------------------------------
10:12:59  AssertionError                            Traceback (most recent call last)
10:12:59  /tmp/ipykernel_1135/1504455324.py in <module>
10:12:59        7             url = thredds_url,
10:12:59        8             n_threads = TEST_N_THREADS_THREDDS,
10:12:59  ----> 9             runs_per_threads = TEST_RUNS_THREDDS)
10:12:59  
10:12:59  /tmp/ipykernel_1135/1081959450.py in run_threads(target_test, url, execution_id, n_threads, runs_per_threads)
10:12:59       37     assert (
10:12:59       38         stress_test_progression.failed_threads_count == 0
10:12:59  ---> 39     ), stress_test_progression.fail_report()
10:12:59       40 
10:12:59       41     print(f"All threads passed {target_test.__name__} function.")
10:12:59  
10:12:59  AssertionError: This test failed due to bad code received from the requests.
10:12:59   Number of failed thread(s) : 3/ 5
10:12:59   Number of failed requests : 7/2500
10:12:59  
10:12:59  
10:12:59   Stress Test:
10:12:59      Test:
10:12:59          code: 200
10:12:59          runs: 500
10:12:59          max-avg-time:  1s
10:12:59          max-err-code:  0
10:12:59          sum-err-code:  2
10:12:59          timeout-abort: 0s
10:12:59          timeout-retry: 0
10:12:59          timeout-count: 0
10:12:59      Request:
10:12:59          method: GET
10:12:59          url:    https://hirondelle.crim.ca/twitcher/ows/proxy/thredds/catalog/birdhouse/testdata/catalog.html?dataset=birdhouse/testdata/ta_Amon_MRI-CGCM3_decadal1980_r1i1p1_199101-200012.nc
10:12:59          args:   {}
10:12:59      Times:
10:12:59          min: 0.064s
10:12:59          avg: 0.000s
10:12:59          max: 0.087s
10:12:59      Results:
10:12:59          Run                 Codes                 Delta                 Times            Timestamps
10:12:59            1               (!) 500                0.052s                0.087s   2021-11-23 15:11:52
10:12:59            2               (!) 500                0.075s                0.064s   2021-11-23 15:12:13
10:12:59      Summary:
10:12:59          Detected 2 erroneous HTTP codes not equal to expected 200.
10:12:59          Test failed (status=-1).
10:12:59  
10:12:59   === Exceptions ===
10:12:59  
10:12:59   1. 500 Internal Server Error
10:12:59   <?xml version="1.0" encoding="utf-8"?>
10:12:59  <ExceptionReport version="1.0.0"
10:12:59      xmlns="http://www.opengis.net/ows/1.1"
10:12:59      xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
10:12:59      xsi:schemaLocation="http://www.opengis.net/ows/1.1 http://schemas.opengis.net/ows/1.1.0/owsExceptionReport.xsd">
10:12:59      <Exception exceptionCode="NoApplicableCode" locator="NoApplicableCode">
10:12:59          <ExceptionText>Unknown Error</ExceptionText>
10:12:59      </Exception>
10:12:59  </ExceptionReport>
10:12:59  
10:12:59   2. 500 Internal Server Error
10:12:59   <?xml version="1.0" encoding="utf-8"?>
10:12:59  <ExceptionReport version="1.0.0"
10:12:59      xmlns="http://www.opengis.net/ows/1.1"
10:12:59      xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
10:12:59      xsi:schemaLocation="http://www.opengis.net/ows/1.1 http://schemas.opengis.net/ows/1.1.0/owsExceptionReport.xsd">
10:12:59      <Exception exceptionCode="NoApplicableCode" locator="NoApplicableCode">
10:12:59          <ExceptionText>Unknown Error</ExceptionText>
10:12:59      </Exception>
10:12:59  </ExceptionReport>
10:12:59  
10:12:59

tlvu

The new error reporting in Jenkins is very good. I just have comments about what seems to me as an accidental way of hiding wrong behavior from the server (Magpie, Twitcher or the service behind it).

I'd rather have all the errors exposed clearly so we can fix them but I do not know enough the design decision to make a call.

notebooks/stress-tests.ipynb

tlvu · 2021-11-23T16:11:20Z

notebooks/stress-tests.ipynb

@@ -330,7 +330,7 @@
    "            resp = requests.request(method, url, timeout=abort_timeout, **req_kwargs)\n",
    "            resp.raise_for_status()\n",
    "        except requests.exceptions.Timeout as errT:\n",
-    "            result.codes.append(code)  # read timeout\n",
+    "            result.codes.append(408)  # read timeout\n",


What was the wrong code for timeout, why not expose it instead of hardcoding 408? Was hardcoding the previous behavior that you are trying to maintain?

In general, tests are meant to expose errors so masking the error is usually the wrong thing to do. My 2 cents because I do not really know for sure what is the design decision of Magpie.

I'll answer both here,

I might be wrong on this, please correct me if it is the case.
But because your request TimeOut after a specified time as timeout=abort_timeout it means you dont have any answer from the server, which result in not knowing the response, only knowing the request made, so we can't have the exact Status_Code of the server, so we force a 408 code after the TimeOut limit. And because there is no answer from the Server I force the "Request Timeout" message and I print the requests.exceptions.Timeout(request) as exception_text message.

Also, the thing with this test, is that a 408 error happens randomly, so trying to debug this is like flipping a coin, sometime it takes 1 minute sometimes an entire day. So I can try to optimise it on my side, but it would be important that @fmigneault can be unblocked.

OHH I misunderstood, so it's the test that is timing out, it's not the actual response from the server. So in that case, the error reporting should not make it looks like it's a 408 received from the server !

Maybe just put "xxx" as the code and the reason would be "test timeout after ### second"?

So we can differentiate a real timeout as returned by Magpie because the server behind Magpie takes too long to respond vs the test that is timing out because we enforce an arbitrary abort_timeout.

But I'd suggest you wait to hear from @fmigneault before changing code. He knows best how to test Magpie.

@ahandan's interpretation is correct.
We manually set a limit we want to wait for, and if no response from the server is received until then, the client (us) aborts.
So there is no real code from the server.
Code 408 is picked because that's the code for "Client Timeout", in contrast to "Server Timeout" that should be 504 (or some other 5xx).

Just to be clear, there is no confusion to be had here.
When the request is initiated, our client establishes a connection with a limit "wait for" x seconds.
The server accepts that connection and keeps it open for that amount of time to receive the request.
If either the client fails to produce the request or the server to receive it / read data in time, 408 happens.
In other words, the request did not reach (completely or at all) the server.
If the server failed to produce a response in time (eg: it itself did not receive a result in time from the service it was supposed to redirect to), 504 happens instead. This is based on the timeout value configured in nginx.

OK, it's clear for me now. Just so it's clear for others as well, can we simply add "(client side)" to the reason "Request TimeOut (client side)"? Thanks.

notebooks/stress-tests.ipynb

Magpie integrate new feats ## Overview Update to latest Magpie version 3.19.0 Tested on a local birdhouse setup. ## Changes - Adds new environment variables to handle new Magpie features, which rely on email usage. These features include user registration and approvals + user assignment to groups with terms and conditions. Following PR will have to be merged prior to merging this current PR : [pavics-e2e #90](Ouranosinc/PAVICS-e2e-workflow-tests#90) Also, PR [birdhouse #197](#197 content has been transfered to this PR.

Making the actual test a multitreaded one, increase the number of req…

3d7ca8a

…uest

ahandan changed the title ~~Making the actual test a multitreaded one, increase the number of req…~~ Increase the amount of request in the stress test, using multiple threads Oct 22, 2021

Fixing issue, now the notebook is working

5decc49

ahandan requested a review from fmigneault October 25, 2021 17:47

Adding the number of runs desired

66e9105

ahandan requested review from matprov and tlvu October 25, 2021 19:22

tlvu requested changes Oct 26, 2021

View reviewed changes

notebooks/stress-tests.ipynb Outdated Show resolved Hide resolved

notebooks/stress-tests.ipynb Outdated Show resolved Hide resolved

notebooks/stress-tests.ipynb Outdated Show resolved Hide resolved

notebooks/stress-tests.ipynb Outdated Show resolved Hide resolved

matprov reviewed Oct 26, 2021

View reviewed changes

notebooks/stress-tests.ipynb Outdated Show resolved Hide resolved

fmigneault reviewed Oct 26, 2021

View reviewed changes

fmigneault requested changes Oct 26, 2021

View reviewed changes

ahandan-crim added 2 commits October 29, 2021 12:27

Refactoring, timestamps for error and progression class for threads

8f1f6e0

Ignore output

16bafe8

ahandan requested review from fmigneault and tlvu October 29, 2021 12:43

fmigneault reviewed Oct 29, 2021

View reviewed changes

notebooks/stress-tests.ipynb Show resolved Hide resolved

notebooks/stress-tests.ipynb Show resolved Hide resolved

notebooks/stress-tests.ipynb Show resolved Hide resolved

ahandan-crim added 3 commits November 1, 2021 17:16

Moved urls out of the test to make the test run on a passed in params…

566658f

… url

Cleaning code

00fb7b5

adding output ignoring

a2ac625

ahandan requested review from fmigneault and matprov November 2, 2021 16:40

fmigneault reviewed Nov 2, 2021

View reviewed changes

Move execution into run_threads + Fixing the test print alignment

090f4f8

Fix alignement of results

e2d4def

tlvu reviewed Nov 4, 2021

View reviewed changes

notebooks/stress-tests.ipynb Show resolved Hide resolved

notebooks/stress-tests.ipynb Show resolved Hide resolved

notebooks/stress-tests.ipynb Show resolved Hide resolved

notebooks/stress-tests.ipynb Show resolved Hide resolved

fmigneault reviewed Nov 17, 2021

View reviewed changes

notebooks/stress-tests.ipynb Show resolved Hide resolved

notebooks/stress-tests.ipynb Show resolved Hide resolved

Cleaning

60403bc

ahandan requested a review from fmigneault November 18, 2021 20:37

fmigneault approved these changes Nov 18, 2021

View reviewed changes

ahandan requested a review from tlvu November 18, 2021 20:51

tlvu requested changes Nov 18, 2021

View reviewed changes

ChaamC mentioned this pull request Nov 19, 2021

Magpie integrate new feats bird-house/birdhouse-deploy#224

Merged

Assertion output only all the failed runs as result printout"

6e55f30

Reset tot 500 runs

ce80d3f

ahandan-crim added 2 commits November 23, 2021 14:43

Fail Report of only all failed resulst + Exception Info

a0836a6

Code --> 408

26a8f25

tlvu approved these changes Nov 23, 2021

View reviewed changes

tlvu reviewed Nov 23, 2021

View reviewed changes

notebooks/stress-tests.ipynb Show resolved Hide resolved

ahandan-crim added 2 commits November 23, 2021 20:30

Generic Exception Catching of all requests Exceptions

c8a445d

Specify the timeout is Client side

f23ef78

tlvu reviewed Nov 23, 2021

View reviewed changes

notebooks/stress-tests.ipynb Show resolved Hide resolved

Retry : Specify the timeout is Client side

35be180

ahandan merged commit 9c63eeb into master Nov 24, 2021

ahandan deleted the multi_threading_stress_test branch November 24, 2021 18:34

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Increase the amount of request in the stress test, using multiple threads #90

Increase the amount of request in the stress test, using multiple threads #90

ahandan commented Oct 21, 2021

review-notebook-app bot commented Oct 21, 2021

tlvu commented Oct 25, 2021

tlvu left a comment

fmigneault left a comment

ahandan commented Oct 29, 2021

fmigneault left a comment •

edited

Loading

ahandan commented Nov 2, 2021

fmigneault commented Nov 3, 2021 •

edited

Loading

fmigneault commented Nov 17, 2021

ahandan commented Nov 18, 2021 •

edited

Loading

tlvu left a comment

tlvu commented Nov 18, 2021

fmigneault commented Nov 19, 2021

tlvu commented Nov 22, 2021

fmigneault commented Nov 22, 2021

ahandan commented Nov 22, 2021 •

edited

Loading

tlvu commented Nov 22, 2021

tlvu commented Nov 22, 2021

ahandan commented Nov 23, 2021

tlvu left a comment

tlvu Nov 23, 2021

ahandan Nov 23, 2021

tlvu Nov 23, 2021

fmigneault Nov 23, 2021

fmigneault Nov 23, 2021

tlvu Nov 23, 2021

Increase the amount of request in the stress test, using multiple threads #90

Increase the amount of request in the stress test, using multiple threads #90

Conversation

ahandan commented Oct 21, 2021

Overview

Changes

Related Issue / Discussion

review-notebook-app bot commented Oct 21, 2021

tlvu commented Oct 25, 2021

tlvu left a comment

Choose a reason for hiding this comment

fmigneault left a comment

Choose a reason for hiding this comment

ahandan commented Oct 29, 2021

fmigneault left a comment • edited Loading

Choose a reason for hiding this comment

ahandan commented Nov 2, 2021

fmigneault commented Nov 3, 2021 • edited Loading

fmigneault commented Nov 17, 2021

ahandan commented Nov 18, 2021 • edited Loading

tlvu left a comment

Choose a reason for hiding this comment

tlvu commented Nov 18, 2021

fmigneault commented Nov 19, 2021

tlvu commented Nov 22, 2021

fmigneault commented Nov 22, 2021

ahandan commented Nov 22, 2021 • edited Loading

tlvu commented Nov 22, 2021

tlvu commented Nov 22, 2021

ahandan commented Nov 23, 2021

tlvu left a comment

Choose a reason for hiding this comment

tlvu Nov 23, 2021

Choose a reason for hiding this comment

ahandan Nov 23, 2021

Choose a reason for hiding this comment

tlvu Nov 23, 2021

Choose a reason for hiding this comment

fmigneault Nov 23, 2021

Choose a reason for hiding this comment

fmigneault Nov 23, 2021

Choose a reason for hiding this comment

tlvu Nov 23, 2021

Choose a reason for hiding this comment

fmigneault left a comment •

edited

Loading

fmigneault commented Nov 3, 2021 •

edited

Loading

ahandan commented Nov 18, 2021 •

edited

Loading

ahandan commented Nov 22, 2021 •

edited

Loading