Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Return an error if the path is not permanent #1698

Merged
merged 5 commits into from
Mar 13, 2024

Conversation

KershawChang
Copy link
Collaborator

For issue #1697
This is just a workaround for stop crashing.
The root cause of this crash is still not clear. Maybe necko does something wrong.

Note that the crash signature seems wrong. The frame 9 is neqo_transport::path::Path::local_cid(), but the provided link to the code indicates the function is neqo_transport::path::Path::remote_cid().

Copy link
Member

@martinthomson martinthomson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any way that you could generate a test case for this?

Copy link

codecov bot commented Mar 1, 2024

Codecov Report

Attention: Patch coverage is 50.00000% with 2 lines in your changes are missing coverage. Please review.

Project coverage is 92.97%. Comparing base (ad027cf) to head (baa9b52).

Files Patch % Lines
neqo-transport/src/connection/mod.rs 50.00% 2 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1698      +/-   ##
==========================================
- Coverage   92.98%   92.97%   -0.01%     
==========================================
  Files         120      120              
  Lines       37340    37344       +4     
==========================================
+ Hits        34719    34721       +2     
- Misses       2621     2623       +2     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@KershawChang
Copy link
Collaborator Author

Is there any way that you could generate a test case for this?

It appears that make_permanent is called at the very beginning when a connection is created, which suggests that the only place setting remote_cid to none is here. Maybe this is caused by receiving an unexpected NewConnectionId frame?
Unfortunately, I'm uncertain about how to write a test case to confirm this hypothesis

@KershawChang
Copy link
Collaborator Author

@martinthomson , I think I am out of my ideas.
I'd really appreciate if you can point me a new direction to investigate. Thanks.

@martinthomson
Copy link
Member

I spent a little time with process_input and we have several cases that are relevant here:

  1. The packet is processed successfully. No problem.
  2. There was an error in packet preprocessing:
    1. The packet was garbage, which we ignore. No problem.
    2. We received a Retry packet. Again, no close message sent then. No problem.
    3. We received a bad Version Negotiation packet. We go straight to closed without sending anything. No problem.
    4. We received a Version Negotiation packet and were not able to create a new client. PROBLEM?
  3. We want to update keys, but we have run out of keys. We deliberately do not send any close message in this case, so no problem.
  4. We received a stateless reset. We don't send anything more from this state. No problem.
  5. There was an error in the packet processing. We always setup the path from this state. No problem?

That last one contains a problem. If we receive a bad packet from a new path AND we don't have a connection ID AND the peer requires a connection ID, the attempt to make a new path will fail.

That seems like a reasonable test case to try out.

@KershawChang
Copy link
Collaborator Author

I found a way to trigger the crash by calling process_output in the end of test retire_prior_to_migration_failure.
Note that we do call ensure_permanent, but we are not able to make the path permanent.
Given this situation, it seems our options are limited. One possible solution might be adding a check to see if the path is temporary before building a packet.

Comment on lines 901 to 904

// See issue #1697. We had a crash when the client had a temporary path and
// process_output is called.
mem::drop(client.process_output(now()));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is good. I'd prefer to see this as a separate test case though.

Did you not have any luck with pushing a bad packet in on a new path? You can use the test frame writer to make a garbage packet (a single varint should do the trick, as that will be interpreted as a frame type that we don't support).

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. Will try to write another test case.

Copy link

QUIC Interop Runner

Saving logs to logs.
Run took 0:03:33.925181
Server: neqo-latest. Client: neqo-latest. Running test case: handshake
Server: quic-go. Client: neqo-latest. Running test case: handshake
Server: ngtcp2. Client: neqo-latest. Running test case: handshake
Server: neqo. Client: neqo-latest. Running test case: handshake
Server: msquic. Client: neqo-latest. Running test case: handshake
Server: neqo-latest. Client: quic-go. Running test case: handshake
Server: neqo-latest. Client: ngtcp2. Running test case: handshake
Server: neqo-latest. Client: neqo. Running test case: handshake
Server: neqo-latest. Client: msquic. Running test case: handshake
+-------------+-------------+---------+--------+------+--------+
|             | neqo-latest | quic-go | ngtcp2 | neqo | msquic |
+-------------+-------------+---------+--------+------+--------+
| neqo-latest |     ✓(H)    |   ✓(H)  |  ✓(H)  | ✓(H) |  ✓(H)  |
|             |     ?()     |   ?()   |  ?()   | ?()  |  ?()   |
|             |     ✕()     |   ✕()   |  ✕()   | ✕()  |  ✕()   |
+-------------+-------------+---------+--------+------+--------+
|   quic-go   |     ✓(H)    |         |        |      |        |
|             |     ?()     |         |        |      |        |
|             |     ✕()     |         |        |      |        |
+-------------+-------------+---------+--------+------+--------+
|    ngtcp2   |     ✓(H)    |         |        |      |        |
|             |     ?()     |         |        |      |        |
|             |     ✕()     |         |        |      |        |
+-------------+-------------+---------+--------+------+--------+
|     neqo    |     ✓(H)    |         |        |      |        |
|             |     ?()     |         |        |      |        |
|             |     ✕()     |         |        |      |        |
+-------------+-------------+---------+--------+------+--------+
|    msquic   |     ✓(H)    |         |        |      |        |
|             |     ?()     |         |        |      |        |
|             |     ✕()     |         |        |      |        |
+-------------+-------------+---------+--------+------+--------+

⬇️ Download logs

Copy link
Member

@martinthomson martinthomson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That test is very nice. I'm sure that it took longer to write than it looks, because it looks effortless. Thanks Kershaw. Good work.

neqo-transport/src/connection/tests/migration.rs Outdated Show resolved Hide resolved
neqo-transport/src/connection/tests/migration.rs Outdated Show resolved Hide resolved
Copy link
Member

@martinthomson martinthomson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Clippy complained, but sort that out and this is good to land.

neqo-transport/src/connection/mod.rs Outdated Show resolved Hide resolved
neqo-transport/src/connection/tests/migration.rs Outdated Show resolved Hide resolved
@KershawChang KershawChang merged commit 2ee8c7a into mozilla:main Mar 13, 2024
12 checks passed
@KershawChang KershawChang deleted the issue_1697 branch March 13, 2024 10:27
Copy link

Benchmark results

  • coalesce_acked_from_zero 1+1 entries
    time: [253.08 ns 254.92 ns 258.36 ns]
    change: [+0.8737% +1.4957% +2.1808%] (p = 0.00 < 0.05)
    Change within noise threshold.

  • coalesce_acked_from_zero 3+1 entries
    time: [289.37 ns 290.09 ns 290.83 ns]
    change: [+0.6054% +0.9470% +1.2682%] (p = 0.00 < 0.05)
    Change within noise threshold.

  • coalesce_acked_from_zero 10+1 entries
    time: [289.58 ns 290.52 ns 291.62 ns]
    change: [+0.8819% +1.7147% +2.7910%] (p = 0.00 < 0.05)
    Change within noise threshold.

  • coalesce_acked_from_zero 1000+1 entries
    time: [267.54 ns 271.62 ns 281.06 ns]
    change: [+2.9370% +8.6965% +19.705%] (p = 0.05 > 0.05)
    No change in performance detected.

  • RxStreamOrderer::inbound_frame()
    time: [125.95 ms 126.12 ms 126.37 ms]
    change: [+3.1957% +3.3536% +3.5923%] (p = 0.00 < 0.05)
    💔 Performance has regressed.

  • Run multiple transfers with varying seeds
    time: [187.97 ms 188.68 ms 189.38 ms]
    change: [-6.8699% -6.3892% -5.9527%] (p = 0.00 < 0.05)
    💚 Performance has improved.

  • Run multiple transfers with the same seed
    time: [189.52 ms 189.93 ms 190.35 ms]
    change: [-5.2761% -5.0096% -4.7463%] (p = 0.00 < 0.05)
    💚 Performance has improved.

⬇️ Download logs

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants