Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rustls support #1190

Open
wants to merge 29 commits into
base: main
Choose a base branch
from
Open

Rustls support #1190

wants to merge 29 commits into from

Conversation

wprzytula
Copy link
Collaborator

@wprzytula wprzytula commented Feb 2, 2025

No cover letter, because GitHub ate it.

The Big Picture

Before

Before, openssl was the only supported TLS backend. (therefore, all TLS abstractions in the driver use the Ssl prefix). Written in C, it's a bit cumbersome to enable in the driver: one needs to install openssl library using the system package manager.

There were two distinct cases:

  1. Global SSL context, set by a user for all connections to add encryption layer.
  2. Serverless Cloud, where TLS is used mainly to set up SNI. Then, SSL context is different for every node (endpoint) node.

SSL abstractions before

┌─←─ openssl::SslContext
│
│ gets wrapped in
│
↳ SslConfig (same for all connections)
    │               │
    │ gets cloned   ├─ either of these, depending on
    │               │  cloud or non-cloud scenario
    │  sets up SNI  │    
    ├─←─────────── CloudConfig (powered by openssl)
    │
    ↳ SslConfig (specific for the particular connection)
        │
        │ produces
        │
        ↳ openssl::Ssl (wrapper over TCP stream which adds encryption)

After

The main goal of this PR is to add support for another TLS backend: rustls, which is written in Rust and hence easier to include in the project as a usual Rust dependency. As a bonus, its authors claims it's faster than openssl.

As we're going to support more than one TLS backend, we need to create abstractions that hide the implementation details, including which backend is used.

  • -> done in this PR: For that, naming is adjusted from Ssl* abstractions specific for openssl (e.g., SslContext), to Tls* (e.g., TlsContext), which are made "generic" over the backend (actually, they are enums with #[cfg(feature)]-activated backend-specific variants).

Now, the two cases (global context and cloud) can both be done with openssl or rustls as a backend, so there are 4 combinations in total. As all combinations should be supported, new abstractions were designed to encapsulate the exact case and backend used. Thanks to that, the code is not only not more complicated after this PR, but it's actually less complicated! Comparison of the following schema with the previous one should convince you.

TLS abstractions after

┌─←─ TlsContext (openssl::SslContext / rustls::ClientConfig)
│       │
│       ├─ either of these, depending on cloud or non-cloud scenario
│       │  
├─←─ CloudConfig (powered by either TLS backend)
│
│ gets wrapped in
│
↳ TlsProvider (same for all connections)
    │
    │ produces
    │
    ↳ TlsConfig (specific for the particular connection)
        │
        │ produces
        │
        ↳ Tls (wrapper over TCP stream which adds encryption)

The Details

Features

  • introduced "__tls", which gates the common Tls* abstractions.
    Rationale: As those Tls* are enums that are empty (and this is problematic) when no TLS backend is enabled, an explicit compile error is raised in such scenario. Users should not enable this feature on their own; it's a dependency of all TLS-dependant features.

rustls support

Global TLS context case

  • SessionBuilder now accepts Option<impl Into<TlsContext>>, allowing passing both openssl::SslContext and rustls::ClientConfig.
  • Both backends can be supported by a single build of the driver, so users can even have distinct Sessions that differ by the TLS backend use, which may be handy for comparative benchmarks between the backends and choosing the best fit.

Serverless Cloud

Which backend is employed is now chosen based on the set of enabled features:

  • if "openssl-010" feature is enabled, openssl is used for cloud;
  • else if "rustls-023" feature is enabled, rustls is used for cloud;
  • else a compile error is raised.

Miscellaneous

Newly-introduced Host*Config

As @Lorak-mmk correctly observed, we could leverage the type system to distinguish global pool/connection config from one that is already made specific for a particular endpoint. This way we avoid non-obvious mutation that happened in ConnectionConfig before this PR in the cloud case.

Host{Pool,Connection}Config is introduced to bring type safety and hide the necessary adjustments done in the cloud case (setting up SNI for a particular node) in the ConnectionConfig -> HostConnectionConfig conversion. In the future, there might be more differences between *Config and Host*Config, which will further justify this duplication.

Refactored Serverless address translation

Use AddressTranslator instead of hand-crafted logic for serverless.

It seems that when I originally wrote serverless, for some reason I hand-crafted address translation logic specifically for serverless, making it fully separate from the standard AddressTranslator trait.

In this PR I successfully implement cloud address translation inside the standard AddressTranslator framework, decluttering PoolRefiller logic from cloud-related code. CloudConfig becomes an AddressTranslator itself, so it needn't be passed separately anymore. This further reduces code complication arising from cloud-related codepaths.

Changes to UntranslatedEndpoint

UntranslatedEndpoint now:

  • has getters and private fields (compared to #[non_exhaustive] public fields before) - for more flexibility in the future;
  • is parametrised by a lifetime and stores datacenter and rack as &str instead of String, in order to elide allocations upon translation. UntranslatedEndpoint is constructed right before translation from PeerEndpoint, so now we avoid String allocation.

Bumped openssl version

Older releases miss important security patches.

Other fixes

Minor ones, you'll see them yourselves in the commits.

Credits

Special thanks to @nrxus, who created a PR that I based mine on.
Thanks to @Lorak-mmk and @nemosupremo for ideas and code inspiration.

Fixes: #293
Closes: #1182
Closes: #911

Pre-review checklist

  • I have split my patch into logically separate commits.
  • All commit messages clearly explain what they change and why.
  • I added relevant tests for new features and bug fixes.
  • All commits compile, pass static checks and pass test.
  • PR description sums up the changes and reasons why they should be introduced.
  • I have provided docstrings for the public items that I want to introduce.
  • I have adjusted the documentation in ./docs/source/.
  • I added appropriate Fixes: annotations to PR description.

wprzytula and others added 4 commits February 2, 2025 19:42
ConnectionKeeper is no longer there.
The impl of Default for PoolConfig is only useful for tests.

Co-authored-by: Karol Baryła <[email protected]>
It was only passed around back and forth, but unused apart from that.
The keepalive interval that is used in keepaliver comes from
ConnectionConfig, not PoolConfig.
@wprzytula wprzytula added this to the 0.16.0 milestone Feb 2, 2025
@wprzytula wprzytula self-assigned this Feb 2, 2025
wprzytula and others added 2 commits February 2, 2025 21:59
This updates openssl to its latest available patch release.
The version that we required had some security vulnerabilities reported.
As openssl crate is still before its 1.0 (stable) release, we need to
prepare for its possible introduces breaking changes. As it's present in
our public API, we hide it behind a versioned feature flag. This way,
if in the future openssl is released in 0.11 or 1.0 version, we'll
simply add the new feature flag and gradually deprecate the old one.

As a side note, openssl crate hasn't issued a major release for 7 years
now, so it's plausible that it's not going to happen (at least soon).

Co-authored-by: Wojciech Przytuła <[email protected]>
Copy link

github-actions bot commented Feb 2, 2025

cargo semver-checks detected some API incompatibilities in this PR.
Checked commit: a59b80d

See the following report for details:

cargo semver-checks output
./scripts/semver-checks.sh --baseline-rev 448502ca1db6228ab3d1d6108db78664e1dfddb6
+ cargo semver-checks -p scylla -p scylla-cql --baseline-rev 448502ca1db6228ab3d1d6108db78664e1dfddb6
     Cloning 448502ca1db6228ab3d1d6108db78664e1dfddb6
    Building scylla v0.15.0 (current)
       Built [  34.145s] (current)
     Parsing scylla v0.15.0 (current)
      Parsed [   0.049s] (current)
    Building scylla v0.15.0 (baseline)
       Built [  22.727s] (baseline)
     Parsing scylla v0.15.0 (baseline)
      Parsed [   0.049s] (baseline)
    Checking scylla v0.15.0 -> v0.15.0 (no change)
     Checked [   0.131s] 127 checks: 117 pass, 10 fail, 0 warn, 0 skip

--- failure auto_trait_impl_removed: auto trait no longer implemented ---

Description:
A public type has stopped implementing one or more auto traits. This can break downstream code that depends on the traits being implemented.
        ref: https://doc.rust-lang.org/reference/special-types-and-traits.html#auto-traits
       impl: https://github.com/obi1kenobi/cargo-semver-checks/tree/v0.39.0/src/lints/auto_trait_impl_removed.ron

Failed in:
  type TranslationError is no longer UnwindSafe, in /home/runner/work/scylla-rust-driver/scylla-rust-driver/scylla/src/errors.rs:600
  type TranslationError is no longer RefUnwindSafe, in /home/runner/work/scylla-rust-driver/scylla-rust-driver/scylla/src/errors.rs:600

--- failure enum_missing: pub enum removed or renamed ---

Description:
A publicly-visible enum cannot be imported by its prior path. A `pub use` may have been removed, or the enum itself may have been renamed or removed entirely.
        ref: https://doc.rust-lang.org/cargo/reference/semver.html#item-remove
       impl: https://github.com/obi1kenobi/cargo-semver-checks/tree/v0.39.0/src/lints/enum_missing.ron

Failed in:
  enum scylla::cloud::CloudConfigError, previously in file /home/runner/work/scylla-rust-driver/scylla-rust-driver/target/semver-checks/git-448502ca1db6228ab3d1d6108db78664e1dfddb6/65b2eff398a4ccb7c9c581eed694c6b5b77c88c7/scylla/src/cloud/config.rs:12
  enum scylla::client::session_builder::CloudMode, previously in file /home/runner/work/scylla-rust-driver/scylla-rust-driver/target/semver-checks/git-448502ca1db6228ab3d1d6108db78664e1dfddb6/65b2eff398a4ccb7c9c581eed694c6b5b77c88c7/scylla/src/client/session_builder.rs:49

--- failure feature_missing: package feature removed or renamed ---

Description:
A feature has been removed from this package's Cargo.toml. This will break downstream crates which enable that feature.
        ref: https://doc.rust-lang.org/cargo/reference/semver.html#cargo-feature-remove
       impl: https://github.com/obi1kenobi/cargo-semver-checks/tree/v0.39.0/src/lints/feature_missing.ron

Failed in:
  feature cloud in the package's Cargo.toml
  feature ssl in the package's Cargo.toml

--- failure inherent_method_missing: pub method removed or renamed ---

Description:
A publicly-visible method or associated fn is no longer available under its prior name. It may have been renamed or removed entirely.
        ref: https://doc.rust-lang.org/cargo/reference/semver.html#item-remove
       impl: https://github.com/obi1kenobi/cargo-semver-checks/tree/v0.39.0/src/lints/inherent_method_missing.ron

Failed in:
  GenericSessionBuilder::ssl_context, previously in file /home/runner/work/scylla-rust-driver/scylla-rust-driver/target/semver-checks/git-448502ca1db6228ab3d1d6108db78664e1dfddb6/65b2eff398a4ccb7c9c581eed694c6b5b77c88c7/scylla/src/client/session_builder.rs:357

--- failure method_parameter_count_changed: pub method parameter count changed ---

Description:
A publicly-visible method now takes a different number of parameters.
        ref: https://doc.rust-lang.org/cargo/reference/semver.html#fn-change-arity
       impl: https://github.com/obi1kenobi/cargo-semver-checks/tree/v0.39.0/src/lints/method_parameter_count_changed.ron

Failed in:
  scylla::client::session_builder::GenericSessionBuilder::new now takes 0 parameters instead of 1, in /home/runner/work/scylla-rust-driver/scylla-rust-driver/scylla/src/client/session_builder.rs:88

--- failure module_missing: pub module removed or renamed ---

Description:
A publicly-visible module cannot be imported by its prior path. A `pub use` may have been removed, or the module may have been renamed, removed, or made non-public.
        ref: https://doc.rust-lang.org/cargo/reference/semver.html#item-remove
       impl: https://github.com/obi1kenobi/cargo-semver-checks/tree/v0.39.0/src/lints/module_missing.ron

Failed in:
  mod scylla::cloud, previously in file /home/runner/work/scylla-rust-driver/scylla-rust-driver/target/semver-checks/git-448502ca1db6228ab3d1d6108db78664e1dfddb6/65b2eff398a4ccb7c9c581eed694c6b5b77c88c7/scylla/src/cloud/mod.rs:1

--- failure struct_missing: pub struct removed or renamed ---

Description:
A publicly-visible struct cannot be imported by its prior path. A `pub use` may have been removed, or the struct itself may have been renamed or removed entirely.
        ref: https://doc.rust-lang.org/cargo/reference/semver.html#item-remove
       impl: https://github.com/obi1kenobi/cargo-semver-checks/tree/v0.39.0/src/lints/struct_missing.ron

Failed in:
  struct scylla::cloud::CloudConfig, previously in file /home/runner/work/scylla-rust-driver/scylla-rust-driver/target/semver-checks/git-448502ca1db6228ab3d1d6108db78664e1dfddb6/65b2eff398a4ccb7c9c581eed694c6b5b77c88c7/scylla/src/cloud/config.rs:33

--- failure struct_pub_field_missing: pub struct's pub field removed or renamed ---

Description:
A publicly-visible struct has at least one public field that is no longer available under its prior name. It may have been renamed or removed entirely.
        ref: https://doc.rust-lang.org/cargo/reference/semver.html#item-remove
       impl: https://github.com/obi1kenobi/cargo-semver-checks/tree/v0.39.0/src/lints/struct_pub_field_missing.ron

Failed in:
  field ssl_context of struct SessionConfig, previously in file /home/runner/work/scylla-rust-driver/scylla-rust-driver/target/semver-checks/git-448502ca1db6228ab3d1d6108db78664e1dfddb6/65b2eff398a4ccb7c9c581eed694c6b5b77c88c7/scylla/src/client/session.rs:171
  field cloud_config of struct SessionConfig, previously in file /home/runner/work/scylla-rust-driver/scylla-rust-driver/target/semver-checks/git-448502ca1db6228ab3d1d6108db78664e1dfddb6/65b2eff398a4ccb7c9c581eed694c6b5b77c88c7/scylla/src/client/session.rs:232
  field host_id of struct UntranslatedPeer, previously in file /home/runner/work/scylla-rust-driver/scylla-rust-driver/target/semver-checks/git-448502ca1db6228ab3d1d6108db78664e1dfddb6/65b2eff398a4ccb7c9c581eed694c6b5b77c88c7/scylla/src/cluster/metadata.rs:176
  field untranslated_address of struct UntranslatedPeer, previously in file /home/runner/work/scylla-rust-driver/scylla-rust-driver/target/semver-checks/git-448502ca1db6228ab3d1d6108db78664e1dfddb6/65b2eff398a4ccb7c9c581eed694c6b5b77c88c7/scylla/src/cluster/metadata.rs:177
  field datacenter of struct UntranslatedPeer, previously in file /home/runner/work/scylla-rust-driver/scylla-rust-driver/target/semver-checks/git-448502ca1db6228ab3d1d6108db78664e1dfddb6/65b2eff398a4ccb7c9c581eed694c6b5b77c88c7/scylla/src/cluster/metadata.rs:178
  field rack of struct UntranslatedPeer, previously in file /home/runner/work/scylla-rust-driver/scylla-rust-driver/target/semver-checks/git-448502ca1db6228ab3d1d6108db78664e1dfddb6/65b2eff398a4ccb7c9c581eed694c6b5b77c88c7/scylla/src/cluster/metadata.rs:179

--- failure struct_pub_field_now_doc_hidden: pub struct field is now #[doc(hidden)] ---

Description:
A pub field of a pub struct is now marked #[doc(hidden)] and is no longer part of the public API.
        ref: https://doc.rust-lang.org/rustdoc/write-documentation/the-doc-attribute.html#hidden
       impl: https://github.com/obi1kenobi/cargo-semver-checks/tree/v0.39.0/src/lints/struct_pub_field_now_doc_hidden.ron

Failed in:
  field UntranslatedPeer.host_id in file /home/runner/work/scylla-rust-driver/scylla-rust-driver/scylla/src/cluster/metadata.rs:174
  field UntranslatedPeer.untranslated_address in file /home/runner/work/scylla-rust-driver/scylla-rust-driver/scylla/src/cluster/metadata.rs:174
  field UntranslatedPeer.datacenter in file /home/runner/work/scylla-rust-driver/scylla-rust-driver/scylla/src/cluster/metadata.rs:174
  field UntranslatedPeer.rack in file /home/runner/work/scylla-rust-driver/scylla-rust-driver/scylla/src/cluster/metadata.rs:174

--- failure type_mismatched_generic_lifetimes: type now takes a different number of generic lifetimes ---

Description:
A type now takes a different number of generic lifetime parameters. Uses of this type that name the previous number of parameters will be broken.
       impl: https://github.com/obi1kenobi/cargo-semver-checks/tree/v0.39.0/src/lints/type_mismatched_generic_lifetimes.ron
Failed in:
  Struct UntranslatedPeer (0 -> 1 lifetime params) in /home/runner/work/scylla-rust-driver/scylla-rust-driver/scylla/src/cluster/metadata.rs:174

     Summary semver requires new major version: 10 major and 0 minor checks failed
    Finished [  58.013s] scylla
    Building scylla-cql v0.4.0 (current)
       Built [  11.042s] (current)
     Parsing scylla-cql v0.4.0 (current)
      Parsed [   0.033s] (current)
    Building scylla-cql v0.4.0 (baseline)
       Built [  11.423s] (baseline)
     Parsing scylla-cql v0.4.0 (baseline)
      Parsed [   0.035s] (baseline)
    Checking scylla-cql v0.4.0 -> v0.4.0 (no change)
     Checked [   0.127s] 127 checks: 127 pass, 0 skip
     Summary no semver update required
    Finished [  23.177s] scylla-cql
make: *** [Makefile:61: semver-rev] Error 1

@github-actions github-actions bot added the semver-checks-breaking cargo-semver-checks reports that this PR introduces breaking API changes label Feb 2, 2025
nrxus and others added 15 commits February 3, 2025 11:16
This allows for future TLS providers to be added, including rustls.

Co-authored-by: Wojciech Przytuła <[email protected]>
otherwise the error shows up too late in a non-terminal way whereas
the error is terminal
The idea is that `{Connection,Pool}Config` is the same for all
connections, whereas `Host{Connection,Pool}Config` is customized for
a specific endpoint. This commit leverages type system to better convey
this difference.
This step improves types safety and prepares for next steps.
This just changes the style from `if let` to `let else` for better
readability.

This commit should be viewed without whitespace difference.
…nfig

It makes great sense, because the CloudConfig is the subject of the
operation.
TlsProvider is the last abstraction about TLS in the driver.
The full picture looks like this:

┌─←─ TlsContext (openssl::SslContext / rustls::ClientConfig)
│
├─←─ CloudConfig (powered by either TLS backend)
│
│ gets wrapped in
│
↳ TlsProvider (same for all connections)
   │
   │ produces
   │
   ↳ TlsConfig (specific for the particular connection)
      │
      │ produces
      │
      ↳ Tls (wrapper over TCP stream which adds encryption)
As HostPoolConfig is created by cloning required fields of PoolConfig
anyway, there's no point in taking PoolConfig by move.
Before, PoolConfig would be re-created for each attempt to recreate the
control connection pool. It's redundant, because each time it would be
exactly the same, and it requires cloning ConnectionConfig.

Now, PoolConfig is only created once and stored in MetadataReader.
Before, only ConnectionConfig would be stored.
It seems that when I originally wrote serverless, for some reason I
hand-crafted address translation logic specifically for serverless,
making it fully separate from the standard AddressTranslator trait.

In this commit I successfully implement cloud address translation inside
the standard AddressTranslator framework, decluttering PoolRefiller
logic from cloud-related code. CloudConfig becomes an AddressTranslator
itself, so it needn't be passed separately anymore. This further reduces
code complication arising from cloud-related codepaths.
It's no longer necessary, as CloudConfig is now passed inside
TlsProvider.
If in the future we choose to store data differently in
UntranslatedPeer, we will now have freedom to do so, only preserving the
getters. Otherwise, we would have to retain legacy fields in such case.
As address translation should not need ownership of neither datacenter
nor rack, let's not require cloning them.
@wprzytula wprzytula force-pushed the rustls-support branch 2 times, most recently from dbaba83 to a07efdc Compare February 3, 2025 10:52
nrxus and others added 2 commits February 3, 2025 12:13
Rustls is now supported as ordinary TLS provider, for far only for the
non-cloud case.

Co-authored-by: Wojciech Przytuła <[email protected]>
rustls is now supported in cloud, too. The choice between rustls and
openssl is done based on enabled features:
- if openssl-010 feature is enabled, openssl is used for cloud;
- else if rustls-023 feature is enabled, rustls is used for cloud;
- else compile error is raised.

It is subject to discussion in the PR is such logic based on features
is acceptable.

Co-authored-by: Andres Medina <[email protected]>
nrxus and others added 6 commits February 3, 2025 12:13
Co-authored-by: Wojciech Przytuła <[email protected]>
Co-authored-by: Wojciech Przytuła <[email protected]>
This is analogous to tls-rustls.rs example.
As address translation no longer needs owned Strings, it's enough to
pass references to UntranslatedEndpoint to `open_connection` & friends.

Note that, unfortunately, we can't fully elide the clone in
`start_opening_connection()`, because the endpoint may be mutated (in
the shard-aware case) for the purpose of opening a connection to the
shard-aware port. Nonetheless, now the limitation that requires us to
clone is in internal code, not in the user-facing API, which is better
for us.
As the Serverless Cloud is a highly unstable feature, it must be marked
as such in order not to break API after 1.0 in case we need to introduce
breaking changes to the feature.
As the tls module got quite large, it makes sense to extract it to the
network supermodule.
@wprzytula wprzytula marked this pull request as ready for review February 3, 2025 12:21
@wprzytula wprzytula mentioned this pull request Feb 3, 2025
8 tasks
Copy link
Contributor

@muzarski muzarski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I noticed that we do not do cargo checks when unstable-cloud and rustls-x features are enabled. I think we should do that. Probably should check the following combinations:

  • openssl-x, rustls-x, unstable-cloud
  • openssl-x, unstable-cloud,
  • rustls-x, unstable-cloud
  • openssl-x, rustls-x
  • openssl-x (done in tls.yml, but it currently does not work)
  • rustls-x (makes sense to put it in tls.yml once it's fixed )

Also, previously we would run a tls example in tls.yml. We should add a step to run tls-rustls example as well (once workflow is fixed).

scylla/src/client/session.rs Outdated Show resolved Hide resolved
scylla/Cargo.toml Show resolved Hide resolved
scylla/src/cloud/config.rs Outdated Show resolved Hide resolved
scylla/src/network/connection.rs Outdated Show resolved Hide resolved
Comment on lines 271 to 276
#[cfg(feature = "openssl-010")]
impl From<openssl::error::ErrorStack> for TlsError {
fn from(error: openssl::error::ErrorStack) -> Self {
TlsError::OpenSsl010(error)
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not use #[from] attribute?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I took this code from @nrxus; will fix this.

scylla/src/network/connection_pool.rs Outdated Show resolved Hide resolved
scylla/src/network/connection.rs Outdated Show resolved Hide resolved
@@ -369,7 +460,7 @@ pub(crate) struct ConnectionConfig {
pub(crate) tcp_keepalive_interval: Option<Duration>,
pub(crate) timestamp_generator: Option<Arc<dyn TimestampGenerator>>,
#[cfg(feature = "__tls")]
pub(crate) tls_config: Option<TlsConfig>,
pub(crate) tls_provider: Option<TlsProvider>,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, now I understand the difference between HostConnectionConfig and ConnectionConfig.

scylla/src/network/connection.rs Show resolved Hide resolved
README.md Show resolved Hide resolved
@Lorak-mmk Lorak-mmk modified the milestones: 0.16.0, 1.0.0 Feb 5, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
semver-checks-breaking cargo-semver-checks reports that this PR introduces breaking API changes
Projects
None yet
Development

Successfully merging this pull request may close these issues.

rustls support
4 participants