Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

More iobuf cross-shard checks, better doc #23263

Conversation

travisdowns
Copy link
Member

@travisdowns travisdowns commented Sep 10, 2024

Increase the iobuf strictness in checking for "same shard" restrictions in debug mode.

Backports Required

  • none - not a bug fix
  • none - this is a backport
  • none - issue does not exist in previous branches
  • none - papercut/not impactful enough to backport
  • v24.2.x
  • v24.1.x
  • v23.3.x

Release Notes

  • none

iobuf is a tricky class which is unsafe to use across threads even in
cases which would be safe with almost all other classes, due to
hidden sharing of the underlying buffers.

Make this restriction clearer in the class commends and in the
documentation for share() and copy().

Ref CORE-7061.
We do a check (in debug mode) in many mutating methods that we are
on the origin shard of the iobuf. Move this check into its own method
to reduce verbosity and to allow a split between mutating and
non-mutating methods.
Add the mutating method check to more mutating methods. The only check
that this does, that the "origin shard" is the same as the current
shard, also should apply to const methods, but for now I'll add it only
to mutating methods which are somehow "even worse" to call when
internal sharing is present.

Ref CORE-7061.
@travisdowns
Copy link
Member Author

This is still missing the one line where we add the same-shard check to share(), but I already expect some test failures from this one so I wanted to get those out the way first.

I don't think this precludes any of the stuff that @ballard26 was thinking about doing in terms of making internal sharing safe rather than unsafe-and-sometimes-checked-in-debug: if that goes in we just need to update the doc & checks, but since this is a real footgun it seems like we can do this conservative check now (and even if we make internal sharing safe this will catch cases where the same iobuf is shared across threads which wouldn't be allowed even if we made internal sharing safe).

@travisdowns travisdowns changed the title Td core 7061 more checks in iobuf More iobuf cross-shard checks, better doc Sep 10, 2024
@vbotbuildovich
Copy link
Collaborator

vbotbuildovich commented Sep 10, 2024

new failures in https://buildkite.com/redpanda/redpanda/builds/54271#0191ddcb-a4ad-4e40-9286-c37678c13515:

"rptest.tests.audit_log_test.AuditLogTestAdminAuthApi.test_excluded_principal"
"rptest.tests.audit_log_test.AuditLogTestKafkaApi.test_management"
"rptest.tests.audit_log_test.AuditLogTestSchemaRegistry.test_sr_audit_bad_authn"
"rptest.tests.audit_log_test.AuditLogTestKafkaAuthnApi.test_authn_messages"
"rptest.tests.audit_log_test.AuditLogTestsAppLifecycle.test_recovery_mode"
"rptest.tests.compatibility.kafka_streams_test.KafkaStreamsSessionWindow.test_kafka_streams"
"rptest.tests.compatibility.kafka_streams_test.KafkaStreamsWikipedia.test_kafka_streams_wikipedia"
"rptest.tests.pandaproxy_test.PandaProxyInvalidInputsTest.test_invalid_topic_commit_offset"

new failures in https://buildkite.com/redpanda/redpanda/builds/54271#0191ddcb-a4b0-447b-a8d5-9f492e43ca22:

"rptest.tests.audit_log_test.AuditLogTestOauth.test_kafka_oauth.authz_match=AuthorizationMatch.ACL"
"rptest.tests.audit_log_test.AuditLogTestKafkaAuthnApi.test_excluded_principal"
"rptest.tests.audit_log_test.AuditLogTestSchemaRegistry.test_sr_audit_bad_authz"
"rptest.tests.compatibility.kafka_streams_test.KafkaStreamsJsonToAvro.test_kafka_streams"
"rptest.tests.audit_log_test.AuditLogTestKafkaApi.test_audit_topic_protections"
"rptest.tests.audit_log_test.AuditLogTestKafkaApi.test_no_auth_enabled"
"rptest.tests.pandaproxy_test.PandaProxyAutoAuthTest.test_remove_consumer_validation"
"rptest.tests.pandaproxy_test.PandaProxyAutoAuthTest.test_subscribe_consumer_validation"
"rptest.tests.pandaproxy_test.PandaProxyConsumerGroupTest.test_moving_group_coordinator"
"rptest.tests.pandaproxy_test.PandaProxyTest.test_create_consumer_validation"
"rptest.tests.pandaproxy_test.PandaProxyTest.test_remove_consumer_validation"

new failures in https://buildkite.com/redpanda/redpanda/builds/54271#0191ddcb-a4ab-44ed-862d-e86b6e63d109:

"rptest.tests.audit_log_test.AuditLogTestAdminApi.test_config_rejected"
"rptest.tests.audit_log_test.AuditLogTestKafkaApi.test_excluded_topic"
"rptest.tests.audit_log_test.AuditLogTestKafkaAuthnApi.test_authn_failure_messages"
"rptest.tests.audit_log_test.AuditLogTestSchemaRegistry.test_sr_audit"
"rptest.tests.audit_log_test.AuditLogTestKafkaTlsApi.test_mtls"
"rptest.tests.compatibility.kafka_streams_test.KafkaStreamsPageView.test_kafka_streams_page_view"
"rptest.tests.pandaproxy_test.PandaProxyAutoAuthTest.test_consumer_group_binary_v2"
"rptest.tests.pandaproxy_test.PandaProxyTest.test_consumer_group_binary_v2"
"rptest.tests.pandaproxy_test.PandaProxyTest.test_subscribe_consumer_validation"
"rptest.tests.rpk_registry_test.RpkRegistryTest.test_registry_schema"
"rptest.tests.schema_registry_test.SchemaRegistryModeMutableTest.test_mode_readonly"

new failures in https://buildkite.com/redpanda/redpanda/builds/54271#0191ddcb-a4b2-464e-8345-35a6e8f13361:

"rptest.tests.audit_log_test.AuditLogTestKafkaApi.test_consume"
"rptest.tests.audit_log_test.AuditLogTestKafkaApi.test_produce"
"rptest.tests.audit_log_test.AuditLogTestKafkaAuthnApi.test_no_audit_user_authn"
"rptest.tests.audit_log_test.AuditLogTestAdminApi.test_audit_log_metrics"
"rptest.tests.compatibility.kafka_streams_test.KafkaStreamsTopArticles.test_kafka_streams"
"rptest.tests.audit_log_test.AuditLogTestOauth.test_kafka_oauth.authz_match=AuthorizationMatch.RBAC"

new failures in https://buildkite.com/redpanda/redpanda/builds/54336#0191e23c-59bd-47ec-9df9-174d8a6e5372:

"rptest.tests.audit_log_test.AuditLogTestAdminApi.test_audit_log_metrics"

new failures in https://buildkite.com/redpanda/redpanda/builds/54337#0191e267-7b33-42b1-a8a0-94c65b52e5cb:

"rptest.tests.audit_log_test.AuditLogTestsAppLifecycle.test_recovery_mode"

new failures in https://buildkite.com/redpanda/redpanda/builds/54348#0191e2be-ea10-4287-a00f-0898528f1796:

"rptest.tests.audit_log_test.AuditLogTestsAppLifecycle.test_recovery_mode"

* shard from the source and so on).
*
* The only safe way to get the contents of an iobuf from one shard to another
* is to pass the iobuf to the other shard and then call copy() on it, which is
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't seem like a safe interface to me?

It means the submit_to caller always has to wait until the submit_to future resolves to do anything with its iobuf and hope the submit_to side is not keeping the original iobuf around somehow? It seems as unsafe to me as the more idiomatic way of just doing copy/share first and then moving the result to the other shard. It further pessimizes the cases where no copy is needed and the iobuf is simply EOL on the source shard.

I think the oncore stuff is just broken and doesn't really work as is. I tried to extend it in the past as well.

This really needs language support, maybe there is some thread_local hackery that might work?

Copy link
Member Author

@travisdowns travisdowns Sep 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good feedback, thanks.

It means the submit_to caller always has to wait until the submit_to future resolves to do anything with its iobuf and hope the submit_to side is not keeping the original iobuf around somehow?

Right, though waiting until the future resolves is almost always the existing use case for submit_to and related calls?

That said, I wrote this as a codification of the oncore existing restrictions of the class, and the existing behavior of the share/copy/move ctor methods: the thing that creates the new iobuf (here, copy() needs to happen on the target shard since it grabs its origin shard ID implicitly from the current shard.

As far as I can tell, iobuf does not currently allow any reasonable way to move an iobuf from one shard to another: the origin ID is preserved on move, so you can never get a moved to iobuf on another shard with the right origin ID. It may only happen to work in some cases because the verify check wasn't uniformly added everywhere so if you avoid the methods where it does exist you at least won't get an assert (but you may crash).

I don't think it's a slam dunk though that copy()-on-source is definitely better at least if we have any concept of origin shard: consider the case where you do an invoke_on (all shards), you really want to create N copies (at east with the current restriction), one for each shard. In that case copy() on the target just works: makes the right number of copies of the iobuf with right shard ID.

It further pessimizes the cases where no copy is needed and the iobuf is simply EOL on the source shard.

Yes, but as above this is "already the case" and don't think there's any way to move safely currently.

I think the oncore stuff is just broken and doesn't really work as is.

Well the restriction is "too strict", but we already have it so I'm trying to lean on that for safety at the cost of re-enforcing the existing pessimization. So it is at least partly workable in the sense that it has been like this since iobuf introduction.

That said, totally open to better approaches here too, either for the doc or the implementation. Just relying on 100% adherence to complex and mostly undocumented lifetime runs isn't going to cut it, I think: especially because the one of the main failure mode occurs due to a non-atomic increment race which is going to be very hard to pick up with testing and likely to slip through to production.

I tried to #14024 it in the past as well.

Ah, I wish I had seen that. Yeah it's the same vein. There's also more discussion in https://redpandadata.atlassian.net/browse/CORE-7061 if you hadn't seen it.

This really needs language support, maybe there is some thread_local hackery that might work?

Sure, we should be open to anything here. Maybe like @ballard26 said it's easier just to make the refcounting atomic, which would make the semantics more sane and hence easier to document :). That does look like a potentially invasive change performance-wise though since I think it means temporary_buffer needs use atomic refcounting everywhere in seastar (i.e., not opt-in e.g., via template parameter) since we buffers created and consumed in seastar to be like that from the start: plus the work of combing through all the deleter implementations to see if they are truly compatible with cross-shard use beyond the refcounting, since they can do arbitrary things at deletion time.

Copy link
Member

@StephanDollberg StephanDollberg Sep 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As far as I can tell, iobuf does not currently allow any reasonable way to move an iobuf from one shard to another: the origin ID is preserved on move, so you can never get a moved to iobuf on another shard with the right origin ID. It may only happen to work in some cases because the verify check wasn't uniformly added everywhere so if you avoid the methods where it does exist you at least won't get an assert (but you may crash).

Yes, but as above this is "already the case" and don't think there's any way to move safely currently.

Right yes, I guess I should have clarified that I was thinking more about the release build usecase or with the oncore check removed as I think it doesn't make much sense in its current form as described otherwise. I do very much suspect it wasn't added to all methods because it would break some common current usage patterns that are fine from a thread safety perspective.

I don't think it's a slam dunk though that copy()-on-source is definitely better at least if we have any concept of origin shard: consider the case where you do an invoke_on (all shards), you really want to create N copies (at east with the current restriction), one for each shard. In that case copy() on the target just works: makes the right number of copies of the iobuf with right shard ID.

In that example, is a copy even needed at all? If it's safe to copy/read from N other threads in parallel then it would also be safe to just read directly without copying for the duration of the invoke_on_all call? It would require const == threadsafe, not sure whether that is given at least today.

said it's easier just to make the refcounting atomic ... potentially invasive change performance-wise though

I do think that is the best way forward. Back when I was looking into this because of the same bug/misuse in a different part of the code this was my conclusion as well. Note I think this could be a performance gain as it would mean that we can drop the extra copies on the produce and fetch path which are likely a lot more expensive.

For the checks, I am wondering whether we can add a method along the signature of:

iobuf&& move_to_shard(shard_id target_shard_id)

That invalidates the current object and creates a new one with the source shard id adapted. Then we could possibly add the checks in more places like share (outside of the constructors which are probably still needed implicitly as seastar might move the lambda under the hood). Haven't really thought this through.

Copy link
Member

@StephanDollberg StephanDollberg Sep 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But no, move_to_shard is never really safe because at least in its current form the iobuf doesn't know whether it uniquely owns the temporary_buffer (in #13447 (comment) the issue was definitely that we could have already gotten a shared buffer out of ss::input_stream::read_up_to).

So I guess this really already pessimizes this case:

It further pessimizes the cases where no copy is needed and the iobuf is simply EOL on the source shard.

as it can never be correct unfortunately. And in that sense share with oncore check failing anywhere is a bug. As are the failures in this PR? Well maybe not if you can guarantee that the temporary_buf is not shared possibly because you constructed it yourself somewhere.

So possibly copy_to_shard(shard_id target_shard_id) might be a good interface but in practice just doing the copy in the submit_to callback is just easier then.

Copy link
Member Author

@travisdowns travisdowns Sep 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right yes, I guess I should have clarified that I was thinking more about the release build usecase or with the oncore check removed as I think it doesn't make much sense in its current form as described otherwise.

Well I feel like we shouldn't do be doing anything in the release build that would assert in the debug build, right? Both "in principle" but also actually it should be unlikely since we run (some subset of) our tests in debug and those don't assert. So I would say that the methods that have the oncore check are off-limits for cross-origin calls regardless of the build mode.

That aside, I basically agree with you that the oncore check seems like it's not really doing the right thing or at least pessimizes things unnecessarily. So I'll consider this on hold for now.

Edit: To add this was responding to your comment 2 up, I hadn't see the one immediately above so I'll have to consider that further.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as it can never be correct unfortunately.

Right. Though some correct cases can be detected, conservatively, though it would require more tracking. E.g., every operation either potentially causes shared buffers to be added to the iobuf, or not. E.g., the constructors that just pass char*/size which is copied into the iobuf don't result in sharing, and something like the move-ctor should inherit the sharing state. clear() removes all sharing, etc. So you track this boolean state "not shared vs maybe shared" and only assert if it's in the maybe shared state or some other kind of logic. E.g., moving across shards is safe for not shared stuff.

As are the failures in this PR?

I did look at two cases and they both looked safe in the sense above: on one shard an iobuf is created "unshared" (e.g., as the result of parsing some json), then it is moved to another shard (produce flow in this case).

@travisdowns travisdowns marked this pull request as draft September 11, 2024 16:51
@travisdowns
Copy link
Member Author

/ci-repeat 1
debug
tests/rptest/tests/audit_log_test.py::AuditLogTestAdminApi.test_audit_log_metrics

@travisdowns
Copy link
Member Author

/ci-repeat 1
debug
skip-redpanda-build
tests/rptest/tests/audit_log_test.py::AuditLogTestAdminApi.test_audit_log_metrics

@travisdowns
Copy link
Member Author

/ci-repeat 1
debug
skip-redpanda-build
tests/rptest/tests/audit_log_test.py::AuditLogTestsAppLifecycle.test_recovery_mode

@travisdowns
Copy link
Member Author

/ci-repeat 1
debug
skip-redpanda-build
tests/rptest/tests/audit_log_test.py::AuditLogTestsAppLifecycle.test_recovery_mode

@travisdowns
Copy link
Member Author

See also here for an upstream discussion on making temporary_buffer thread-safe: scylladb/seastar#2450.

@travisdowns
Copy link
Member Author

This is obsoleted by redpanda-data/seastar#149.

@travisdowns travisdowns closed this Dec 9, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants