Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Grand dispatch queue #661

Open
wants to merge 169 commits into
base: main
Choose a base branch
from
Open

Grand dispatch queue #661

wants to merge 169 commits into from

Conversation

sbSteveK
Copy link
Contributor

@sbSteveK sbSteveK commented Jul 29, 2024

Integration of Apple's grand dispatch queue with event loop

The CI is failing for dispatch queue because of the apple network socket is not in. The socket related changes is in PR: #662

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

@codecov-commenter
Copy link

codecov-commenter commented Jul 29, 2024

Codecov Report

Attention: Patch coverage is 63.15789% with 7 lines in your changes missing coverage. Please review.

Project coverage is 79.68%. Comparing base (fdb053b) to head (4fadfee).

Files with missing lines Patch % Lines
source/event_loop.c 63.15% 7 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #661      +/-   ##
==========================================
- Coverage   79.69%   79.68%   -0.01%     
==========================================
  Files          30       30              
  Lines        6117     6119       +2     
==========================================
+ Hits         4875     4876       +1     
- Misses       1242     1243       +1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Base automatically changed from runtime_select_event_loop to main January 8, 2025 17:19
* when we need verify if the dispatch_loop is alive. This makes sure that the dispatch_loop will not be destroyed
* from other thread while we are using it.
*/
struct aws_rw_lock lock;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Trivial: The comment above refers to this lock as the "conetxt lock". Spelling on "context lock" and also we should clarify by either calling it "The dispatch_loop_context's lock" or renaming the lock member here to "context_lock"

DEFAULT_QUEUE_SIZE,
sizeof(struct scheduled_service_entry *),
&s_compare_timestamps)) {
goto clean_up;
Copy link
Contributor Author

@sbSteveK sbSteveK Jan 22, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this fails and we goto clean_up and dispatch_loop_context *context has not yet been assigned to dispatch_loop->context.

This means when s_dispatch_event_loop_destroy(loop) checks whether it should decref context by doing a NULL check on dispatch_loop->context, it fails and leaks memory.

We should assign dispatch_loop->context with this context immediately following its ref_count being initialized so it gets cleaned properly if we goto cleanup here.


s_lock_cross_thread_data(dispatch_loop);
if (dispatch_loop->synced_data.suspended) {
AWS_LOGF_INFO(AWS_LS_IO_EVENT_LOOP, "id=%p: Starting event-loop thread.", (void *)event_loop);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since dispatch loops are created in a running state when we retrieve it from Apple's framework, we should add this log to the point where the dispatch loop is started in the aws_event_loop_new_with_dispatch_queue function. This one will never get called on a newly created and run dispatch loop but we want the log that one was created and Started for debugging purposes.

if (!dispatch_loop->synced_data.suspended) {
dispatch_loop->synced_data.suspended = true;
AWS_LOGF_INFO(AWS_LS_IO_EVENT_LOOP, "id=%p: Stopping event-loop thread.", (void *)event_loop);
/* Suspend will increase the dispatch reference count. It is required to call resume before
Copy link
Contributor Author

@sbSteveK sbSteveK Jan 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment clarification is required. Specify that the reference count discussed here is Apple's reference count on the dispatch loop and not ours. Instead of stating we need to call "resume", specify that we either need to call Apple's API dispatch_resume() or that we need to call run() which detects that the dispatch loop is suspended and calls dispatch_resume().

None of this is necessary if we simply remove the ability to suspend the dispatch loop entirely.

return AWS_OP_SUCCESS;
}

static int s_stop(struct aws_event_loop *event_loop) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dispatch loop shouldn't ever be put into a stop state (dispatch_suspend). If this is called, we should either ignore it, or return an error (platform not supported maybe?) and not mess with any complexity we need to add to support a thing that shouldn't be happening.

s_stop and s_run can be emptied, we will no longer need to check that the dispatch loop is in a non-suspended state before attempting to destroy it, and we can clean up the dispatch_loop by removing the bool suspended from synced_data. Let's only add complexity where it's required.

*
* The function should be wrapped with the following locks:
* dispatch_loop->context->lock: To retain the dispatch loop
* dispatch_loop->synced_data.lock : To verify if the dispatch loop is suspended
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we are removing the ability to suspend a dispatch queue, this is no longer a necessary comment.

/* Synced data handle cross thread tasks and events, and event loop operations*/
struct {
/**
* The lock is used to protect synced_data across the threads. It should be acquired whenever we touched the
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Trivial: "It should be acquired whenever data in the synced_data struct is accessed or modified." ?

*
* If timestamp==0, the function will always schedule a new iteration as long as the event loop is not suspended.
*
* The function should be wrapped with the following locks:
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Trivial: "This function"

* 2. Process cross-thread tasks.
* 3. Execute all runnable tasks.
*
* Apple Dispatch queues are FIFO queues to which the application can submit tasks in the form of block objects, and the
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should state that we create a serial dispatch queue and not that Apple Dispatch queues are FIFO. Concurrent Apple dispatch queues can be created so we want to be clear here what we are doing.

*
* Apple Dispatch queues are FIFO queues to which the application can submit tasks in the form of block objects, and the
* block objects will be executed on a system defined thread pool. Instead of executing the loop on a single thread, we
* tried to recurrently run a single iteration of the execution loop as a dispatch queue block object.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Trivial: remove "tried to". I think we simply are doing it here.

/**
* The function should be wrapped around scheduling_status->lock
*/
static void s_scheduled_service_entry_destroy(
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is only called as part of cleanup on s_run_iteration. Can we remove this function and add its execution code into s_run_iteration.

If we keep the function, we should pull in the locking and unlocking of the scheduling_status->lock into this function itself to keep the other end cleaner as well as update the comment to explain what this function is doing in the comment description above.

s_acquire_dispatch_loop_context(dispatch_queue_context);
s_rlock_dispatch_loop_context(dispatch_queue_context);

if (!begin_iteration(entry)) {
Copy link
Contributor Author

@sbSteveK sbSteveK Jan 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The begin_iteration() function is only used in this one spot. It might be cleaner and easier to follow if we move the internals of that check (it looks to be checking whether the entry->dispatch_queue_context->io_dispatch_loop is NULL).

struct aws_io_handle {
union {
int fd;
/* on Apple systems, handle is of type nw_connection_t. On Windows, it's a SOCKET handle. */
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Clarity needed. I don't think we're using nw_connection_t unless dispatch queue is turned on.

"On Apple systems using dispatch queue event loops handle is of type nw_connection_t"

/**
* @internal - Don't use outside of testing.
*
* Return the default event loop type. If the return value is `AWS_ELT_PLATFORM_DEFAULT`, the function failed to
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you changed this to AWS_EVENT_LOOP_PLATFORM_DEFAULT as AWS_ELT_PLATFORM_DEFAULT doesn't exist.

Also, it looks like there is nothing returned on a failure to retrieve the default type value. I think its a compile time error?

* */
static void end_iteration(struct scheduled_service_entry *entry) {

struct dispatch_loop_context *context = entry->dispatch_queue_context;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's rename this variable to be consistent in its name across various functions for clarity. context -> dispatch_queue_context

s_get_unique_dispatch_queue_id(dispatch_queue_id);

dispatch_loop->dispatch_queue = dispatch_queue_create(dispatch_queue_id, DISPATCH_QUEUE_SERIAL);
if (!dispatch_loop->dispatch_queue) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dispatch_queue_create Apple API will always return a dispatch_queue_t. If it can't a crash on the OS level is expected on this API call.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants