Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merge release/2.6 into google/2.6 #15914

Merged
merged 20 commits into from
Feb 15, 2025
Merged

Conversation

jolivier23
Copy link
Contributor

Nasf-Fan and others added 20 commits January 29, 2025 11:41
…5793)

It is a temporary workaround for the collective punch crash at large scale.

Signed-off-by: Fan Yong <[email protected]>
Intercepting io_queue_init() is needed on Ubuntu. There is compatibility issue for pil4dfs interception library when used with fio libaio engine. In some cases, fio initialize the aio context through io_queue_init function when loading the libaio engine. Through the pil4dfs has intercepted the io_setup function, but it seems that the io_setup which called by io_queue_init is not intercepted some times, which causing invalid aio context for I/O processing. So add an interception for io_queue_init to make it work for this case.

Signed-off-by: Jun Zeng <[email protected]>
Signed-off-by: Lei Huang <[email protected]>
The backport from master missed a couple of pool tests that
needed to be updated for the new JSON output from pool query.
Query results always include a dead_ranks array, even when
it's empty.

Signed-off-by: Michael MacDonald <[email protected]>
…re. (#15696) (#15776)

Summary: Pass the ior_timeout to avoid the test hanging under certain situations.

Signed-off-by: Padmanabhan <[email protected]>
Whenever stopping an engine process from within the control-plane, use
SIGKILL rather than asking nicely (SIGTERM). This has been requested
to try to avoid situations that could result in dataloss.

This change preserves the behaviour where ds_mgmt_drpc_prep_shutdown()
and then ds_pool_disable_exclude() will be called during a controlled
shutdown where dmg system stop is called with new --full argument.

Notable behavior changes with this PR:
  * Always performs SIGKILL on dmg system stop unless --full command
option is supplied.
  * Will attempt prepare shutdown to disable exclusions across cluster
during “controlled” shutdown where dmg system stop is called with
--full option but this should be regarded as experimental and not
for use in production environments.
  * Force option is a no-op and is retained for backward compatibility
and future use.

Signed-off-by: Tom Nabarro <[email protected]>
…15833) (#15837)

This is a workaround for DAOS-16990 and DAOS-17011.

When using the CXI provider, retry HG_Init_opt2() on error cases since
it seems CXI has intermittent issues on initialization. A new
environment variable is added (CRT_CXI_INIT_RETRY) to control the retry
count (default is 3) and to be able to test future SS fixes without
retry.

Signed-off-by: Mohamad Chaarawi <[email protected]>
Increase the "Unit Test bdev with memcheck on EL 8.8" step timeout to be
in sync with the master branch.

Signed-off-by: Phil Henderson <[email protected]>
With this change, three ULTs in pool and container code launched via
ds_pool_thread_collective() are changed to specify a larger ("deep")
stack size of 64KiB rather than a default 16KiB stack size. i.e., the
flags parameter specified as DSS_ULT_DEEP_STACK. The three ULT
function entrypoints are:
cont_open_one, cont_snap_update_one,and update_vos_prop_on_targets.

Before this change, intermittently in CI testing, shortly after
daos_engine startup, a dmg pool list (pool query on the back end)
would occasionally result in a segmentation fault in an engine, in
these three particular areas of the code. Specifically, the faults
occurred within the ABT thread create, inside ABTI_mem_pool_alloc().

This change is based on a guess that the stack size parameter may have
some effect.

Signed-off-by: Kenneth Cain <[email protected]>
Tag second release build for 2.6.3.

Signed-off-by: Phil Henderson <[email protected]>
Third argument is "void *" type in libc source code.
"va_arg(arg, int);" leads to wrong argument retrieved.
also need to return ENOTSUP for flock when compatible
mode is not enabled.

Signed-off-by: Lei Huang <[email protected]>
…) (#15859)

Remove the calling of cleanup methods for multiple containers and ior
commands that can be handled by destroying the pool and a single ior
kill command.

Signed-off-by: Phil Henderson <[email protected]>
Otherwise, the partial committed DTX entry will be re-committed when
reopen the container. Then access related dangling DTX record(s) may
trigger assertion and cause corruption.

Signed-off-by: Fan Yong <[email protected]>
…#15882)

Skip existing partial committed DTX records that were generated by
DAOS-2.6.3-rc{1,2} to avoid repeated DTX commit after engine upgrade.

To be safe, it is required for the user/admin to explicitly set server side
environment variable "DAOS_SKIP_OLD_PARTIAL_DTX" while upgrading
from DAOS-2.6.3-rc{1,2}. 

The environment variable can be ignored for upgrade from earlier versions.

Signed-off-by: Fan Yong <[email protected]>
…15879)

For dfs_readx/writex and array_read/write operations, add a limit for
the number of IODs being passed to DAOS of 16k if the range lengths are
under 16 bytes (best effort checking).

Signed-off-by: Mohamad Chaarawi <[email protected]>
Tag third release build for 2.6.3.

Signed-off-by: Phil Henderson <[email protected]>
Updated the expected journalctl message from "exited with 0" to "killed",
since #15811 changed the default dmg system stop to use --force.

Signed-off-by: Dalton Bohning <[email protected]>
Suppress io.netty:netty-common 4.1.115.Final CVE - no fix available
Suppress io.netty:netty-handler 4.1.100.Final CVE -
    fix available in 4.1.118.Final

Signed-off-by: Tomasz Gromadzki <[email protected]>
…le/2.6

Revert e1393d8

Change-Id: Ica1c2d04f7f54d60a616282323824baae1350f72
Signed-off-by: Jeff Olivier <[email protected]>
Copy link

Errors are component not formatted correctly,Ticket number prefix incorrect,PR title is malformatted. See https://daosio.atlassian.net/wiki/spaces/DC/pages/11133911069/Commit+Comments,Unable to load ticket data
https://daosio.atlassian.net/browse/Merge

@daosbuild1
Copy link
Collaborator

Test stage NLT on EL 8.8 completed with status UNSTABLE. https://build.hpdd.intel.com/job/daos-stack/job/daos//view/change-requests/job/PR-15914/1/testReport/

@jxiong jxiong self-requested a review February 15, 2025 01:50
@jolivier23 jolivier23 merged commit 0b7ff54 into google/2.6 Feb 15, 2025
66 of 72 checks passed
@jolivier23 jolivier23 deleted the jeffolivier/google/2.6 branch February 15, 2025 01:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.