-
Notifications
You must be signed in to change notification settings - Fork 307
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merge release/2.6 into google/2.6 #15914
Conversation
jolivier23
commented
Feb 14, 2025
- DAOS-16936 object: disable object collective operation by default (DAOS-16936 object: disable object collective operation by default #15793)
- DAOS-16807 client: intercept io_queue_init in libpil4dfs (DAOS-16807 client: intercept io_queue_init in libpil4dfs #15784)
- DAOS-16968 test: Fix pool query tests (DAOS-16968 test: Fix pool query tests #15770)
- DAOS-15604 test: Address intermittent scrubber aggregation test failure. (DAOS-15604 test: Address intermittent scrubber aggregation test failure. #15696) (DAOS-15604 test: Address intermittent scrubber aggregation test failure. (#15696) #15776)
- DAOS-16153 test: Do not run NLT fi tests for release builds. (DAOS-16153 test: Do not run NLT fi tests for release builds. #15171) (DAOS-16153 test: Do not run NLT fi tests for release builds. (#15171) #15258)
- DAOS-16257 mercury: Update build.config to include ep flush patch. (DAOS-16257 mercury: Update build.config to include ep flush patch. #15824)
- DAOS-16312 control: Always use --force for dmg system stop (DAOS-16312 control: Always use --force for dmg system stop #15811)
- DAOS-16990 cart: workaround to CXI init errors with retrying HG init (DAOS-16990 cart: workaround to CXI init errors with retrying HG init #15833) (DAOS-16990 cart: workaround to CXI init errors with retrying HG init … #15837)
- DAOS-17020 test: Increase unit test bdev memcheck timeout (DAOS-17020 test: Increase unit test bdev memcheck timeout #15835)
- DAOS-16768 pool: larger ABT ULT stack sizes (DAOS-16768 pool: larger ABT ULT stack sizes #15832)
- DAOS-17021 build: Tag 2.6.3 rc2 (DAOS-17021 build: Tag 2.6.3 rc2 #15842)
- DAOS-17059 client: fcntl 3rd parameter should be void * (DAOS-17059 client: fcntl 3rd parameter should be void * #15869)
- DAOS-16969 test: Reduce cleanup operations for metadata.py test (DAOS-16969 test: Reduce cleanup operations for metadata.py test #15779) (DAOS-16969 test: Reduce cleanup operations for metadata.py test (#15779) #15859)
- DAOS-16876 vos: remove DTX record after partial commit - b26 (DAOS-16876 vos: remove DTX record after partial commit - b26 #15858)
- DAOS-16876 vos: skip DTX record when load partial committed DTX - b26 (DAOS-16876 vos: skip DTX record when load partial committed DTX - b26 #15882)
- DAOS-17055 client: add a soft limit of 4k to nr ranges for list-io (DAOS-17055 client: add a soft limit of 4k to nr ranges for list-io #15879)
- DAOS-17060 build: Tag 2.6.3 rc3 (DAOS-17060 build: Tag 2.6.3 rc3 #15885)
- DAOS-17052 test: update expected msg for critical_integration (DAOS-17052 test: update expected msg for critical_integration #15855)
- DAOS-17108 common: suppress io.netty 4.1.115 CVE (DAOS-17108 common: suppress io.netty 4.1.115 CVE (#15889) #15890)
…5793) It is a temporary workaround for the collective punch crash at large scale. Signed-off-by: Fan Yong <[email protected]>
Intercepting io_queue_init() is needed on Ubuntu. There is compatibility issue for pil4dfs interception library when used with fio libaio engine. In some cases, fio initialize the aio context through io_queue_init function when loading the libaio engine. Through the pil4dfs has intercepted the io_setup function, but it seems that the io_setup which called by io_queue_init is not intercepted some times, which causing invalid aio context for I/O processing. So add an interception for io_queue_init to make it work for this case. Signed-off-by: Jun Zeng <[email protected]> Signed-off-by: Lei Huang <[email protected]>
The backport from master missed a couple of pool tests that needed to be updated for the new JSON output from pool query. Query results always include a dead_ranks array, even when it's empty. Signed-off-by: Michael MacDonald <[email protected]>
…re. (#15696) (#15776) Summary: Pass the ior_timeout to avoid the test hanging under certain situations. Signed-off-by: Padmanabhan <[email protected]>
…#15258) Signed-off-by: Ashley Pittman <[email protected]>
…15824) Signed-off-by: Joseph Moore <[email protected]>
Whenever stopping an engine process from within the control-plane, use SIGKILL rather than asking nicely (SIGTERM). This has been requested to try to avoid situations that could result in dataloss. This change preserves the behaviour where ds_mgmt_drpc_prep_shutdown() and then ds_pool_disable_exclude() will be called during a controlled shutdown where dmg system stop is called with new --full argument. Notable behavior changes with this PR: * Always performs SIGKILL on dmg system stop unless --full command option is supplied. * Will attempt prepare shutdown to disable exclusions across cluster during “controlled” shutdown where dmg system stop is called with --full option but this should be regarded as experimental and not for use in production environments. * Force option is a no-op and is retained for backward compatibility and future use. Signed-off-by: Tom Nabarro <[email protected]>
…15833) (#15837) This is a workaround for DAOS-16990 and DAOS-17011. When using the CXI provider, retry HG_Init_opt2() on error cases since it seems CXI has intermittent issues on initialization. A new environment variable is added (CRT_CXI_INIT_RETRY) to control the retry count (default is 3) and to be able to test future SS fixes without retry. Signed-off-by: Mohamad Chaarawi <[email protected]>
Increase the "Unit Test bdev with memcheck on EL 8.8" step timeout to be in sync with the master branch. Signed-off-by: Phil Henderson <[email protected]>
With this change, three ULTs in pool and container code launched via ds_pool_thread_collective() are changed to specify a larger ("deep") stack size of 64KiB rather than a default 16KiB stack size. i.e., the flags parameter specified as DSS_ULT_DEEP_STACK. The three ULT function entrypoints are: cont_open_one, cont_snap_update_one,and update_vos_prop_on_targets. Before this change, intermittently in CI testing, shortly after daos_engine startup, a dmg pool list (pool query on the back end) would occasionally result in a segmentation fault in an engine, in these three particular areas of the code. Specifically, the faults occurred within the ABT thread create, inside ABTI_mem_pool_alloc(). This change is based on a guess that the stack size parameter may have some effect. Signed-off-by: Kenneth Cain <[email protected]>
Tag second release build for 2.6.3. Signed-off-by: Phil Henderson <[email protected]>
Third argument is "void *" type in libc source code. "va_arg(arg, int);" leads to wrong argument retrieved. also need to return ENOTSUP for flock when compatible mode is not enabled. Signed-off-by: Lei Huang <[email protected]>
…) (#15859) Remove the calling of cleanup methods for multiple containers and ior commands that can be handled by destroying the pool and a single ior kill command. Signed-off-by: Phil Henderson <[email protected]>
Otherwise, the partial committed DTX entry will be re-committed when reopen the container. Then access related dangling DTX record(s) may trigger assertion and cause corruption. Signed-off-by: Fan Yong <[email protected]>
…#15882) Skip existing partial committed DTX records that were generated by DAOS-2.6.3-rc{1,2} to avoid repeated DTX commit after engine upgrade. To be safe, it is required for the user/admin to explicitly set server side environment variable "DAOS_SKIP_OLD_PARTIAL_DTX" while upgrading from DAOS-2.6.3-rc{1,2}. The environment variable can be ignored for upgrade from earlier versions. Signed-off-by: Fan Yong <[email protected]>
…15879) For dfs_readx/writex and array_read/write operations, add a limit for the number of IODs being passed to DAOS of 16k if the range lengths are under 16 bytes (best effort checking). Signed-off-by: Mohamad Chaarawi <[email protected]>
Tag third release build for 2.6.3. Signed-off-by: Phil Henderson <[email protected]>
Updated the expected journalctl message from "exited with 0" to "killed", since #15811 changed the default dmg system stop to use --force. Signed-off-by: Dalton Bohning <[email protected]>
Suppress io.netty:netty-common 4.1.115.Final CVE - no fix available Suppress io.netty:netty-handler 4.1.100.Final CVE - fix available in 4.1.118.Final Signed-off-by: Tomasz Gromadzki <[email protected]>
…le/2.6 Revert e1393d8 Change-Id: Ica1c2d04f7f54d60a616282323824baae1350f72 Signed-off-by: Jeff Olivier <[email protected]>
Errors are component not formatted correctly,Ticket number prefix incorrect,PR title is malformatted. See https://daosio.atlassian.net/wiki/spaces/DC/pages/11133911069/Commit+Comments,Unable to load ticket data |
Test stage NLT on EL 8.8 completed with status UNSTABLE. https://build.hpdd.intel.com/job/daos-stack/job/daos//view/change-requests/job/PR-15914/1/testReport/ |