Add rollout restart agent e2e test #2799

vankichi · 2025-01-10T15:40:46Z

Description

Implement the E2E test and run it on GitHub Actions
- Background: Search fails w/DEADLINE_EXCEEDED when agent rolling update
Function
- Setup Vald cluster w/PV and disabled in-memory mode
- Rollout-restart agent after createIndex done
- Continue search operation during rollout-restart agent pods
- Catch DEADLINE_EXCEEDED error when it happens on the search operation
  - if the error occurs, E2E test ends with Fatal

Related Issue

Versions

Vald Version: v1.7.15
Go Version: v1.23.4
Rust Version: v1.83.0
Docker Version: v27.4.0
Kubernetes Version: v1.32.0
Helm Version: v3.16.3
NGT Version: v2.3.5
Faiss Version: v1.9.0

Checklist

I have read the CONTRIBUTING document and completed our CLA agreement.
I have checked open Pull Requests for the similar feature or fixes?

Special notes for your reviewer

Summary by CodeRabbit

Release Notes

New Features
- Added configuration options for K3D storage setup in GitHub Actions workflows.
- Introduced new end-to-end (E2E) test for agent rollout restart functionality.
- New input parameters for K3D setup to enhance configurability.
- New Helm values configuration for rollout agent.
- Added a new YAML configuration file for rollout agent settings.
- Enhanced Docker image build configurations to support multiple architectures (linux/amd64 and linux/arm64).
- Added new methods for search operations in the operation package.
Testing Improvements
- Enhanced Kubernetes client with StatefulSet readiness check.
- Added new E2E test scenario for system resilience during agent restart.
- Increased timeout for gRPC connection to improve reliability.
- Improved error handling and synchronization in streaming operations.
- Enhanced MultiSearch method to handle more results and introduced timeout configuration.
Configuration Updates
- Added configurable wait time for resource readiness in E2E tests.
- Introduced new rollout resource management function in the kubectl package.

coderabbitai · 2025-01-10T15:40:54Z

📝 Walkthrough

Walkthrough

This pull request introduces enhancements to the end-to-end (E2E) testing infrastructure for the Vald project. The changes focus on adding configurability to K3D storage setup, creating a new rollout agent configuration, and implementing a new E2E test scenario for agent rollout restart. The modifications span multiple files, including GitHub Actions workflows, Makefiles, Kubernetes client implementations, and test scripts, with the primary goal of improving testing capabilities and system resilience.

Changes

File	Change Summary
`.github/actions/setup-e2e/action.yaml`	Added new input `require_k3d_storage` to control K3D storage setup
`.github/actions/setup-k3d/action.yaml`	Introduced `storage` input to enable local-path-storage deployment
`.github/helm/values/values-rollout-agent.yaml`	New configuration file for rollout agent with comprehensive settings
`.github/workflows/e2e.yaml`	Added new job `e2e-stream-crud-with-rollout-restart-agent` for E2E testing
`Makefile`	Added `E2E_SEARCH_CONCURRENCY` and `E2E_WAIT_FOR_RESOURCE_READY` variables for E2E testing configurations
`Makefile.d/e2e.mk`	Created new targets `e2e/standard/rollout/restart/agent`, `e2e/stream/rollout/restart/agent`, and `e2e/large/search` for specific E2E tests
`Makefile.d/functions.mk`	Updated `run-e2e-crud-test` function with new parameters for concurrency and resource readiness
`tests/e2e/crud/crud_test.go`	Added `waitResourceReadyDuration` and new tests `TestE2EAgentRolloutRestart` and `TestE2EHighConcurrencyMultiSearch`
`tests/e2e/kubernetes/client/client.go`	Introduced `WaitForStatefulSetReady` method to check StatefulSet readiness
`tests/e2e/kubernetes/kubectl/kubectl.go`	Added `RolloutResourceName` function for resource rollout management
`tests/e2e/operation/operation.go`	Increased gRPC connection timeout from 5s to 60s
`tests/e2e/operation/stream.go`	Enhanced error handling in various methods with new error variables and mutex usage
`.github/workflows/dockers-*.yaml`	Added `platforms: linux/amd64,linux/arm64` parameter to multiple Docker image build workflows
`dockers/*/Dockerfile`	Reformatted `ENTRYPOINT` lines in various Dockerfiles without functional changes

Suggested Labels

priority/medium, size/XXXL, actions/backport/release/v1.7

Suggested Reviewers

kpango
kmrmt

Tip

🌐 Web search-backed reviews and chat

We have enabled web search-based reviews and chat for all users. This feature allows CodeRabbit to access the latest documentation and information on the web.
You can disable this feature by setting web_search: false in the knowledge_base settings.
Please share any feedback in the Discord discussion.

✨ Finishing Touches

📝 Generate Docstrings (Beta)

Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

❤️ Share

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

Review comments: Directly reply to a review comment made by CodeRabbit. Example:
- I pushed a fix in commit <commit_id>, please review it.
- Generate unit testing code for this file.
- Open a follow-up GitHub issue for this discussion.
Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
- @coderabbitai generate unit testing code for this file.
- @coderabbitai modularize this function.
PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
- @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
- @coderabbitai read src/utils.ts and generate unit testing code.
- @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
- @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

@coderabbitai pause to pause the reviews on a PR.
@coderabbitai resume to resume the paused reviews.
@coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
@coderabbitai full review to do a full review from scratch and review all the files again.
@coderabbitai summary to regenerate the summary of the PR.
@coderabbitai generate docstrings to generate docstrings for this PR. (Beta)
@coderabbitai resolve resolve all the CodeRabbit review comments.
@coderabbitai configuration to show the current CodeRabbit configuration for the repository.
@coderabbitai help to get help.

Other keywords and placeholders

Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (`.coderabbit.yaml`)

You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
Please see the configuration documentation for more information.
If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

Visit our Documentation for detailed information on how to use CodeRabbit.
Join our Discord Community to get help, request features, and share feedback.
Follow us on X/Twitter for updates and announcements.

vdaas-ci · 2025-01-10T15:41:25Z

[CHATOPS:HELP] ChatOps commands.

🙆‍♀️ /approve - approve
🍱 /format - format codes and add licenses
✅ /gen-test - generate test codes
🏷️ /label - add labels
🔚 2️⃣ 🔚 /label actions/e2e-deploy - run E2E deploy & integration test

cloudflare-workers-and-pages · 2025-01-10T15:45:07Z

Deploying vald with Cloudflare Pages

Latest commit:	`caa6b5a`
Status:	✅ Deploy successful!
Preview URL:	https://aad08764.vald.pages.dev
Branch Preview URL:	https://test-e2e-add-rolloutrestart.vald.pages.dev

View logs

vankichi · 2025-01-15T02:40:48Z

@coderabbitai review

coderabbitai · 2025-01-15T02:40:53Z

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (4)

tests/e2e/kubernetes/kubectl/kubectl.go (1)
45-86: LGTM! Well-structured implementation of resource rollout management.

The implementation is robust with:

Proper error handling and resource cleanup with deferred Close() calls

Concurrent output handling using goroutines and bufio.Scanner

Clear command construction with timeout parameter

However, consider adding debug logging using t.Logf for better test observability.
 func RolloutResourceName(
 	ctx context.Context, t *testing.T, resource string, name string, timeout string,
 ) error {
 	t.Helper()
+	t.Logf("Rolling out resource %s/%s with timeout %s", resource, name, timeout)
 	cmd := exec.CommandContext(ctx, "kubectl", "rollout", "restart", resource, name)
 	if err := runCmd(t, cmd); err != nil {
+		t.Logf("Failed to restart resource: %v", err)
 		return err
 	}

 	r := strings.Join([]string{resource, name}, "/")
 	to := strings.Join([]string{"--timeout", timeout}, "=")
 	cmd = exec.CommandContext(ctx, "kubectl", "rollout", "status", r, "--watch", to)
tests/e2e/kubernetes/client/client.go (1)
210-233: LGTM! Robust implementation of StatefulSet readiness check.

The implementation includes:

Proper context handling with timeout

Clear readiness condition check

Efficient polling with ticker

Consider adding exponential backoff to reduce API server load.
+import "k8s.io/client-go/util/retry"

 func (cli *client) WaitForStatefulSetReady(
 	ctx context.Context, namespace, name string, timeout time.Duration,
 ) (ok bool, err error) {
 	ctx, cancel := context.WithTimeout(ctx, timeout)
 	defer cancel()

-	tick := time.NewTicker(time.Second)
-	defer tick.Stop()
-
-	for {
+	return true, retry.OnError(retry.DefaultBackoff, func() error {
 		ss, err := cli.clientset.AppsV1().StatefulSets(namespace).Get(ctx, name, metav1.GetOptions{})
 		if err != nil {
-			return false, err
+			return err
 		}
 		if ss.Status.UpdatedReplicas == ss.Status.Replicas && ss.Status.ReadyReplicas == ss.Status.Replicas {
-			return true, nil
+			return nil
 		}
-		select {
-		case <-ctx.Done():
-			return false, ctx.Err()
-		case <-tick.C:
-		}
-	}
+		return fmt.Errorf("statefulset %s/%s not ready", namespace, name)
+	})
 }
Makefile.d/e2e.mk (1)
92-96: LGTM! Well-structured test target for agent rollout restart.

The target follows the established pattern for e2e test targets.

Note: There's a typo in the target name: e2e/rollaout/restart/agent should be e2e/rollout/restart/agent.
-.PHONY: e2e/rollaout/restart/agent
+.PHONY: e2e/rollout/restart/agent
 ## run rollout-restart agent e2e
-e2e/rollout/restart/agent:
+e2e/rollout/restart/agent:
 	$(call run-e2e-crud-test,-run TestE2EAgentRolloutRestart)
tests/e2e/crud/crud_test.go (1)
999-1103: Consider improving error handling and test reliability.

The test implementation has a few areas that could be enhanced:

The sleep duration between search attempts is hardcoded to 10 seconds.

The error aggregation could potentially consume a lot of memory if many errors occur.

The test might not catch all DEADLINE_EXCEEDED errors due to the sleep interval.

Consider these improvements:
-				time.Sleep(10 * time.Second)
+				time.Sleep(time.Second) // More frequent checks to catch errors
+				if serr != nil && errors.Is(serr, context.DeadlineExceeded) {
+					// Early exit on first deadline exceeded error
+					return
+				}
Also, consider adding a timeout context for the entire test to prevent it from running indefinitely if something goes wrong:
-	ctx, cancel := context.WithCancel(context.Background())
+	ctx, cancel := context.WithTimeout(context.Background(), 30*time.Minute)

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 6065fd9 and 6141552.

📒 Files selected for processing (10)

.github/actions/setup-e2e/action.yaml (2 hunks)
.github/actions/setup-k3d/action.yaml (2 hunks)
.github/helm/values/values-rollout-agent.yaml (1 hunks)
.github/workflows/e2e.yaml (2 hunks)
Makefile (1 hunks)
Makefile.d/e2e.mk (1 hunks)
Makefile.d/functions.mk (1 hunks)
tests/e2e/crud/crud_test.go (5 hunks)
tests/e2e/kubernetes/client/client.go (2 hunks)
tests/e2e/kubernetes/kubectl/kubectl.go (2 hunks)

🧰 Additional context used

🪛 GitHub Actions: Run formatter

.github/helm/values/values-rollout-agent.yaml

[error] 2-2: File needs formatting. Copyright year needs to be updated from 2024 to 2025. Please execute make format locally to fix this issue.

🪛 actionlint (1.7.4)

.github/workflows/e2e.yaml

362-362: shellcheck reported issue in this script: SC2086:info:1:42: Double quote to prevent globbing and word splitting

(shellcheck)

377-377: shellcheck reported issue in this script: SC2086:info:12:23: Double quote to prevent globbing and word splitting

(shellcheck)

🔇 Additional comments (12)

tests/e2e/kubernetes/client/client.go (1)

71-75: LGTM! Clear interface definition for StatefulSet readiness check.

The interface method signature is well-defined with appropriate parameters.

Makefile.d/functions.mk (1)

166-166: LGTM! Good addition of resource readiness wait parameter.

The parameter follows the established pattern and integrates well with the existing function.

tests/e2e/crud/crud_test.go (1)

66-67: LGTM! New duration variable for resource readiness.

The addition of waitResourceReadyDuration aligns with the PR's objective to handle agent rolling updates effectively.

.github/helm/values/values-rollout-agent.yaml (1)

34-60: Review agent configuration for production readiness.

The agent configuration has several points to consider:

The terminationGracePeriodSeconds of 600 seconds might be too long for CI/CD environments.

The storage size of 500Mi might be insufficient for production workloads.

The auto index duration limit of 2m might be too short for large datasets.

Please verify these settings in your environment:

.github/actions/setup-e2e/action.yaml (1)

35-38: LGTM! Storage configuration for k3d setup.

The addition of require_k3d_storage parameter and its usage in the k3d setup step is well-implemented and properly documented.

Also applies to: 92-92

.github/actions/setup-k3d/action.yaml (1)

39-42: LGTM! K3d storage deployment configuration.

The implementation of storage deployment is clean and follows the action's existing patterns.

Also applies to: 131-135
.github/workflows/e2e.yaml (5)

351-359: LGTM! Job configuration is well-structured.

The job configuration follows GitHub Actions best practices with appropriate timeout and container setup.

364-368: Good use of persistent storage for test stability.

The require_k3d_storage: true configuration aligns with the PR objective of using persistent volumes and disabling in-memory mode.

376-393: Test configuration looks comprehensive.

The E2E test configuration includes:

Appropriate dataset setup

Balanced operation counts (1000 for search operations, 100 for others)

Reasonable wait duration for index creation (3m)

Proper environment variable passing

🧰 Tools

🪛 actionlint (1.7.4)

377-377: shellcheck reported issue in this script: SC2086:info:12:23: Double quote to prevent globbing and word splitting

(shellcheck)

463-463: Good job including the new test in slack notifications.

The new test job is correctly added to the slack-notification job dependencies.

369-375: Verify the Helm values file for rollout agent configuration.

The deployment uses a specific values file for rollout agent configuration.

Run this script to inspect the Helm values:

✅ Verification successful

✓ Helm values file for rollout agent is properly configured

The values file contains appropriate configurations including:

Persistent volume settings with 500Mi storage and ReadWriteOncePod access mode

Resource allocations for all components

Agent replica configuration with parallel pod management

NGT indexing parameters
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:
#!/bin/bash
# Description: Verify the Helm values file for rollout agent configuration
# Check if the values file exists and inspect its content for PV and memory settings

if [ -f ".github/helm/values/values-rollout-agent.yaml" ]; then
    echo "Checking values-rollout-agent.yaml content:"
    cat .github/helm/values/values-rollout-agent.yaml
else
    echo "Warning: values-rollout-agent.yaml not found"
fi
Length of output: 2351
Makefile (1)

370-370: LGTM! Good addition of resource wait timeout.

The new E2E_WAIT_FOR_RESOURCE_READY variable with a default value of 3m is a good practice for ensuring test stability.

.github/helm/values/values-rollout-agent.yaml

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (6)

tests/e2e/operation/operation.go (1)
204-204: Consider making the timeout configurable.

Since different E2E test scenarios might require different timeout values (e.g., regular operations vs rolling updates), consider making this timeout configurable through an environment variable.

Example implementation:
+ const defaultGRPCTimeout = 60 * time.Second
+
+ func getGRPCTimeout() time.Duration {
+   if timeout := os.Getenv("E2E_GRPC_TIMEOUT"); timeout != "" {
+     if d, err := time.ParseDuration(timeout); err == nil {
+       return d
+     }
+   }
+   return defaultGRPCTimeout
+ }

  func (c *client) getGRPCConn() (*grpc.ClientConn, error) {
    return grpc.NewClient(
      c.host+":"+strconv.Itoa(c.port),
      grpc.WithInsecure(),
      grpc.WithKeepaliveParams(
        keepalive.ClientParameters{
          Time:                time.Second,
-         Timeout:             60 * time.Second,
+         Timeout:             getGRPCTimeout(),
          PermitWithoutStream: true,
        },
      ),
    )
  }
tests/e2e/operation/stream.go (2)
875-885: Consider using defer for mutex unlock.

While the error handling is correct, consider using defer mu.Unlock() right after the lock to prevent potential lock leaks in case of panics.
-				mu.Lock()
-				rerr = ierr
-				mu.Unlock()
+				mu.Lock()
+				defer mu.Unlock()
+				rerr = ierr
924-927: Consider consolidating error handling patterns.

The error handling pattern here differs from the goroutine's error handling. Consider using the same pattern with a local error variable for consistency.
-			mu.Lock()
-			rerr = errors.Join(rerr, err)
-			mu.Unlock()
-			return
+			mu.Lock()
+			defer mu.Unlock()
+			rerr = errors.Join(rerr, err)
+			return
tests/e2e/crud/crud_test.go (3)
999-1103: Comprehensive test implementation with proper concurrency handling.

The test implementation is well-structured with:

Proper context handling with cancellation

Concurrent search operations during rollout

Thread-safe error aggregation

Clean resource management with WaitGroup and channels

However, consider adding a timeout context for the entire test to prevent indefinite hanging.
-	ctx, cancel := context.WithCancel(context.Background())
+	ctx, cancel := context.WithTimeout(context.Background(), 10*time.Minute)
Also, consider adding a constant for the sleep duration:
+	const searchRetryInterval = 10 * time.Second
 				}
-				time.Sleep(10 * time.Second)
+				time.Sleep(searchRetryInterval)
1028-1034: Consider adding retry mechanism with backoff.

The search function could benefit from a retry mechanism with exponential backoff to handle transient failures more gracefully.
+	searchWithRetry := func() error {
+		backoff := time.Second
+		maxBackoff := 30 * time.Second
+		for i := 0; i < 3; i++ {
+			err := searchFunc()
+			if err == nil {
+				return nil
+			}
+			if st, ok := status.FromError(err); ok && st.Code() != codes.DeadlineExceeded {
+				return err
+			}
+			time.Sleep(backoff)
+			backoff *= 2
+			if backoff > maxBackoff {
+				backoff = maxBackoff
+			}
+		}
+		return searchFunc()
+	}
1047-1057: Consider adding metrics or logging for error tracking.

The error handling in the search loop could benefit from metrics or logging to track the frequency and patterns of deadline exceeded errors.
 				err = searchFunc()
 				if err != nil {
 					st, ok := status.FromError(err)
 					if ok && st.Code() == codes.DeadlineExceeded {
+						t.Logf("Search operation timed out: %v", err)
 						_, _, rerr := status.ParseError(err, codes.DeadlineExceeded, "an error occurred")
 						mu.Lock()
 						serr = errors.Join(serr, rerr)
 						mu.Unlock()
 					}
+				} else {
+					t.Log("Search operation completed successfully")
 				}

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between f438c7c and 11cd40d.

📒 Files selected for processing (3)

tests/e2e/crud/crud_test.go (5 hunks)
tests/e2e/operation/operation.go (1 hunks)
tests/e2e/operation/stream.go (3 hunks)

⏰ Context from checks skipped due to timeout of 90000ms (10)

GitHub Check: runner / go build
GitHub Check: runner / go build
GitHub Check: runner / go build
GitHub Check: runner / go build
GitHub Check: runner / go build
GitHub Check: runner / go build
GitHub Check: runner / go build
GitHub Check: runner / go build
GitHub Check: runner / go build
GitHub Check: runner / go build

🔇 Additional comments (5)

tests/e2e/operation/operation.go (1)

204-204: LGTM! Increasing the timeout to handle rolling updates.

The increase from 5s to 60s timeout is a reasonable change to handle potential delays during agent rolling updates and should help prevent DEADLINE_EXCEEDED errors in the E2E tests.

tests/e2e/operation/stream.go (2)

858-869: Improve error handling synchronization.

The addition of mutex synchronization and local error variable improves thread safety when handling errors in the streaming operation.

898-898: LGTM: Thread-safe error aggregation.

The error joining operation is correctly using the local error variable before synchronizing with the shared error.

tests/e2e/crud/crud_test.go (2)

66-67: LGTM: Clear variable naming.

The new duration variables have clear, descriptive names that indicate their purpose.

137-141: LGTM: Proper error handling in initialization.

The initialization of waitResourceReadyDuration includes appropriate error handling.

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (1)

tests/e2e/crud/crud_test.go (1)
1035-1062: Well-implemented concurrent search with proper error handling.

The implementation correctly uses sync primitives (WaitGroup, Mutex) and channels for coordination. The error handling specifically targets DEADLINE_EXCEEDED errors as required.

Consider making the sleep duration (10s) configurable via a flag parameter for better test flexibility:
+var searchRetryInterval time.Duration
+
 func init() {
+    searchRetry := flag.String("search-retry-interval", "10s", "interval between search retries during rollout")
+    searchRetryInterval, err = time.ParseDuration(*searchRetry)
+    if err != nil {
+        panic(err)
+    }
 }
Then update line 1059:
-                time.Sleep(10 * time.Second)
+                time.Sleep(searchRetryInterval)

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between be7b4de and fdc3b87.

📒 Files selected for processing (2)

tests/e2e/crud/crud_test.go (5 hunks)
tests/e2e/operation/operation.go (1 hunks)

🚧 Files skipped from review as they are similar to previous changes (1)

tests/e2e/operation/operation.go

⏰ Context from checks skipped due to timeout of 90000ms (70)

GitHub Check: CodeQL
GitHub Check: runner / go build
GitHub Check: check-format-diff
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: CodeQL
GitHub Check: runner / go build
GitHub Check: check-format-diff
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: CodeQL
GitHub Check: runner / go build
GitHub Check: check-format-diff
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: CodeQL
GitHub Check: runner / go build
GitHub Check: check-format-diff
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: CodeQL
GitHub Check: runner / go build
GitHub Check: check-format-diff
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: CodeQL
GitHub Check: runner / go build
GitHub Check: check-format-diff
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: CodeQL
GitHub Check: runner / go build
GitHub Check: check-format-diff
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: CodeQL
GitHub Check: runner / go build
GitHub Check: check-format-diff
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: CodeQL
GitHub Check: runner / go build
GitHub Check: check-format-diff
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: CodeQL
GitHub Check: runner / go build
GitHub Check: check-format-diff
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build

🔇 Additional comments (3)

tests/e2e/crud/crud_test.go (3)

66-67: LGTM! Well-structured variable initialization.

The addition of waitResourceReadyDuration with proper error handling and a reasonable default timeout value of 3 minutes is appropriate for managing resource readiness checks.

Also applies to: 101-141

1071-1099: Comprehensive post-rollout verification.

The test properly verifies the system state after rollout by:

Checking index information

Verifying object existence

Testing object retrieval

Confirming remove operations

1064-1069: Verify rollout restart command execution.

The implementation uses kubectl for rollout restart. Let's verify the command execution and logging.

✅ Verification successful

Implementation of rollout restart command is correct and complete

The kubectl rollout restart implementation includes proper command execution, error handling, status monitoring with timeout, and appropriate logging. The code follows best practices for both the command execution and test implementation.
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:
#!/bin/bash
# Description: Verify kubectl rollout restart command usage
# Test: Search for kubectl rollout restart command usage
rg -A 5 'kubectl.*rollout.*restart'
Length of output: 916

Script:
#!/bin/bash
# Get the complete implementation of kubectl package functions
ast-grep --pattern 'func RolloutResourceName($$$) {
  $$$
}'

echo "---"

ast-grep --pattern 'func runCmd($$$) {
  $$$
}'
Length of output: 127

Script:
#!/bin/bash
# Search for RolloutResourceName function implementation
rg -A 15 "^func RolloutResourceName" tests/e2e/kubernetes/kubectl/

echo "---"

# Search for runCmd function implementation
rg -A 10 "^func runCmd" tests/e2e/kubernetes/kubectl/
Length of output: 2022

coderabbitai

Actionable comments posted: 3

🧹 Nitpick comments (7)

tests/e2e/operation/multi.go (1)

33-38: LGTM! Consider using a constant for the timeout value.

The search configuration changes look good. The increased number of results and timeout settings align well with testing during agent rolling updates.

Consider extracting the timeout duration to a constant at the package level for better maintainability:
+const defaultSearchTimeout = 3 * time.Second

 func (c *client) MultiSearch(t *testing.T, ctx context.Context, ds Dataset) error {
-    to := time.Second * 3
+    to := defaultSearchTimeout

tests/e2e/kubernetes/kubectl/kubectl.go (1)

45-86: Consider improving goroutine management and error handling.

The implementation looks good overall, but there are a few areas for improvement:

The goroutines for output streaming could leak if cmd.Start() fails.
Error messages from stderr could be more structured for better debugging.

Consider this improved implementation:

 func RolloutResourceName(ctx context.Context, t *testing.T, resource string, name string, timeout string) error {
     t.Helper()
     cmd := exec.CommandContext(ctx, "kubectl", "rollout", "restart", resource, name)
     if err := runCmd(t, cmd); err != nil {
         return err
     }

     r := strings.Join([]string{resource, name}, "/")
     to := strings.Join([]string{"--timeout", timeout}, "=")
     cmd = exec.CommandContext(ctx, "kubectl", "rollout", "status", r, "--watch", to)
     stdout, err := cmd.StdoutPipe()
     if err != nil {
         return err
     }
     defer stdout.Close()

     stderr, err := cmd.StderrPipe()
     if err != nil {
         return err
     }
     defer stderr.Close()

+    errCh := make(chan error, 2)
+    done := make(chan struct{})
+    defer close(done)
+
     if err := cmd.Start(); err != nil {
         return err
     }
-    go func() {
+    go func() {
         scanner := bufio.NewScanner(stdout)
         for scanner.Scan() {
             fmt.Println(scanner.Text())
         }
+        if err := scanner.Err(); err != nil {
+            select {
+            case errCh <- fmt.Errorf("stdout scanner error: %w", err):
+            case <-done:
+            }
+        }
     }()
-    go func() {
+    go func() {
         scanner := bufio.NewScanner(stderr)
+        var errMsgs []string
         for scanner.Scan() {
-            fmt.Println("Error:", scanner.Text())
+            errMsg := scanner.Text()
+            errMsgs = append(errMsgs, errMsg)
+            fmt.Println("Error:", errMsg)
         }
+        if err := scanner.Err(); err != nil {
+            select {
+            case errCh <- fmt.Errorf("stderr scanner error: %w", err):
+            case <-done:
+            }
+        }
+        if len(errMsgs) > 0 {
+            select {
+            case errCh <- fmt.Errorf("kubectl errors:\n%s", strings.Join(errMsgs, "\n")):
+            case <-done:
+            }
+        }
     }()

-    return cmd.Wait()
+    err = cmd.Wait()
+    if err != nil {
+        return err
+    }
+
+    select {
+    case err := <-errCh:
+        return err
+    default:
+        return nil
+    }
 }

This improved version:

Properly manages goroutine lifecycles
Collects and structures error messages
Handles scanner errors
Returns a more detailed error message when kubectl fails

tests/e2e/kubernetes/client/client.go (1)

71-75: Align error handling with existing patterns.

The implementation looks good, but consider aligning the error handling with the existing WaitForPodReady method for consistency.

Consider this more consistent implementation:

 func (cli *client) WaitForStatefulSetReady(
     ctx context.Context, namespace, name string, timeout time.Duration,
 ) (ok bool, err error) {
     ctx, cancel := context.WithTimeout(ctx, timeout)
     defer cancel()

     tick := time.NewTicker(time.Second)
     defer tick.Stop()

     for {
         ss, err := cli.clientset.AppsV1().StatefulSets(namespace).Get(ctx, name, metav1.GetOptions{})
-        if err != nil {
-            return false, err
-        }
+        if err == nil {
+            if ss.Status.UpdatedReplicas == ss.Status.Replicas && ss.Status.ReadyReplicas == ss.Status.Replicas {
+                return true, nil
+            }
+        }

-        if ss.Status.UpdatedReplicas == ss.Status.Replicas && ss.Status.ReadyReplicas == ss.Status.Replicas {
-            return true, nil
-        }
         select {
         case <-ctx.Done():
             return false, ctx.Err()
         case <-tick.C:
+            if err != nil {
+                // Log the error but continue polling
+                continue
+            }
         }
     }
 }

This version:

Aligns with the error handling pattern in WaitForPodReady
Continues polling on transient errors
Maintains better readability with error handling at the end of the loop

Also applies to: 210-233

.github/workflows/e2e.yaml (1)

351-393: LGTM! Consider adding documentation for the new test scenario.

The new E2E test job is well-structured and follows the established patterns. The configuration properly handles the requirements for testing agent rollout restart scenarios.

Consider adding a comment block above the job to document:

The purpose of this specific test scenario

Why K3D storage is required

What the values-rollout-agent.yaml configures

Expected behavior during the test

Example:
  # Test agent rollout restart behavior:
  # - Requires K3D storage for persistent volumes
  # - Uses values-rollout-agent.yaml to configure agent StatefulSet
  # - Performs rollout restart after index creation
  # - Verifies search operations during agent pod updates
  e2e-stream-crud-with-rollout-restart-agent:

tests/e2e/operation/stream.go (2)

71-79: Improved timeout configuration using time.Duration

The change from hardcoded nanoseconds to a more readable duration improves code maintainability.

Consider making the timeout duration configurable through a parameter instead of hardcoding 3 seconds:
-to := time.Second * 3
+func (c *client) Search(t *testing.T, ctx context.Context, ds Dataset, timeout time.Duration) error {
+    return c.SearchWithParameters(
+        t,
+        ctx,
+        ds,
+        100,
+        -1.0,
+        0.1,
+        timeout.Nanoseconds(),
+        DefaultStatusValidator,
+        ParseAndLogError,
+    )
+}
653-681: Thread-safe error handling with proper error collection

The error handling is now thread-safe with proper mutex protection and error collection.

Consider extracting the common error handling pattern into a reusable helper function to reduce code duplication across stream operations:
type streamErrorHandler struct {
    mu   sync.Mutex
    rerr error
}

func (h *streamErrorHandler) handleError(err error) {
    h.mu.Lock()
    defer h.mu.Unlock()
    h.rerr = errors.Join(h.rerr, err)
}

func (h *streamErrorHandler) getError() error {
    h.mu.Lock()
    defer h.mu.Unlock()
    return h.rerr
}

tests/e2e/crud/crud_test.go (1)

87-88: Configuration improvements for test parameters

Good changes to make search parameters configurable, but the default values could use documentation.

Add comments explaining the reasoning behind the default values:
+// searchByIDNum is set to 3 to minimize test duration while still providing adequate coverage
 flag.IntVar(&searchByIDNum, "search-by-id-num", 3, "number of id-vector pairs used for search-by-id")
+// searchConcurrency controls the number of concurrent search operations to stress test the system
 flag.IntVar(&searchConcurrency, "search-conn", 100, "number of search concurrency")

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 625c035 and 40a3ccc.

⛔ Files ignored due to path filters (1)

hack/docker/gen/main.go is excluded by !**/gen/**

📒 Files selected for processing (60)

.github/actions/setup-e2e/action.yaml (2 hunks)
.github/actions/setup-k3d/action.yaml (2 hunks)
.github/helm/values/values-rollout-agent.yaml (1 hunks)
.github/workflows/dockers-agent-faiss-image.yaml (1 hunks)
.github/workflows/dockers-agent-image.yaml (1 hunks)
.github/workflows/dockers-agent-ngt-image.yaml (1 hunks)
.github/workflows/dockers-agent-sidecar-image.yaml (1 hunks)
.github/workflows/dockers-benchmark-job-image.yaml (1 hunks)
.github/workflows/dockers-benchmark-operator-image.yaml (1 hunks)
.github/workflows/dockers-dev-container-image.yaml (1 hunks)
.github/workflows/dockers-discoverer-k8s-image.yaml (1 hunks)
.github/workflows/dockers-example-client-image.yaml (1 hunks)
.github/workflows/dockers-gateway-filter-image.yaml (1 hunks)
.github/workflows/dockers-gateway-lb-image.yaml (1 hunks)
.github/workflows/dockers-gateway-mirror-image.yaml (1 hunks)
.github/workflows/dockers-helm-operator-image.yaml (1 hunks)
.github/workflows/dockers-index-correction-image.yaml (1 hunks)
.github/workflows/dockers-index-creation-image.yaml (1 hunks)
.github/workflows/dockers-index-deletion-image.yaml (1 hunks)
.github/workflows/dockers-index-operator-image.yaml (1 hunks)
.github/workflows/dockers-index-save-image.yaml (1 hunks)
.github/workflows/dockers-manager-index-image.yaml (1 hunks)
.github/workflows/dockers-readreplica-rotate-image.yaml (1 hunks)
.github/workflows/e2e.yaml (2 hunks)
Makefile (2 hunks)
Makefile.d/e2e.mk (1 hunks)
Makefile.d/functions.mk (2 hunks)
dockers/agent/core/agent/Dockerfile (1 hunks)
dockers/agent/core/faiss/Dockerfile (1 hunks)
dockers/agent/core/ngt/Dockerfile (1 hunks)
dockers/agent/sidecar/Dockerfile (1 hunks)
dockers/binfmt/Dockerfile (1 hunks)
dockers/buildbase/Dockerfile (1 hunks)
dockers/buildkit/Dockerfile (1 hunks)
dockers/buildkit/syft/scanner/Dockerfile (1 hunks)
dockers/ci/base/Dockerfile (1 hunks)
dockers/dev/Dockerfile (1 hunks)
dockers/discoverer/k8s/Dockerfile (1 hunks)
dockers/example/client/Dockerfile (1 hunks)
dockers/gateway/filter/Dockerfile (1 hunks)
dockers/gateway/lb/Dockerfile (1 hunks)
dockers/gateway/mirror/Dockerfile (1 hunks)
dockers/index/job/correction/Dockerfile (1 hunks)
dockers/index/job/creation/Dockerfile (1 hunks)
dockers/index/job/deletion/Dockerfile (1 hunks)
dockers/index/job/readreplica/rotate/Dockerfile (1 hunks)
dockers/index/job/save/Dockerfile (1 hunks)
dockers/index/operator/Dockerfile (1 hunks)
dockers/manager/index/Dockerfile (1 hunks)
dockers/operator/helm/Dockerfile (1 hunks)
dockers/tools/benchmark/job/Dockerfile (1 hunks)
dockers/tools/benchmark/operator/Dockerfile (1 hunks)
dockers/tools/cli/loadtest/Dockerfile (1 hunks)
pkg/agent/core/ngt/service/ngt.go (1 hunks)
tests/e2e/crud/crud_test.go (6 hunks)
tests/e2e/kubernetes/client/client.go (2 hunks)
tests/e2e/kubernetes/kubectl/kubectl.go (2 hunks)
tests/e2e/operation/multi.go (3 hunks)
tests/e2e/operation/operation.go (1 hunks)
tests/e2e/operation/stream.go (18 hunks)

🚧 Files skipped from review as they are similar to previous changes (53)

dockers/index/job/correction/Dockerfile
dockers/buildkit/syft/scanner/Dockerfile
dockers/buildbase/Dockerfile
dockers/dev/Dockerfile
dockers/gateway/filter/Dockerfile
dockers/buildkit/Dockerfile
dockers/index/operator/Dockerfile
.github/workflows/dockers-dev-container-image.yaml
dockers/tools/benchmark/operator/Dockerfile
dockers/discoverer/k8s/Dockerfile
dockers/gateway/lb/Dockerfile
dockers/gateway/mirror/Dockerfile
dockers/agent/core/agent/Dockerfile
dockers/manager/index/Dockerfile
dockers/agent/core/faiss/Dockerfile
dockers/index/job/deletion/Dockerfile
dockers/binfmt/Dockerfile
.github/workflows/dockers-index-creation-image.yaml
.github/workflows/dockers-agent-ngt-image.yaml
dockers/index/job/creation/Dockerfile
.github/workflows/dockers-discoverer-k8s-image.yaml
.github/workflows/dockers-agent-faiss-image.yaml
dockers/tools/benchmark/job/Dockerfile
.github/workflows/dockers-agent-image.yaml
.github/workflows/dockers-readreplica-rotate-image.yaml
.github/workflows/dockers-index-deletion-image.yaml
.github/workflows/dockers-manager-index-image.yaml
.github/workflows/dockers-gateway-lb-image.yaml
.github/workflows/dockers-index-operator-image.yaml
dockers/example/client/Dockerfile
Makefile
dockers/tools/cli/loadtest/Dockerfile
.github/workflows/dockers-benchmark-job-image.yaml
pkg/agent/core/ngt/service/ngt.go
.github/workflows/dockers-example-client-image.yaml
tests/e2e/operation/operation.go
.github/workflows/dockers-agent-sidecar-image.yaml
.github/workflows/dockers-gateway-mirror-image.yaml
.github/workflows/dockers-benchmark-operator-image.yaml
.github/actions/setup-e2e/action.yaml
.github/workflows/dockers-index-save-image.yaml
.github/workflows/dockers-helm-operator-image.yaml
.github/actions/setup-k3d/action.yaml
dockers/index/job/readreplica/rotate/Dockerfile
dockers/ci/base/Dockerfile
dockers/agent/core/ngt/Dockerfile
.github/helm/values/values-rollout-agent.yaml
dockers/index/job/save/Dockerfile
dockers/agent/sidecar/Dockerfile
Makefile.d/functions.mk
Makefile.d/e2e.mk
dockers/operator/helm/Dockerfile
.github/workflows/dockers-index-correction-image.yaml

🔇 Additional comments (3)

.github/workflows/dockers-gateway-filter-image.yaml (1)

262-262: LGTM! Good addition of multi-architecture support.

Adding support for both AMD64 and ARM64 architectures is a positive change that improves cross-platform compatibility of the gateway-filter component.

Let's verify that the shared workflow template supports this parameter:

✅ Verification successful

Multi-architecture build support is properly configured and supported

The shared workflow template fully supports the platforms parameter and has the necessary infrastructure setup for multi-architecture builds using Docker Buildx.
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:
#!/bin/bash
# Description: Verify that the shared workflow template supports the platforms parameter

# Check if the _docker-image.yaml workflow accepts the platforms parameter
rg -A 5 "platforms:" .github/workflows/_docker-image.yaml

# Verify that the GitHub Actions runners support multi-arch builds
rg -A 5 "runs-on:" .github/workflows/_docker-image.yaml
Length of output: 1623
tests/e2e/operation/multi.go (1)

61-61: LGTM! Useful logging addition.

The log statement helps track response counts during the E2E test, which is valuable for debugging.

.github/workflows/e2e.yaml (1)

463-463: LGTM! Slack notification dependency added correctly.

The new job is properly added to the slack-notification job dependencies.

coderabbitai · 2025-01-28T03:32:11Z

tests/e2e/operation/stream.go

+			mu.Lock()
+			rerr = errors.Join(err, err)
+			mu.Unlock()
+			return


⚠️ Potential issue

Fix redundant error joining

The error is being joined with itself, which is redundant and potentially confusing.

Apply this fix to all similar occurrences:

-rerr = errors.Join(err, err) +rerr = err

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

mu.Lock()

rerr = errors.Join(err, err)

mu.Unlock()

return

mu.Lock()

rerr = err

mu.Unlock()

return

coderabbitai · 2025-01-28T03:32:11Z

tests/e2e/crud/crud_test.go

+
+	_ = op.Upsert(t, ctx, operation.Dataset{
+		Train: ds.Train[insertFrom : insertFrom+insertNum],
+	})
+	if err != nil {
+		t.Fatalf("an error occurred: %s", err)
+	}
+
+	sleep(t, waitAfterInsertDuration)
+
+	searchFunc := func(ctx context.Context) error {
+		return op.MultiSearch(t, ctx, operation.Dataset{
+			Test:      ds.Test[searchFrom : searchFrom+searchNum],
+			Neighbors: ds.Neighbors[searchFrom : searchFrom+searchNum],
+		})
+	}
+
+	wg := sync.WaitGroup{}
+	mu := sync.Mutex{}
+	var serr error
+	wg.Add(1)
+	done := make(chan struct{})
+	go func() {
+		defer wg.Done()
+		for {
+			select {
+			case <-done:
+				return
+			default:
+				eg, egctx := errgroup.New(ctx)
+				for i := 0; i < searchConcurrency; i++ {
+					eg.Go(func() (e error) {
+						ierr := searchFunc(egctx)
+						if ierr != nil {
+							st, ok := status.FromError(ierr)
+							if ok && st.Code() == codes.DeadlineExceeded {
+								_, _, rerr := status.ParseError(ierr, codes.DeadlineExceeded, "an error occurred")
+								mu.Lock()
+								e = errors.Join(e, rerr)
+								mu.Unlock()
+							}
+						}
+						return
+					})
+				}
+				egerr := eg.Wait()
+				mu.Lock()
+				serr = errors.Join(serr, egerr)
+				mu.Unlock()
+				time.Sleep(5 * time.Second)
+			}
+		}
+	}()
+
+	// Wait for StatefulSet to be ready
+	t.Log("rollout restart agent and waiting for agent pods ready...")
+	err = kubectl.RolloutResourceName(ctx, t, "statefulset", "vald-agent", waitResourceReadyDuration.String())
+	if err != nil {
+		t.Fatalf("an error occurred: %s", err)
+	}
+
+	cnt, err := op.IndexInfo(t, ctx)
+	if err != nil {
+		if cnt == nil {
+			t.Fatalf("an error occurred: err = %s", err)
+		}
+		t.Fatalf("an error occurred: count = %d, err = %s", cnt.Stored, err)
+	}
+
+	err = op.Exists(t, ctx, "0")
+	if err != nil {
+		t.Fatalf("an error occurred: %s", err)
+	}
+
+	err = op.GetObject(t, ctx, operation.Dataset{
+		Train: ds.Train[getObjectFrom : getObjectFrom+getObjectNum],
+	})
+	if err != nil {
+		t.Fatalf("an error occurred: %s", err)
+	}
+
+	err = op.Remove(t, ctx, operation.Dataset{
+		Train: ds.Train[removeFrom : removeFrom+removeNum],
+	})
+	if err != nil {
+		t.Fatalf("an error occurred: %s", err)
+	}
+
+	// Remove all vector data after the current - 1 hour.
+	err = op.RemoveByTimestamp(t, ctx, time.Now().Add(-time.Hour).UnixNano())
+	if err != nil {
+		t.Fatalf("an error occurred: %s", err)
+	}
+
+	close(done)
+	wg.Wait()
+	if serr != nil {
+		t.Fatalf("an error occurred: %s", serr.Error())
+	}
+}


🛠️ Refactor suggestion

Reduce code duplication in test implementations

The test shares significant code with TestE2EAgentRolloutRestart.

Extract common test patterns into helper functions:

type searchTestConfig struct { searchFunc func(context.Context) error concurrency int waitDuration time.Duration cleanupSteps []func() error } func runConcurrentSearchTest(t *testing.T, cfg searchTestConfig) error { ctx, cancel := context.WithTimeout(context.Background(), 10*time.Minute) defer cancel() var ( wg sync.WaitGroup mu sync.Mutex serr error ) done := make(chan struct{}) // Run concurrent searches wg.Add(1) go func() { defer wg.Done() // ... existing search loop ... }() // Run cleanup steps for _, cleanup := range cfg.cleanupSteps { if err := cleanup(); err != nil { return err } } close(done) wg.Wait() return serr }

Then use it in both tests:

func TestE2EHighConcurrencyMultiSearch(t *testing.T) { // ... setup ... err := runConcurrentSearchTest(t, searchTestConfig{ searchFunc: searchFunc, concurrency: searchConcurrency, waitDuration: 5 * time.Second, cleanupSteps: []func() error{ // ... cleanup steps ... }, }) if err != nil { t.Fatal(err) } }

coderabbitai · 2025-01-28T03:32:11Z

tests/e2e/crud/crud_test.go

+	_ = op.Upsert(t, ctx, operation.Dataset{
+		Train: ds.Train[insertFrom : insertFrom+insertNum],
+	})
+	if err != nil {
+		t.Fatalf("an error occurred: %s", err)
+	}
+
+	sleep(t, waitAfterInsertDuration)
+
+	searchFunc := func(ctx context.Context) error {
+		return op.Search(t, ctx, operation.Dataset{
+			Test:      ds.Test[searchFrom : searchFrom+searchNum],
+			Neighbors: ds.Neighbors[searchFrom : searchFrom+searchNum],
+		})
+	}
+
+	wg := sync.WaitGroup{}
+	mu := sync.Mutex{}
+	done := make(chan struct{})
+	var serr error
+
+	wg.Add(1)
+	go func() {
+		defer wg.Done()
+		for {
+			select {
+			case <-done:
+				return
+			default:
+				eg, egctx := errgroup.New(ctx)
+				for i := 0; i < searchConcurrency; i++ {
+					eg.Go(func() (e error) {
+						ierr := searchFunc(egctx)
+						if ierr != nil {
+							st, ok := status.FromError(ierr)
+							if ok && st.Code() == codes.DeadlineExceeded {
+								_, _, rerr := status.ParseError(ierr, codes.DeadlineExceeded, "an error occurred")
+								mu.Lock()
+								e = errors.Join(e, rerr)
+								mu.Unlock()
+							}
+						}
+						return
+					})
+				}
+				egerr := eg.Wait()
+				mu.Lock()
+				serr = errors.Join(serr, egerr)
+				mu.Unlock()
+				time.Sleep(10 * time.Second)
+			}
+		}
+	}()
+
+	// Wait for StatefulSet to be ready
+	t.Log("rollout restart agent and waiting for agent pods ready...")
+	err = kubectl.RolloutResourceName(ctx, t, "statefulset", "vald-agent", waitResourceReadyDuration.String())
+	if err != nil {
+		t.Fatalf("an error occurred: %s", err)
+	}
+
+	cnt, err := op.IndexInfo(t, ctx)
+	if err != nil {
+		if cnt == nil {
+			t.Fatalf("an error occurred: err = %s", err)
+		}
+		t.Fatalf("an error occurred: count = %d, err = %s", cnt.Stored, err)
+	}
+
+	err = op.Exists(t, ctx, "0")
+	if err != nil {
+		t.Fatalf("an error occurred: %s", err)
+	}
+
+	err = op.GetObject(t, ctx, operation.Dataset{
+		Train: ds.Train[getObjectFrom : getObjectFrom+getObjectNum],
+	})
+	if err != nil {
+		t.Fatalf("an error occurred: %s", err)
+	}
+
+	err = op.Remove(t, ctx, operation.Dataset{
+		Train: ds.Train[removeFrom : removeFrom+removeNum],
+	})
+	if err != nil {
+		t.Fatalf("an error occurred: %s", err)
+	}
+
+	// Remove all vector data after the current - 1 hour.
+	err = op.RemoveByTimestamp(t, ctx, time.Now().Add(-time.Hour).UnixNano())
+	if err != nil {
+		t.Fatalf("an error occurred: %s", err)
+	}
+
+	close(done)
+	wg.Wait()
+	if serr != nil {
+		t.Fatalf("an error occurred: %s", serr)
+	}
+}


🛠️ Refactor suggestion

Improve test robustness and cleanup

The test implementation is good but could be improved in several areas.

Consider these improvements:

Add timeout context to prevent test from hanging:

-ctx, cancel := context.WithCancel(context.Background()) +ctx, cancel := context.WithTimeout(context.Background(), 10*time.Minute)

Ensure proper cleanup of resources:

t.Cleanup(teardown) +t.Cleanup(func() { + cancel() + close(done) + wg.Wait() +})

Add test failure conditions for specific scenarios:

if serr != nil { + if strings.Contains(serr.Error(), "DeadlineExceeded") { + t.Fatal("Search operations timed out during agent rollout") + } t.Fatalf("an error occurred: %s", serr) }

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

// TestE2EAgentRolloutRestart tests that search requests succeed with rollout restart vald-agent.

func TestE2EAgentRolloutRestart(t *testing.T) {

t.Cleanup(teardown)

if kubeClient == nil {

var err error

kubeClient, err = client.New(kubeConfig)

if err != nil {

t.Skipf("TestE2EReadReplica needs kubernetes client but failed to create one: %s", err)

}

}

ctx, cancel := context.WithCancel(context.Background())

defer cancel()

op, err := operation.New(host, port)

if err != nil {

t.Fatalf("an error occurred: %s", err)

}

_ = op.Upsert(t, ctx, operation.Dataset{

Train: ds.Train[insertFrom : insertFrom+insertNum],

})

if err != nil {

t.Fatalf("an error occurred: %s", err)

}

sleep(t, waitAfterInsertDuration)

searchFunc := func(ctx context.Context) error {

return op.Search(t, ctx, operation.Dataset{

Test: ds.Test[searchFrom : searchFrom+searchNum],

Neighbors: ds.Neighbors[searchFrom : searchFrom+searchNum],

})

}

wg := sync.WaitGroup{}

mu := sync.Mutex{}

done := make(chan struct{})

var serr error

wg.Add(1)

go func() {

defer wg.Done()

for {

select {

case <-done:

return

default:

eg, egctx := errgroup.New(ctx)

for i := 0; i < searchConcurrency; i++ {

eg.Go(func() (e error) {

ierr := searchFunc(egctx)

if ierr != nil {

st, ok := status.FromError(ierr)

if ok && st.Code() == codes.DeadlineExceeded {

_, _, rerr := status.ParseError(ierr, codes.DeadlineExceeded, "an error occurred")

mu.Lock()

e = errors.Join(e, rerr)

mu.Unlock()

}

}

return

})

}

egerr := eg.Wait()

mu.Lock()

serr = errors.Join(serr, egerr)

mu.Unlock()

time.Sleep(10 * time.Second)

}

}

}()

// Wait for StatefulSet to be ready

t.Log("rollout restart agent and waiting for agent pods ready...")

err = kubectl.RolloutResourceName(ctx, t, "statefulset", "vald-agent", waitResourceReadyDuration.String())

if err != nil {

t.Fatalf("an error occurred: %s", err)

}

cnt, err := op.IndexInfo(t, ctx)

if err != nil {

if cnt == nil {

t.Fatalf("an error occurred: err = %s", err)

}

t.Fatalf("an error occurred: count = %d, err = %s", cnt.Stored, err)

}

err = op.Exists(t, ctx, "0")

if err != nil {

t.Fatalf("an error occurred: %s", err)

}

err = op.GetObject(t, ctx, operation.Dataset{

Train: ds.Train[getObjectFrom : getObjectFrom+getObjectNum],

})

if err != nil {

t.Fatalf("an error occurred: %s", err)

}

err = op.Remove(t, ctx, operation.Dataset{

Train: ds.Train[removeFrom : removeFrom+removeNum],

})

if err != nil {

t.Fatalf("an error occurred: %s", err)

}

// Remove all vector data after the current - 1 hour.

err = op.RemoveByTimestamp(t, ctx, time.Now().Add(-time.Hour).UnixNano())

if err != nil {

t.Fatalf("an error occurred: %s", err)

}

close(done)

wg.Wait()

if serr != nil {

t.Fatalf("an error occurred: %s", serr)

}

}

// TestE2EAgentRolloutRestart tests that search requests succeed with rollout restart vald-agent.

func TestE2EAgentRolloutRestart(t *testing.T) {

t.Cleanup(teardown)

if kubeClient == nil {

var err error

kubeClient, err = client.New(kubeConfig)

if err != nil {

t.Skipf("TestE2EReadReplica needs kubernetes client but failed to create one: %s", err)

}

}

ctx, cancel := context.WithTimeout(context.Background(), 10*time.Minute)

defer cancel()

t.Cleanup(func() {

cancel()

close(done)

wg.Wait()

})

op, err := operation.New(host, port)

if err != nil {

t.Fatalf("an error occurred: %s", err)

}

_ = op.Upsert(t, ctx, operation.Dataset{

Train: ds.Train[insertFrom : insertFrom+insertNum],

})

if err != nil {

t.Fatalf("an error occurred: %s", err)

}

sleep(t, waitAfterInsertDuration)

searchFunc := func(ctx context.Context) error {

return op.Search(t, ctx, operation.Dataset{

Test: ds.Test[searchFrom : searchFrom+searchNum],

Neighbors: ds.Neighbors[searchFrom : searchFrom+searchNum],

})

}

wg := sync.WaitGroup{}

mu := sync.Mutex{}

done := make(chan struct{})

var serr error

wg.Add(1)

go func() {

defer wg.Done()

for {

select {

case <-done:

return

default:

eg, egctx := errgroup.New(ctx)

for i := 0; i < searchConcurrency; i++ {

eg.Go(func() (e error) {

ierr := searchFunc(egctx)

if ierr != nil {

st, ok := status.FromError(ierr)

if ok && st.Code() == codes.DeadlineExceeded {

_, _, rerr := status.ParseError(ierr, codes.DeadlineExceeded, "an error occurred")

mu.Lock()

e = errors.Join(e, rerr)

mu.Unlock()

}

}

return

})

}

egerr := eg.Wait()

mu.Lock()

serr = errors.Join(serr, egerr)

mu.Unlock()

time.Sleep(10 * time.Second)

}

}

}()

// Wait for StatefulSet to be ready

t.Log("rollout restart agent and waiting for agent pods ready...")

err = kubectl.RolloutResourceName(ctx, t, "statefulset", "vald-agent", waitResourceReadyDuration.String())

if err != nil {

t.Fatalf("an error occurred: %s", err)

}

cnt, err := op.IndexInfo(t, ctx)

if err != nil {

if cnt == nil {

t.Fatalf("an error occurred: err = %s", err)

}

t.Fatalf("an error occurred: count = %d, err = %s", cnt.Stored, err)

}

err = op.Exists(t, ctx, "0")

if err != nil {

t.Fatalf("an error occurred: %s", err)

}

err = op.GetObject(t, ctx, operation.Dataset{

Train: ds.Train[getObjectFrom : getObjectFrom+getObjectNum],

})

if err != nil {

t.Fatalf("an error occurred: %s", err)

}

err = op.Remove(t, ctx, operation.Dataset{

Train: ds.Train[removeFrom : removeFrom+removeNum],

})

if err != nil {

t.Fatalf("an error occurred: %s", err)

}

// Remove all vector data after the current - 1 hour.

err = op.RemoveByTimestamp(t, ctx, time.Now().Add(-time.Hour).UnixNano())

if err != nil {

t.Fatalf("an error occurred: %s", err)

}

close(done)

wg.Wait()

if serr != nil {

if strings.Contains(serr.Error(), "DeadlineExceeded") {

t.Fatal("Search operations timed out during agent rollout")

}

t.Fatalf("an error occurred: %s", serr)

}

}

coderabbitai

Actionable comments posted: 2

🧹 Nitpick comments (4)

tests/e2e/kubernetes/client/client.go (1)
210-233: Consider making the polling interval configurable.

The implementation looks good but could benefit from a configurable polling interval instead of the hardcoded 1-second interval.

Apply this diff to make the polling interval configurable:
 func (cli *client) WaitForStatefulSetReady(
-  ctx context.Context, namespace, name string, timeout time.Duration,
+  ctx context.Context, namespace, name string, timeout time.Duration, pollInterval time.Duration,
 ) (ok bool, err error) {
   ctx, cancel := context.WithTimeout(ctx, timeout)
   defer cancel()
 
-  tick := time.NewTicker(time.Second)
+  if pollInterval <= 0 {
+    pollInterval = time.Second
+  }
+  tick := time.NewTicker(pollInterval)
   defer tick.Stop()
.github/workflows/e2e.yaml (1)
351-393: Consider aligning test parameters with other e2e jobs.

The implementation looks good, but for consistency:

Consider increasing the insert and search counts to 10000 to match other e2e jobs

Consider adding comments to explain the requirement for k3d storage

Apply this diff to align the test parameters:
   make E2E_BIND_PORT=8081 \
     E2E_DATASET_NAME=${{ env.DATASET }} \
-    E2E_INSERT_COUNT=1000 \
-    E2E_SEARCH_COUNT=1000 \
-    E2E_SEARCH_BY_ID_COUNT=1000 \
+    E2E_INSERT_COUNT=10000 \
+    E2E_SEARCH_COUNT=10000 \
+    E2E_SEARCH_BY_ID_COUNT=10000 \
Also, add a comment above the k3d storage requirement:
   uses: ./.github/actions/setup-e2e
   with:
+    # Required for persistent volumes during agent rollout
     require_k3d_storage: true
tests/e2e/operation/stream.go (1)
71-71: Consider making the timeout duration configurable.

The hardcoded 3-second timeout might be too restrictive for large-scale search operations. Consider making it configurable through parameters or environment variables to support different deployment scenarios.
 type client struct {
+    searchTimeout time.Duration
 }

 func (c *client) Search(t *testing.T, ctx context.Context, ds Dataset) error {
-    to := time.Second * 3
+    to := c.searchTimeout
     return c.SearchWithParameters(
Also applies to: 79-79, 273-273, 280-280
tests/e2e/crud/crud_test.go (1)
1002-1121: Comprehensive test for agent rollout restart.

The test effectively validates system behavior during agent rollout restart. However, consider improving error type checking.

Consider extracting the error type checking into a helper function for better reusability:
+func isDeadlineExceeded(err error) bool {
+    if st, ok := status.FromError(err); ok {
+        return st.Code() == codes.DeadlineExceeded
+    }
+    return false
+}

 func TestE2EAgentRolloutRestart(t *testing.T) {
     // ...
     eg.Go(func() (e error) {
         ierr := searchFunc(egctx)
         if ierr != nil {
-            st, ok := status.FromError(ierr)
-            if ok && st.Code() == codes.DeadlineExceeded {
+            if isDeadlineExceeded(ierr) {
                 _, _, rerr := status.ParseError(ierr, codes.DeadlineExceeded, "an error occurred")
                 mu.Lock()
                 e = errors.Join(e, rerr)
                 mu.Unlock()
             }
         }
         return
     })
     // ...
 }

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 625c035 and 40a3ccc.

⛔ Files ignored due to path filters (1)

hack/docker/gen/main.go is excluded by !**/gen/**

📒 Files selected for processing (60)

.github/actions/setup-e2e/action.yaml (2 hunks)
.github/actions/setup-k3d/action.yaml (2 hunks)
.github/helm/values/values-rollout-agent.yaml (1 hunks)
.github/workflows/dockers-agent-faiss-image.yaml (1 hunks)
.github/workflows/dockers-agent-image.yaml (1 hunks)
.github/workflows/dockers-agent-ngt-image.yaml (1 hunks)
.github/workflows/dockers-agent-sidecar-image.yaml (1 hunks)
.github/workflows/dockers-benchmark-job-image.yaml (1 hunks)
.github/workflows/dockers-benchmark-operator-image.yaml (1 hunks)
.github/workflows/dockers-dev-container-image.yaml (1 hunks)
.github/workflows/dockers-discoverer-k8s-image.yaml (1 hunks)
.github/workflows/dockers-example-client-image.yaml (1 hunks)
.github/workflows/dockers-gateway-filter-image.yaml (1 hunks)
.github/workflows/dockers-gateway-lb-image.yaml (1 hunks)
.github/workflows/dockers-gateway-mirror-image.yaml (1 hunks)
.github/workflows/dockers-helm-operator-image.yaml (1 hunks)
.github/workflows/dockers-index-correction-image.yaml (1 hunks)
.github/workflows/dockers-index-creation-image.yaml (1 hunks)
.github/workflows/dockers-index-deletion-image.yaml (1 hunks)
.github/workflows/dockers-index-operator-image.yaml (1 hunks)
.github/workflows/dockers-index-save-image.yaml (1 hunks)
.github/workflows/dockers-manager-index-image.yaml (1 hunks)
.github/workflows/dockers-readreplica-rotate-image.yaml (1 hunks)
.github/workflows/e2e.yaml (2 hunks)
Makefile (2 hunks)
Makefile.d/e2e.mk (1 hunks)
Makefile.d/functions.mk (2 hunks)
dockers/agent/core/agent/Dockerfile (1 hunks)
dockers/agent/core/faiss/Dockerfile (1 hunks)
dockers/agent/core/ngt/Dockerfile (1 hunks)
dockers/agent/sidecar/Dockerfile (1 hunks)
dockers/binfmt/Dockerfile (1 hunks)
dockers/buildbase/Dockerfile (1 hunks)
dockers/buildkit/Dockerfile (1 hunks)
dockers/buildkit/syft/scanner/Dockerfile (1 hunks)
dockers/ci/base/Dockerfile (1 hunks)
dockers/dev/Dockerfile (1 hunks)
dockers/discoverer/k8s/Dockerfile (1 hunks)
dockers/example/client/Dockerfile (1 hunks)
dockers/gateway/filter/Dockerfile (1 hunks)
dockers/gateway/lb/Dockerfile (1 hunks)
dockers/gateway/mirror/Dockerfile (1 hunks)
dockers/index/job/correction/Dockerfile (1 hunks)
dockers/index/job/creation/Dockerfile (1 hunks)
dockers/index/job/deletion/Dockerfile (1 hunks)
dockers/index/job/readreplica/rotate/Dockerfile (1 hunks)
dockers/index/job/save/Dockerfile (1 hunks)
dockers/index/operator/Dockerfile (1 hunks)
dockers/manager/index/Dockerfile (1 hunks)
dockers/operator/helm/Dockerfile (1 hunks)
dockers/tools/benchmark/job/Dockerfile (1 hunks)
dockers/tools/benchmark/operator/Dockerfile (1 hunks)
dockers/tools/cli/loadtest/Dockerfile (1 hunks)
pkg/agent/core/ngt/service/ngt.go (1 hunks)
tests/e2e/crud/crud_test.go (6 hunks)
tests/e2e/kubernetes/client/client.go (2 hunks)
tests/e2e/kubernetes/kubectl/kubectl.go (2 hunks)
tests/e2e/operation/multi.go (3 hunks)
tests/e2e/operation/operation.go (1 hunks)
tests/e2e/operation/stream.go (18 hunks)

🚧 Files skipped from review as they are similar to previous changes (52)

dockers/buildkit/Dockerfile
dockers/index/job/correction/Dockerfile
dockers/buildbase/Dockerfile
dockers/binfmt/Dockerfile
dockers/agent/core/agent/Dockerfile
.github/workflows/dockers-index-correction-image.yaml
.github/workflows/dockers-helm-operator-image.yaml
.github/workflows/dockers-dev-container-image.yaml
.github/workflows/dockers-index-operator-image.yaml
.github/workflows/dockers-index-creation-image.yaml
dockers/operator/helm/Dockerfile
.github/workflows/dockers-manager-index-image.yaml
.github/workflows/dockers-agent-image.yaml
dockers/gateway/lb/Dockerfile
.github/workflows/dockers-index-save-image.yaml
dockers/tools/benchmark/operator/Dockerfile
dockers/dev/Dockerfile
dockers/agent/core/ngt/Dockerfile
dockers/tools/benchmark/job/Dockerfile
.github/workflows/dockers-benchmark-operator-image.yaml
dockers/gateway/mirror/Dockerfile
dockers/agent/core/faiss/Dockerfile
.github/workflows/dockers-gateway-lb-image.yaml
dockers/index/job/save/Dockerfile
Makefile.d/functions.mk
dockers/manager/index/Dockerfile
dockers/ci/base/Dockerfile
pkg/agent/core/ngt/service/ngt.go
.github/actions/setup-e2e/action.yaml
.github/workflows/dockers-benchmark-job-image.yaml
.github/workflows/dockers-agent-faiss-image.yaml
.github/workflows/dockers-index-deletion-image.yaml
.github/workflows/dockers-example-client-image.yaml
.github/workflows/dockers-agent-ngt-image.yaml
.github/workflows/dockers-readreplica-rotate-image.yaml
.github/workflows/dockers-gateway-mirror-image.yaml
.github/workflows/dockers-agent-sidecar-image.yaml
tests/e2e/operation/operation.go
Makefile.d/e2e.mk
dockers/index/job/deletion/Dockerfile
dockers/tools/cli/loadtest/Dockerfile
.github/workflows/dockers-discoverer-k8s-image.yaml
dockers/index/operator/Dockerfile
dockers/index/job/readreplica/rotate/Dockerfile
dockers/example/client/Dockerfile
dockers/discoverer/k8s/Dockerfile
.github/helm/values/values-rollout-agent.yaml
dockers/agent/sidecar/Dockerfile
dockers/index/job/creation/Dockerfile
dockers/buildkit/syft/scanner/Dockerfile
dockers/gateway/filter/Dockerfile
Makefile

⏰ Context from checks skipped due to timeout of 90000ms (150)

GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build

🔇 Additional comments (8)

.github/workflows/dockers-gateway-filter-image.yaml (1)

262-262: LGTM! Good addition of multi-platform support.

The addition of AMD64 and ARM64 platform support enhances deployment flexibility and follows container best practices.

To ensure the multi-platform build works as expected, please verify:

The build completes successfully for both architectures

The resulting images are properly tagged and pushed

The images can be pulled and run on their respective architectures

Run this script to verify the image build and platform support:

✅ Verification successful

Multi-platform support properly configured across all Docker workflows ✅

The configuration is consistently implemented across all 24 Docker workflow files, including base images and build infrastructure, indicating a well-coordinated implementation of multi-architecture support.
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:
#!/bin/bash
# Description: Verify multi-platform image build configuration

# Check if the workflow file references a valid reusable workflow
echo "Checking reusable workflow reference..."
test -f .github/workflows/_docker-image.yaml || echo "Error: Referenced workflow file not found"

# Verify platform configuration in other similar workflow files
echo "Checking platform configuration consistency across workflow files..."
rg -l 'platforms:\s*linux/amd64,linux/arm64' .github/workflows/dockers-*-image.yaml
Length of output: 2909
.github/actions/setup-k3d/action.yaml (2)

39-42: LGTM! Well-defined input parameter.

The new storage input parameter is well-documented and follows the existing parameter style. The default value of "false" ensures backward compatibility.

131-135: Verify the make target and add storage setup validation.

The step looks good but requires verification of the following:

Ensure the k3d/storage make target exists and properly sets up local-path-storage

Consider adding verification of successful storage setup

Let's verify the make target and its implementation:

Consider adding storage validation after the setup:
     - name: Set local path storage
       if: ${{ inputs.storage == 'true' }}
       shell: bash
       run: |
         make k3d/storage
+        # Verify storage setup
+        kubectl wait --for=condition=ready pod -l app=local-path-provisioner -n local-path-storage --timeout=60s
✅ Verification successful

The k3d/storage target exists and is properly implemented, but could benefit from additional validation

The target is correctly defined in Makefile.d/k3d.mk and:

Deploys the official Rancher local-path-provisioner

Verifies StorageClass creation

Sets local-path as the default StorageClass

Consider enhancing the validation as suggested:
.PHONY: k3d/storage
k3d/storage:
	kubectl apply -f https://raw.githubusercontent.com/rancher/local-path-provisioner/master/deploy/local-path-storage.yaml
	kubectl get storageclass
	kubectl patch storageclass local-path -p '{"metadata": {"annotations":{"storageclass.kubernetes.io/is-default-class":"true"}}}'
+	kubectl wait --for=condition=ready pod -l app=local-path-provisioner -n local-path-storage --timeout=60s
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:
#!/bin/bash
# Description: Verify the k3d/storage make target and its implementation

# Check if the make target exists
echo "Checking for k3d/storage make target..."
make -n k3d/storage 2>/dev/null || echo "k3d/storage target not found"

# Look for local-path-storage related files
echo -e "\nSearching for local-path-storage configuration..."
rg -l "local-path-storage" --type yaml
Length of output: 331

Script:
#!/bin/bash
# Check Makefile content and search for storage-related configurations
echo "=== Checking Makefile content ==="
fd Makefile --type f --exec cat {}

echo -e "\n=== Searching for storage-related patterns ==="
rg -l "storage" --type yaml
rg -l "provisioner" --type yaml

echo -e "\n=== Checking k3d-related targets in Makefile ==="
rg "k3d[:/]" Makefile
Length of output: 33547

Script:
#!/bin/bash
# Check k3d-specific Makefile and configurations
echo "=== Checking k3d Makefile includes ==="
cat Makefile.d/k3d.mk

echo -e "\n=== Searching for k3d storage configurations ==="
rg -l "k3d.*storage" --type yaml
rg -l "local-path.*storage" --type yaml
Length of output: 2568
tests/e2e/operation/multi.go (2)

33-38: Enhance search configuration with timeout and increased result count.

The changes improve the robustness of the MultiSearch operation by:

Adding a 3-second timeout to catch potential DEADLINE_EXCEEDED errors during agent rollout

Increasing the number of search results from 3 to 10 for better coverage

61-61: Add logging for better observability.

The added logging statement helps track the number of responses received during the test execution.

.github/workflows/e2e.yaml (1)

463-463: LGTM!

The job is correctly added to the slack-notification dependencies.

tests/e2e/operation/stream.go (1)

653-681: Well-implemented thread-safe error handling.

The error handling improvements demonstrate good practices:

Proper mutex synchronization for thread safety

Consistent error aggregation pattern across streaming operations

Clear separation of error collection and propagation

Also applies to: 767-795, 883-910, 995-1023, 1156-1183

tests/e2e/crud/crud_test.go (1)

59-69: Well-structured configuration management.

The new configuration options enhance test flexibility:

Search concurrency control

Resource wait duration configuration

Clear default values

Also applies to: 87-88, 104-104, 141-144

coderabbitai · 2025-01-28T03:55:24Z

tests/e2e/kubernetes/kubectl/kubectl.go

+func RolloutResourceName(
+	ctx context.Context, t *testing.T, resource string, name string, timeout string,
+) error {
+	t.Helper()
+	cmd := exec.CommandContext(ctx, "kubectl", "rollout", "restart", resource, name)
+	if err := runCmd(t, cmd); err != nil {
+		return err
+	}
+
+	r := strings.Join([]string{resource, name}, "/")
+	to := strings.Join([]string{"--timeout", timeout}, "=")
+	cmd = exec.CommandContext(ctx, "kubectl", "rollout", "status", r, "--watch", to)
+	stdout, err := cmd.StdoutPipe()
+	if err != nil {
+		return err
+	}
+	defer stdout.Close()
+
+	stderr, err := cmd.StderrPipe()
+	if err != nil {
+		return err
+	}
+	defer stderr.Close()
+
+	if err := cmd.Start(); err != nil {
+		return err
+	}
+	go func() {
+		scanner := bufio.NewScanner(stdout)
+		for scanner.Scan() {
+			fmt.Println(scanner.Text())
+		}
+	}()
+	go func() {
+		scanner := bufio.NewScanner(stderr)
+		for scanner.Scan() {
+			fmt.Println("Error:", scanner.Text())
+		}
+	}()
+
+	return cmd.Wait()
+}


🛠️ Refactor suggestion

Add error handling for scanner errors.

The implementation looks good but should include error handling for scanner errors in the goroutines.

Apply this diff to add error handling:

go func() { scanner := bufio.NewScanner(stdout) for scanner.Scan() { fmt.Println(scanner.Text()) } + if err := scanner.Err(); err != nil { + fmt.Printf("Error reading stdout: %v\n", err) + } }() go func() { scanner := bufio.NewScanner(stderr) for scanner.Scan() { fmt.Println("Error:", scanner.Text()) } + if err := scanner.Err(); err != nil { + fmt.Printf("Error reading stderr: %v\n", err) + } }()

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

func RolloutResourceName(

ctx context.Context, t *testing.T, resource string, name string, timeout string,

) error {

t.Helper()

cmd := exec.CommandContext(ctx, "kubectl", "rollout", "restart", resource, name)

if err := runCmd(t, cmd); err != nil {

return err

}

r := strings.Join([]string{resource, name}, "/")

to := strings.Join([]string{"--timeout", timeout}, "=")

cmd = exec.CommandContext(ctx, "kubectl", "rollout", "status", r, "--watch", to)

stdout, err := cmd.StdoutPipe()

if err != nil {

return err

}

defer stdout.Close()

stderr, err := cmd.StderrPipe()

if err != nil {

return err

}

defer stderr.Close()

if err := cmd.Start(); err != nil {

return err

}

go func() {

scanner := bufio.NewScanner(stdout)

for scanner.Scan() {

fmt.Println(scanner.Text())

}

}()

go func() {

scanner := bufio.NewScanner(stderr)

for scanner.Scan() {

fmt.Println("Error:", scanner.Text())

}

}()

return cmd.Wait()

}

func RolloutResourceName(

ctx context.Context, t *testing.T, resource string, name string, timeout string,

) error {

t.Helper()

cmd := exec.CommandContext(ctx, "kubectl", "rollout", "restart", resource, name)

if err := runCmd(t, cmd); err != nil {

return err

}

r := strings.Join([]string{resource, name}, "/")

to := strings.Join([]string{"--timeout", timeout}, "=")

cmd = exec.CommandContext(ctx, "kubectl", "rollout", "status", r, "--watch", to)

stdout, err := cmd.StdoutPipe()

if err != nil {

return err

}

defer stdout.Close()

stderr, err := cmd.StderrPipe()

if err != nil {

return err

}

defer stderr.Close()

if err := cmd.Start(); err != nil {

return err

}

go func() {

scanner := bufio.NewScanner(stdout)

for scanner.Scan() {

fmt.Println(scanner.Text())

}

if err := scanner.Err(); err != nil {

fmt.Printf("Error reading stdout: %v\n", err)

}

}()

go func() {

scanner := bufio.NewScanner(stderr)

for scanner.Scan() {

fmt.Println("Error:", scanner.Text())

}

if err := scanner.Err(); err != nil {

fmt.Printf("Error reading stderr: %v\n", err)

}

}()

return cmd.Wait()

}

coderabbitai · 2025-01-28T03:55:25Z

tests/e2e/crud/crud_test.go

+
+	_ = op.Upsert(t, ctx, operation.Dataset{
+		Train: ds.Train[insertFrom : insertFrom+insertNum],
+	})
+	if err != nil {
+		t.Fatalf("an error occurred: %s", err)
+	}
+
+	sleep(t, waitAfterInsertDuration)
+
+	searchFunc := func(ctx context.Context) error {
+		return op.MultiSearch(t, ctx, operation.Dataset{
+			Test:      ds.Test[searchFrom : searchFrom+searchNum],
+			Neighbors: ds.Neighbors[searchFrom : searchFrom+searchNum],
+		})
+	}
+
+	wg := sync.WaitGroup{}
+	mu := sync.Mutex{}
+	var serr error
+	wg.Add(1)
+	done := make(chan struct{})
+	go func() {
+		defer wg.Done()
+		for {
+			select {
+			case <-done:
+				return
+			default:
+				eg, egctx := errgroup.New(ctx)
+				for i := 0; i < searchConcurrency; i++ {
+					eg.Go(func() (e error) {
+						ierr := searchFunc(egctx)
+						if ierr != nil {
+							st, ok := status.FromError(ierr)
+							if ok && st.Code() == codes.DeadlineExceeded {
+								_, _, rerr := status.ParseError(ierr, codes.DeadlineExceeded, "an error occurred")
+								mu.Lock()
+								e = errors.Join(e, rerr)
+								mu.Unlock()
+							}
+						}
+						return
+					})
+				}
+				egerr := eg.Wait()
+				mu.Lock()
+				serr = errors.Join(serr, egerr)
+				mu.Unlock()
+				time.Sleep(5 * time.Second)
+			}
+		}
+	}()
+
+	// Wait for StatefulSet to be ready
+	t.Log("rollout restart agent and waiting for agent pods ready...")
+	err = kubectl.RolloutResourceName(ctx, t, "statefulset", "vald-agent", waitResourceReadyDuration.String())
+	if err != nil {
+		t.Fatalf("an error occurred: %s", err)
+	}
+
+	cnt, err := op.IndexInfo(t, ctx)
+	if err != nil {
+		if cnt == nil {
+			t.Fatalf("an error occurred: err = %s", err)
+		}
+		t.Fatalf("an error occurred: count = %d, err = %s", cnt.Stored, err)
+	}
+
+	err = op.Exists(t, ctx, "0")
+	if err != nil {
+		t.Fatalf("an error occurred: %s", err)
+	}
+
+	err = op.GetObject(t, ctx, operation.Dataset{
+		Train: ds.Train[getObjectFrom : getObjectFrom+getObjectNum],
+	})
+	if err != nil {
+		t.Fatalf("an error occurred: %s", err)
+	}
+
+	err = op.Remove(t, ctx, operation.Dataset{
+		Train: ds.Train[removeFrom : removeFrom+removeNum],
+	})
+	if err != nil {
+		t.Fatalf("an error occurred: %s", err)
+	}
+
+	// Remove all vector data after the current - 1 hour.
+	err = op.RemoveByTimestamp(t, ctx, time.Now().Add(-time.Hour).UnixNano())
+	if err != nil {
+		t.Fatalf("an error occurred: %s", err)
+	}
+
+	close(done)
+	wg.Wait()
+	if serr != nil {
+		t.Fatalf("an error occurred: %s", serr.Error())
+	}
+}


🛠️ Refactor suggestion

Consider reducing code duplication in test implementations.

The high concurrency test is well-structured but shares significant code with TestE2EAgentRolloutRestart. Consider extracting common functionality.

Extract shared test logic into reusable functions:

+type searchTestConfig struct { + t *testing.T + ctx context.Context + op *operation.Operation + searchFunc func(context.Context) error + concurrency int + sleepDuration time.Duration +} + +func runConcurrentSearchTest(cfg searchTestConfig) error { + var ( + wg sync.WaitGroup + mu sync.Mutex + serr error + ) + done := make(chan struct{}) + + wg.Add(1) + go func() { + defer wg.Done() + for { + select { + case <-done: + return + default: + eg, egctx := errgroup.New(cfg.ctx) + for i := 0; i < cfg.concurrency; i++ { + eg.Go(func() (e error) { + if ierr := cfg.searchFunc(egctx); ierr != nil { + if isDeadlineExceeded(ierr) { + _, _, rerr := status.ParseError( + ierr, + codes.DeadlineExceeded, + "an error occurred", + ) + mu.Lock() + e = errors.Join(e, rerr) + mu.Unlock() + } + } + return + }) + } + egerr := eg.Wait() + mu.Lock() + serr = errors.Join(serr, egerr) + mu.Unlock() + time.Sleep(cfg.sleepDuration) + } + } + }() + + // Run test-specific operations here + + close(done) + wg.Wait() + return serr +} func TestE2EHighConcurrencyMultiSearch(t *testing.T) { // ... setup code ... - wg := sync.WaitGroup{} - mu := sync.Mutex{} - var serr error - wg.Add(1) - done := make(chan struct{}) - go func() { - // ... duplicate concurrent search logic ... - }() + err := runConcurrentSearchTest(searchTestConfig{ + t: t, + ctx: ctx, + op: op, + searchFunc: searchFunc, + concurrency: searchConcurrency, + sleepDuration: 5 * time.Second, + }) // ... remaining test code ... }

Committable suggestion skipped: line range outside the PR's diff.

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (5)

Makefile.d/k3d.mk (1)

43-43: LGTM! Local registry setup will improve E2E test reliability.

The addition of a local registry at 0.0.0.0:5000 is a good practice for E2E testing as it:

Reduces external dependencies on remote registries

Improves test execution speed

Provides better control over image versions

tests/e2e/kubernetes/client/client.go (1)

210-233: Consider adding logging and configurable polling interval.

The implementation is correct but could be enhanced:

Add logging to track the StatefulSet status during polling
Make the polling interval configurable instead of hardcoding to 1 second

Apply this diff to add logging and configurable polling:

 func (cli *client) WaitForStatefulSetReady(
-	ctx context.Context, namespace, name string, timeout time.Duration,
+	ctx context.Context, namespace, name string, timeout time.Duration, pollInterval time.Duration,
 ) (ok bool, err error) {
 	ctx, cancel := context.WithTimeout(ctx, timeout)
 	defer cancel()
 
-	tick := time.NewTicker(time.Second)
+	if pollInterval == 0 {
+		pollInterval = time.Second
+	}
+	tick := time.NewTicker(pollInterval)
 	defer tick.Stop()
 
 	for {
 		ss, err := cli.clientset.AppsV1().StatefulSets(namespace).Get(ctx, name, metav1.GetOptions{})
 		if err != nil {
+			fmt.Printf("Error getting StatefulSet %s/%s: %v\n", namespace, name, err)
 			return false, err
 		}
+		fmt.Printf("StatefulSet %s/%s status: %d/%d replicas ready, %d/%d replicas updated\n",
+			namespace, name, ss.Status.ReadyReplicas, ss.Status.Replicas,
+			ss.Status.UpdatedReplicas, ss.Status.Replicas)
 		if ss.Status.UpdatedReplicas == ss.Status.Replicas && ss.Status.ReadyReplicas == ss.Status.Replicas {
 			return true, nil
 		}

Makefile.d/e2e.mk (1)

92-95: Fix typo in target name.

The target name contains a typo: "rollaout" should be "rollout".

Apply this diff to fix the typo:
-.PHONY: e2e/rollaout/restart/agent
+.PHONY: e2e/rollout/restart/agent

tests/e2e/operation/stream.go (1)

170-177: Clean up commented code.

Remove the commented out code as it's no longer needed and can cause confusion.

Apply this cleanup:
-// idx, err := strconv.Atoi(left)
-_, err = strconv.Atoi(left)
+_, err := strconv.Atoi(left)

-// t.Logf("algo: %s, id: %d, results: %d, recall: %f", right, idx, len(topKIDs), c.recall(topKIDs, ds.Neighbors[idx][:len(topKIDs)]))

tests/e2e/crud/crud_test.go (1)

1123-1241: Reduce code duplication in test implementations.

The test shares significant code with TestE2EAgentRolloutRestart.

Extract common test patterns into helper functions:

type searchTestConfig struct {
    searchFunc    func(context.Context) error
    concurrency   int
    waitDuration  time.Duration
    cleanupSteps  []func() error
}

func runConcurrentSearchTest(t *testing.T, cfg searchTestConfig) error {
    ctx, cancel := context.WithTimeout(context.Background(), 10*time.Minute)
    defer cancel()

    var (
        wg   sync.WaitGroup
        mu   sync.Mutex
        serr error
    )
    done := make(chan struct{})

    // Run concurrent searches
    wg.Add(1)
    go func() {
        defer wg.Done()
        // ... existing search loop ...
    }()

    // Run cleanup steps
    for _, cleanup := range cfg.cleanupSteps {
        if err := cleanup(); err != nil {
            return err
        }
    }

    close(done)
    wg.Wait()
    return serr
}

Then use it in both tests:

func TestE2EHighConcurrencyMultiSearch(t *testing.T) {
    // ... setup ...
    err := runConcurrentSearchTest(t, searchTestConfig{
        searchFunc:   searchFunc,
        concurrency:  searchConcurrency,
        waitDuration: 5 * time.Second,
        cleanupSteps: []func() error{
            // ... cleanup steps ...
        },
    })
    if err != nil {
        t.Fatal(err)
    }
}

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 40a3ccc and 8d9e061.

⛔ Files ignored due to path filters (1)

hack/docker/gen/main.go is excluded by !**/gen/**

📒 Files selected for processing (61)

.github/actions/setup-e2e/action.yaml (2 hunks)
.github/actions/setup-k3d/action.yaml (2 hunks)
.github/helm/values/values-rollout-agent.yaml (1 hunks)
.github/workflows/dockers-agent-faiss-image.yaml (1 hunks)
.github/workflows/dockers-agent-image.yaml (1 hunks)
.github/workflows/dockers-agent-ngt-image.yaml (1 hunks)
.github/workflows/dockers-agent-sidecar-image.yaml (1 hunks)
.github/workflows/dockers-benchmark-job-image.yaml (1 hunks)
.github/workflows/dockers-benchmark-operator-image.yaml (1 hunks)
.github/workflows/dockers-dev-container-image.yaml (1 hunks)
.github/workflows/dockers-discoverer-k8s-image.yaml (1 hunks)
.github/workflows/dockers-example-client-image.yaml (1 hunks)
.github/workflows/dockers-gateway-filter-image.yaml (1 hunks)
.github/workflows/dockers-gateway-lb-image.yaml (1 hunks)
.github/workflows/dockers-gateway-mirror-image.yaml (1 hunks)
.github/workflows/dockers-helm-operator-image.yaml (1 hunks)
.github/workflows/dockers-index-correction-image.yaml (1 hunks)
.github/workflows/dockers-index-creation-image.yaml (1 hunks)
.github/workflows/dockers-index-deletion-image.yaml (1 hunks)
.github/workflows/dockers-index-operator-image.yaml (1 hunks)
.github/workflows/dockers-index-save-image.yaml (1 hunks)
.github/workflows/dockers-manager-index-image.yaml (1 hunks)
.github/workflows/dockers-readreplica-rotate-image.yaml (1 hunks)
.github/workflows/e2e.yaml (2 hunks)
Makefile (2 hunks)
Makefile.d/e2e.mk (1 hunks)
Makefile.d/functions.mk (2 hunks)
Makefile.d/k3d.mk (1 hunks)
dockers/agent/core/agent/Dockerfile (1 hunks)
dockers/agent/core/faiss/Dockerfile (1 hunks)
dockers/agent/core/ngt/Dockerfile (1 hunks)
dockers/agent/sidecar/Dockerfile (1 hunks)
dockers/binfmt/Dockerfile (1 hunks)
dockers/buildbase/Dockerfile (1 hunks)
dockers/buildkit/Dockerfile (1 hunks)
dockers/buildkit/syft/scanner/Dockerfile (1 hunks)
dockers/ci/base/Dockerfile (1 hunks)
dockers/dev/Dockerfile (1 hunks)
dockers/discoverer/k8s/Dockerfile (1 hunks)
dockers/example/client/Dockerfile (1 hunks)
dockers/gateway/filter/Dockerfile (1 hunks)
dockers/gateway/lb/Dockerfile (1 hunks)
dockers/gateway/mirror/Dockerfile (1 hunks)
dockers/index/job/correction/Dockerfile (1 hunks)
dockers/index/job/creation/Dockerfile (1 hunks)
dockers/index/job/deletion/Dockerfile (1 hunks)
dockers/index/job/readreplica/rotate/Dockerfile (1 hunks)
dockers/index/job/save/Dockerfile (1 hunks)
dockers/index/operator/Dockerfile (1 hunks)
dockers/manager/index/Dockerfile (1 hunks)
dockers/operator/helm/Dockerfile (1 hunks)
dockers/tools/benchmark/job/Dockerfile (1 hunks)
dockers/tools/benchmark/operator/Dockerfile (1 hunks)
dockers/tools/cli/loadtest/Dockerfile (1 hunks)
pkg/agent/core/ngt/service/ngt.go (1 hunks)
tests/e2e/crud/crud_test.go (6 hunks)
tests/e2e/kubernetes/client/client.go (2 hunks)
tests/e2e/kubernetes/kubectl/kubectl.go (2 hunks)
tests/e2e/operation/multi.go (3 hunks)
tests/e2e/operation/operation.go (1 hunks)
tests/e2e/operation/stream.go (19 hunks)

🚧 Files skipped from review as they are similar to previous changes (54)

dockers/buildkit/syft/scanner/Dockerfile
dockers/gateway/filter/Dockerfile
dockers/index/job/correction/Dockerfile
dockers/buildkit/Dockerfile
dockers/buildbase/Dockerfile
dockers/ci/base/Dockerfile
dockers/tools/benchmark/job/Dockerfile
dockers/operator/helm/Dockerfile
dockers/index/job/deletion/Dockerfile
.github/workflows/dockers-index-operator-image.yaml
.github/workflows/dockers-manager-index-image.yaml
dockers/tools/benchmark/operator/Dockerfile
.github/workflows/dockers-helm-operator-image.yaml
dockers/gateway/mirror/Dockerfile
dockers/agent/core/faiss/Dockerfile
.github/workflows/dockers-agent-faiss-image.yaml
dockers/discoverer/k8s/Dockerfile
.github/workflows/dockers-index-correction-image.yaml
.github/workflows/dockers-dev-container-image.yaml
dockers/tools/cli/loadtest/Dockerfile
dockers/dev/Dockerfile
.github/workflows/dockers-discoverer-k8s-image.yaml
.github/workflows/dockers-gateway-lb-image.yaml
.github/helm/values/values-rollout-agent.yaml
.github/workflows/dockers-benchmark-operator-image.yaml
dockers/binfmt/Dockerfile
dockers/gateway/lb/Dockerfile
dockers/index/operator/Dockerfile
.github/workflows/dockers-gateway-filter-image.yaml
dockers/manager/index/Dockerfile
.github/workflows/dockers-gateway-mirror-image.yaml
.github/workflows/dockers-readreplica-rotate-image.yaml
dockers/index/job/creation/Dockerfile
Makefile
.github/workflows/dockers-agent-ngt-image.yaml
.github/workflows/dockers-example-client-image.yaml
dockers/index/job/save/Dockerfile
.github/workflows/dockers-index-creation-image.yaml
.github/workflows/dockers-agent-image.yaml
dockers/agent/core/ngt/Dockerfile
tests/e2e/operation/operation.go
.github/actions/setup-e2e/action.yaml
dockers/agent/sidecar/Dockerfile
dockers/index/job/readreplica/rotate/Dockerfile
tests/e2e/operation/multi.go
.github/workflows/dockers-agent-sidecar-image.yaml
.github/workflows/dockers-index-save-image.yaml
.github/actions/setup-k3d/action.yaml
.github/workflows/dockers-index-deletion-image.yaml
dockers/example/client/Dockerfile
dockers/agent/core/agent/Dockerfile
pkg/agent/core/ngt/service/ngt.go
Makefile.d/functions.mk
.github/workflows/dockers-benchmark-job-image.yaml

🔇 Additional comments (8)

tests/e2e/kubernetes/kubectl/kubectl.go (1)

72-83: Add error handling for scanner errors.

The implementation looks good but should include error handling for scanner errors in the goroutines.

Makefile.d/e2e.mk (1)

97-100: LGTM! Target follows existing patterns.

The target is well-structured and consistent with other E2E test targets in the file.
tests/e2e/operation/stream.go (3)

71-71: LGTM! Improved timeout configuration.

The changes make the timeout values more explicit and maintainable by using time.Duration variables.

Also applies to: 274-274

654-666: LGTM! Improved thread-safe error handling.

The changes enhance error handling by:

Using a mutex to protect error assignments

Using a dedicated error variable for consistent error aggregation

Also applies to: 672-682

720-723: Fix redundant error joining

The error is being joined with itself, which is redundant and potentially confusing.

Apply this fix:
-rerr = errors.Join(err, err)
+rerr = err
tests/e2e/crud/crud_test.go (2)

1025-1114: Verify if commented operations are needed.

Several operations are commented out. Please verify if these operations should be:

Re-enabled for complete testing

Removed if no longer needed

Left commented for future implementation

Run this script to check the test coverage without these operations:

✅ Verification successful

Commented operations are adequately covered elsewhere.

The functions IndexInfo, Exists, and GetObject are still defined and exercised in other tests (see files under tests/e2e/operation), which confirms that their functionalities are being tested.

This indicates that the commented out code in this high concurrency multi-search test is either intentionally disabled for focus or reserved for future use.
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:
#!/bin/bash
# Description: Check test coverage for commented operations

# Check if these operations are tested elsewhere
rg --type go 'func.*IndexInfo|func.*Exists|func.*GetObject' tests/
Length of output: 427
1002-1121: 🛠️ Refactor suggestion

Improve test robustness and cleanup.

The test implementation is good but could be improved in several areas:

Add timeout context to prevent test from hanging

Ensure proper cleanup of resources

Add test failure conditions for specific scenarios

Apply these improvements:
-ctx, cancel := context.WithCancel(context.Background())
+ctx, cancel := context.WithTimeout(context.Background(), 10*time.Minute)

 t.Cleanup(teardown)
+t.Cleanup(func() {
+    cancel()
+    close(done)
+    wg.Wait()
+})

 if serr != nil {
+    if strings.Contains(serr.Error(), "DeadlineExceeded") {
+        t.Fatal("Search operations timed out during agent rollout")
+    }
     t.Fatalf("an error occurred: %s", serr)
 }
Likely invalid or redundant comment.
.github/workflows/e2e.yaml (1)

351-394: LGTM! Well-structured E2E test job configuration.

The new job follows the established pattern and includes:

Proper timeout configuration

Required environment setup

Correct test execution parameters

github-actions · 2025-02-04T04:39:16Z

dockers/dev/Dockerfile

@@ -140,4 +140,4 @@ RUN --mount=type=bind,target=.,rw \
    && make faiss/install \
    && rm -rf ${GOPATH}/src/github.com/${ORG}/${REPO}/*
 # skipcq: DOK-DL3002
-USER root:root
+USER root:root


⚠️ [hadolint] <DL3002> _{reported by reviewdog 🐶}
Last USER should not be root

coderabbitai

Actionable comments posted: 1

♻️ Duplicate comments (2)

tests/e2e/kubernetes/kubectl/kubectl.go (1)

72-77: ⚠️ Potential issue

Add error handling for scanner errors.

The goroutines should handle scanner errors to prevent silent failures.

Apply this diff to add error handling:

 go func() {
   scanner := bufio.NewScanner(stdout)
   for scanner.Scan() {
     fmt.Println(scanner.Text())
   }
+  if err := scanner.Err(); err != nil {
+    fmt.Printf("Error reading stdout: %v\n", err)
+  }
 }()
 go func() {
   scanner := bufio.NewScanner(stderr)
   for scanner.Scan() {
     fmt.Println("Error:", scanner.Text())
   }
+  if err := scanner.Err(); err != nil {
+    fmt.Printf("Error reading stderr: %v\n", err)
+  }
 }()

Also applies to: 78-83

tests/e2e/crud/crud_test.go (1)

1002-1241: 🛠️ Refactor suggestion

Reduce code duplication in test implementations.

The test functions share significant code. Extract common test patterns into helper functions.

Apply this diff to create a helper function:

type searchTestConfig struct {
    searchFunc    func(context.Context) error
    concurrency   int
    waitDuration  time.Duration
    cleanupSteps  []func() error
}

func runConcurrentSearchTest(t *testing.T, cfg searchTestConfig) error {
    ctx, cancel := context.WithTimeout(context.Background(), 10*time.Minute)
    defer cancel()

    var (
        wg   sync.WaitGroup
        mu   sync.Mutex
        serr error
    )
    done := make(chan struct{})

    // Run concurrent searches
    wg.Add(1)
    go func() {
        defer wg.Done()
        for {
            select {
            case <-done:
                return
            default:
                eg, egctx := errgroup.New(ctx)
                for i := 0; i < cfg.concurrency; i++ {
                    eg.Go(func() (e error) {
                        if ierr := cfg.searchFunc(egctx); ierr != nil {
                            st, ok := status.FromError(ierr)
                            if ok && st.Code() == codes.DeadlineExceeded {
                                _, _, rerr := status.ParseError(
                                    ierr,
                                    codes.DeadlineExceeded,
                                    "an error occurred",
                                )
                                mu.Lock()
                                e = errors.Join(e, rerr)
                                mu.Unlock()
                            }
                        }
                        return
                    })
                }
                egerr := eg.Wait()
                mu.Lock()
                serr = errors.Join(serr, egerr)
                mu.Unlock()
                time.Sleep(cfg.waitDuration)
            }
        }
    }()

    // Run cleanup steps
    for _, cleanup := range cfg.cleanupSteps {
        if err := cleanup(); err != nil {
            return err
        }
    }

    close(done)
    wg.Wait()
    return serr
}

Then use it in both tests:

func TestE2EAgentRolloutRestart(t *testing.T) {
    // ... setup ...
    err := runConcurrentSearchTest(t, searchTestConfig{
        searchFunc:    searchFunc,
        concurrency:   searchConcurrency,
        waitDuration: 10 * time.Second,
        cleanupSteps: []func() error{
            func() error {
                return kubectl.RolloutResourceName(
                    ctx, t, "statefulset", "vald-agent",
                    waitResourceReadyDuration.String(),
                )
            },
        },
    })
    if err != nil {
        t.Fatal(err)
    }
}

func TestE2EHighConcurrencyMultiSearch(t *testing.T) {
    // ... setup ...
    err := runConcurrentSearchTest(t, searchTestConfig{
        searchFunc:    searchFunc,
        concurrency:   searchConcurrency,
        waitDuration: 5 * time.Second,
        cleanupSteps: []func() error{
            func() error {
                return kubectl.RolloutResourceName(
                    ctx, t, "statefulset", "vald-agent",
                    waitResourceReadyDuration.String(),
                )
            },
            // ... other cleanup steps ...
        },
    })
    if err != nil {
        t.Fatal(err)
    }
}

🧹 Nitpick comments (1)

tests/e2e/crud/crud_test.go (1)

1022-1028: Clean up commented code.

Remove commented-out code to improve readability. If these operations are needed, they should be uncommented and properly tested.

Also applies to: 1083-1115

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 8d9e061 and 67be97f.

⛔ Files ignored due to path filters (1)

hack/docker/gen/main.go is excluded by !**/gen/**

📒 Files selected for processing (61)

.github/actions/setup-e2e/action.yaml (2 hunks)
.github/actions/setup-k3d/action.yaml (2 hunks)
.github/helm/values/values-rollout-agent.yaml (1 hunks)
.github/workflows/dockers-agent-faiss-image.yaml (1 hunks)
.github/workflows/dockers-agent-image.yaml (1 hunks)
.github/workflows/dockers-agent-ngt-image.yaml (1 hunks)
.github/workflows/dockers-agent-sidecar-image.yaml (1 hunks)
.github/workflows/dockers-benchmark-job-image.yaml (1 hunks)
.github/workflows/dockers-benchmark-operator-image.yaml (1 hunks)
.github/workflows/dockers-dev-container-image.yaml (1 hunks)
.github/workflows/dockers-discoverer-k8s-image.yaml (1 hunks)
.github/workflows/dockers-example-client-image.yaml (1 hunks)
.github/workflows/dockers-gateway-filter-image.yaml (1 hunks)
.github/workflows/dockers-gateway-lb-image.yaml (1 hunks)
.github/workflows/dockers-gateway-mirror-image.yaml (1 hunks)
.github/workflows/dockers-helm-operator-image.yaml (1 hunks)
.github/workflows/dockers-index-correction-image.yaml (1 hunks)
.github/workflows/dockers-index-creation-image.yaml (1 hunks)
.github/workflows/dockers-index-deletion-image.yaml (1 hunks)
.github/workflows/dockers-index-operator-image.yaml (1 hunks)
.github/workflows/dockers-index-save-image.yaml (1 hunks)
.github/workflows/dockers-manager-index-image.yaml (1 hunks)
.github/workflows/dockers-readreplica-rotate-image.yaml (1 hunks)
.github/workflows/e2e.yaml (2 hunks)
Makefile (2 hunks)
Makefile.d/e2e.mk (1 hunks)
Makefile.d/functions.mk (2 hunks)
Makefile.d/k3d.mk (1 hunks)
dockers/agent/core/agent/Dockerfile (1 hunks)
dockers/agent/core/faiss/Dockerfile (1 hunks)
dockers/agent/core/ngt/Dockerfile (1 hunks)
dockers/agent/sidecar/Dockerfile (1 hunks)
dockers/binfmt/Dockerfile (1 hunks)
dockers/buildbase/Dockerfile (1 hunks)
dockers/buildkit/Dockerfile (1 hunks)
dockers/buildkit/syft/scanner/Dockerfile (1 hunks)
dockers/ci/base/Dockerfile (1 hunks)
dockers/dev/Dockerfile (1 hunks)
dockers/discoverer/k8s/Dockerfile (1 hunks)
dockers/example/client/Dockerfile (1 hunks)
dockers/gateway/filter/Dockerfile (1 hunks)
dockers/gateway/lb/Dockerfile (1 hunks)
dockers/gateway/mirror/Dockerfile (1 hunks)
dockers/index/job/correction/Dockerfile (1 hunks)
dockers/index/job/creation/Dockerfile (1 hunks)
dockers/index/job/deletion/Dockerfile (1 hunks)
dockers/index/job/readreplica/rotate/Dockerfile (1 hunks)
dockers/index/job/save/Dockerfile (1 hunks)
dockers/index/operator/Dockerfile (1 hunks)
dockers/manager/index/Dockerfile (1 hunks)
dockers/operator/helm/Dockerfile (1 hunks)
dockers/tools/benchmark/job/Dockerfile (1 hunks)
dockers/tools/benchmark/operator/Dockerfile (1 hunks)
dockers/tools/cli/loadtest/Dockerfile (1 hunks)
pkg/agent/core/ngt/service/ngt.go (1 hunks)
tests/e2e/crud/crud_test.go (6 hunks)
tests/e2e/kubernetes/client/client.go (2 hunks)
tests/e2e/kubernetes/kubectl/kubectl.go (2 hunks)
tests/e2e/operation/multi.go (3 hunks)
tests/e2e/operation/operation.go (1 hunks)
tests/e2e/operation/stream.go (19 hunks)

🚧 Files skipped from review as they are similar to previous changes (57)

dockers/buildkit/Dockerfile
dockers/buildkit/syft/scanner/Dockerfile
dockers/gateway/filter/Dockerfile
dockers/gateway/mirror/Dockerfile
dockers/discoverer/k8s/Dockerfile
.github/workflows/dockers-index-correction-image.yaml
.github/workflows/dockers-gateway-filter-image.yaml
dockers/index/job/creation/Dockerfile
dockers/binfmt/Dockerfile
dockers/index/job/correction/Dockerfile
.github/workflows/dockers-benchmark-job-image.yaml
dockers/tools/cli/loadtest/Dockerfile
.github/workflows/dockers-agent-faiss-image.yaml
.github/workflows/dockers-gateway-mirror-image.yaml
.github/workflows/dockers-agent-ngt-image.yaml
dockers/ci/base/Dockerfile
dockers/example/client/Dockerfile
dockers/agent/sidecar/Dockerfile
.github/workflows/dockers-index-operator-image.yaml
dockers/agent/core/agent/Dockerfile
dockers/manager/index/Dockerfile
dockers/buildbase/Dockerfile
dockers/index/operator/Dockerfile
dockers/operator/helm/Dockerfile
dockers/index/job/save/Dockerfile
dockers/agent/core/faiss/Dockerfile
Makefile
.github/workflows/dockers-discoverer-k8s-image.yaml
dockers/gateway/lb/Dockerfile
dockers/tools/benchmark/job/Dockerfile
Makefile.d/k3d.mk
.github/workflows/dockers-dev-container-image.yaml
tests/e2e/operation/operation.go
.github/workflows/dockers-manager-index-image.yaml
.github/workflows/dockers-example-client-image.yaml
.github/workflows/dockers-gateway-lb-image.yaml
dockers/agent/core/ngt/Dockerfile
Makefile.d/functions.mk
dockers/tools/benchmark/operator/Dockerfile
.github/workflows/dockers-index-deletion-image.yaml
.github/workflows/dockers-index-save-image.yaml
.github/workflows/dockers-agent-image.yaml
pkg/agent/core/ngt/service/ngt.go
.github/workflows/dockers-benchmark-operator-image.yaml
dockers/index/job/readreplica/rotate/Dockerfile
.github/workflows/dockers-index-creation-image.yaml
.github/workflows/dockers-agent-sidecar-image.yaml
.github/workflows/dockers-readreplica-rotate-image.yaml
.github/helm/values/values-rollout-agent.yaml
.github/workflows/dockers-helm-operator-image.yaml
.github/actions/setup-e2e/action.yaml
tests/e2e/operation/multi.go
.github/actions/setup-k3d/action.yaml
tests/e2e/kubernetes/client/client.go
dockers/index/job/deletion/Dockerfile
dockers/dev/Dockerfile
tests/e2e/operation/stream.go

⏰ Context from checks skipped due to timeout of 90000ms (190)

GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build
GitHub Check: build / build

🔇 Additional comments (2)

Makefile.d/e2e.mk (1)

92-100: LGTM!

The new phony targets are well-defined and properly documented.

.github/workflows/e2e.yaml (1)

351-394: LGTM!

The new job is well-structured and properly integrated with the existing workflow.

coderabbitai · 2025-02-04T04:50:08Z

tests/e2e/crud/crud_test.go

+		}
+	}
+
+	ctx, cancel := context.WithCancel(context.Background())


⚠️ Potential issue

Add timeout to prevent test from hanging.

Replace context.WithCancel with context.WithTimeout to prevent the test from running indefinitely.

Apply this diff:

-ctx, cancel := context.WithCancel(context.Background()) +ctx, cancel := context.WithTimeout(context.Background(), 10*time.Minute)

Also applies to: 1135-1135

Signed-off-by: vankichi <[email protected]>

Signed-off-by: kpango <[email protected]>

Signed-off-by: vankichi <[email protected]>

coderabbitai

Actionable comments posted: 0

♻️ Duplicate comments (4)

tests/e2e/kubernetes/kubectl/kubectl.go (1)

72-77: 🛠️ Refactor suggestion

Add error handling for scanner errors.

The scanner error handling is missing in both goroutines. This could lead to silent failures if the scanner encounters an error.

Apply this diff to add error handling:
 go func() {
   scanner := bufio.NewScanner(stdout)
   for scanner.Scan() {
     fmt.Println(scanner.Text())
   }
+  if err := scanner.Err(); err != nil {
+    fmt.Printf("Error reading stdout: %v\n", err)
+  }
 }()
 go func() {
   scanner := bufio.NewScanner(stderr)
   for scanner.Scan() {
     fmt.Println("Error:", scanner.Text())
   }
+  if err := scanner.Err(); err != nil {
+    fmt.Printf("Error reading stderr: %v\n", err)
+  }
 }()
Also applies to: 78-83

tests/e2e/operation/stream.go (1)

720-723: 🛠️ Refactor suggestion

Fix redundant error joining.

The error is being joined with itself, which is redundant.

Apply this fix:
-rerr = errors.Join(err, err)
+rerr = err

tests/e2e/crud/crud_test.go (2)

1014-1015: 🛠️ Refactor suggestion

Add timeout to prevent test from hanging.

Replace context.WithCancel with context.WithTimeout to prevent the test from running indefinitely.

Apply this diff:

-ctx, cancel := context.WithCancel(context.Background())
+ctx, cancel := context.WithTimeout(context.Background(), 10*time.Minute)

Also applies to: 1135-1136

1002-1121: 🛠️ Refactor suggestion

Reduce code duplication in test implementations.

The test functions share significant code. Consider extracting common test patterns into helper functions.

Extract shared test logic into reusable functions:

type searchTestConfig struct {
    t               *testing.T
    ctx             context.Context
    op              *operation.Operation
    searchFunc      func(context.Context) error
    concurrency     int
    sleepDuration   time.Duration
}

func runConcurrentSearchTest(cfg searchTestConfig) error {
    var (
        wg   sync.WaitGroup
        mu   sync.Mutex
        serr error
    )
    done := make(chan struct{})
    
    wg.Add(1)
    go func() {
        defer wg.Done()
        for {
            select {
            case <-done:
                return
            default:
                eg, egctx := errgroup.New(cfg.ctx)
                for i := 0; i < cfg.concurrency; i++ {
                    eg.Go(func() (e error) {
                        if ierr := cfg.searchFunc(egctx); ierr != nil {
                            if isDeadlineExceeded(ierr) {
                                _, _, rerr := status.ParseError(
                                    ierr,
                                    codes.DeadlineExceeded,
                                    "an error occurred",
                                )
                                mu.Lock()
                                e = errors.Join(e, rerr)
                                mu.Unlock()
                            }
                        }
                        return
                    })
                }
                egerr := eg.Wait()
                mu.Lock()
                serr = errors.Join(serr, egerr)
                mu.Unlock()
                time.Sleep(cfg.sleepDuration)
            }
        }
    }()
    
    close(done)
    wg.Wait()
    return serr
}

Also applies to: 1123-1241

🧹 Nitpick comments (1)

tests/e2e/crud/crud_test.go (1)
1118-1120: Add test failure conditions for specific scenarios.

Improve error handling by adding specific failure conditions.

Apply this diff:
 if serr != nil {
+    if strings.Contains(serr.Error(), "DeadlineExceeded") {
+        t.Fatal("Search operations timed out during agent rollout")
+    }
     t.Fatalf("an error occurred: %s", serr)
 }
Also applies to: 1238-1240

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 67be97f and 6a61e26.

⛔ Files ignored due to path filters (1)

hack/docker/gen/main.go is excluded by !**/gen/**

📒 Files selected for processing (61)

.github/actions/setup-e2e/action.yaml (2 hunks)
.github/actions/setup-k3d/action.yaml (2 hunks)
.github/helm/values/values-rollout-agent.yaml (1 hunks)
.github/workflows/dockers-agent-faiss-image.yaml (1 hunks)
.github/workflows/dockers-agent-image.yaml (1 hunks)
.github/workflows/dockers-agent-ngt-image.yaml (1 hunks)
.github/workflows/dockers-agent-sidecar-image.yaml (1 hunks)
.github/workflows/dockers-benchmark-job-image.yaml (1 hunks)
.github/workflows/dockers-benchmark-operator-image.yaml (1 hunks)
.github/workflows/dockers-dev-container-image.yaml (1 hunks)
.github/workflows/dockers-discoverer-k8s-image.yaml (1 hunks)
.github/workflows/dockers-example-client-image.yaml (1 hunks)
.github/workflows/dockers-gateway-filter-image.yaml (1 hunks)
.github/workflows/dockers-gateway-lb-image.yaml (1 hunks)
.github/workflows/dockers-gateway-mirror-image.yaml (1 hunks)
.github/workflows/dockers-helm-operator-image.yaml (1 hunks)
.github/workflows/dockers-index-correction-image.yaml (1 hunks)
.github/workflows/dockers-index-creation-image.yaml (1 hunks)
.github/workflows/dockers-index-deletion-image.yaml (1 hunks)
.github/workflows/dockers-index-operator-image.yaml (1 hunks)
.github/workflows/dockers-index-save-image.yaml (1 hunks)
.github/workflows/dockers-manager-index-image.yaml (1 hunks)
.github/workflows/dockers-readreplica-rotate-image.yaml (1 hunks)
.github/workflows/e2e.yaml (2 hunks)
Makefile (2 hunks)
Makefile.d/e2e.mk (1 hunks)
Makefile.d/functions.mk (2 hunks)
Makefile.d/k3d.mk (1 hunks)
dockers/agent/core/agent/Dockerfile (1 hunks)
dockers/agent/core/faiss/Dockerfile (1 hunks)
dockers/agent/core/ngt/Dockerfile (1 hunks)
dockers/agent/sidecar/Dockerfile (1 hunks)
dockers/binfmt/Dockerfile (1 hunks)
dockers/buildbase/Dockerfile (1 hunks)
dockers/buildkit/Dockerfile (1 hunks)
dockers/buildkit/syft/scanner/Dockerfile (1 hunks)
dockers/ci/base/Dockerfile (1 hunks)
dockers/dev/Dockerfile (1 hunks)
dockers/discoverer/k8s/Dockerfile (1 hunks)
dockers/example/client/Dockerfile (1 hunks)
dockers/gateway/filter/Dockerfile (1 hunks)
dockers/gateway/lb/Dockerfile (1 hunks)
dockers/gateway/mirror/Dockerfile (1 hunks)
dockers/index/job/correction/Dockerfile (1 hunks)
dockers/index/job/creation/Dockerfile (1 hunks)
dockers/index/job/deletion/Dockerfile (1 hunks)
dockers/index/job/readreplica/rotate/Dockerfile (1 hunks)
dockers/index/job/save/Dockerfile (1 hunks)
dockers/index/operator/Dockerfile (1 hunks)
dockers/manager/index/Dockerfile (1 hunks)
dockers/operator/helm/Dockerfile (1 hunks)
dockers/tools/benchmark/job/Dockerfile (1 hunks)
dockers/tools/benchmark/operator/Dockerfile (1 hunks)
dockers/tools/cli/loadtest/Dockerfile (1 hunks)
pkg/agent/core/ngt/service/ngt.go (1 hunks)
tests/e2e/crud/crud_test.go (6 hunks)
tests/e2e/kubernetes/client/client.go (2 hunks)
tests/e2e/kubernetes/kubectl/kubectl.go (2 hunks)
tests/e2e/operation/multi.go (3 hunks)
tests/e2e/operation/operation.go (1 hunks)
tests/e2e/operation/stream.go (19 hunks)

🚧 Files skipped from review as they are similar to previous changes (57)

dockers/buildkit/Dockerfile
dockers/gateway/lb/Dockerfile
dockers/dev/Dockerfile
dockers/binfmt/Dockerfile
dockers/gateway/filter/Dockerfile
dockers/buildbase/Dockerfile
dockers/index/job/correction/Dockerfile
.github/workflows/dockers-manager-index-image.yaml
dockers/index/job/creation/Dockerfile
dockers/index/operator/Dockerfile
dockers/tools/benchmark/job/Dockerfile
.github/workflows/dockers-agent-ngt-image.yaml
dockers/example/client/Dockerfile
.github/workflows/dockers-index-deletion-image.yaml
Makefile.d/k3d.mk
dockers/discoverer/k8s/Dockerfile
.github/workflows/dockers-benchmark-operator-image.yaml
dockers/gateway/mirror/Dockerfile
dockers/agent/core/faiss/Dockerfile
.github/workflows/dockers-discoverer-k8s-image.yaml
dockers/buildkit/syft/scanner/Dockerfile
.github/workflows/dockers-index-save-image.yaml
dockers/agent/core/ngt/Dockerfile
.github/workflows/dockers-benchmark-job-image.yaml
.github/workflows/dockers-example-client-image.yaml
dockers/ci/base/Dockerfile
.github/workflows/dockers-dev-container-image.yaml
.github/workflows/dockers-helm-operator-image.yaml
.github/workflows/dockers-gateway-filter-image.yaml
.github/workflows/dockers-index-creation-image.yaml
Makefile.d/functions.mk
.github/workflows/dockers-agent-faiss-image.yaml
.github/workflows/dockers-gateway-mirror-image.yaml
tests/e2e/operation/multi.go
dockers/tools/benchmark/operator/Dockerfile
.github/workflows/dockers-agent-image.yaml
.github/actions/setup-k3d/action.yaml
.github/workflows/dockers-index-operator-image.yaml
dockers/tools/cli/loadtest/Dockerfile
.github/helm/values/values-rollout-agent.yaml
.github/workflows/dockers-readreplica-rotate-image.yaml
tests/e2e/kubernetes/client/client.go
dockers/manager/index/Dockerfile
.github/actions/setup-e2e/action.yaml
dockers/index/job/save/Dockerfile
tests/e2e/operation/operation.go
dockers/index/job/readreplica/rotate/Dockerfile
dockers/index/job/deletion/Dockerfile
dockers/operator/helm/Dockerfile
dockers/agent/sidecar/Dockerfile
Makefile.d/e2e.mk
Makefile
dockers/agent/core/agent/Dockerfile
.github/workflows/dockers-index-correction-image.yaml
.github/workflows/dockers-agent-sidecar-image.yaml
.github/workflows/dockers-gateway-lb-image.yaml
pkg/agent/core/ngt/service/ngt.go

🔇 Additional comments (3)

tests/e2e/kubernetes/kubectl/kubectl.go (1)

45-86: LGTM! Well-structured implementation.

The function is well-structured with proper:

Resource name and timeout parameter handling

Command execution and output streaming

Error handling for command execution

tests/e2e/operation/stream.go (1)

71-72: LGTM! Improved timeout management.

Good improvement replacing hardcoded timeouts with configurable duration variables:

Search timeout: 1 second

SearchByID timeout: 3 seconds

Also applies to: 274-275

.github/workflows/e2e.yaml (1)

351-394: LGTM! Well-configured E2E test job.

The new job is well-structured with:

Proper environment setup using K3D storage

Consistent configuration with other E2E test jobs

Appropriate test parameters for rollout restart testing

Signed-off-by: vankichi <[email protected]>

coderabbitai

Actionable comments posted: 3

♻️ Duplicate comments (1)

tests/e2e/operation/stream.go (1)
719-722: ⚠️ Potential issue

Fix redundant error joining.

The error is being joined with itself, which is redundant and potentially confusing.

Apply this fix to all similar occurrences:
-rerr = errors.Join(err, err)
+rerr = err

🧹 Nitpick comments (8)

tests/e2e/operation/unaly.go (4)
38-43: Consider making timeout configurable.

The timeout value is hardcoded to 3 seconds. Consider making it configurable through a parameter to accommodate different test environments and scenarios.
-	var (
-		num     = uint32(100)
-		radius  = -1
-		epsilon = 0.01
-		timeout = time.Second * 3
-	)
+	var (
+		num     = uint32(100)
+		radius  = -1
+		epsilon = 0.01
+		timeout = time.Duration(c.config.SearchTimeout) * time.Second
+	)
103-105: Implement SearchByID functionality.

The function is currently a placeholder. Based on the interface contract, this function should implement search by ID functionality.

Would you like me to help implement this function based on the Search implementation above?

107-109: Implement LinearSearch functionality.

The function is currently a placeholder. Based on the interface contract, this function should implement linear search functionality.

Would you like me to help implement this function based on the Search implementation above?

111-113: Implement LinearSearchByID functionality.

The function is currently a placeholder. Based on the interface contract, this function should implement linear search by ID functionality.

Would you like me to help implement this function based on the Search implementation above?
Makefile.d/e2e.mk (2)
92-95: Fix typo in target name.

The target name contains a typo in "rollaout".
-.PHONY: e2e/standard/rollaout/restart/agent
+.PHONY: e2e/standard/rollout/restart/agent
97-100: Fix typo in target name.

The target name contains a typo in "rollaout".
-.PHONY: e2e/stream/rollaout/restart/agent
+.PHONY: e2e/stream/rollout/restart/agent
tests/e2e/crud/crud_test.go (2)
1031-1037: Extract common search function pattern.

The search function pattern is duplicated across multiple tests. Consider extracting it into a reusable helper.
+type searchConfig struct {
+    op        *operation.Operation
+    t         *testing.T
+    dataset   operation.Dataset
+    isStream  bool
+    isMulti   bool
+}
+
+func createSearchFunc(cfg searchConfig) func(context.Context) error {
+    return func(ctx context.Context) error {
+        if cfg.isMulti {
+            return cfg.op.MultiSearch(cfg.t, ctx, cfg.dataset)
+        }
+        if cfg.isStream {
+            return cfg.op.StreamSearch(cfg.t, ctx, cfg.dataset)
+        }
+        return cfg.op.Search(cfg.t, ctx, cfg.dataset)
+    }
+}
Also applies to: 1152-1158, 1262-1268

1363-1367: Move mutex lock before error check.

The mutex lock around the error check is unnecessary since no other goroutines are accessing serr at this point.
-	mu.Lock()
-	if serr != nil {
-		t.Fatalf("an error occurred: %s", serr.Error())
-	}
-	mu.Unlock()
+	if serr != nil {
+		t.Fatalf("an error occurred: %s", serr.Error())
+	}

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 6a61e26 and caa6b5a.

📒 Files selected for processing (5)

Makefile.d/e2e.mk (1 hunks)
tests/e2e/crud/crud_test.go (12 hunks)
tests/e2e/operation/operation.go (3 hunks)
tests/e2e/operation/stream.go (21 hunks)
tests/e2e/operation/unaly.go (1 hunks)

🔇 Additional comments (9)

tests/e2e/operation/unaly.go (1)

45-47: LGTM! Well-structured concurrent implementation.

The use of errgroup with a limit of 100 is a good practice for managing concurrent search operations while preventing resource exhaustion.

tests/e2e/operation/operation.go (1)

207-208: LGTM! Appropriate keepalive adjustments for rollout scenarios.

The increased keepalive parameters (Time: 10m, Timeout: 60s) are well-suited for maintaining connections during agent rollout restarts, which aligns with the PR objectives.

Makefile.d/e2e.mk (1)

102-105: LGTM! Well-structured test target.

The target is appropriately named and correctly configured to run the high concurrency multi-search test.

tests/e2e/crud/crud_test.go (6)

59-66: LGTM! New variables are well-organized and follow naming conventions.

The new variables are properly grouped with other similar variables and follow consistent naming patterns.

68-69: LGTM! Duration variables are well-named and grouped together.

The new duration variables follow the established pattern and are properly grouped.

104-104: LGTM! New flag with clear description.

The new wait-resource-ready flag is well-documented and follows the established pattern.

141-144: LGTM! Proper error handling for duration parsing.

The error handling follows the established pattern in the file.

1014-1014: Add timeout to prevent test from hanging.

Replace context.WithCancel with context.WithTimeout to prevent the test from running indefinitely.

Also applies to: 1135-1135

87-88: Consider adjusting the default values for better test coverage.

The default values might need adjustment:

searchByIDNum is reduced from 100 to 3, which might not provide sufficient test coverage

searchConcurrency default of 100 might be too high for some environments

Consider validating these values:

coderabbitai · 2025-02-05T03:21:51Z

tests/e2e/crud/crud_test.go

+	wg := sync.WaitGroup{}
+	mu := sync.Mutex{}
+	done := make(chan struct{})
+	var serr error
+
+	wg.Add(1)
+	go func() {
+		defer wg.Done()
+		for {
+			select {
+			case <-done:
+				return
+			default:
+				eg, egctx := errgroup.New(ctx)
+				for i := 0; i < searchConcurrency; i++ {
+					eg.Go(func() (e error) {
+						ierr := searchFunc(egctx)
+						if ierr != nil {
+							st, ok := status.FromError(ierr)
+							if ok && st.Code() == codes.DeadlineExceeded {
+								_, _, rerr := status.ParseError(ierr, codes.DeadlineExceeded, "an error occurred")
+								mu.Lock()
+								e = errors.Join(e, rerr)
+								mu.Unlock()
+							}
+						}
+						return
+					})
+				}
+				egerr := eg.Wait()
+				mu.Lock()
+				serr = errors.Join(serr, egerr)
+				mu.Unlock()
+				time.Sleep(10 * time.Second)
+			}
+		}
+	}()


🛠️ Refactor suggestion

Extract common concurrent search pattern.

The concurrent search pattern is duplicated across multiple tests. Consider extracting it into a reusable helper.

+type concurrentSearchConfig struct { + ctx context.Context + t *testing.T + searchFunc func(context.Context) error + concurrency int + sleepDuration time.Duration +} + +func runConcurrentSearch(cfg concurrentSearchConfig) (chan struct{}, *sync.WaitGroup, error) { + var ( + wg sync.WaitGroup + mu sync.Mutex + serr error + ) + done := make(chan struct{}) + + wg.Add(1) + go func() { + defer wg.Done() + for { + select { + case <-done: + return + default: + eg, egctx := errgroup.New(cfg.ctx) + for i := 0; i < cfg.concurrency; i++ { + eg.Go(func() (e error) { + if ierr := cfg.searchFunc(egctx); ierr != nil { + st, ok := status.FromError(ierr) + if ok && st.Code() == codes.DeadlineExceeded { + _, _, rerr := status.ParseError(ierr, codes.DeadlineExceeded, "an error occurred") + mu.Lock() + e = errors.Join(e, rerr) + mu.Unlock() + } + } + return + }) + } + egerr := eg.Wait() + mu.Lock() + serr = errors.Join(serr, egerr) + mu.Unlock() + time.Sleep(cfg.sleepDuration) + } + } + }() + + return done, &wg, serr +}

Also applies to: 1159-1194, 1269-1306

coderabbitai · 2025-02-05T03:21:51Z

tests/e2e/crud/crud_test.go

+	close(done)
+	wg.Done()
+	mu.Lock()


⚠️ Potential issue

Fix incorrect wait group usage.

The code calls wg.Done() after close(done), but this is incorrect since the wait group is used to wait for the goroutine to finish.

- close(done) - wg.Done() + close(done) + wg.Wait()

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

close(done)

wg.Done()

mu.Lock()

close(done)

wg.Wait()

mu.Lock()

coderabbitai · 2025-02-05T03:21:51Z

tests/e2e/crud/crud_test.go

+	wg := sync.WaitGroup{}
+	wg.Add(1)
+	mu := sync.Mutex{}
+	var serr error
+	wg.Add(1)
+	done := make(chan struct{})


⚠️ Potential issue

Fix duplicate wg.Add(1) call.

The wg.Add(1) is called twice, which will cause the wait group to wait for an extra goroutine that doesn't exist.

- wg := sync.WaitGroup{} - wg.Add(1) - mu := sync.Mutex{} - var serr error - wg.Add(1) + var ( + wg sync.WaitGroup + mu sync.Mutex + serr error + ) + wg.Add(1)

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

wg := sync.WaitGroup{}

wg.Add(1)

mu := sync.Mutex{}

var serr error

wg.Add(1)

done := make(chan struct{})

var (

wg sync.WaitGroup

mu sync.Mutex

serr error

)

wg.Add(1)

done := make(chan struct{})

github-actions bot added area/makefile size/L type/test Adding tests labels Jan 10, 2025

github-actions bot added the priority/low label Jan 10, 2025

vankichi force-pushed the test/e2e/add-rolloutrestart-agent-test branch from 4ae3d97 to 9985751 Compare January 10, 2025 15:45

vankichi added the actions/e2e-deploy label Jan 14, 2025

github-actions bot added the type/ci label Jan 14, 2025

vankichi force-pushed the test/e2e/add-rolloutrestart-agent-test branch from 96c2ce3 to 00434e0 Compare January 14, 2025 10:55

vankichi added actions/e2e-deploy and removed actions/e2e-deploy labels Jan 14, 2025

vankichi force-pushed the test/e2e/add-rolloutrestart-agent-test branch from 573283f to bd747ff Compare January 14, 2025 14:46

vankichi added actions/e2e-deploy and removed actions/e2e-deploy labels Jan 14, 2025

vankichi requested review from kmrmt and kpango January 15, 2025 02:39

vankichi changed the title ~~[WIP] Add rollout restart agent e2e test~~ Add rollout restart agent e2e test Jan 15, 2025

coderabbitai bot requested changes Jan 15, 2025

View reviewed changes

.github/helm/values/values-rollout-agent.yaml Outdated Show resolved Hide resolved

coderabbitai bot previously approved these changes Jan 15, 2025

View reviewed changes

vankichi dismissed coderabbitai[bot]’s stale review via 3df88e8 January 15, 2025 06:45

coderabbitai bot reviewed Jan 15, 2025

View reviewed changes

coderabbitai bot previously approved these changes Jan 15, 2025

View reviewed changes

vankichi dismissed coderabbitai[bot]’s stale review via be7b4de January 15, 2025 07:29

coderabbitai bot previously approved these changes Jan 15, 2025

View reviewed changes

vankichi dismissed coderabbitai[bot]’s stale review via fdc3b87 January 16, 2025 04:57

coderabbitai bot reviewed Jan 16, 2025

View reviewed changes

github-actions bot added the size/XL label Jan 28, 2025

coderabbitai bot requested changes Jan 28, 2025

View reviewed changes

github-actions bot removed the size/L label Jan 28, 2025

github-actions bot mentioned this pull request Feb 1, 2025

Monthly review time metrics report: 2025-01-01..2025-01-31 #2827

Open

vankichi force-pushed the test/e2e/add-rolloutrestart-agent-test branch from 40a3ccc to 8d9e061 Compare February 3, 2025 11:37

coderabbitai bot reviewed Feb 3, 2025

View reviewed changes

vankichi force-pushed the test/e2e/add-rolloutrestart-agent-test branch from 8d9e061 to 67be97f Compare February 4, 2025 04:38

github-actions bot reviewed Feb 4, 2025

View reviewed changes

coderabbitai bot requested changes Feb 4, 2025

View reviewed changes

vankichi and others added 16 commits February 4, 2025 16:17

✅ Add rollout restart agent e2e test

6ac7695

Signed-off-by: vankichi <[email protected]>

♻️ Fix logic

a578f21

Signed-off-by: vankichi <[email protected]>

✅ Add e2e

705d4b3

Signed-off-by: vankichi <[email protected]>

✅ Fix CI Err

c51dfe8

Signed-off-by: vankichi <[email protected]>

✅ Fix

2fc5b82

Signed-off-by: vankichi <[email protected]>

♻️ Fix format

4fdc976

Signed-off-by: vankichi <[email protected]>

♻️ Fix race condition

136732d

Signed-off-by: vankichi <[email protected]>

♻️ Fix race condition

2ab6b29

Signed-off-by: vankichi <[email protected]>

♻️ Fix race condition v2

3a2f760

Signed-off-by: vankichi <[email protected]>

♻️ Fix race condition

2636b8b

Signed-off-by: vankichi <[email protected]>

fix

ceb7fad

Signed-off-by: kpango <[email protected]>

fix

4b428e7

Signed-off-by: kpango <[email protected]>

♻️ fix

a1f63bc

Signed-off-by: vankichi <[email protected]>

♻️ add high multisearch e2e

d811af2

Signed-off-by: vankichi <[email protected]>

:rerycle: Add Multi test

b09d78b

Signed-off-by: vankichi <[email protected]>

Fix

6a61e26

Signed-off-by: vankichi <[email protected]>

vankichi force-pushed the test/e2e/add-rolloutrestart-agent-test branch from 67be97f to 6a61e26 Compare February 4, 2025 07:18

coderabbitai bot reviewed Feb 4, 2025

View reviewed changes

♻️ Add unary

caa6b5a

Signed-off-by: vankichi <[email protected]>

coderabbitai bot requested changes Feb 5, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add rollout restart agent e2e test #2799

Add rollout restart agent e2e test #2799

vankichi commented Jan 10, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Jan 10, 2025 •

edited

Loading

Walkthrough

Changes

Suggested Labels

Suggested Reviewers

Chat

CodeRabbit Commands (Invoked using PR comments)

Other keywords and placeholders

CodeRabbit Configuration File (`.coderabbit.yaml`)

Documentation and Community

vdaas-ci commented Jan 10, 2025

cloudflare-workers-and-pages bot commented Jan 10, 2025 •

edited

Loading

vankichi commented Jan 15, 2025

coderabbitai bot commented Jan 15, 2025

coderabbitai bot left a comment

coderabbitai bot left a comment

coderabbitai bot left a comment

coderabbitai bot left a comment

coderabbitai bot Jan 28, 2025

coderabbitai bot Jan 28, 2025

coderabbitai bot Jan 28, 2025

coderabbitai bot left a comment

coderabbitai bot Jan 28, 2025

coderabbitai bot Jan 28, 2025

coderabbitai bot left a comment

github-actions bot Feb 4, 2025

coderabbitai bot left a comment

coderabbitai bot Feb 4, 2025

coderabbitai bot left a comment

coderabbitai bot left a comment

coderabbitai bot Feb 5, 2025

coderabbitai bot Feb 5, 2025

coderabbitai bot Feb 5, 2025

Add rollout restart agent e2e test #2799

Are you sure you want to change the base?

Add rollout restart agent e2e test #2799

Conversation

vankichi commented Jan 10, 2025 • edited by coderabbitai bot Loading

Description

Related Issue

Versions

Checklist

Special notes for your reviewer

Summary by CodeRabbit

Release Notes

coderabbitai bot commented Jan 10, 2025 • edited Loading

Walkthrough

Changes

Suggested Labels

Suggested Reviewers

Chat

CodeRabbit Commands (Invoked using PR comments)

Other keywords and placeholders

CodeRabbit Configuration File (.coderabbit.yaml)

Documentation and Community

vdaas-ci commented Jan 10, 2025

cloudflare-workers-and-pages bot commented Jan 10, 2025 • edited Loading

Deploying vald with Cloudflare Pages

vankichi commented Jan 15, 2025

coderabbitai bot commented Jan 15, 2025

coderabbitai bot left a comment

Choose a reason for hiding this comment

coderabbitai bot left a comment

Choose a reason for hiding this comment

coderabbitai bot left a comment

Choose a reason for hiding this comment

coderabbitai bot left a comment

Choose a reason for hiding this comment

coderabbitai bot Jan 28, 2025

Choose a reason for hiding this comment

coderabbitai bot Jan 28, 2025

Choose a reason for hiding this comment

coderabbitai bot Jan 28, 2025

Choose a reason for hiding this comment

coderabbitai bot left a comment

Choose a reason for hiding this comment

coderabbitai bot Jan 28, 2025

Choose a reason for hiding this comment

coderabbitai bot Jan 28, 2025

Choose a reason for hiding this comment

coderabbitai bot left a comment

Choose a reason for hiding this comment

github-actions bot Feb 4, 2025

Choose a reason for hiding this comment

coderabbitai bot left a comment

Choose a reason for hiding this comment

coderabbitai bot Feb 4, 2025

Choose a reason for hiding this comment

coderabbitai bot left a comment

Choose a reason for hiding this comment

coderabbitai bot left a comment

Choose a reason for hiding this comment

coderabbitai bot Feb 5, 2025

Choose a reason for hiding this comment

coderabbitai bot Feb 5, 2025

Choose a reason for hiding this comment

coderabbitai bot Feb 5, 2025

Choose a reason for hiding this comment

vankichi commented Jan 10, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Jan 10, 2025 •

edited

Loading

CodeRabbit Configuration File (`.coderabbit.yaml`)

cloudflare-workers-and-pages bot commented Jan 10, 2025 •

edited

Loading