Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Stabilize CI for dashboards repo #1624

Closed
2 tasks
peternied opened this issue Oct 19, 2023 · 4 comments
Closed
2 tasks

Stabilize CI for dashboards repo #1624

peternied opened this issue Oct 19, 2023 · 4 comments
Assignees
Labels

Comments

@peternied
Copy link
Member

peternied commented Oct 19, 2023

Description

I've done a couple of quick tests trying to validate how unstable the CI is, using this commit main...peternied:security-dashboards-plugin:baseline-ci-failures

Experiment Pass Rate Platform Test Run Link Code
Control (No Changes) 90% ubuntu-latest link PR
Control (No Changes) 10% windows-latest link PR
Disable windows tests 80% ubuntu-latest link PR
Disable disk threshold 80% ubuntu-latest link PR
Disable disk threshold 30% windows-latest link PR

Exit Criteria

  • Jest integration are running
  • CI is passing 10 out of 10 times when tested separately for all enabled workflows
@peternied
Copy link
Member Author

peternied commented Oct 19, 2023

@DarshitChanpura I think you are taking this issue one - I've added some details, please free free to make it your own and edit/change the description to what best represents what you are tracking.

@DarshitChanpura
Copy link
Member

Jest Integration tests are passing for the most part. Only flaky behavior is due to agentkeepalive throwing socket timeout's on random occurrences. This tells me it could be due to network latency or some other factors outside the tests.

D:\a\security-dashboards-plugin\security-dashboards-plugin\OpenSearch-Dashboards\node_modules\agentkeepalive\lib\agent.js:350
        const error = new Error('Socket timeout');
                      ^
Error: Unhandled error. (Error: Socket timeout
    at TLSSocket.onTimeout (D:\a\security-dashboards-plugin\security-dashboards-plugin\OpenSearch-Dashboards\node_modules\agentkeepalive\lib\agent.js:350:23)
    at TLSSocket.emit (node:events:525:35)
    at TLSSocket.emit (node:domain:489:12)
    at TLSSocket.Socket._onTimeout (node:net:570:8)
    at listOnTimeout (node:internal/timers:569:17)
    at processTimers (node:internal/timers:512:7) {
  code: 'ERR_SOCKET_TIMEOUT',
  timeout: 120000
})
    at new NodeError (node:internal/errors:399:5)
    at ClientRequest.emit (node:events:502:17)
    at ClientRequest.emit (node:domain:489:12)
    at TLSSocket.socketErrorListener (node:_http_client:502:9)
    at TLSSocket.emit (node:events:525:35)
    at TLSSocket.emit (node:domain:489:12)
    at emitErrorNT (node:internal/streams/destroy:151:8)
    at emitErrorCloseNT (node:internal/streams/destroy:116:3)
    at processTicksAndRejections (node:internal/process/task_queues:82:21)
    at runNextTicks (node:internal/process/task_queues:64:3)
    at processTimers (node:internal/timers:509:9)
Error: Process completed with exit code 1.

@davidlago
Copy link

Cypress tests fixes are captured here: #1599 This issue does not include those.

@RyanL1997
Copy link
Collaborator

The previous flakiness was also related to the SAML integration test cases. #1641 has addressed that. For more context please use the reference to the original PR + this comment.

And the result of both runners for 10-run after applying this fix is here: https://github.com/opensearch-project/security-dashboards-plugin/actions/runs/6815132175/job/18533579549?pr=1641

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants