Skip to content
This repository has been archived by the owner on Jan 6, 2025. It is now read-only.

Implement graceful termination #39

Open
wants to merge 1 commit into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions config/ssh.go
Original file line number Diff line number Diff line change
Expand Up @@ -43,6 +43,10 @@ type SSHConfig struct {
// allowed to be sent without a response being received. If this number
// is exceeded the connection is considered dead
ClientAliveCountMax int `json:"clientAliveCountMax" yaml:"clientAliveCountMax" default:"3" comment:"Maximum number of failed keepalives"`
// GracefulTerminationDeadline is the amount of time ContainerSSH will
// wait after receiving a SIGTERM for all the clients to disconnect.
// After this duraction all connections are forcefully terminated.
GracefulTerminationDeadline time.Duration `json:"gracefulTerminationDeadline" yaml:"gracefulTerminationDeadline" default:"0"`
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ContainerSSH will not always run on SSH only, we plan a web client too. This deadline should not be tied to SSH.

}

// GenerateHostKey generates a random host key and adds it to SSHConfig
Expand Down
4 changes: 3 additions & 1 deletion internal/sshserver/serverImpl.go
Original file line number Diff line number Diff line change
Expand Up @@ -86,12 +86,13 @@ func (s *serverImpl) RunWithLifecycle(lifecycle service.Lifecycle) error {
s.wg.Add(1)
go s.handleConnection(tcpConn)
}

lifecycle.Stopping()
s.shuttingDown = true
allClientsExited := make(chan struct{})
shutdownHandlerExited := make(chan struct{}, 1)
s.disconnectClients(lifecycle, allClientsExited)
go s.shutdownHandlers.Shutdown(lifecycle.ShutdownContext())
go s.disconnectClients(lifecycle, allClientsExited)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this does not work, this is actually a bug. The shutdown handlers should get a chance to run before the clients are disconnected forcefully.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems to work the way it's intended currently: disconnectClients() waits until the context timeout (hardcoded 30 seconds currently in main) and then force-disconnects all clients. All this PR does is make this delay configurable.

I delay the shutdown handler until all clients are disconnected because running them means shutting down things like the metrics server which may be needed if the delay is too long, we want to know what happened during that time.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the future it may be beneficial to add a message when a shutdown is triggered prompting all users to re-connect.

go s.shutdownHandler(lifecycle, shutdownHandlerExited)

s.wg.Wait()
Expand Down Expand Up @@ -322,6 +323,7 @@ func (s *serverImpl) createConfiguration(
ServerVersion: s.cfg.ServerVersion.String(),
BannerCallback: func(conn ssh.ConnMetadata) string { return s.cfg.Banner },
}

for _, key := range s.hostKeys {
serverConfig.AddHostKey(key)
}
Expand Down
9 changes: 6 additions & 3 deletions main.go
Original file line number Diff line number Diff line change
Expand Up @@ -182,10 +182,10 @@ func startServices(cfg config.AppConfig, loggerFactory log.LoggerFactory) error
return err
}

return startPool(pool, lifecycle)
return startPool(pool, lifecycle, cfg)
}

func startPool(pool Service, lifecycle service.Lifecycle) error {
func startPool(pool Service, lifecycle service.Lifecycle, cfg config.AppConfig) error {
starting := make(chan struct{})
lifecycle.OnStarting(
func(s service.Service, l service.Lifecycle) {
Expand All @@ -204,12 +204,15 @@ func startPool(pool Service, lifecycle service.Lifecycle) error {
rotateSignals := make(chan os.Signal, 1)
signal.Notify(exitSignals, exitSignalList...)
signal.Notify(rotateSignals, rotateSignalList...)

deadline := cfg.SSH.GracefulTerminationDeadline + 5 * time.Second
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

5 seconds plus is a very arbitrary number, we should come up with something better.


go func() {
if _, ok := <-exitSignals; ok {
// ok means the channel wasn't closed
shutdownContext, cancelFunc := context.WithTimeout(
context.Background(),
20*time.Second,
deadline,
)
defer cancelFunc()
lifecycle.Stop(
Expand Down