Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve IP Monitor documentation #675

Merged
merged 1 commit into from
Oct 22, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 3 additions & 1 deletion .github/workflows/trivy.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -97,4 +97,6 @@ jobs:

- name: Image Scan
shell: bash
run: make trivy-scan
run: |
echo "${{ secrets.GITHUB_TOKEN }}" | docker login ghcr.io -u $ --password-stdin
make trivy-scan
41 changes: 37 additions & 4 deletions docs/coherence/090_ipmonitor.adoc
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
///////////////////////////////////////////////////////////////////////////////

Copyright (c) 2021, Oracle and/or its affiliates.
Copyright (c) 2021, 2024, Oracle and/or its affiliates.
Licensed under the Universal Permissive License v 1.0 as shown at
http://oss.oracle.com/licenses/upl.

Expand All @@ -10,9 +10,41 @@

== Coherence IPMonitor

The Coherence IPMonitor is a failure detection mechanism used by Coherence to detect machine failures. It does this by pinging the echo port, (port 7) on remote hosts that other cluster members are running on. When running in Kubernetes, every Pod has its own IP address, so it looks to Coherence like every member is on a different host. Failure detection using IPMonitor is less useful in Kubernetes than it is on physical machines or VMs, so the Operator disables the IPMonitor by default. This is configurable though and if it is felt that using IPMonitor is useful to an application, it can be re-enabled.
The Coherence IPMonitor is a failure detection mechanism used by Coherence to detect machine failures.
It does this by pinging the echo port, (port 7) on remote hosts that other cluster members are running on.
When running in Kubernetes, every Pod has its own IP address, so it looks to Coherence like every member is on a different host.
Failure detection using IPMonitor is less useful in Kubernetes than it is on physical machines or VMs, so the Operator disables
the IPMonitor by default. This is configurable though and if it is felt that using IPMonitor is useful to an application,
it can be re-enabled.

To re-enable IPMonitor set the boolean flag `enableIpMonitor` in the `coherence` section of the Coherence resource yaml:
=== Coherence Warning Message

Disabling IP Monitor causes Coherence to print a warning in the logs similar to the one shown below.
This can be ignored when using the Operator.

[source]
----
2024-07-01 14:43:55.410/3.785 Oracle Coherence GE 14.1.1.2206.10 (dev-jonathanknight) <Warning> (thread=Coherence, member=n/a): IPMonitor has been explicitly disabled, this is not a recommended practice and will result in a minimum death detection time of 300 seconds for failed machines or networks.
----

=== Re-Enable the IP Monitor

To re-enable IPMonitor set the boolean flag `enableIpMonitor` in the `coherence` section of the Coherence resource yaml.

[CAUTION]
====
The Coherence IP Monitor works by using Java's `INetAddress.isReachable()` method to "ping" another cluster member's IP address.
Under the covers the JDK will use an ICMP echo request to port 7 of the server. This can fail if port 7 is blocked,
for example using firewalls, or in Kubernetes using Network Policies or tools such as Istio.
In particular when using Network Policies it is impossible to open a port for ICMP as currently Network Policies
only support TCP or UDP and not ICMP.

If the Coherence IP Monitor is enabled in a Kubernetes cluster where port 7 is blocked then the cluster will fail to start.
Typically, the issue will be seen as one member will start and become the senior member. None of the other cluster members
will be abe to get IP Monitor to connect to the senior member, so they wil fail to start.
====

The yaml below shows an example of re-enabling the IP Monitor.

[source,yaml]
.coherence-storage.yaml
Expand All @@ -26,4 +58,5 @@ spec:
enableIpMonitor: true
----

Setting `enableIpMonitor` will disable the IPMonitor, which is the default behaviour when `enableIpMonitor` is not specified in the yaml.
Setting `enableIpMonitor` field to `false` will disable the IPMonitor, which is the default behaviour when `enableIpMonitor` is
not specified in the yaml.
15 changes: 15 additions & 0 deletions docs/troubleshooting/01_trouble-shooting.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -35,6 +35,8 @@ This page will be updated and maintained over time to include common issues we s

* <<arm-java8, I'm using Arm64 and Java 8 and the JVM will not start due to using G1GC>>

* <<ipmon, Why do I see warnings about IPMonitor being disabled when Coherence starts>>

== Issues

[#no-operator]
Expand Down Expand Up @@ -250,3 +252,16 @@ This will cause errors on Arm64 Java 8 JMS unless the JVM option `-XX:+UnlockExp
added in the Coherence resource spec (see <<docs/jvm/030_jvm_args.adoc,Adding Arbitrary JVM Arguments>>).
Alternatively specify a different garbage collector, ideally on a version of Java this old, use CMS
(see <<docs/jvm/040_gc.adoc,Garbage Collector Settings>>).

[#ipmon]
=== Why do I see warnings about IPMonitor being disabled when Coherence starts

When Coherence starts a message similar to the following is displayed in the Coherence container's log:

[source]
----
2024-07-01 14:43:55.410/3.785 Oracle Coherence GE 14.1.1.2206.10 (dev-jonathanknight) <Warning> (thread=Coherence, member=n/a): IPMonitor has been explicitly disabled, this is not a recommended practice and will result in a minimum death detection time of 300 seconds for failed machines or networks.
----

This message is because the default behaviour of the Operator is to disable the Coherence IP Monitor,
see the <<_coherence_operator_api_docs/coherence/090_ipmonitor.adoc,IP Monitor documentation>> for an explanation.
Loading