Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add pt-galera-log-explainer #669

Merged
merged 48 commits into from
Nov 30, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
48 commits
Select commit Hold shift + click to select a range
b4cad31
Add pt-galera-log-explainer
ylacancellera Aug 30, 2023
b054429
Improve: "not found" error handling
ylacancellera Sep 1, 2023
e03fe18
Fix: forgot to inherit conflicts
ylacancellera Sep 1, 2023
5bbb11d
Add: merge by directory, Simplified: timeline merges
ylacancellera Sep 1, 2023
7f8071c
Simplified the main loop
ylacancellera Sep 8, 2023
bbba087
Fix: crash when timeline is empty
ylacancellera Sep 12, 2023
5638387
Fix: --version formatting
ylacancellera Sep 14, 2023
6d6f303
Fix: error msg with uppercase, usage missed pt-
ylacancellera Sep 14, 2023
fea9be2
Add: regression tests
ylacancellera Sep 19, 2023
75ce300
Fix: "not found" should not be an error
ylacancellera Sep 20, 2023
0d554e9
Add: merge regression tests
ylacancellera Sep 20, 2023
6613880
Change: README to pt standard, regex-list --json
ylacancellera Sep 25, 2023
a7a4b4a
Add: --no-color regression test
ylacancellera Sep 25, 2023
3ad09d8
Update README.rst
ylacancellera Sep 25, 2023
72c55a1
Remove: extra space in --version out
ylacancellera Oct 4, 2023
94c89c2
Add: requirements in README.rst
ylacancellera Oct 4, 2023
b0477bc
Fix: don't cut node name if it's an ip
ylacancellera Oct 5, 2023
12ea803
Add: proper error handling for files
ylacancellera Oct 10, 2023
161c2af
Fix: pointer dereference if votes was missing
ylacancellera Oct 10, 2023
117d683
Change: simplify verbose mode
ylacancellera Oct 11, 2023
208708a
Add: concurrent SSTs handling
ylacancellera Oct 16, 2023
ce6ea4e
Remove: propagation of ip propagation to older hash, Add: operator re…
ylacancellera Oct 18, 2023
41a593b
Improve: main_test
ylacancellera Oct 18, 2023
f5d0ef7
Add: operator member assocations regex
ylacancellera Oct 18, 2023
478066d
Add: same ip/name limitation on README.rst
ylacancellera Oct 20, 2023
978daa6
Add: shortuuid check, new date layout found
ylacancellera Oct 20, 2023
4c2272e
Add: inconsistent vote regex corner-case
ylacancellera Oct 23, 2023
230bece
Refactoring: migrate translations to singleton
ylacancellera Oct 26, 2023
602ca34
Commenting out whois+sed, will be re-added later
ylacancellera Nov 7, 2023
14b6a77
Add: translate tests
ylacancellera Nov 7, 2023
7400fd8
Remove old comments, dead code
ylacancellera Nov 13, 2023
ecf068a
Fix: typos
ylacancellera Nov 21, 2023
2c9b853
Remove: --grep-args
ylacancellera Nov 21, 2023
17c8026
PR-669 - Add pt-galera-log-explainer percona#669
ylacancellera Nov 21, 2023
6baca93
Fix: README with -vvv, --grep-args, regex-list
ylacancellera Nov 21, 2023
923a199
Update src/go/pt-galera-log-explainer/README.rst
ylacancellera Nov 21, 2023
6cc6435
Add: parallel on 2 unit tests
ylacancellera Nov 21, 2023
304bc91
Imp: use strings.builder for conflicts
ylacancellera Nov 22, 2023
e3d48a0
Fix: missed errors, minor formatting issues
ylacancellera Nov 22, 2023
7360c2d
Rename ctx to logCtx, remove any mention of ctx
ylacancellera Nov 23, 2023
400e343
Move: chan closing in "main" func
ylacancellera Nov 24, 2023
d82556e
preallocate lastcontexts map
ylacancellera Nov 24, 2023
d4a3146
PR-669 - Add pt-galera-log-explainer
svetasmirnova Nov 30, 2023
22b9600
Merge branch '3.x' into galera-log-explainer
svetasmirnova Nov 30, 2023
e8bf46c
PR-669 - Add pt-galera-log-explainer
svetasmirnova Nov 30, 2023
6da3c9f
PR-669 - Add pt-galera-log-explainer
svetasmirnova Nov 30, 2023
fc838f2
PR-669 - Add pt-galera-log-explainer
svetasmirnova Nov 30, 2023
e7ef29a
PR-669 - Add pt-galera-log-explainer
svetasmirnova Nov 30, 2023
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
13 changes: 13 additions & 0 deletions .typos.toml
Original file line number Diff line number Diff line change
Expand Up @@ -4,11 +4,13 @@ extend-exclude = [
"sandbox/sakila.sql",
"sandbox/sakila-db/",
"sandbox/servers/",
"src/go/pt-galera-log-explainer/tests/",
"src/go/pt-secure-collect/testdata/",
"util/NaturalDocs/",
"util/pod2rst-fixed.packed",
# For now
"t/",
"**/*_test.go",
]
ignore-hidden = false

Expand All @@ -30,12 +32,15 @@ extend-ignore-re = [
]

[default.extend-identifiers]
"Bootstraping" = "Bootstraping" # typo in Galera lib
"Continuent" = "Continuent"
"END_ND_SUMMARY" = "END_ND_SUMMARY"
"END_ND_TOOLTIPS" = "END_ND_TOOLTIPS"
"EXPLAINed" = "EXPLAINed"
"FH_ND_FILE" = "FH_ND_FILE"
"INSERTs" = "INSERTs"
"IST" = "IST"
"istError" = "istError"
"JOINed" = "JOINed"
"MERCHANTIBILITY" = "MERCHANTIBILITY"
"ND" = "ND"
Expand All @@ -46,8 +51,16 @@ extend-ignore-re = [
"NDOnResize" = "NDOnResize"
"NULLable" = "NULLable"
"O_WRONLY" = "O_WRONLY"
"receving" = "receving" # typo in Galera lib
"RegexFailedToPrepareIST" = "RegexFailedToPrepareIST"
"RegexISTFailed" = "RegexISTFailed"
"RegexISTReceived" = "RegexISTReceived"
"RegexISTReceiver" = "RegexISTReceiver"
"RegexISTSender" = "RegexISTSender"
"RegexXtrabackupISTReceived" = "RegexXtrabackupISTReceived"
"START_ND_SUMMARY" = "START_ND_SUMMARY"
"START_ND_TOOLTIPS" = "START_ND_TOOLTIPS"
"TOI" = "TOI"
"UNIONed" = "UNIONed"
"UNIONs" = "UNIONs"
"xtrabackup_ist" = "xtrabackup_ist"
233 changes: 233 additions & 0 deletions docs/pt-galera-log-explainer.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,233 @@
.. _pt-galera-log-explainer:

==================================
:program:`pt-galera-log-explainer`
==================================

Filter, aggregate and summarize multiple galera logs together.
This is a toolbox to help navigating Galera logs.

Usage
=====

.. code-block:: bash

pt-galera-log-explainer [--since=] [--until=] [-vv] [--merge-by-directory] [--pxc-operator] <command> <paths ...>


Commands available
==================

list
~~~~

.. code-block:: bash

pt-galera-log-explainer [flags] list { --all | [--states] [--views] [--events] [--sst] [--applicative] } <paths ...>

List key events in chronological order from any number of nodes (sst, view changes, general errors, maintenance operations)
It will aggregates logs together by identifying them using node names, IPs and internal Galera identifiers.



It can be from a single node:

.. code-block:: bash

pt-galera-log-explainer list --all --since 2023-01-05T03:24:26.000000Z /var/log/mysql/*.log

or from multiple nodes.

.. code-block:: bash

pt-galera-log-explainer list --all *.log

You can filter by type of events

.. code-block:: bash

pt-galera-log-explainer list --sst --views *.log

..
whois
~~~~~
Find out information about nodes, using any type of info

.. code-block:: bash

pt-galera-log-explainer whois '218469b2' mysql.log
{
"input": "218469b2",
"IPs": [
"172.17.0.3"
],
"nodeNames": [
"galera-node2"
],
"hostname": "",
"nodeUUIDs:": [
"218469b2",
"259b78a0",
"fa81213d",
]
}

Using any type of information

.. code-block:: bash

pt-galera-log-explainer whois '172.17.0.3' mysql.log
pt-galera-log-explainer whois 'galera-node2' mysql.log


conflicts
~~~~~~~~~

List every replication failure votes (Galera 4)

.. code-block:: bash

pt-galera-log-explainer conflicts [--json|--yaml] *.log

ctx
~~~

Get the tool crafted context for a single log.
It will contain everything the tool extracted from the log file: version, sst information, known uuid-ip-nodename mappings, ...

.. code-block:: bash

pt-galera-log-explainer ctx mysql.log

regex-list
~~~~~~~~~~

Will print every implemented regexes:
* regex: the regex that will be used against the log files
* internalRegex: the golang regex that will be used to extract piece of information
* type: the regex group it belong to
* verbosity: the required level of verbosity to which it will be printed

.. code-block:: bash

pt-galera-log-explainer regex-list

Available flags
~~~~~~~~~~~~~~~

``-h``, ``--help``
Show help and exit.

``--no-color``
Remove every color special characters

``--since``
Only list events after this date. It will affect the regex applied to the logs.
Format: 2023-01-23T03:53:40Z (RFC3339)

``--until``
Only list events before this date. This is only implemented in the tool loop, it does not alter regexes.
Format: 2023-01-23T03:53:40Z (RFC3339)

``--merge-by-directory``
Instead of relying on extracted information, logs will be merged by their base directory
It is useful when logs are very sparse and already organized by nodes.

``-v``, ``--verbosity``
``-v``: display in the timeline every mysql info the tool used
``-vv``: internal tool debug

``--pxc-operator``
Analyze logs from Percona PXC operator.
It will prevent logs from being merged together, add operator specific regexes, and fine-tune regexes for logs taken from pt-k8s-debug-collector
Off by default because it negatively impacts performance for non-k8s setups.

``--exclude-regexes``
Remove regexes from analysis. Use 'pt-galera-log-explainer regex-list | jq .' to have the list

``--grep-cmd``
grep v3 binary command path. For Darwin systems, it could need to be set to ``ggrep``
Default: ``grep``

``--version``
Show version and exit.


Example outputs
===============

.. code-block:: bash

$ pt-galera-log-explainer list --all --no-color --since=2023-03-12T19:41:28.493046Z --until=2023-03-12T19:44:59.855491Z tests/logs/upgrade/*
identifier 172.17.0.2 node2 tests/logs/upgrade/node3.log
current path tests/logs/upgrade/node1.log tests/logs/upgrade/node2.log tests/logs/upgrade/node3.log
last known ip 172.17.0.2
last known name node2
mysql version 8.0.28

2023-03-12T19:41:28.493046Z starting(8.0.28) | |
2023-03-12T19:41:28.500789Z started(cluster) | |
2023-03-12T19:43:17.630191Z | node3 joined |
2023-03-12T19:43:17.630208Z node3 joined | |
2023-03-12T19:43:17.630221Z node2 joined | |
2023-03-12T19:43:17.630243Z | node1 joined |
2023-03-12T19:43:17.634138Z | | node2 joined
2023-03-12T19:43:17.634229Z | | node1 joined
2023-03-12T19:43:17.643210Z | PRIMARY(n=3) |
2023-03-12T19:43:17.648163Z | | PRIMARY(n=3)
2023-03-12T19:43:18.130088Z CLOSED -> OPEN | |
2023-03-12T19:43:18.130230Z PRIMARY(n=3) | |
2023-03-12T19:43:18.130916Z OPEN -> PRIMARY | |
2023-03-12T19:43:18.904410Z will receive IST(seqno:178226792) | |
2023-03-12T19:43:18.913328Z | | node1 cannot find donor
2023-03-12T19:43:18.913429Z node1 cannot find donor | |
2023-03-12T19:43:18.913565Z | node1 cannot find donor |
2023-03-12T19:43:19.914122Z | | node1 cannot find donor
2023-03-12T19:43:19.914259Z node1 cannot find donor | |
2023-03-12T19:43:19.914362Z | node1 cannot find donor |
2023-03-12T19:43:20.914957Z | | (repeated x97)node1 cannot find donor
2023-03-12T19:43:20.915143Z (repeated x97)node1 cannot find donor | |
2023-03-12T19:43:20.915262Z | (repeated x97)node1 cannot find donor |
2023-03-12T19:44:58.999603Z | | node1 cannot find donor
2023-03-12T19:44:58.999791Z node1 cannot find donor | |
2023-03-12T19:44:58.999891Z | node1 cannot find donor |
2023-03-12T19:44:59.817822Z timeout from donor in gtid/keyring stage | |
2023-03-12T19:44:59.839692Z SST error | |
2023-03-12T19:44:59.840669Z | | node2 joined
2023-03-12T19:44:59.840745Z | | node1 left
2023-03-12T19:44:59.840933Z | node3 joined |
2023-03-12T19:44:59.841034Z | node1 left |
2023-03-12T19:44:59.841189Z NON-PRIMARY(n=1) | |
2023-03-12T19:44:59.841292Z PRIMARY -> OPEN | |
2023-03-12T19:44:59.841352Z OPEN -> CLOSED | |
2023-03-12T19:44:59.841515Z terminated | |
2023-03-12T19:44:59.841529Z former SST cancelled | |
2023-03-12T19:44:59.848349Z | | node1 left
2023-03-12T19:44:59.848409Z | | PRIMARY(n=2)
2023-03-12T19:44:59.855443Z | node1 left |
2023-03-12T19:44:59.855491Z | PRIMARY(n=2) |

Requirements
============

grep, version 3
On Darwin based OS, grep is only version 2 due to license limitations. --grep-cmd can be used to point the correct grep binary, usually ggrep


Compatibility
=============

* Percona XtraDB Cluster: 5.5 to 8.0
* MariaDB Galera Cluster: 10.0 to 10.6
* logs from PXC operator pods (error.log, recovery.log, post.processing.log)

Known issues
============

* Nodes sharing the same ip, or nodes with identical names are not supported
* Sparse files identification can be missed, resulting in many columns displayed. ``--merge-by-directory`` can be used, but files need to be organized already in separate directories
This is mainly when the log file does not contain enough information.
* Some information will seems missed. Depending on the case, it may be simply unimplemented yet, or it was disabled later because it was found to be unreliable (node index numbers are not reliable for example)
* Columns width are sometimes too large to be easily readable. This usually happens when printing SST events with long node names
* Using ``list`` on PXC operator logs can silently lead to broken results, ``--pxc-operator`` should be used
* When some display corner-cases seems broken (events not deduplicated, ...), it is because of extra hidden internal events.
11 changes: 8 additions & 3 deletions go.mod
Original file line number Diff line number Diff line change
Expand Up @@ -4,11 +4,14 @@ go 1.21

require (
github.com/AlekSi/pointer v1.2.0
github.com/Ladicle/tabwriter v1.0.0
github.com/Masterminds/semver v1.5.0
github.com/alecthomas/kingpin v2.2.6+incompatible
github.com/alecthomas/kong v0.8.1
github.com/davecgh/go-spew v1.1.1
github.com/go-ini/ini v1.67.0
github.com/golang/mock v1.6.0
github.com/google/go-cmp v0.5.9
github.com/google/uuid v1.4.0
github.com/hashicorp/go-version v1.6.0
github.com/howeyc/gopass v0.0.0-20210920133722-c8aef6fb66ef
Expand All @@ -18,27 +21,31 @@ require (
github.com/pborman/getopt v1.1.0
github.com/percona/go-mysql v0.0.0-20210427141028-73d29c6da78c
github.com/pkg/errors v0.9.1
github.com/rs/zerolog v1.30.0
github.com/shirou/gopsutil v3.21.11+incompatible
github.com/sirupsen/logrus v1.9.3
github.com/stretchr/testify v1.8.4
go.mongodb.org/mongo-driver v1.13.0
golang.org/x/crypto v0.15.0
golang.org/x/exp v0.0.0-20230321023759-10a507213a29
gopkg.in/mgo.v2 v2.0.0-20190816093944-a6b53ec6cb22
gopkg.in/yaml.v2 v2.4.0
k8s.io/api v0.28.4
k8s.io/utils v0.0.0-20230406110748-d93618cff8a2
)

require (
github.com/alecthomas/template v0.0.0-20190718012654-fb15b899a751 // indirect
github.com/alecthomas/units v0.0.0-20211218093645-b94a6e3cc137 // indirect
github.com/davecgh/go-spew v1.1.1 // indirect
github.com/go-logr/logr v1.2.4 // indirect
github.com/go-ole/go-ole v1.2.6 // indirect
github.com/gogo/protobuf v1.3.2 // indirect
github.com/golang/snappy v0.0.4 // indirect
github.com/google/gofuzz v1.2.0 // indirect
github.com/json-iterator/go v1.1.12 // indirect
github.com/klauspost/compress v1.16.3 // indirect
github.com/mattn/go-colorable v0.1.12 // indirect
github.com/mattn/go-isatty v0.0.14 // indirect
github.com/modern-go/concurrent v0.0.0-20180306012644-bacd9c7ef1dd // indirect
github.com/modern-go/reflect2 v1.0.2 // indirect
github.com/pmezard/go-difflib v1.0.0 // indirect
Expand All @@ -55,11 +62,9 @@ require (
golang.org/x/term v0.14.0 // indirect
golang.org/x/text v0.14.0 // indirect
gopkg.in/inf.v0 v0.9.1 // indirect
gopkg.in/yaml.v2 v2.4.0 // indirect
gopkg.in/yaml.v3 v3.0.1 // indirect
k8s.io/apimachinery v0.28.4 // indirect
k8s.io/klog/v2 v2.100.1 // indirect
k8s.io/utils v0.0.0-20230406110748-d93618cff8a2 // indirect
sigs.k8s.io/json v0.0.0-20221116044647-bc3834ca7abd // indirect
sigs.k8s.io/structured-merge-diff/v4 v4.2.3 // indirect
)
Loading