-
Notifications
You must be signed in to change notification settings - Fork 41
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update trace_log.md #106
Open
TheraniAA
wants to merge
1
commit into
main
Choose a base branch
from
ashwini-ahire7-patch-3
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Update trace_log.md #106
Changes from all commits
Commits
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -8,25 +8,25 @@ description: >- | |
|
||
## Collecting query execution flamegraph using system.trace_log | ||
|
||
ClickHouse® has embedded functionality to analyze the details of query performance. | ||
ClickHouse® provides an embedded functionality to analyze query performance in detail through the system.trace_log table. | ||
|
||
It's `system.trace_log` table. | ||
By default, this table collects information about queries that run longer than 1 second, capturing stack traces every second. However, you can customize the trace collection using the following settings: | ||
|
||
By default it collects information only about queries when runs longer than 1 sec (and collects stacktraces every second). | ||
query_profiler_real_time_period_ns: Adjusts the interval for collecting stack traces based on real time. | ||
query_profiler_cpu_time_period_ns: Adjusts the interval based on CPU time. | ||
Both settings work similarly by dumping stack traces of all threads running the query at the specified intervals. The real-time profiler helps in identifying situations where CPU usage is low but significant time is spent on I/O, while the CPU timer pinpoints hot spots in calculations more accurately by skipping I/O time. | ||
|
||
You can adjust that per query using settings `query_profiler_real_time_period_ns` & `query_profiler_cpu_time_period_ns`. | ||
Note: Attempting to collect stack traces at frequencies higher than a few KHz is typically not feasible. | ||
|
||
Both works very similar (with desired interval dump the stacktraces of all the threads which execute the query). | ||
real timer - allows to 'see' the situations when cpu was not working much, but time was spend for example on IO. | ||
cpu timer - allows to see the 'hot' points in calculations more accurately (skip the io time). | ||
|
||
Trying to collect stacktraces with a frequency higher than few KHz is usually not possible. | ||
|
||
To check where most of the RAM is used you can collect stacktraces during memory allocations / deallocation, by using the | ||
setting `memory_profiler_sample_probability`. | ||
#### Memory Profiling | ||
To examine RAM usage, stack traces can be collected during memory allocations and deallocations using the memory_profiler_sample_probability setting. | ||
|
||
## Tools for Analyzing Stack Traces | ||
|
||
### clickhouse-speedscope | ||
This tool processes the collected stack traces for analysis. | ||
|
||
Installation: | ||
|
||
```bash | ||
# install | ||
|
@@ -37,56 +37,78 @@ pip3 install -r requirements.txt | |
``` | ||
|
||
For debugging particular query: | ||
``` | ||
1. In the clickhouse-client, set your desired profiler settings: | ||
```sql | ||
clickhouse-client | ||
|
||
SET query_profiler_cpu_time_period_ns=1000000; -- 1000 times per 'cpu' sec | ||
-- or SET query_profiler_real_time_period_ns=2000000; -- 500 times per 'real' sec. | ||
-- or SET memory_profiler_sample_probability=0.1; -- to debug the memory allocations | ||
|
||
-- or | ||
SET query_profiler_real_time_period_ns=2000000; -- 500 times per 'real' sec. | ||
-- or | ||
SET memory_profiler_sample_probability=0.1; -- to debug the memory allocations | ||
``` | ||
2. Execute your query: | ||
```sql | ||
SELECT ... <your select> | ||
|
||
``` | ||
3. Flush the logs: | ||
```sql | ||
SYSTEM FLUSH LOGS; | ||
|
||
-- get the query_id from the clickhouse-client output or from system.query_log (also pay attention on query_id vs initial_query_id for distributed queries). | ||
``` | ||
4. Retrieve the query_id from either the clickhouse-client output or from system.query_log (note the difference between query_id and initial_query_id for distributed queries). | ||
5. Process the stack traces: | ||
|
||
Now let's process that: | ||
``` | ||
```bash | ||
python3 main.py & # start the proxy in background | ||
python3 main.py --query-id 908952ee-71a8-48a4-84d5-f4db92d45a5d # process the stacktraces | ||
fg # get the proxy from background | ||
Ctrl + C # stop it. | ||
``` | ||
|
||
To access ClickHouse with other username / password etc. - see the sources of https://github.com/laplab/clickhouse-speedscope/blob/master/main.py | ||
|
||
For additional configuration options, refer to the source of the tool: clickhouse-speedscope https://github.com/laplab/clickhouse-speedscope/blob/master/main.py | ||
|
||
### clickhouse-flamegraph | ||
|
||
Installation & usage instructions: https://github.com/Slach/clickhouse-flamegraph | ||
For more advanced visualizations, clickhouse-flamegraph can be used. Installation and usage instructions are available on its GitHub page https://github.com/Slach/clickhouse-flamegraph | ||
|
||
### pure flamegraph.pl examples | ||
Example for Generating a Flamegraph: | ||
|
||
``` | ||
1. Install FlameGraph: | ||
```bash | ||
git clone https://github.com/brendangregg/FlameGraph /opt/flamegraph | ||
|
||
``` | ||
2. Generate a flamegraph from the system.trace_log: | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. this is not the second step with clickhouse-flamegraph, this is different approach |
||
```bash | ||
clickhouse-client -q "SELECT arrayStringConcat(arrayReverse(arrayMap(x -> concat( addressToLine(x), '#', demangle(addressToSymbol(x)) ), trace)), ';') AS stack, count() AS samples FROM system.trace_log WHERE event_time >= subtractMinutes(now(),10) GROUP BY trace FORMAT TabSeparated" | /opt/flamegraph/flamegraph.pl > flamegraph.svg | ||
|
||
clickhouse-client -q "SELECT arrayStringConcat((arrayMap(x -> concat(splitByChar('/', addressToLine(x))[-1], '#', demangle(addressToSymbol(x)) ), trace)), ';') AS stack, sum(abs(size)) AS samples FROM system.trace_log where trace_type = 'Memory' and event_date = today() group by trace order by samples desc FORMAT TabSeparated" | /opt/flamegraph/flamegraph.pl > allocs.svg | ||
``` | ||
3. For memory allocation analysis: | ||
```bash | ||
clickhouse-client -q "SELECT arrayStringConcat(arrayReverse(arrayMap(x -> concat(splitByChar('/', addressToLine(x))[-1], '#', demangle(addressToSymbol(x)) ), trace)), ';') AS stack, count() AS samples FROM system.trace_log where trace_type = 'Memory' group by trace FORMAT TabSeparated SETTINGS allow_introspection_functions=1" | /opt/flamegraph/flamegraph.pl > ~/mem1.svg | ||
``` | ||
|
||
### similar using perf | ||
|
||
``` | ||
|
||
### Using perf | ||
|
||
Using perf for System-Wide Profiling | ||
perf can be used for system-wide profiling, particularly for capturing CPU usage: | ||
|
||
1. Install necessary tools: | ||
```bash | ||
apt-get update -y | ||
apt-get install -y linux-tools-common linux-tools-generic linux-tools-`uname -r`git | ||
apt-get install -y clickhouse-common-static-dbg clickhouse-common-dbg | ||
mkdir -p /opt/flamegraph | ||
git clone https://github.com/brendangregg/FlameGraph /opt/flamegraph | ||
|
||
``` | ||
2. Record the performance data: | ||
```bash | ||
perf record -F 99 -p $(pidof clickhouse) -G | ||
``` | ||
3. Process the performance data: | ||
|
||
```bash | ||
perf script > /tmp/out.perf | ||
/opt/flamegraph/stackcollapse-perf.pl /tmp/out.perf | /opt/flamegraph/flamegraph.pl > /tmp/flamegraph.svg | ||
``` | ||
|
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is now also a global profiler, and obe more setting for memory - collecting peaks, and step