Skip to content

Commit

Permalink
New Go version performance article (2024 edition)
Browse files Browse the repository at this point in the history
  • Loading branch information
benhoyt committed Apr 13, 2024
1 parent 35333d7 commit 1e47722
Show file tree
Hide file tree
Showing 5 changed files with 89 additions and 1 deletion.
Binary file added images/goawk-speed-2024.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
79 changes: 79 additions & 0 deletions writings/go-version-performance-2024.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,79 @@
---
layout: default
title: "Go performance from version 1.0 to 1.22"
permalink: /writings/go-version-performance-2024/
description: "Shows how much the performance of Go has improved from version 1.0 through to 1.22 (including PGO) -- in its compiler, runtime, and libraries."
---
<h1>{{ page.title }}</h1>
<p class="subtitle">April 2024</p>


Two years ago I [compared](/writings/go-version-performance/) two different benchmarks of my [GoAWK interpreter](https://github.com/benhoyt/goawk) on all the versions of Go from 1.2 through 1.18.

In this article I re-run those benchmarks, adding the missing Go versions (1.0 and 1.1) as well as the new versions (1.19 through 1.22). I also include results with profile-guided optimisation (PGO), which was added in Go 1.20. I'll quote a fair bit from my original article so you don't have to re-read the old one to understand the setup.

There are many ways programs written in Go have gotten faster: the Go team and external contributors have improved the compiler and have optimized the runtime, garbage collector, and standard library. Here we compare GoAWK's performance when compiled using each released version of Go from 1.0 through 1.22 -- the latest at the time of writing.

I tested this by running GoAWK on two AWK programs which represent different extremes of what you can do with AWK: I/O with string processing, and number crunching.

First we have `countwords`, a string processing task that counts the frequencies of words in the input and prints out the words with their counts. This is the kind of thing that's typical for an AWK script. The input is a 10x concatenated version of the King James Bible (which I've [used before](/writings/count-words/) for performance comparisons). Here's the code:

```awk
{
for (i=1; i<=NF; i++)
counts[tolower($i)]++
}
END {
for (k in counts)
print k, counts[k]
}
```

The second program is `sumloop`, a tight loop that adds the loop counter to a variable a bunch of times. This one's not really a typical use of AWK, but makes for a good test of the GoAWK bytecode interpreter loop:

```awk
BEGIN {
for (i=0; i<10000000; i++)
sum += i+i+i+i+i
}
```

I had to tweak GoAWK's code slightly to get it to compile on older Go versions. In particular for Go 1.0, because it doesn't have [`bufio.Scanner`](https://pkg.go.dev/bufio#Scanner), and GoAWK uses that heavily. I used the Go 1.1 implementation of `bufio.Scanner` for 1.0.

The timing numbers in the chart are the time in seconds on my x86-64 Linux laptop (best of three runs). The blue line is `countwords` and the red line is `sumloop` (incidentally, I had mis-labelled the results last time). Note that **the Y axis is logarithmic** this time, in an effort to see the more subtle improvements in recent versions more clearly.

Also included on the chart are the GoAWK binary sizes for each Go version -- that's the light grey line.

Once again, I used a [Python script](https://gist.github.com/benhoyt/50eea688bc8de697218fe982488e2467) to run them all and measure the timings. Here is the chart (or [as a table](https://gist.github.com/benhoyt/c9e1db52103e88c725facce4361c8a26) if you prefer):

![GoAWK speed across Go versions](/images/goawk-speed-2024.png)

**The biggest improvements come in versions 1.3, 1.5, 1.7, and 1.12.** After that, it's very gradual speedups -- all the low-hanging fruit has long since been picked.

This time there was a **strange bump for `countwords` in Go 1.2**: it went from 7.5s in 1.1 to 25.5s in 1.2 (!), and then down to 2.8s in 1.3. This is almost certainly caused by the [stack "hot split" issue](https://docs.google.com/document/d/1wAaf1rYoM4S4gtnPh0zOlGzWtrZFQ5suE8qr2sD8uWQ/pub) which was [fixed in 1.3](https://go.dev/doc/go1.3#stacks) due to the Go team changing "the implementation of goroutine stacks away from the old, 'segmented' model to a contiguous model."

I figured out the cause of the 1.2 anomaly by profiling, and noticing that runtime stack operations made up a huge percentage of the running time. Here's the first few lines of the `pprof` output:

```
$ go tool pprof --text ./goawk_1.2 go12.prof
Total: 1830 samples
332 18.1% 18.1% 332 18.1% runtime.newstack
296 16.2% 34.3% 296 16.2% runtime.memclr
281 15.4% 49.7% 281 15.4% runtime.oldstack
222 12.1% 61.8% 619 33.8% github.com/benhoyt/goawk/interp.(*interp).execute
91 5.0% 66.8% 91 5.0% runtime.lessstack
75 4.1% 70.9% 133 7.3% github.com/benhoyt/goawk/interp.(*interp).callBuiltin
57 3.1% 74.0% 57 3.1% runtime.stackfree
53 2.9% 76.9% 81 4.4% strings.FieldsFunc
...
```

**PGO improves the performance by only a few percent,** about 2% for `countwords` and 7% for `sumloop` using Go 1.22. I compile the released GoAWK binaries with PGO.

**Binary size has remained fairly stable over the years**, apart from the large bump in 1.2. Even with PGO enabled, binaries are only about 5% bigger, so I think it's normally worth it.

Overall, `countwords` is now about 8x as fast as it would have been with Go 1.0, and `sumloop` is 24x as fast. Thanks, Go team, for all your hard work over the years!


{% include sponsor.html %}
2 changes: 1 addition & 1 deletion writings/go-version-performance.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@ There are many ways programs written in Go have gotten faster: the Go team and e

I tested by running GoAWK on two AWK programs which represent different extremes of what you can do with AWK: I/O and string processing, and number crunching.

First, `countwords`, a string processing task that counts the frequencies of words in the input and prints out the words with their counts. This is the kind of thing that's typical for an AWK script. The input is a 10x concatenated version of the King James Bible (which I've [used before](/writings/count-words/) for performance comparisons) Here's the code:
First, `countwords`, a string processing task that counts the frequencies of words in the input and prints out the words with their counts. This is the kind of thing that's typical for an AWK script. The input is a 10x concatenated version of the King James Bible (which I've [used before](/writings/count-words/) for performance comparisons). Here's the code:

```awk
{
Expand Down
1 change: 1 addition & 0 deletions writings/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,7 @@ Below are the programming-related articles I’ve written (see also my [**non-te

<h2 id="y2024">2024</h2>

* [Go version performance (2024 edition)](/writings/go-version-performance-2024/) uses GoAWK to show how the speed of Go programs has improved from version 1.0 through to 1.22. (April)
* [The One Billion Row Challenge in Go](/writings/go-1brc/) presents my nine Go solutions to this problem, which range in running time from 1m45s down to 3.4s. (March)
* [Switching from S3 to Tigris on Fly.io](/writings/flyio-and-tigris/) describes how I moved my side project from using Amazon S3 for file hosting to using Tigris. (February)
* [*Stray Pointers* interview](/writings/stray-pointers-interview/): Jim Lawless interviewed me on his podcast (this is a transcription). We discuss Forth, C, CGI, Python, Go, and AWK. (January)
Expand Down
8 changes: 8 additions & 0 deletions writings/rss.xml
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,14 @@
<ttl>3600</ttl>
<atom:link href="https://benhoyt.com/writings/rss.xml" rel="self" type="application/rss+xml" />

<item>
<title>Go performance from version 1.0 to 1.22</title>
<description>Shows how much the performance of Go has improved from version 1.0 through to 1.22 (including PGO) -- in its compiler, runtime, and libraries.</description>
<link>https://benhoyt.com/writings/writings/go-version-performance-2024/</link>
<guid>https://benhoyt.com/writings/writings/go-version-performance-2024/</guid>
<pubDate>Sat, 13 Apr 2024 18:00:00 +1200</pubDate>
</item>

<item>
<title>The One Billion Row Challenge in Go: from 1m45s to 3.4s in nine solutions</title>
<description>How I solved the One Billion Row Challenge (1BRC) in Go nine times, from a simple unoptimised version that takes 1 minute 45 seconds, to an optimised and parallelised version that takes 3.4 seconds.</description>
Expand Down

5 comments on commit 1e47722

@3052
Copy link

@3052 3052 commented on 1e47722 Apr 13, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please don't use unlabeled axis in the picture. "100 10 1" doesn't mean anything without units

@benhoyt
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@3052 the key/legend on the right has the units: seconds (s) for the time and MB for the binary size.

@benhoyt
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@3052 the text of the article also mentions the times are in seconds.

@3052
Copy link

@3052 3052 commented on 1e47722 Apr 13, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the numbers have multiple meanings? thats weird.

@benhoyt
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A little. Then again, the specific values are less interesting than the trends/deltas. If you want the absolute numbers, you can click the link to the table.

Please sign in to comment.