move-results tar Jcf excessive? #207

bengland2 · 2016-03-30T18:04:52Z

It isn't clear to me why we have to use "tar Jcf" as part of move-results. Wouldn't "tar zcf" be sufficient and a lot faster? You still get good compression with text files using this. It really slows me down because it takes a long time and I can't start another test until move-results finishes, right?

portante · 2016-03-30T19:52:35Z

What is the time difference you are seeing in your case?

bengland2 · 2016-03-30T20:19:28Z

So it's a 20x speedup for tar on this dataset. You do take 3x as much space when using gzip compression, but most of the time the network copy time slowdown isn't as great as the slowdown from using tar Jcf compression.

Another option here is to have move-results background the compression and copy, so that you can then proceed with new test results. For example, it could rename the directories to a holding area and then fork a process to compress and copy the results. If you use "nice" then the CPU usage isn't competing with the application under test.

This isn't necessarily representative, pretty small. but try it!

[root@gprfs041-10ge pbench-agent]# echo 3 > /proc/sys/vm/drop_caches
[root@gprfs041-10ge pbench-agent]# time tar zcf /tmp/y user-benchmark__2016-03-29_18:14:11
[root@gprfs041-10ge pbench-agent]# du -s -m user-benchmark__2016-03-29_18:14:11
107 user-benchmark__2016-03-29_18:14:11
[root@gprfs041-10ge pbench-agent]# find !$ -type f | wc -l
find user-benchmark__2016-03-29_18:14:11 -type f | wc -l
442

real 0m2.994s
user 0m2.323s
sys 0m0.146s

[root@gprfs041-10ge pbench-agent]# echo 3 > /proc/sys/vm/drop_caches
[root@gprfs041-10ge pbench-agent]# time tar Jcf /tmp/x user-benchmark__2016-03-29_18:14:11

real 0m43.782s
user 0m43.578s
sys 0m0.610s

[root@gprfs041-10ge pbench-agent]# ls -l /tmp/{x,y}
-rw-r--r-- 1 root root 4506160 Mar 30 16:12 /tmp/x
-rw-r--r-- 1 root root 11623098 Mar 30 16:14 /tmp/y

portante · 2016-03-31T00:28:01Z

Classic trade-off: time vs. space. We chose to lean towards space since some of the data collected can compress even better with xz compression that 4:1.

So how are you using move-results?

    # pbench-user-benchmark -C test1 -- sleep 10
    # move-results
    # pbench-user-benchmark -C test2 -- sleep 5
    # move-results

-- OR --

    # pbench-user-benchmark -C test1 -- sleep 10
    # pbench-user-benchmark -C test2 -- sleep 5
    # move-results

atheurer · 2016-03-31T03:45:21Z

Another option here is to have move-results background the compression and copy, so that you can then proceed with new test results. For example, it could rename the directories to a holding area and then fork a process to compress and copy the results. If you use "nice" then the CPU usage isn't competing with the application under test.

There will be rioting in the streets if this were to happen :)

If we are not already doing multi-threaded compression, we should. I doubt tar has support for it, but we should be able to pipe tar to separate utility like xz --threads=0. I am curious how effective the multi-threading could be.

atheurer · 2016-03-31T03:53:33Z

I am open to using bzip if the only effect is data copied over network is bigger (we end up uncompressing it anyway on the repo, right?). But some other things to consider:

-copy data over the network as we tar/compress instead of 2 steps.
-use another way to move data like rsync IF it is much faster and easy for us to use.

portante · 2016-03-31T11:27:06Z

@atheurer, we archive the tar balls, so that will grow the archived data size. The untarred data is expendable, but I would like to keep size of the archives down as much as possible.

atheurer · 2016-03-31T14:29:10Z

@portante OK, I did not realize that but it makes sense. We could always just defer the compression until the tar file is copied to the repository so there's no time spent on compressing on the test system. I would hope we could process multiple incoming data sets on the repository at once, so we can use all of the available cpus.

I am still curious about rsync as a replacement of the scp. Not sure if it is better, but it may include built-in checksum.

k-rister · 2016-03-31T14:44:28Z

On 03/31/2016 09:29 AM, Andrew Theurer wrote:

@portante https://github.com/portante OK, I did not realize that but
it makes sense. We could always just defer the compression until the tar
file is copied to the repository so there's no time spent on compressing
on the test system. I would hope we could process multiple incoming data
sets on the repository at once, so we can use all of the available cpus.

I would expect that compressing on the test system might be more
important outside of a controlled lab environment than inside it since
there may be a lot more variability in the network throughput and
latency. This might be important to non RH users or for cloud testing.

I am still curious about rsync as a replacement of the scp. Not sure if
it is better, but it may include built-in checksum.

I would assume that rsync would underperform scp since you will probably
get better transmit rates on a single stream of a large file rather than
the multiple streams of variable size files. I think rsync is an
awesome tool, but I'm not sure it's strengths are exploited when doing a
one-time transfer.

If the upload is getting interrupted and needing to be restarted a lot
then rsync might be very interesting.

On a previous tool I used we actually went the opposite direction, for
some historic reason we transferred a file at a time using scp/sftp but
once we started doing testing in a public cloud we added support for
compressing on the test system prior to transfer and it improved
performance and reliability considerably. That tool actually had
multiple transfer methods that the user could choose from depending on
their use case...

...so what about having a default behavior (like the current one), but
allow the user to change that with a option if the situation warrants
it? It seems like the worst case scenario of such an operation would be
an archive that had mixed compression types, is that really so bad?

Karl Rister [email protected]

k-rister · 2016-03-31T14:46:39Z

On 03/30/2016 10:45 PM, Andrew Theurer wrote:

Another option here is to have move-results background the
compression and copy, so that you can then proceed with new test
results. For example, it could rename the directories to a holding
area and then fork a process to compress and copy the results. If
you use "nice" then the CPU usage isn't competing with the
application under test.
There will be rioting in the streets if this were to happen :)

If we are not already doing multi-threaded compression, we should. I
doubt tar has support for it, but we should be able to pipe tar to
separate utility like xz --threads=0. I am curious how effective the
multi-threading could be.

I can't speak to xv, but I have extensively used pbzip2 as a replacement
to bzip2 and consider it extremely effective at using available CPU
resources to significantly reduce compression time. The only problem
with it is that an archive must be compressed with it to get the
benefits when uncompressing and it's usage is not widespread.

In a situation like pbench where you control the client and server side
this really isn't a problem.

Karl Rister [email protected]

atheurer · 2016-03-31T14:48:31Z

FWIW, a xz --threads got me about a 4x speed-up on a small-ish tar file

bengland2 · 2016-03-31T15:12:08Z

about a previous comment, "rsync -ravuz" will do gzip compression on the fly, saving network bandwidth, if that is what you are concerned about. With this approach, if the copy is interrupted, it can be resumed right where it left off as long as the target directory is still there. Of course, the compression doesn't help much if there are a ton of small files. In that case, tar zcf - | wins, I think.

atheurer · 2016-03-31T15:29:56Z

So, could we use tar (no compression), then rsync -ravuz, then on the repo, compress the tar file with xz --threads?

portante · 2016-03-31T15:47:07Z

So doesn't scp also do compression? rsync is good about look at two file system trees and determining what has to be transferred, avoiding transferring data in files that is already there. In our current model, we also know we need to copy it remotely, so just using scp with compression would be fine too.

So are we saying we would not compress a tar ball at all? So a move-results would:

For each directory
- tar cf result.tar
- md5sum result.tar
- scp -C -o CompressionLevel=9 result.tar remote.host.example.com:/pbench/
- ssh remote.host.example.com md5sum /pbench/result.tar
- compare sums and declare success or failure

Then on the server side, we would have potentially huge tar balls from many clients lying around until the can be compressed and archived.

I think we could easily end up taxing CPU resources on a server, even with multiple threads, and would require a very large staging area, which I think keeps this from scaling.

@bengland2, so how are you using move-results? One at a time, or all at once?

I think we should use xz --threads on the client, and consider compressing tool data as we post-process it so that we spread the time out over the pbench-run.

Thoughts?

atheurer · 2016-03-31T15:54:34Z

For now just using xz with threads seems like a good incremental improvement.

bengland2 · 2016-03-31T15:54:58Z

I use move-results one at a time, want to see the results right away.

I think xz --threads might be good enough, have to experiment.

portante · 2016-03-31T16:00:35Z

@bengland2, you might want to consider @ndokos's new pbench-web-server RPM, which you can install on your client and immediately review your results before sending them remotely.

bengland2 · 2017-12-14T21:30:59Z

@portante , What became of this? I thought the rsync -avuz was not a bad idea since it does less heavy compression while copying over the network and defers at-rest compression until the other end.

I tried

tar cf usr-share.tar /usr/share

I chose this example because it contained a lot of text files and therefore should be similar to pbench in terms of compressibility, I thought. I saw this on a 2-socket Sandy Bridge named pcloud13.perf.lab.eng.bos.redhat.com:

threads seconds method
12 75 xz -T 12, all 12 cores around 95% CPU util
1 723 xz (default 1 thread)
1 84 gzip
1 220 bzip2

compression ratios (original size was ~2.4 GB):

method compression ratio
xz 3.5
gzip 2.5
bzip2 2.75

@atheurer Overall, gzip seems the most efficient in using the least CPU while getting most of the compression that the other tools get.

portante added the Backlog label Sep 19, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

move-results tar Jcf excessive? #207

move-results tar Jcf excessive? #207

bengland2 commented Mar 30, 2016

portante commented Mar 30, 2016

bengland2 commented Mar 30, 2016

portante commented Mar 31, 2016

atheurer commented Mar 31, 2016

atheurer commented Mar 31, 2016

portante commented Mar 31, 2016

atheurer commented Mar 31, 2016

k-rister commented Mar 31, 2016

k-rister commented Mar 31, 2016

atheurer commented Mar 31, 2016

bengland2 commented Mar 31, 2016

atheurer commented Mar 31, 2016

portante commented Mar 31, 2016

atheurer commented Mar 31, 2016

bengland2 commented Mar 31, 2016

portante commented Mar 31, 2016

bengland2 commented Dec 14, 2017

move-results tar Jcf excessive? #207

move-results tar Jcf excessive? #207

Comments

bengland2 commented Mar 30, 2016

portante commented Mar 30, 2016

bengland2 commented Mar 30, 2016

portante commented Mar 31, 2016

atheurer commented Mar 31, 2016

atheurer commented Mar 31, 2016

portante commented Mar 31, 2016

atheurer commented Mar 31, 2016

k-rister commented Mar 31, 2016

k-rister commented Mar 31, 2016

atheurer commented Mar 31, 2016

bengland2 commented Mar 31, 2016

atheurer commented Mar 31, 2016

portante commented Mar 31, 2016

atheurer commented Mar 31, 2016

bengland2 commented Mar 31, 2016

portante commented Mar 31, 2016

bengland2 commented Dec 14, 2017