-
Notifications
You must be signed in to change notification settings - Fork 108
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
move-results tar Jcf excessive? #207
Comments
What is the time difference you are seeing in your case? |
So it's a 20x speedup for tar on this dataset. You do take 3x as much space when using gzip compression, but most of the time the network copy time slowdown isn't as great as the slowdown from using tar Jcf compression. Another option here is to have move-results background the compression and copy, so that you can then proceed with new test results. For example, it could rename the directories to a holding area and then fork a process to compress and copy the results. If you use "nice" then the CPU usage isn't competing with the application under test. This isn't necessarily representative, pretty small. but try it! [root@gprfs041-10ge pbench-agent]# echo 3 > /proc/sys/vm/drop_caches real 0m2.994s [root@gprfs041-10ge pbench-agent]# echo 3 > /proc/sys/vm/drop_caches real 0m43.782s [root@gprfs041-10ge pbench-agent]# ls -l /tmp/{x,y} |
Classic trade-off: time vs. space. We chose to lean towards space since some of the data collected can compress even better with So how are you using
-- OR --
|
There will be rioting in the streets if this were to happen :) If we are not already doing multi-threaded compression, we should. I doubt tar has support for it, but we should be able to pipe tar to separate utility like xz --threads=0. I am curious how effective the multi-threading could be. |
I am open to using bzip if the only effect is data copied over network is bigger (we end up uncompressing it anyway on the repo, right?). But some other things to consider: -copy data over the network as we tar/compress instead of 2 steps. |
@atheurer, we archive the tar balls, so that will grow the archived data size. The untarred data is expendable, but I would like to keep size of the archives down as much as possible. |
@portante OK, I did not realize that but it makes sense. We could always just defer the compression until the tar file is copied to the repository so there's no time spent on compressing on the test system. I would hope we could process multiple incoming data sets on the repository at once, so we can use all of the available cpus. I am still curious about rsync as a replacement of the scp. Not sure if it is better, but it may include built-in checksum. |
On 03/31/2016 09:29 AM, Andrew Theurer wrote:
I would expect that compressing on the test system might be more
I would assume that rsync would underperform scp since you will probably If the upload is getting interrupted and needing to be restarted a lot On a previous tool I used we actually went the opposite direction, for ...so what about having a default behavior (like the current one), but Karl Rister [email protected] |
On 03/30/2016 10:45 PM, Andrew Theurer wrote:
I can't speak to xv, but I have extensively used pbzip2 as a replacement In a situation like pbench where you control the client and server side Karl Rister [email protected] |
FWIW, a xz --threads got me about a 4x speed-up on a small-ish tar file |
about a previous comment, "rsync -ravuz" will do gzip compression on the fly, saving network bandwidth, if that is what you are concerned about. With this approach, if the copy is interrupted, it can be resumed right where it left off as long as the target directory is still there. Of course, the compression doesn't help much if there are a ton of small files. In that case, tar zcf - | wins, I think. |
So, could we use tar (no compression), then rsync -ravuz, then on the repo, compress the tar file with xz --threads? |
So doesn't scp also do compression? rsync is good about look at two file system trees and determining what has to be transferred, avoiding transferring data in files that is already there. In our current model, we also know we need to copy it remotely, so just using scp with compression would be fine too. So are we saying we would not compress a tar ball at all? So a
Then on the server side, we would have potentially huge tar balls from many clients lying around until the can be compressed and archived. I think we could easily end up taxing CPU resources on a server, even with multiple threads, and would require a very large staging area, which I think keeps this from scaling. @bengland2, so how are you using move-results? One at a time, or all at once? I think we should use xz --threads on the client, and consider compressing tool data as we post-process it so that we spread the time out over the pbench-run. Thoughts? |
For now just using xz with threads seems like a good incremental improvement. |
I use move-results one at a time, want to see the results right away. I think xz --threads might be good enough, have to experiment. |
@bengland2, you might want to consider @ndokos's new |
@portante , What became of this? I thought the rsync -avuz was not a bad idea since it does less heavy compression while copying over the network and defers at-rest compression until the other end. I tried
I chose this example because it contained a lot of text files and therefore should be similar to pbench in terms of compressibility, I thought. I saw this on a 2-socket Sandy Bridge named pcloud13.perf.lab.eng.bos.redhat.com: threads seconds method compression ratios (original size was ~2.4 GB): method compression ratio @atheurer Overall, gzip seems the most efficient in using the least CPU while getting most of the compression that the other tools get. |
It isn't clear to me why we have to use "tar Jcf" as part of move-results. Wouldn't "tar zcf" be sufficient and a lot faster? You still get good compression with text files using this. It really slows me down because it takes a long time and I can't start another test until move-results finishes, right?
The text was updated successfully, but these errors were encountered: