I regularly backup the Raspbian system of my several Raspberry Pi’s, for reasons that anyone owning and using a Raspberry Pi knows.
With time you end up always wanting more, and I want to upload backups on the cloud for that additional layer of extra safety; cloud space, though, is either free and very limited, or quite costly to mantain, hence the smaller the files you upload, the more practical is sending them online.
With this purpose in mind, I wanted to try several compression options, using the same source file (a 3.7GB image file produced by my latest version of RaspiBackup -the “bleeding edge” which right now is in its own branch), but changing some parameters from the default “Ultra settings” (the one you can find in 7z manpage).
All tests were done on a non overclocked Raspberry Pi 4 with 4GB of RAM.
First test goes with the “ultra settings” comandline found in 7z manpage:
time 7z a -t7z -m0=lzma -mx=9 -mfb=64 -md=32m -ms=on archive.7z source.img
7-Zip [32] 16.02 : Copyright (c) 1999-2016 Igor Pavlov : 2016-05-21
p7zip Version 16.02 (locale=en_US.UTF-8,Utf16=on,HugeFiles=on,32 bits,4 CPUs LE)
Scanning the drive:
1 file, 3981165056 bytes (3797 MiB)
Creating archive: archive.7z
Items to compress: 1
Files read from disk: 1
Archive size: 695921344 bytes (664 MiB)
Everything is Ok
real 50m33.638s
user 73m16.589s
sys 0m44.505s
Second test builds on this, and increases the dictionary size to 128MB (which is, alas, the maximum allowed for 32bit systems as per 7zip documentation, any value above this will throw an error on the raspberry):
time 7z a -t7z -m0=lzma -mx=9 -mfb=64 -md=128m -ms=on archive.7z source.img
7-Zip [32] 16.02 : Copyright (c) 1999-2016 Igor Pavlov : 2016-05-21
p7zip Version 16.02 (locale=en_US.UTF-8,Utf16=on,HugeFiles=on,32 bits,4 CPUs LE)
Scanning the drive:
1 file, 3981165056 bytes (3797 MiB)
Creating archive: archive.7z
Items to compress: 1
Files read from disk: 1
Archive size: 625572636 bytes (597 MiB)
Everything is Ok
real 59m54.703s
user 80m50.340s
sys 0m55.886s
Third test puts another variable in the equation, by adding the -mmc=10000
parameter, which tells the algorithm to cycle ten thousand times to find matches in the dictionary, hence increasing the possibility of a better compression, from the default number of cycles which should be in this case less than 100.
time 7z a -t7z -m0=lzma -mx=9 -mfb=64 -md=128m -mmc=10000 -ms=on archive.7z source.img
7-Zip [32] 16.02 : Copyright (c) 1999-2016 Igor Pavlov : 2016-05-21
p7zip Version 16.02 (locale=en_US.UTF-8,Utf16=on,HugeFiles=on,32 bits,4 CPUs LE)
Scanning the drive:
1 file, 3981165056 bytes (3797 MiB)
Creating archive: archive.7z
Items to compress: 1
Files read from disk: 1
Archive size: 625183257 bytes (597 MiB)
Everything is Ok
real 77m53.377s
user 99m48.431s
sys 0m39.215s
I then tried one last command line that I found on Stack Exchange network:
time 7z a -t7z -mx=9 -mfb=32 -ms -md=31 -myx=9 -mtm=- -mmt -mmtf -md=128m -mmf=bt3 -mpb=0 -mlc=0 archive.7z source.img
and I cannot find that answer anymore but it boasted the best compression rate ever (yeah, I imagine, everything was set to potential maximum). This commandline I had to tone down, because it implied increasing to the maximum possibile the dictionary size (which is 1536MB, but it’s not feasible on 32bit system which are limited to 128MB) and also the fast bytes to its maximum of 273.
I always got an error though:
ERROR: Can't allocate required memory!
even by gradually decreasing the -mfb (fast bytes) down to 32; even if I completely removed the fast bytes parameter. At this point I simply desisted.
So, onto the
Conclusions:
You should definitely pump up the dictionary size to its limit of 128MB, because it yields a discrete compression increase (down to 15.7% from 17.5%, so 10% smaller). According to this post the time increase must be measured as “user+sys”), so it’s 74 minutes of CPU time for first example, 81.75 minutes for the second, and 100.5 minutes for the third. The difference in CPU time between the first and second is still in the ballpark of 10%, so that additional time gets practically converted in better compression, I’ll take it.
Interestingly, increasing the matching cycles didn’t bring ANY increase in compression, at the expense of a whopping 25% increase in processing time (actually it did when I compared the exact file sizes, and it was negligible at just a few hundred kilobytes less).
Overall, this is is a great result, as the total free space in that image should be around 300MB, so the rest is all real data compression.