Monday, January 24, 2011

A Tar Gzip Split

The issue:

We had a 350gbs text file that was gzip’ed down to 115gbs that needed to be move from one of our Linux servers to a client’s Lunix server. However, as usual they wanted it there yesterday. So to speed this movement up I wanted to split the file so that it could be sftp’ed in parallel. Of course unzipping and splitting and re-zipping would require a lot of time not to mention space that was not available. What to do…

The answer:

Here is what I did.

First I split the gzip’ed file using the following command:

tar cfz - |split -b 100m /u1/directoryname/somefilename.gz

This split the zipped source file into 11 pieces. Allowing them to be ftp’ed in parallel; thereby greatly reducing the overall transfer time. Once the 11 files were on the target server they were put back together and then they where uncompressed.

cat x* > somefilename.gz (note: it is important to use the x* as if you try to order the reconstruction of the files the resulted gzip’ed file will be corrupt).

Then this new file must be unzipped:

gunzip somefilename.gz

I go no errors and a quick head and wc –l on the file showed everything to be fine.
Thx

No comments: