Table of Contents
|
This will split the file “myfile” to produce three files: split000, split001 and split002. |
|
This will combine the three files split000, split001 and split002 to produce the file “recovered”. |
|
No differences will be found. |
The utility gsecraif is a tool to split and recombine files in a secure manner. The utility is designed to run in a unix-like environment and has been tested under Ubuntu Linux. Gsecraif addresses two aspects of security; protecting against data loss and guarding against unauthorised access. Security is discussed in more detail in the Security document.
Gsecraif will split a file in to three or more component files and can recombine them back in to the original file even if one of the component files is lost; this is achieved through the same technique used in RAID 5 redundant disk striping.
Gsecraif splits a file in such a way as to make it impractical to infer any information about the original file from any one of the component files; access to all the component files or all the component files except one is required to reconstruct the original file. If a file has been split with gsecraif in to N component files it is impractical to infer any information about the original file from N-2 or fewer component files. Other file splitting utilities, such as the conventional unix “split” command create components from which it might be possible to infer information about the original file. For example if a text file is spit using the unix “split” command in to four components; it is obvious that being able to read a quarter of a document may give the reader a good impression as to what the original document was about.
There are many well documented uses for file splitting utilities. There are a number of uses of gsecraif envisaged, but the two main uses envisaged are file transfer and storage of files on third party storage systems. The utility gsecraif is expected to be used where security is a concern, both in terms of protecting against data loss and unauthorised access.
The capacity of both data storage technology and data transfer technology (e.g. network bandwidth) is constantly increasing, but it often seems to be the case that the capacity of readily available removable media is smaller that the capacity of fixed media. It also seems to be true that the capacity and use of fixed media increases to often make network transfer difficult. For these reasons it is often convenient or necessary to split a file in to a number of smaller components that can be fitted on to removable media or transferred within other limits (such as email attachment size limits).
Using gsecraif to split a file in to three or more components which are then held on three or more removable media ensures that the original file can be recovered even if one of the removable media is lost. Using gsecraif also guards against unauthorised access to the original data if one of the removable media is stolen or intercepted. Gsecraif provides similar protection for components sent using data transfer technology such as email attachments or transferring component files using FTP.
From the earliest days of computing, companies have been sending backup copies of data “off site”. Storing data on external, third party data storage facilities became significantly easier and faster with the development of the pubic internet and became popularised with the concept of cloud computing. There are two particular problems with entrusting data to third parties; one is the issue of retrieving data if the third party is unable to provide access to the data (technical difficulties) or if the third party ceases trading. The second issue is of confidentiality. Some internet companies are infamous for changing their privacy policies and companies with good reputations may be acquired by, or become broken up in to, companies with poor reputations. Simply storing copies of data on multiple external facilities can address the first issue, but may be expensive and exacerbates the second issue (there are now more companies that have to be trusted).
Gsecraif could be used to split files in to three or more components which are then held on three or more external data facilities. The original data would still be recoverable in the event of one of the third party companies experiencing difficulties or ceasing trading and it would be impractical for any of the third parties to infer any information about the original files from the component files that they hold.
This version of gsecriaf is installed from source files. The tarball should be down loaded and unziped and un tarred in to a suitable directory. Once unziped and un tared the gsecraif directory will contain a src directory. Change to the src directory and run:
$ ./configure
then
$ make
As root run:
# make install
The utility gsecraif will be placed in /usr/local/bin
The current version of gsecraif does not provide extensive on line help, however, providing an invalid command line parameter or incorrect parameters, will cause gsecraif to output basic usage information. E.g.
$ gsecraif -h
The utility gsecraif is a tool to split and recombine files in a secure manner.
Input is expected from standard-in, output is to standard-out.
Default action is to split standard-in in to a number of component files.
Names may be given for the component files; if no names are given, a prefix is
used followed by a three digit number starting with 000.
Usage: gsecraif [OPTIONS]... [FILES]...
-h, --help Print help and exit
-V, --version Print version and exit
-c, --combine Combine files
-d, --debug=INT Debug level N (Default is 0 - no output)
-n, --nfiles=INT Number of component files (default is 3)
-p, --prefix=STRING Set the file prfix (Default is split)
-r, --rotate=INT Rotate N bits
-t, --transpose Transpose bits (rotate is ignored)
-#, --hash Print # characters (every 1024 bytes)
Gsecraif will attempt to split input supplied on Standard-in. The utility will usually be used by piping a file or the output of some other command in to gsecraif. e.g.
$ cat myfile | gsecraif
Where myfile is a file to be split. As an example of piping the output of a command, a tar file could be split without actually creating the tar file itself:
$ tar -cf - * | gsecraif
Gsecraif will split a file in to three component files by default, called “split000”, “split001” and “split002” (see below).
By default gsecraif will split an original file in to three component files which is the minimum that can be specified. Gsecraif can split an original file in to between three and two hundred and fifty five component files; in the event that a file needs to be spit in to more component files the component files themselves can be further split (See cascading below). The number of component files is specified with the -n option, e.g.
$ cat myfile | gsecriaf -n 5
would split a file called “myfile” in to five component files.
If file names are specified for the component files, then the number of component files is derived from the number of file names provided.
By default gsecraif will use a prefix of “split” followed by a three digit number for the names of the component files, e.g. spit000 split001 and split002
An alternative prefix may be specified using the -p parameter e.g.
$ cat myfile | gsecraif -p myfile
would produce “myfile000”, “myfile001” and “myfile002”.
Individual file names may be specified for each component file by providing the file names after any other parameters. If file names are provided, a minimum of three file names must be given and no prefix must be provided.
$ cat myfile | gsecraif alpha beta gamma
Would split “myfile” in to three files called “alpha” “beta” and “gamma”.
In the current version of gsecriaf, the order of the files names is critical and should be recorded for use when recovering the original file.
The default action of gsecraif is to split a file read in on Standard input. To combine a file the -c parameter must be specified. The combined file is output to Standard out.
By default gsecraif will try to combine three component files called “split000” “split001” and “split002”
e.g.
$ gsecraif -c > recovered
Would try to combine component files called “split000” “split001” and “split002” in to a file called “recoverd”.
If one of the above files was missing, gsecraif would try to recover the original file using parity in the remaining two files.
The number of component files is specified with the -n option, e.g.
$ gsecraif -c -n 5 > recovered
would try to combine five component files with a default prefix of “split” (i.e split000 split001 .. split004) in to a file called “recoverd”. If one of the files was missing, gsecraif would try to recover the original file using parity in the remaining files.
An alternative prefix may be specified using the -p parameter e.g.
$ gsecraif -c -p myfile > recovered
would try to combine “myfile000” “myfile001” “myfile002” in to a file called “recovered”. If one of the files was missing, gsecraif would try to recover the original file using parity in the remaining files.
Individual file names may be specified for each component file by providing the file names after any other parameters.
$ gsecraif -c alpha beta gamma > recovered
would try to combine “alpha” “beta” and “gamma” in to a file called “recovered”.
Gsecriaf will derive the number of component files from the number of file names specified. In the current version of gsecriaf, the order of the files names is critical when recovering the original file. The order of the file names should have been recorded when splitting the file. It is important that the correct file names are specified in the correct order, even if one of the component files is missing; if the name of the missing component file is not known, an arbitrary file name can be given.
As an example, suppose a file was split in to three component files called “alpha”, “beta” and “gamma” and the file “beta” is missing, the origianl file can be recovered with:
$ gsecraif -c alpha missing gamma > recovered
Provided that there is no file called “missing”, gsecraif would try to combine “alpha” “missing” and “gamma” in to a file called “recovered” and since the file “missing” does not exist, parity in the remaining files would be used to recover the original file.
Gsecraif does not impose an arbitrary lower limit on the size of file that can be split. There may be occasions when it is desirable to split trivially small files with gsecraif either to take advantage of the security benefits or simply for consistency or simplicity, for example when splitting a number of files of varying sizes.
Gsecraif splits an original file in to three or more component files each of which has a four byte header and a single byte trailer. Each component file also contains parity information. For trivially small files, each component will be larger than the original file depending upon the number of component files the original is split in to.
If the original file has 12 or more bytes, each component is always smaller than the original file.
The relationships between component file size, number of component files and the size of trivially small files is illustrated in the Appendix. See the section called “Appendix Component file sizes for trivially small original files”
By default gsecraif produces no information when running. The level of debug information can be increased by using the -d paramter followed by an integer (0 to 2 in the current version). Each debug level produces all the information in the lower levels and adds more information. E.g.
$ gsecraif -d 2
Debug information is out put the Standard error, from which it can be re-directed as required, such as to a log file.
With the default debug level of zero, only serious errors are reported such as a failure to open a file or a corrupt header in a component file.
Debug level 1 reports basic information.
Debug level 2 reports on the progress of the split or combine process in detail.
Gsecraif processes and original file in a series of slices, the size of which is determined by the number of component files. (See the security document for more information). By default gsecraif rotates the bits in each slice. The number of places the bits are moved is an intiger between 0 and 7 and is specified with the -r parameter. E.g
$ cat myfile | gsecriaf -r 5
Would cause gsecraif to rotate the bits my moving them 5 places and would split the origianl file called “myfile” in to the default three component files, “spit000”, “split001” and “split002”.
Gsecraif moves the bits three places by default.
Gsecraif processes and original file in a series of slices, the size of which is determined by the number of component files. (See the security document for more information). By default gsecraif rotates the bits in each slice. Gsecraif can reorder the bits by transposing them rather than rotating them.
Transposing the bits may provide greater security than rotating the bits at the expense of performance. Further information is provided in the security document.
To use transposition rather than rotation the -t parameter is used. E.g.
$ cat myfile | gsecriaf -t
If the -t parameter is used, the rotation parameter is ignored.
In the event that it it necessary to split an original file in to more that two hundred and fifty five files, this can obviously be achieved by splitting an original file and then subsequently treating each component file as an original file to be split by gsecraif. The recovery of the original file that was started with, would be the exact reverse of the cascading process.
It may also be desirable to cascade the splitting process for other reasons than to exceed to two hundred and fifty file limit, (see the Security document).
If one of the component files becomes corrupt or is lost, it is important to recover the component, because gsecraif, by design, can only tolerate the loss of one component file.
A component file can be recovered with out having to fully recover the original file which is then split again. Gsecraif can be used to recover the original file, but instead of directing the output to the recovered file, the output can be piped in to another instance of gsecraif to split this output and re-generate the component file. This process is best explained by example.
As an example, an original file has been split in to three components, “alpha”, “beta” and “gamma” and the component file, “beta” has been lost. The lost component file can be recovered with the command below:
$ gsecraif -c alpha missing gamma | gsecriaf /dev/null beta /dev/null
gsecriaf is not designed to detect or recover from corrupt component files. It is recommended that appropriate check-sums (e.g. MD5) are produced for each component file and stored with the corresponding component file. If a component file is found to be corrupt, it can be deleted and gsecraif can recover the original file with one of the component files missing.
Gsecraif is designed to create component files of exactly the same size (using padding data as necessary). Each component file begins with a defined header and ends with the same byte value. These facts can be used to provide some elementary checking, but check-sums should be used to protect against data corruption.
By design gsecraif can only tolerate the loss of a single component file; it uses XOR (RAID 5) technology.
In common with other utilities there is no facility to re-start an interrupted process. If a large split or re-combination operation is halted the process must be run again.
All messages from gsecraif are sent to standard-error. When the debug level is set to zero, which is the default, gesecraif produces no messages under normal circumstances. Errors may occur that will prevent gesecraif from proceeding and if this happens an Error message is sent to standard-error (regardless of the debug level). The error messages that might be produced, indicating that gsecraif can not proceed, are shown below.
Error: Cannot open standard in.
When splitting a file, gsecraif reads from standard-in. This error indicates that unusual conditions have prevented gsecraif from opening standard-in (e.g the system has exceeded the maximum number of open files)
Error: Cannot open output file
Gsecraif cannot open one or more of the component files. The filename may be invalid (e.g. non-existent directory path) or the destination may not be writeable.
Error: Cannot open standard out.
When combining files, gsecraif sends the output to standard-out. This error indicates that unusual conditions have prevented gsecraif from opening standard-out (e.g the system has exceeded the maximum number of open files)
Error: multiple files can not be opened.
Because gsecraif uses XOR parity (RAID 5 technology), it can recover from at most one missing file. The error is show when gsecraif failed to open two or more component files.
Error: File format of one or more component files is incorrect.
Each component file has a defined structure, this error indicates that one or more component files are corrupt.
Error: One or more component files are corrupt.
This error occurs when gsecraif finishes reading one or more component files too soon; all component files should be the same size and end with a byte indicating the number of pad bytes used (if any).
Error: Bad pad byte, component file id: [compfileid] Pad value: [padbytes]
All component files should end with the same byte value indicating the number of pad bytes used. The error shows a component files with an inconistent number of pad bytes.
Error: Bad file format version.
The first byte of each component file is the file format version which should be 1 for this version of gsecraif. The error indicates that either the component file was created with a later version of gesecraif or is corrupt.
Error: Inconsistent number of component files.
All component files should agree on the number of component files; the error indicates that this is not the case for one or more component files.
Error: duplicate file id zero.
Each component files should have a unique id; the error shows that two or more files share the id 0.
Error: Duplicate file id.
The error shows that two or more files share the same id.
Error: Incorrect order.
In this version of gsecraif the components must be in the correct order; the error indicates that the order is incorrect.
Error: Inconsistent rotation value.
All components should agree on the value of the rotation used; the error shows that one or more component files disagree on this value.
The table below shows the size of a single component file, in bytes, for increasing sizes of original file, in bytes, and increasing number of component files.
For an original null file of zero bytes each component file is 5 bytes regardless of the number of component files.
For an original file of 1 or 2 bytes each component file is 6 bytes regardless of the number of component files.
If the original file has 12 or more bytes each component is always smaller than the original file.
# original file size (bytes) | ||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
null | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | ||
# component files | 3 | 5 | 6 | 6 | 7 | 7 | 8 | 8 | 9 | 9 | 10 | 10 | 11 | 11 | 12 | 12 |
4 | " | " | " | 6 | 7 | 7 | 7 | 8 | 8 | 8 | 9 | 9 | 9 | 10 | 10 | |
5 | " | 6 | 7 | 7 | 7 | 7 | 8 | 8 | 8 | 8 | 9 | 9 | ||||
6 | " | 6 | 7 | 7 | 7 | 7 | 7 | 8 | 8 | 8 | 8 | |||||
7 | " | 6 | 7 | 7 | 7 | 7 | 7 | 7 | 8 | 8 | ||||||
8 | " | 6 | 7 | 7 | 7 | 7 | 7 | 7 | 7 | |||||||
9 | " | 6 | 7 | 7 | 7 | 7 | 7 | 7 | ||||||||
10 | " | 6 | 7 | 7 | 7 | 7 | 7 | |||||||||
11 | " | 6 | 7 | 7 | 7 | 7 | ||||||||||
12 | " | 6 | 7 | 7 | 7 | |||||||||||
13 | " | 6 | 7 | 7 | ||||||||||||
14 | " | 6 | 7 | |||||||||||||
15 | " | 6 |