Gsecraif user guide

Timothy Evans

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.3 or any later version published by the Free Software Foundation; with no Invariant Sections, no Front-Cover Texts, and no Back-Cover Texts. A copy of the license is included in the section entitled "GNU Free Documentation License".


Table of Contents

quick start
Overview
gsecraif version
Envisaged use
Installation
Help
Splitting files
Specifying the number of component files
Specifying the prefix
Using file names
Combining files
Specifying the number of component files
Specifying the prefix
Using file names
Splitting trivially small files with gsecraif
Debug
Progress information
Advanced use
Rotated bits
Transposed bits
Cascading
Chaining to recover a component
Protecting against data corruption
Limitations
General Limitations
Limitations in this version
A. Error messages
Appendix Errors when splitting files
Appendix Errors when combining files
B. Component file sizes for trivially small original files
C. GNU Free Documentation License

quick start

$ cat myfile | gsecraif

This will split the file “myfile” to produce three files: split000, split001 and split002.

$ gsecraif -c > recovered

This will combine the three files split000, split001 and split002 to produce the file “recovered”.

$ diff myfile recovered

No differences will be found.

Overview

The utility gsecraif is a tool to split and recombine files in a secure manner. The utility is designed to run in a unix-like environment and has been tested under Debian and Knoppix GNU/Linux. Gsecraif addresses two aspects of security; protecting against data loss and guarding against unauthorised access. Security is discussed in more detail in the Security document.

Gsecraif will split a file in to three or more component files and can recombine them back in to the original file even if one of the component files is lost; this is achieved through the same technique used in RAID 5 redundant disk striping.

Gsecraif splits a file in such a way as to make it impractical to infer any information about the original file from any one of the component files; access to all the component files or all the component files except one is required to reconstruct the original file. If a file has been split with gsecraif in to N component files it is impractical to infer any information about the original file from N-2 or fewer component files. Other file splitting utilities, such as the conventional unix “split” command create components from which it might be possible to infer information about the original file. For example if a text file is spit using the unix “split” command in to four components; it is obvious that being able to read a quarter of a document may give the reader a good impression as to what the original document was about.

gsecraif version

This document applies to version Alpha 04 of gsecraif.

Envisaged use

There are many well documented uses for file splitting utilities. There are a number of uses of gsecraif envisaged, but the two main uses envisaged are file transfer and storage of files on third party storage systems. The utility gsecraif is expected to be used where security is a concern, both in terms of protecting against data loss and unauthorised access.

File Transfer

The capacity of both data storage technology and data transfer technology (e.g. network bandwidth) is constantly increasing, but it often seems to be the case that the capacity of readily available removable media is smaller that the capacity of fixed media. It also seems to be true that the capacity and use of fixed media increases to often make network transfer difficult. For these reasons it is often convenient or necessary to split a file in to a number of smaller components that can be fitted on to removable media or transferred within other limits (such as email attachment size limits).

Using gsecraif to split a file in to three or more components which are then held on three or more removable media ensures that the original file can be recovered even if one of the removable media is lost. Using gsecraif also guards against unauthorised access to the original data if one of the removable media is stolen or intercepted. Gsecraif provides similar protection for components sent using data transfer technology such as email attachments or transferring component files using FTP.

File Storage

From the earliest days of computing, companies have been sending backup copies of data “off site”. Storing data on external, third party data storage facilities became significantly easier and faster with the development of the pubic internet and became popularised with the concept of cloud computing. There are two particular problems with entrusting data to third parties; one is the issue of retrieving data if the third party is unable to provide access to the data (technical difficulties) or if the third party ceases trading. The second issue is of confidentiality. Some internet companies are infamous for changing their privacy policies and companies with good reputations may be acquired by, or become broken up in to, companies with poor reputations. Simply storing copies of data on multiple external facilities can address the first issue, but may be expensive and exacerbates the second issue (there are now more companies that have to be trusted).

Gsecraif could be used to split files in to three or more components which are then held on three or more external data facilities. The original data would still be recoverable in the event of one of the third party companies experiencing difficulties or ceasing trading and it would be impractical for any of the third parties to infer any information about the original files from the component files that they hold.

Other uses

There are other uses of gsecraif. Gsecraif can be used where ever any other file splitting utility would split a file in to three or more component files.

Installation

This version of gsecriaf is installed from source files. The tarball should be down loaded and unziped and un tarred in to a suitable directory. Once unziped and un tared the gsecraif directory will contain a src directory. Change to the src directory and run:

$ ./configure

then

$ make

As root run:

# make install

The utility gsecraif will be placed in /usr/local/bin

Help

The current version of gsecraif does not provide extensive on line help, however, providing an invalid command line parameter or incorrect parameters, will cause gsecraif to output basic usage information. E.g.

$ gsecraif -h


    

The utility gsecraif is a tool to split and recombine files in a secure manner.

Input is expected from standard-in, output is to standard-out.
Default action is to split standard-in in to a number of component files.
Names may be given for the component files; if no names are given, a prefix is
used followed by a three digit number starting with 000.

Usage: gsecraif [OPTIONS]... [FILES]...

-h, --help           Print help and exit
-V, --version        Print version and exit
-c, --combine        Combine files
-d, --debug=INT      Debug level N (Default is 0 - no output)
-n, --nfiles=INT     Number of component files (default is 3)
-p, --prefix=STRING  Set the file prfix (Default is split)
-r, --rotate=INT     Rotate N bits
-t, --transpose      Transpose bits (rotate is ignored)
-#, --hash           Print # characters (every 1024 bytes)
    

    

Splitting files

Gsecraif will attempt to split input supplied on Standard-in. The utility will usually be used by piping a file or the output of some other command in to gsecraif. e.g.

$ cat myfile | gsecraif

Where myfile is a file to be split. As an example of piping the output of a command, a tar file could be split without actually creating the tar file itself:

$ tar -cf - * | gsecraif

Gsecraif will split a file in to three component files by default, called “split000”, “split001” and “split002” (see below).

Specifying the number of component files

By default gsecraif will split an original file in to three component files which is the minimum that can be specified. Gsecraif can split an original file in to between three and two hundred and fifty five component files; in the event that a file needs to be spit in to more component files the component files themselves can be further split (See cascading below). The number of component files is specified with the -n option, e.g.

$ cat myfile | gsecriaf -n 5

would split a file called “myfile” in to five component files.

If file names are specified for the component files, then the number of component files is derived from the number of file names provided.

Specifying the prefix

By default gsecraif will use a prefix of “split” followed by a three digit number for the names of the component files, e.g. spit000 split001 and split002

An alternative prefix may be specified using the -p parameter e.g.

$ cat myfile | gsecraif -p myfile

would produce “myfile000”, “myfile001” and “myfile002”.

Using file names

Individual file names may be specified for each component file by providing the file names after any other parameters. If file names are provided, a minimum of three file names must be given and no prefix must be provided.

$ cat myfile | gsecraif alpha beta gamma

Would split “myfile” in to three files called “alpha” “beta” and “gamma”.

In the current version of gsecriaf, the order of the files names is critical and should be recorded for use when recovering the original file.

Combining files

The default action of gsecraif is to split a file read in on Standard input.  To combine a file the -c parameter must be specified.  The combined file is output to Standard out.

By default gsecraif will try to combine three component files called “split000” “split001” and “split002”

e.g.

$ gsecraif -c > recovered

Would try to combine component files called “split000” “split001” and “split002” in to a file called “recoverd”.

If one of the above files was missing, gsecraif would try to recover the original file using parity in the remaining two files.

Specifying the number of component files

The number of component files is specified with the -n option, e.g.

$ gsecraif -c -n 5 > recovered

would try to combine five component files with a default prefix of “split” (i.e split000 split001 .. split004) in to a file called “recoverd”. If one of the files was missing, gsecraif would try to recover the original file using parity in the remaining files.

Specifying the prefix

An alternative prefix may be specified using the -p parameter e.g.

$ gsecraif -c -p myfile > recovered

would try to combine “myfile000” “myfile001” “myfile002” in to a file called “recovered”. If one of the files was missing, gsecraif would try to recover the original file using parity in the remaining files.

Using file names

Individual file names may be specified for each component file by providing the file names after any other parameters.

$ gsecraif -c alpha beta gamma > recovered

would try to combine “alpha” “beta” and “gamma” in to a file called “recovered”.

Gsecriaf will derive the number of component files from the number of file names specified. In the current version of gsecriaf, the order of the files names is critical when recovering the original file. The order of the file names should have been recorded when splitting the file. It is important that the correct file names are specified in the correct order, even if one of the component files is missing; if the name of the missing component file is not known, an arbitrary file name can be given.

As an example, suppose a file was split in to three component files called “alpha”, “beta” and “gamma” and the file “beta” is missing, the origianl file can be recovered with:

$ gsecraif -c alpha missing gamma > recovered

Provided that there is no file called “missing”, gsecraif would try to combine “alpha” “missing” and “gamma” in to a file called “recovered” and since the file “missing” does not exist, parity in the remaining files would be used to recover the original file.

Splitting trivially small files with gsecraif

Gsecraif does not impose an arbitrary lower limit on the size of file that can be split. There may be occasions when it is desirable to split trivially small files with gsecraif either to take advantage of the security benefits or simply for consistency or simplicity, for example when splitting a number of files of varying sizes.

Gsecraif splits an original file in to three or more component files each of which has a four byte header and a single byte trailer. Each component file also contains parity information. For trivially small files, each component will be larger than the original file depending upon the number of component files the original is split in to.

If the original file has 12 or more bytes, each component is always smaller than the original file.

The relationships between component file size, number of component files and the size of trivially small files is illustrated in the Appendix. See Appendix B, Component file sizes for trivially small original files

Debug

By default gsecraif produces no information when running.  The level of debug information can be increased by using the -d paramter followed by an integer (0 to 2 in the current version).  Each debug level produces all the information in the lower levels and adds more information. E.g.

$ gsecraif -d 2

Debug information is out put the Standard error, from which it can be re-directed as required, such as to a log file.

With the default debug level of zero, only serious errors are reported such as a failure to open a file or a corrupt header in a component file.

Debug level 1 reports basic information.

Debug level 2 reports on the progress of the split or combine process in detail.

Progress information

Specifying -# will cause gsecraif to print one # character for every 1K of data processed in the original file.

Advanced use

Rotated bits

Gsecraif processes and original file in a series of slices, the size of which is determined by the number of component files. (See the security document for more information). By default gsecraif rotates the bits in each slice. The number of places the bits are moved is an intiger between 0 and 7 and is specified with the -r parameter. E.g

$ cat myfile | gsecriaf -r 5

Would cause gsecraif to rotate the bits my moving them 5 places and would split the origianl file called “myfile” in to the default three component files, “spit000”, “split001” and “split002”.

Gsecraif moves the bits three places by default.

Transposed bits

Gsecraif processes and original file in a series of slices, the size of which is determined by the number of component files. (See the security document for more information). By default gsecraif rotates the bits in each slice. Gsecraif can reorder the bits by transposing them rather than rotating them.

Transposing the bits may provide greater security than rotating the bits at the expense of performance. Further information is provided in the security document.

To use transposition rather than rotation the -t parameter is used. E.g.

$ cat myfile | gsecriaf -t

If the -t parameter is used, the rotation parameter is ignored.

Cascading

In the event that it it necessary to split an original file in to more that two hundred and fifty five files, this can obviously be achieved by splitting an original file and then subsequently treating each component file as an original file to be split by gsecraif. The recovery of the original file that was started with, would be the exact reverse of the cascading process.

It may also be desirable to cascade the splitting process for other reasons than to exceed to two hundred and fifty file limit, (see the Security document).

Chaining to recover a component

If one of the component files becomes corrupt or is lost, it is important to recover the component, because gsecraif, by design, can only tolerate the loss of one component file.

A component file can be recovered with out having to fully recover the original file which is then split again. Gsecraif can be used to recover the original file, but instead of directing the output to the recovered file, the output can be piped in to another instance of gsecraif to split this output and re-generate the component file. This process is best explained by example.

As an example, an original file has been split in to three components, “alpha”, “beta” and “gamma” and the component file, “beta” has been lost. The lost component file can be recovered with the command below:

$ gsecraif -c alpha missing gamma | gsecriaf /dev/null beta /dev/null

Protecting against data corruption

gsecriaf is not designed to detect or recover from corrupt component files. It is recommended that appropriate check-sums (e.g. MD5) are produced for each component file and stored with the corresponding component file. If a component file is found to be corrupt, it can be deleted and gsecraif can recover the original file with one of the component files missing.

Gsecraif is designed to create component files of exactly the same size (using padding data as necessary). Each component file begins with a defined header and ends with the same byte value. These facts can be used to provide some elementary checking, but check-sums should be used to protect against data corruption.

Limitations

General Limitations

By design gsecraif can only tolerate the loss of a single component file; it uses XOR (RAID 5) technology.

In common with other utilities there is no facility to re-start an interrupted process. If a large split or re-combination operation is halted the process must be run again.

Limitations in this version

This version is Alpha code.