Printer Friendly Version      Send     
Click to Rate and Give Feedback
MSDN
MSDN Library
System Services
File Services
 Microsoft Cabinet Format
Compression Technology
Microsoft Cabinet Format
 

The Cabinet Software Development Kit provides developers with the components needed to utilize Microsoft's Cabinet File technology within other applications, or to build cabinet file management tools. Microsoft is committed to making cabinet files an open technology.

This release of the Cabinet Software Development Kit (formerly called the Cabinet Resource Kit) provides complete documentation of cabinet files, including the LZX data compression technology.

Another tool, called CABView can be found in the Microsoft Windows 95 Power Toys Web site. CABView allows you to treat CAB files like folders. Using this tool, you can explore and drag-and-drop within and into CAB files as you would a folder in Windows Explorer.

Download the Cabinet SDK from the following location:
http://download.microsoft.com/download/platformsdk/cab/2.0/w98nt42kmexp/en-us/Cabsdk.exe

In this Library Section

Cabarc User's Guide

Cabinet Format

FCI / FDI

LZX Format

MakeCAB User's Guide

MSZip Format

Microsoft Cabarc User's Guide

Copyright © 1997 Microsoft Corporation. All rights reserved.

Topics in this section

Introduction

Creating Cabinets

List Cabinet Contents

Extracting Cabinets

Introduction

The Cabinet Format

The cabinet format provides a way to efficiently package multiple files. The key features of the cabinet format are that multiple files may be stored in a single cabinet ("CAB file"); and that data compression is performed across file boundaries, significantly improving the compression ratio.

Depending upon the number of files to be compressed, and the expected access patterns (sequential or random access; whether most of the files will be requested at once or only a small portion), cabinets can be constructed in different ways. One key concept of the cabinet file is the folder. A folder is a collection of one or more files that are compressed together as a single entity. By compressing files in this way, the compression ratio is improved. The downside is that random access time suffers, since in order for any particular file in a folder to be decoded, all preceding files in the same folder must also be decoded.
Back to top

Cabarc

Cabarc is a utility that creates, extracts, and lists the contents of cabinet files (CABs), using a command line interface similar to that of popular archiving tools. Cabarc supports wildcards and recursive directory searches.

Back to: Top of page > Introduction

Command Line Usage

Cabarc is used as follows:

Usage: CABARC [options] command cabfile [@list] [files] [dest_dir]

Currently, only three commands are supported; N (create new cabinet), L (list contents of an existing cabinet), and X (extract files from a cabinet). These commands are described in the following pages.

Options must appear before the command name, and cannot be combined (for example, to set the –r and –p options, use –r –p, and not –rp).

Back to: Top of page > Introduction

Creating Cabinets

Cabinets are created using the n command, followed by the name of the cabinet to create, followed by a filename list, as shown below:

cabarc n mycab.cab prog.c prog.h prog.exe readme.txt

The above command creates the cabinet mycab.cab containing the files "prog.c", "prog.h", "prog.exe", and "readme.txt", in a single folder, using the default compression mode, MSZIP.

Back to: Top of page

Wildcards

Cabarc supports wildcards in the filename list, as shown in the example below:

cabarc n mycab.cab prog.* readme.txt

Back to: Top of page > Creating Cabinets

Folders

By default, all files are added to a single folder (compression history) in the cabinet. It is possible to tell cabarc to begin a new folder, by inserting the plus (+) symbol as a file to be added, as shown below:

cabarc n mycab.cab test.c main.c + test.exe *.obj

The above command creates the cabinet "mycab.cab" with one folder containing "test.c" and "main.c", and a second folder containing "test.exe" and all files matching "*.obj".

Back to: Top of page > Creating Cabinets

Path Name Preservation

By default, directory names are not preserved in the cabinet; only the filename component is stored. For example, the following command will result in the filename "prog.c" being stored in the cabinet:

cabarc n mycab.cab c:\source\myproj\prog.c

In order to preserve path names, the –p option should be used:

cabarc –p n mycab.cab c:\mysource\myproj\prog.c

This command will cause the file to be named "mysource\myproj\prog.c" in the cabinet. Note that the c:\ prefix is still stripped from the filename; cabarc will not allow absolute paths to be stored in the cabinet, nor will it extract such absolute paths.

Back to: Top of page > Creating Cabinets

Path Stripping

In many situations it may be desirable to preserve some of the path name, but not all of it. For example, one might wish to archive everything in the c:\mysource\myproj\ directory, but store only the myproj\ component of the path. This can be accomplished with the path stripping option, -P (capital P).

cabarc –p –P mysource\ n mycab.cab c:\mysource\myproj\prog.c

The –P option strips any strings which begin with the provided string (wildcards are not supported in this case; it is a simple text match). Any absolute path prefixes such as c:\ or \ are stripped before the comparison takes place, so these characters should not be included in the –P option.

The –P option may be used multiple times to strip out multiple paths; cabarc builds a list of all paths to be stripped, and applies only the first one which matches. For example:

cabarc –p –P mysrc\ –P yoursrc\ n mycab.cab c:\mysrc\myproj\*.* d:\yoursrc\yourproj\*.c

The trailing slash at the end of the path name is important; entering –P mysrc instead of –P mysrc\ would cause files to be added as "\myproj\<filename>".

Back to: Top of page > Creating Cabinets

Recursive Directory Search

Cabarc can archive files in a directory and all of its subdirectories, by use of the –r option. For example, the command shown below will archive all files ending in .h that are in c:\msdev\include\, c:\msdev\include\sys, and c:\msdev\include\gl (assuming these directories exist on your system).

cabarc –r –p n mycab.cab c:\msdev\include\*.h

The –p option is used here to preserve the path information when the files are added to the cabinet; without this option, only the filename components would be stored, although sometimes it might be desirable behavior to not use –p.

Back to: Top of page > Creating Cabinets

Reserve Space for Code Signature

Cabarc can reserve space in the cabinet for a code signature. This is done using the –s option, which reserves a specified amount of empty space in the cabinet. For code signing, 6144 bytes need to be reserved:

cabarc –s 6144 n mycab.cab test.exe

Note that the –s option does not actually write the code signature; it merely reserves space for it in the cabinet. The appropriate code signing utility must be used to fill out the code signature.

Back to: Top of page > Creating Cabinets

Set Cabinet ID

Cabinet files have a 16-bit cabinet ID field that is designed for application use. The default value of this field is zero, however, the –i option of cabarc can be used to set this field to any 16-bit value:

cabarc –i 12345 n mycab.cab test.exe

Back to: Top of page > Creating Cabinets

Set Compression Type

The default compression type for a cabinet is MSZIP. However, the compression type can be changed with the –m option. Currently only MSZIP compression (-m MSZIP) and no compression (-m NONE) are supported.

The following command stores files in the cabinet with no compression:

Back to: Top of page > Creating Cabinets

cabarc –m NONE n mycab.c *.*

File List From a File

Cabarc can input its list of files from a text file, instead of from the command line, by using @files ("at files"). This is done by prefixing with the @ symbol the name of the file which contains the file list. For example:

cabarc n mycab.cab @filelist.txt

The text file must list the physical file names of the files to be added, one per line. As is the case when specifying filenames on the command line, the plus (+) symbol can be used as a filename to specify the beginning of a new folder. If a filename contains any embedded spaces, it must be enclosed as quotes, as shown below:

test.c

myapp.exe

"output file.exe"

The reason for requiring quotes is that each physical filename may be followed on the same line by an optional logical filename, which specifies the name under which the file will be stored in the cabinet:

test.c myapp.c

myapp.exe

"output file.exe" foobar.exe

If the logical filename contains spaces, then it must also be enclosed in quotes. Note that the logical filename overrides the –p (preserve path names) and –P (strip path name) options -the file will be added to the cabinet exactly as indicated. Wildcards are allowed in the physical filename, but in this situation a logical filename is not allowed.

The "@" feature may be used multiple times, to retrieve file lists from multiple files. Cabarc does not check for the presence of duplicate files, so if the same physical file appears in multiple file lists, it will be added to the cabinet multiple times.

The "@" feature may be combined with filenames on the command line. Files are added in the order in which they are parsed on the command line. Example:

cabarc n mycab.cab @filelist1.txt *.c @filelist2.txt *.h

Note: The "@" feature is available only when creating cabinets, not when extracting or listing cabinets.

Back to: Top of page > Creating Cabinets

List Cabinet Contents

It is possible to view the contents of a cabinet using the L (list) command, as shown below:

cabarc l mycab.cab

Cabarc will display the Set ID in the cabinet (see the –s option for cabinet creation), as well as the name of each file in the cabinet, along with its file size, file date, file time, and file attributes.

Back to: Top of page

Extracting Cabinets

The X (extract) command extracts files from a cabinet. The simplest use of the X command is shown below, which causes all files to be extracted from the cabinet:

cabarc x mycab.cab

Alternatively, it is possible to selectively extract files, by providing a list of filenames and/or wildcards:

cabarc x mycab.cab readme.txt *.exe *.c

By default, full path names (if they are present in the cabinet) are not preserved upon extraction. For example, if a file named mysrc\myproj\test.c is present in the cabinet, then the command cabarc x mycab.cab will cause the file test.c to be extracted into the current directory. In order to preserve file names upon extraction, the –p option must be used. This option will cause any required directories to be created if necessary.

Only the filename component is considered in the matching process; the pathname is discounted. For example, cabarc x mycab.cab test.c will cause the file mysrc\myproj\test.c to be extracted to the current directory as test.c, as will cabarc x mycab.cab *.c (which will also extract any other files matching *.c).

By default, the extracted files are stored in the current directory (and its subdirectories, if –p is used). However, it is possible to specify a destination directory for the extracted files. This is accomplished by appending a directory name to the command line. The directory name must end in a backslash ( \ ). Examples:

cabarc x mycab.cab c:\somedir\

cabarc x mycab.cab *.exe c:\somedir\

Back to: Top of page

Microsoft Cabinet File Format

Copyright © 1997 Microsoft Corporation. All rights reserved.

Topics in this section

Introduction

Specification

Sample Cabinet File

Notes

Introduction

This specification defines the Microsoft cabinet file format. Cabinet files are compressed packages containing a number of related files. The format of a cabinet file is optimized for maximum compression. Cabinet files support a number of compression formats, including MSZIP, LZX, or uncompressed. This document does not define these internal compression formats. For data compression formats, refer to the documents titled Microsoft MSZIP Data Compression Format and Microsoft LZX Data Compression Format.

Back to: Top of page

Specification

This segment of the documentation includes the following topics:

Conventions

Overview

Detailed Structure Specification

Back to: Top of page

Conventions

The types u1, u2, and u4 are used to represent unsigned 8-, 16-, and 32-bit integer values, respectively. All multi-byte quantities are stored in little-endian order, where the least significant byte comes first.

The cabinet file format is described here using a C-like structure notation, where successive fields appear in the structure sequentially without padding or alignment. Header fields followed by (optional) may or may not be present, depending on the values in the CFHEADER flags byte.

Back to: Top of page > Specification

Overview

Each file stored in a cabinet is stored completely within a single folder. A cabinet file may contain one or more folders, or portions of a folder. A folder can span across multiple cabinets. Such a series of cabinet files form a set. Each cabinet file contains name information for the logically adjacent cabinet files. Each folder contains one or more files. Throughout this discussion, cabinets are said to contain "files". This is for semantic purposes only. Cabinet files actually store streams of bytes, each with a name and some other common attributes. Whether these byte streams are actually files or some other kind of data is application-defined.

A cabinet file contains a cabinet header (CFHEADER), followed by one or more cabinet folder (CFFOLDER) entries, a series of one or more cabinet file (CFFILE) entries, and the actual compressed file data in CFDATA entries. The compressed file data in the CFDATA entry is stored in one of several compression formats, as indicated in the corresponding CFFOLDER structure. The compression encoding formats used are detailed in separate documents.

Back to: Top of page > Specification

Detailed Structure Specification

This segment of the documentation includes the following topics:

CFHEADER

CFFOLDER

CFFILE

CFDATA

Back to: Top of page > Specification

CFHEADER

The CFHEADER structure provides information about this cabinet file.

struct CFHEADER
{
  u1  signature[4]inet file signature */
  u4  reserved1     /* reserved */
  u4  cbCabinet    /* size of this cabinet file in bytes */
  u4  reserved2     /* reserved */
  u4  coffFiles/* offset of the first CFFILE entry */
  u4  reserved3     /* reserved */
  u1  versionMinor   /* cabinet file format version, minor */
  u1  versionMajor   /* cabinet file format version, major */
  u2  cFolders  /* number of CFFOLDER entries in this */
                        /*    cabinet */
  u2  cFiles      /* number of CFFILE entries in this cabinet */
  u2  flags        /* cabinet file option indicators */
  u2  setID        /* must be the same for all cabinets in a */
                        /*    set */
  u2  iCabinet;         /* number of this cabinet file in a set */
  u2  cbCFHeader;       /* (optional) size of per-cabinet reserved */
                        /*    area */
  u1  cbCFFolder;       /* (optional) size of per-folder reserved */
                        /*    area */
  u1  cbCFData;         /* (optional) size of per-datablock reserved */
                        /*    area */
  u1  abReserve[];      /* (optional) per-cabinet reserved area */
  u1  szCabinetPrev[];  /* (optional) name of previous cabinet file */
  u1  szDiskPrev[];     /* (optional) name of previous disk */
  u1  szCabinetNext[];  /* (optional) name of next cabinet file */
  u1  szDiskNext[];     /* (optional) name of next disk */
};
u1 signature[4]
Contains the characters 'M','S','C','F' (bytes 0x4D, 0x53, 0x43, 0x46). This field is used to assure that the file is a cabinet file.

Back to: Top of page > Specification > CFHEADER

u4 reserved1
Reserved field, set to zero.

Back to: Top of page > Specification > CFHEADER

u4 cbCabinet
Total size of this cabinet file in bytes.

Back to: Top of page > Specification > CFHEADER

u4 reserved2
Reserved field, set to zero.

Back to: Top of page > Specification > CFHEADER

u4 coffFiles
Absolute file offset of first CFFILE entry.

Back to: Top of page > Specification > CFHEADER

u4 reserved3
Reserved field, set to zero.

Back to: Top of page > Specification > CFHEADER

u1 versionMinor
u1 versionMajor
Cabinet file format version.
Currently, versionMajor = 1 and versionMinor = 3.

Back to: Top of page > Specification > CFHEADER

u2 cFolders
The number of CFFOLDER entries in this cabinet file.

Back to: Top of page > Specification > CFHEADER

u2 cFiles
The number of CFFILE entries in this cabinet file.

Back to: Top of page > Specification > CFHEADER

u2 flags
Bit-mapped values that indicate the presence of optional data:
#define cfhdrPREV_CABINET       0x0001
#define cfhdrNEXT_CABINET       0x0002
#define cfhdrRESERVE_PRESENT    0x0004

flags.cfhdrPREV_CABINET is set if this cabinet file is not the first in a set of cabinet files. When this bit is set, the szCabinetPrev and szDiskPrev fields are present in this CFHEADER.

flags.cfhdrNEXT_CABINET is set if this cabinet file is not the last in a set of cabinet files. When this bit is set, the szCabinetNext and szDiskNext fields are present in this CFHEADER.

flags.cfhdrRESERVE_PRESENT is set if this cabinet file contains any reserved fields. When this bit is set, the cbCFHeader, cbCFFolder, and cbCFData fields are present in this CFHEADER.

Other bit positions in the flags field are reserved.

Back to: Top of page > Specification > CFHEADER

u2 setID
An arbitrarily derived (random) value that binds a collection of linked cabinet files together. All cabinet files in a set will contain the same setID. This field is used by cabinet file extractors to assure that cabinet files are not inadvertently mixed. This value has no meaning in a cabinet file that is not in a set.

Back to: Top of page > Specification > CFHEADER

u2 iCabinet
Sequential number of this cabinet in a multi-cabinet set. The first cabinet has iCabinet=0. This field, along with setID, is used by cabinet file extractors to assure that this cabinet is the correct continuation cabinet when spanning cabinet files.

Back to: Top of page > Specification > CFHEADER

u2 cbCFHeader(optional)
If flags.cfhdrRESERVE_PRESENT is not set, this field is not present, and the value of cbCFHeader defaults to zero. Indicates the size in bytes of the abReserve field in this CFHEADER. Values for cbCFHeader range from 0 to 60,000.

Back to: Top of page > Specification > CFHEADER

u1 cbCFFolder(optional)
If flags.cfhdrRESERVE_PRESENT is not set, then this field is not present, and the value of cbCFFolder defaults to zero. Indicates the size in bytes of the abReserve field in each CFFOLDER entry. Values for cbCFFolder range from 0 to 255.

Back to: Top of page > Specification > CFHEADER

u1 cbCFData(optional)
If flags.cfhdrRESERVE_PRESENT is set, then this field is not present, and the value for cbCFData defaults to zero. Indicates the size in bytes of the abReserve field in each CFDATA entry. Values for cbCFData range from 0 to 255.

Back to: Top of page > Specification > CFHEADER

u1 abReserve[cbCFHeader](optional)
If flags.cfhdrRESERVE_PRESENT is set and cbCFHeader is non-zero, then this field contains per-cabinet-file application information. This field is defined by the application and used for application-defined purposes.

Back to: Top of page > Specification > CFHEADER

u1 szCabinetPrev[](optional)
If flags.cfhdrPREV_CABINET is not set, then this field is not present. NUL-terminated ASCII string containing the file name of the logically previous cabinet file. May contain up to 255 bytes plus the NUL byte. Note that this gives the name of the most-recently-preceding cabinet file that contains the initial instance of a file entry. This might not be the immediately previous cabinet file, when the most recent file spans multiple cabinet files. If searching in reverse for a specific file entry, or trying to extract a file that is reported to begin in the "previous cabinet", szCabinetPrev would give the name of the cabinet to examine.

Back to: Top of page > Specification > CFHEADER

u1 szDiskPrev[](optional)
If flags.cfhdrPREV_CABINET is not set, then this field is not present. NUL-terminated ASCII string containing a descriptive name for the media containing the file named in szCabinetPrev, such as the text on the diskette label. This string can be used when prompting the user to insert a diskette. May contain up to 255 bytes plus the NUL byte.

Back to: Top of page > Specification > CFHEADER

u1 szCabinetNext[](optional)
If flags.cfhdrNEXT_CABINET is not set, then this field is not present. NUL-terminated ASCII string containing the file name of the next cabinet file in a set. May contain up to 255 bytes plus the NUL byte. Files extending beyond the end of the current cabinet file are continued in the named cabinet file.

Back to: Top of page > Specification > CFHEADER

u1 szDiskNext[](optional)
If flags.cfhdrNEXT_CABINET is not set, then this field is not present. NUL-terminated ASCII string containing a descriptive name for the media containing the file named in szCabinetNext, such as the text on the diskette label. May contain up to 255 bytes plus the NUL byte. This string can be used when prompting the user to insert a diskette.

Back to: Top of page > Specification > CFHEADER

CFFOLDER

Each CFFOLDER structure contains information about one of the folders or partial folders stored in this cabinet file. The first CFFOLDER entry immediately follows the CFHEADER entry. CFHEADER.cFolders indicates how many CFFOLDER entries are present.

Folders may start in one cabinet, and continue on to one or more succeeding cabinets. When the cabinet file creator detects that a folder has been continued into another cabinet, it will complete that folder as soon as the current file has been completely compressed. Any additional files will be placed in the next folder. Generally, this means that a folder would span at most two cabinets, but if the file is large enough, it could span more than two cabinets.

CFFOLDER entries actually refer to folder fragments, not necessarily complete folders. A CFFOLDER structure is the beginning of a folder if the iFolder value in the first file referencing the folder does not indicate the folder is continued from the previous cabinet file.

The typeCompress field may vary from one folder to the next, unless the folder is continued from a previous cabinet file.

Back to: Top of page > Specification

struct CFFOLDER
{
  u4  coffCabStart;  /* offset of the first CFDATA block in this 
                     /*    folder */
  u2  cCFData;       /* number of CFDATA blocks in this folder */
  u2  typeCompress;  /* compression type indicator */
  u1  abReserve[];   /* (optional) per-folder reserved area */
};
u4 coffCabStart
Absolute file offset of first CFDATA block for this folder.

Back to: Top of page > Specification > CFFOLDER

u2 cCFData
Number of CFDATA structures for this folder that are actually in this cabinet. A folder can continue into another cabinet and have more CFDATA blocks in that cabinet, and a folder may have started in a previous cabinet. This number represents only the CFDATA structures for this folder that are at least partially recorded in this cabinet.

Back to: Top of page > Specification > CFFOLDER

u2 typeCompress
Indicates the compression method used for all CFDATA entries in this folder. The valid values are defined in each compression format's specification.

Back to: Top of page > Specification > CFFOLDER

u1 abReserve[CFHEADER.cbCFFolder](optional)
If CFHEADER.flags.cfhdrRESERVE_PRESENT is set and cbCFFolder is non-zero, then this field contains per-folder application information. This field is defined by the application and used for application-defined purposes.

Back to: Top of page > Specification > CFFOLDER

CFFILE

Each CFFILE entry contains information about one of the files stored (or at least partially stored) in this cabinet. The first CFFILE entry in each cabinet is found at absolute offset CFHEADER.coffFiles. CFHEADER.cFiles indicates how many of these entries are in the cabinet. The CFFILE entries in a cabinet are ordered by iFolder value, then by uoffFolderStart. Entries for files continued from the previous cabinet will be first, and entries for files continued to the next cabinet will be last.

Back to: Top of page > Specification

struct CFFILE
{
  u4  cbFile;           /* uncompressed size of this file in bytes */
  u4  uoffFolderStart;  /* uncompressed offset of this file in the folder */
  u2  iFolder;          /* index into the CFFOLDER area */
  u2  date;             /* date stamp for this file */
  u2  time;             /* time stamp for this file */
  u2  attribs;          /* attribute flags for this file */
  u1  szName[];         /* name of this file */
};
u4 cbFile
Uncompressed size of this file in bytes.

Back to: Top of page > Specification > CFFILE

u4 uoffFolderStart
Uncompressed byte offset of the start of this file's data. For the first file in each folder, this value will usually be zero. Subsequent files in the folder will have offsets that are typically the running sum of the cbFile values.

Back to: Top of page > Specification > CFFILE

u2 iFolder
Index of the folder containing this file's data. A value of zero indicates this is the first folder in this cabinet file. The special iFolder values ifoldCONTINUED_FROM_PREV and ifoldCONTINUED_PREV_AND_NEXT indicate that the folder index is actually zero, but that extraction of this file would have to begin with the cabinet named in CFHEADER.szCabinetPrev. The special iFolder values ifoldCONTINUED_PREV_AND_NEXT and ifoldCONTINUED_TO_NEXT indicate that the folder index is actually one less than CFHEADER.cFolders, and that extraction of this file will require continuation to the cabinet named in CFHEADER.szCabinetNext.
#define ifoldCONTINUED_FROM_PREV      (0xFFFD)
#define ifoldCONTINUED_TO_NEXT        (0xFFFE)
#define ifoldCONTINUED_PREV_AND_NEXT  (0xFFFF)

Back to: Top of page > Specification > CFFILE

u2 date
Date of this file, in the format ((year–1980) << 9)+(month << 5)+(day), where month={1..12} and day={1..31}. This "date" is typically considered the "last modified" date in local time, but the actual definition is application-defined.

Back to: Top of page > Specification > CFFILE

u2 time
Time of this file, in the format (hour << 11)+(minute << 5)+(seconds/2), where hour={0..23}. This "time" is typically considered the "last modified" time in local time, but the actual definition is application-defined.

Back to: Top of page > Specification > CFFILE

u2 attribs
Attributes of this file; may be used in any combination:
#define  _A_RDONLY       (0x01)  /* file is read-only */
#define  _A_HIDDEN       (0x02)  /* file is hidden */
#define  _A_SYSTEM       (0x04)  /* file is a system file */
#define  _A_ARCH         (0x20)  /* file modified since last backup */
#define  _A_EXEC         (0x40)  /* run after extraction */
#define  _A_NAME_IS_UTF  (0x80)  /* szName[] contains UTF */

All other attribute bit values are reserved.

Back to: Top of page > Specification > CFFILE

char szName[]
NUL-terminated name of this file. Note that this string may include path separator characters. When attribs._A_NAME_IS_UTF is set, this string can be converted directly to Unicode, avoiding locale-specific dependencies. See "UTF Encoding" for more information. When attribs._A_NAME_IS_UTF is not set, this string is subject to interpretation depending on locale.

Back to: Top of page > Specification > CFFILE

CFDATA

Each CFDATA record describes some amount of compressed data. The first CFDATA entry for each folder is located using CFFOLDER.coffCabStart. Subsequent CFDATA records for this folder are contiguous.

Back to: Top of page > Specification

struct CFDATA
{
  u4  csum;         /* checksum of this CFDATA entry */
  u2  cbData;       /* number of compressed bytes in this block */
  u2  cbUncomp;     /* number of uncompressed bytes in this block */
  u1  abReserve[];  /* (optional) per-datablock reserved area */
  u1  ab[cbData];   /* compressed data bytes */
};
u4 csum
Checksum of this CFDATA structure, from CFDATA.cbData through CFDATA.ab[cbData-1]. See "Checksum Method" for more information. May be set to zero if the checksum is not supplied.

Back to: Top of page > Specification > CFDATA

u2 cbData
Number of bytes of compressed data in this CFDATA record. When cbUncomp is zero, this field indicates only the number of bytes that fit into this cabinet file.

Back to: Top of page > Specification > CFDATA

u2 cbUncomp
The uncompressed size of the data in this CFDATA entry. When this CFDATA entry is continued in the next cabinet file, cbUncomp will be zero, and cbUncomp in the first CFDATA entry in the next cabinet file will report the total uncompressed size of the data from both CFDATA blocks.

Back to: Top of page > Specification > CFDATA

u1 abReserve[CFHEADER.cbCFData](optional)
If CFHEADER.flags.cfhdrRESERVE_PRESENT is set and cbCFHeader is non-zero, then this field contains per-datablock application information. This field is defined by the application and used for application-defined purposes.

Back to: Top of page > Specification > CFDATA

u1 ab[cbData]
The compressed data bytes, compressed using the CFFOLDER.typeCompress method. When cbUncomp is zero, these data bytes must be combined with the data bytes from the next cabinet's first CFDATA entry before decompression.

When CFFOLDER.typeCompress indicates that the data is not compressed, this field contains the uncompressed data bytes. In this case, cbData and cbUncomp will be equal unless this CFDATA entry crosses a cabinet file boundary.

Back to: Top of page > Specification > CFDATA

A Sample Cabinet File

       0   1   2   3   4   5   6    7    8   9   A   B   C   D   E   F
000   4D   53  43  46  00  00  00 00-FD  00  00  00  00  00  00  00  MSCF
010   2C   00  00  00  00  00  00 00-03  01  01  00  02  00  00  00  
020   22   06  00  00  5E  00  00 00-01  00  00  00  4D  00  00  00  
030   00   00  00  00  00  00  6C 22-BA  59  20  00  68  65  6C  6C  hell
040   6F   2E  63  00  4A  00  00 00-4D  00  00  00  00  00  6C  22  o.c
050   E7   59  20  00  77  65  6C 63-6F  6D  65  2E  63  00  BD  5A  welcome.c
060   A6   30  97  00  97  00  23 69-6E  63  6C  75  64  65  20  3C  #include <
070   73   74  64  69  6F  2E  68 3E-0D  0A  0D  0A  76  6F  69  64  stdio.h>    void
080   20   6D  61  69  6E  28  76 6F-69  64  29  0D  0A  7B  0D  0A  main(void)  {
090   20   20  20  20  70  72  69 6E-74  66  28  22  48  65  6C  6C  printf("Hell
0A0   6F   2C  20  77  6F  72  6C 64-21  5C  6E  22  29  3B  0D  0A  o, world!\n");
0B0   7D   0D  0A  23  69  6E  63 6C-75  64  65  20  3C  73  74  64  }  #include <std
0C0   69   6F  2E  68  3E  0D  0A 0D-0A  76  6F  69  64  20  6D  61  io.h>    void ma
0D0   69   6E  28  76  6F  69  64 29-0D  0A  7B  0D  0A  20  20  20  in(void)  {
0E0   20   70  72  69  6E  74  66 28-22  57  65  6C  63  6F  6D  65  printf("Welcome
0F0   21   5C  6E  22  29  3B  0D 0A-7D  0D  0A  0D  0A              !\n");  }

This is a very simple example of a cabinet file which contains two small text files, stored uncompressed for clarity.

Back to: Top of page

   Offset   Description
   00..23   CFHEADER
   00..03   signature = 0x4D, 0x53, 0x43, 0x46
   04..07   reserved1
   08..0B   cbCabinet = 0x000000FD (253)
   0C..0F   reserved2
   10..13   coffFiles = 0x0000002C
   14..17   reserved3
   18..19   versionMinor, Major = 1.3
   1A..1B   cFolders = 1
   1C..1D   cFiles = 2
   1E..1F   flags = 0 (no reserve, no previous or next cabinet)
   20..21   setID = 0x0622
   22..23   iCabinet = 0

   24..2B   CFFOLDER[0]
   24..27   coffCabStart = 0x0000005E
   28..29   cCFData = 1
   2A..2B   typeCompress = 0 (none)

   2C..43   CFFILE[0]
   2C..2F   cbFile = 0x0000004D (77 bytes)
   30..33   uoffFolderStart = 0x00000000
   34..35   iFolder = 0
   36..37   date = 0x226C = 0010001 0011 01100 = March 12, 1997
   38..39   time = 0x59BA = 01011 001101 11010 = 11:13:52 AM
   3A..3B   attribs = 0x0020 = _A_ARCHIVE
   3C..43   szName = "hello.c" + NUL

   44..5D   CFFILE[1]
   44..47   cbFile = 0x0000004A (74 bytes)
   48..4B   uoffFolderStart = 0x0000004D
   4C..4D   iFolder = 0
   4E..4F   date = 0x226C = 0010001 0011 01100 = March 12, 1997
   50..51   time = 0x59E7 = 01011 001111 00111 = 11:15:14 AM
   52..53   attribs = 0x0020 = _A_ARCHIVE
   54..5D   szName = "welcome.c" + NUL

   5E..FD   CFDATA[0]
   5E..61   csum = 0x30A65ABD
   62..63   cbData = 0x0097 (151 bytes)
   64..65   cbUncomp = 0x0097 (151 bytes)
   66..FD   ab[0x0097] = uncompressed file data

Notes

Checksum Method

The computation and verification of checksums found in CFDATA entries cabinet files is done using a function named CSUMCompute. Its actual source code is provided for reference. When checksums are not supplied by the cabinet file creating application, the checksum field is set to zero. Cabinet extracting applications do not compute or verify the checksum if the field is set to zero.

CHECKSUM CSUMCompute(void *pv, UINT cb, CHECKSUM seed)
{
    int         cUlong;                 // Number of ULONGs in block
    CHECKSUM    csum;                   // Checksum accumulator
    BYTE       *pb;
    ULONG       ul;

    cUlong = cb / 4;                    // Number of ULONGs
    csum = seed;                        // Init checksum
    pb = pv;                            // Start at front of data block

    //** Checksum integral multiple of ULONGs
    while (cUlong-- > 0) {
        //** NOTE: Build ULONG in big/little-endian independent manner
        ul = *pb++;                     // Get low-order byte
        ul |= (((ULONG)(*pb++)) <<  8); // Add 2nd byte
        ul |= (((ULONG)(*pb++)) << 16); // Add 3nd byte
        ul |= (((ULONG)(*pb++)) << 24); // Add 4th byte

        csum ^= ul;                     // Update checksum
    }

    //** Checksum remainder bytes
    ul = 0;
    switch (cb % 4) {
        case 3:
            ul |= (((ULONG)(*pb++)) << 16); // Add 3nd byte
        case 2:
            ul |= (((ULONG)(*pb++)) <<  8); // Add 2nd byte
        case 1:
            ul |= *pb++;                    // Get low-order byte
        default:
            break;
    }
    csum ^= ul;                         // Update checksum

    //** Return computed checksum
    return csum;
}

The checksums for non-split CFDATA blocks are computed first on the compressed data bytes, then on the CFDATA header area, starting at the CFDATA.cbData field:

CFDATA.cbData = cbCompressed;
CFDATA.cbUncomp = cbUncompressed;
csumPartial = CSUMCompute(&CFDATA.ab[0],CFDATA.cbData,0);
CFDATA.csum = CSUMCompute(&CFDATA.cbData,sizeof(CFDATA) –
sizeof(CFDATA.csum),csumPartial);

When blocks are split across cabinet file boundaries, the checksum for the partial block at the end of a cabinet file is computed first on the partial field of compressed data bytes, then on the header:

CFDATA.cbData = cbPartialData;
CFDATA.cbUncomp = 0;
csumPartial = CSUMCompute(&CFDATA.ab[0],cbPartialData,0);
CFDATA.csum = CSUMCompute(&CFDATA.cbData,sizeof(CFDATA) –
sizeof(CFDATA.csum),csumPartial);

The checksum for the residual block in the next cabinet file is computed first on the remainder of the field of compressed data bytes, then on the header:

CFDATA.cbData = cbResidualData;
CFDATA.cbUncomp = cbUncompressed;
csumPartial = CSUMCompute(&CFDATA.ab[cbPartialData],cbResidualData,0);
CFDATA.csum = CSUMCompute(&CFDATA.cbData,sizeof(CFDATA) –
sizeof(CFDATA.csum),csumPartial);

UTF Encoding Method

UTF (universal text format) is used to compactly represent a broad range of Unicode characters while favoring size for the most common characters. Unicode characters are translated to sequences of one, two, or three bytes per character.

When a string containing Unicode characters larger than 0x007F are encoded in the CFFILE.szName field, the _A_NAME_IS_UTF attribute should be included in the file's attributes. When no characters larger than 0x007F are in the name, the _A_NAME_IS_UTF attribute should not be set. If byte values larger than 0x7F are found in CFFILE.szName, but the _A_NAME_IS_UTF attribute is not set, the characters should be interpreted according to the current locale.

Unicode characters with values 0x0000 through 0x007F are represented by a single byte of the same value.

The first byte emitted for Unicode characters 0x0080 through 0x07FF is 0xC0+(unicodevalue >> 6), and the second byte is 0x80+(unicodevalue & 0x003F).

Unicode characters 0x0800 through 0xFFFF are represented by byte1 = 0xE0+(unicodevalue >> 12), byte2 = 0x80+((unicodevalue >> 6) & 0x3F), and byte3 = 0x80+(unicodevalue & 0x3F).

Microsoft FCI/FDI Library Description

Copyright © 1996-1997 Microsoft Corporation. All rights reserved.

Topics in this section

Introduction

FCI

FDI

Introduction

The FCI (File Compression Interface) and FDI (File Decompression Interface) libraries provide the ability to create and extract files from cabinets (also known as "CAB files"). In addition, the libraries provide compression and decompression capability to reduce the size of file data stored in cabinets.

The FCI and FDI libraries, FCI.LIB and FDI.LIB, are available in both 32-bit and 16-bit forms. However, the 16-bit version will run more slowly than the 32-bit version.

FCI and FDI support multiple simultaneous contexts, so it is possible to create or extract multiple cabinets simultaneously within the same application. If the application is multi-threaded, it is also possible to run a different context in each thread; however, it is not permitted for the application to use the same context simultaneously in multiple threads (e.g. one cannot call FCIAddFile from two different threads, using the same FCI context).

FCI and FDI operate using the technique of function callbacks; some of the parameters of the FCI and FDI APIs are pointers to functions in the client application. The parameters and purpose of these functions are explained fully in this document. The fci_int.h and fd_int.h header files provide macros for declaring the callback functions, and use keywords such HUGE, FAR, and DIAMONDAPI, which ensure that the functions are properly defined for both 32-bit and 16-bit operation. For example, in the case of the memory allocation and memory free functions, the following definitions exist in fci_int.h:

#define FNFCIALLOC(fn) void HUGE * FAR DIAMONDAPI fn(ULONG cb)
#define FNFCIFREE(fn) void FAR DIAMONDAPI fn(void HUGE *pv)

These declarations can be used as follows:

FNFCIALLOC(mem_alloc)
{
      return malloc(cb);
}

FNFCIFREE(mem_free)
{
      return free(memory);
}

some_function()
{
      hfci = FCICreate(
            &erf, 
            filedest, 
            mem_alloc, 
            mem_free,
            etc.
      );
}

It should be noted that the FCI callback function names all begin with the string "FCI". In addition, the FCI and FDI i/o functions (open, close, read, write, seek) take different parameters, and cannot be used interchangeably.

The FDI i/o functions take parameters which are identical to those of the C run-time library routines _open, close, read, write, and lseek. The FCI i/o functions take similar parameters, with the addition of an error pointer in which to return an i/o error, and the client's context pointer originally passed in to the FCICreate API.

Two example applications are provided; testfci and testfdi. These applications demonstrate how all of the FCI and FDI APIs, respectively, may be used.

Back to: Top of page

FCI

The five FCI (File Compression Interface) APIs are:

API Description
FCICreate Create an FCI context
FCIAddFile Add a file to the cabinet under construction
FCIFlushCabinet Complete the current cabinet
FCIFlushFolder Complete the current folder and start a new folder
FCIDestroy Destroy an FCI context

Back to: Top of page

FCICreate

Back to: Top of page

HFCI DIAMONDAPI FCICreate(
      PERF               perf, 
      PFNFCIFILEPLACED   pfnfiledest, 
      PFNFCIALLOC        pfnalloc, 
      PFNFCIFREE         pfnfree, 
      PFNFCIOPEN         pfnopen, 
      PFNFCIREAD         pfnread, 
      PFNFCIWRITE        pfnwrite, 
      PFNFCICLOSE        pfnclose, 
      PFNFCISEEK         pfnseek, 
      PFNFCIDELETE       pfndelete, 
      PFNFCIGETTEMPFILE  pfnfcigtf, 
      PCCAB              pccab, 
      void FAR *         pv 
);

Back to: Top of page > FCI > FCICreate

Parameters

perf

Pointer to an error structure

pfnfiledest

Function to call when a file is placed

pfnalloc

Memory allocation function

pfnfree

Memory free function

pfnopen

Function to open a file

pfnread

Function to read data from a file

pfnwrite

Function to write data to a file

pfnclose

Function to close a file

pfnseek

Function to seek to a new position in a file

pfntemp

Function to obtain a temporary file name

pfndelete

Function to delete a file

pccab

Parameters for creating cabinet

pv

Client context parameter

Back to: Top of page > FCI > FCICreate

Description

The FCICreate API creates an FCI context that is passed to other FCI APIs.

The perf parameter should point to a global or allocated ERF structure. Any errors returned by FCICreate or subsequent FCI APIs using the same context will cause the ERF structure to be filled out.

The pfnalloc and pfnfree parameters should point to memory allocation and memory free functions which will be called by FCI to allocate and free memory. These two functions take parameters identical to the standard C malloc and free functions.

The pfnopen, pfnread, pfnwrite, pfnclose, pfnseek, and pfndelete parameters should point to functions which perform file open, file read, file write, file close, file seek, and file delete operations respectively. These functions must accept parameters similar to those for the standard _open, _read, _write, _close, _lseek, and remove functions, with the addition of two additional parameters to the list; err and pv. The err parameter is an int *, and upon entry into the function, *err will equal zero. However, if the function returns failure, *err should be set to an error code of the application's choosing, which will be returned via perf (the error code is not used by FCI, and is not required to conform to C run-time library errno conventions). The pv parameter will equal the client's context parameter passed in to FCICreate.

The pfntemp parameter should point to a function which returns the name of a suitable temporary file. Three parameters will be passed to this function; pszTempName, an area of memory to store the filename, cbTempName, the size of the memory area, and pv, the client's context pointer. The filename returned by this function should not occupy more than cbTempName bytes. FCI may open several temporary files at once, so it is important to ensure that a different filename is returned each time, and that the file does not already exist. The function should return TRUE for success, or FALSE for failure.

The pfnfiledest parameter should point to a function which will be called whenever the location of a file or file segment on a particular cabinet has been finalized. This information is useful only when files are being stored across multiple cabinets. The parameters passed to this function are pccab, a pointer to the CCAB structure of the cabinet on which the file has been stored, pszFile, the filename of the file which has been placed, cbFile, the file size, and fContinuation, a Boolean which signifies whether the file is a later segment of a file which has been split across cabinets. In addition, the client context value, pv, is also passed as a parameter.

The pccab parameter should point to an initialized CCAB structure, which will provide FCI with details on how to build the cabinet. The CCAB fields are explained below:

The cb field, the media size, specifies the maximum size of a cabinet which will be created by FCI. If necessary, multiple cabinets will be created. To ensure that only one cabinet is created, a sufficiently large number should be used for this parameter.

The cbFolderThresh field specifies the maximum number of compressed bytes which may reside in a folder before a new folder is created. A higher folder threshold improves compression performance (since creating a new folder resets the compression history), but increases random access time to the folder.

The iCab field is used by FCI to count the number of cabinets that have been created so far. This value can also be read by the application to determine the name of a cabinet. See the GetNextCab parameter of the FCIAddFile API for details.

The iDisk field is used in a similar manner to iCab. See the GetNextCab parameter of the FCIAddFile API for details.

The setID field is for the use of the application, and can be initialized with any number. The set ID is stored in the cabinet.

The szDisk field should contain a disk-specific string (such as "Disk1", "Disk2", etc.) corresponding to the disk on which the cabinet is placed. Alternatively, if cabinets are not spanning multiple disks, the string can simply be a null string. This field is stored in the cabinet and is used upon extraction to prompt the user to insert the correct disk. See the FCIAddFile API for details.

The szCab field should contain a string which contains the name of the first cabinet to be created (e.g. "APP1.CAB"). In the event of multiple cabinets being created, the GetNextCab function called by the FCIAddFile API allows subsequent cabinet names to be specified.

The szCabPath field should contain the complete path of where to create the cabinet (e.g. "C:\MYFILES\").

The cbReserveCFHeader, cbReserveCFFolder, and cbReserveCFData fields can be set to create per-cabinet, per-folder, and per-datablock reserved sections in the cabinet. For example, setting cbReserveCFHeader to 6144 is commonly used to reserve a 6k space in the cabinet file as needed for codesigning. The other reserved sections are not commonly used.

Back to: Top of page > FCI > FCICreate

Returns

If successful, a non-NULL HFCI context pointer is returned. If unsuccessful, NULL is returned, and the error structure pointed to by perf is filled out.

Back to: Top of page > FCI > FCICreate

FCIAddFile

Back to: Top of page

BOOL DIAMONDAPI FCIAddFile(
      HFCI                  hfci, 
      char                 *pszSourceFile, 
      char                 *pszFileName, 
      BOOL                  fExecute, 
      PFNFCIGETNEXTCABINET  GetNextCab, 
      PFNFCISTATUS          pfnProgress, 
      PFNFCIGETOPENINFO     pfnOpenInfo, 
      TCOMP                 typeCompress 
);

Back to: Top of page > FCI > FCIAddFile

Parameters

hfci

FCI Context pointer originally returned by FCICreate

pszSourceFile

Name of file to add (should include path information)

pszFileName

Name under which to store the file in the cabinet
fExecute
Boolean indicating whether the file should be executed when it is extracted

GetNextCab

Function called to obtain specifications on the next cabinet to create

pfnProgress

Progress function called to update the user

pfnOpenInfo

Function called to open a file and return file date, time and attributes

typeCompress

Compression type to use

Back to: Top of page > FCI > FCIAddFile

Description

The FCIAddFile API adds a file to the cabinet under construction.

The hfci parameter must be the context pointer returned by a previous call to FCICreate.

The pszSourceFile parameter specifies the location of the file to be added to the cabinet, and should therefore include as much path information as possible (e.g. "C:\MYFILES\TEST.EXE").

The pszFileName parameter specifies the name of the file inside the cabinet, and should not include any path information (e.g. "TEST.EXE").

The fExecute parameter specifies whether the file should be executed automatically when the cabinet is extracted. When set, the _A_EXEC attribute will be added to the file entry in the CAB. This mechanism is used in some Microsoft self-extracting executables, and could be used for this purpose in any custom extract application.

The GetNextCab parameter should point to a function which is called whenever FCI wishes to create a new cabinet, which will happen whenever the size of the cabinet is about to exceed the media size as specified in the cb field of the CCAB structure passed to FCICreate. The GetNextCab function is called with three parameters which are explained below:

The first parameter, pccab, is a pointer to a copy of the CCAB structure of the cabinet which has just been completed. However, the iCab field will have been incremented by one. When this function returns, the next cabinet will be created using the fields in this structure, so these fields should be modified as is necessary. In particular, the szCab field (the cabinet name) should be changed. If creating multiple cabinets, typically the iCab field is used to create the name; for example, the GetNextCab function might include a line that does:

sprintf(pccab->szCab, "FOO%d.CAB", pccab->iCab);

Similarly, the disk name, media size, folder threshold, etc. parameters may also be modified.

The second parameter, cbPrevCab, is an estimate of the size of the cabinet which has just been completed.

The last parameter, pv, is the application-defined value originally passed to FCICreate.

The GetNextCab function should return TRUE for success, or FALSE to abort cabinet creation.

The pfnProgress parameter should point to a function that is called periodically by FCI so that the application may send a progress report to the user. The progress function has four parameters; typeStatus, which specifies the type of status message, cb1 and cb2, which are numbers, the meaning of which is dependent upon typeStatus, and pv, the application-specific context pointer.

The typeStatus parameter may take on values of statusFile, statusFolder, or statusCabinet. If typeStatus equals statusFile then it means that FCI is compressing data blocks into a folder. In this case, cb1 is either zero, or the compressed size of the most recently compressed block, and cb2 is either zero, or the uncompressed size of the most recently read block (which is usually 32K, except for the last block in a folder, which may be smaller). There is no direct relation between cb1 and cb2; FCI may read several blocks of uncompressed data before emitting any compressed data; if this happens, some statusFile messages may contain, for example, cb1 = 0 and cb2 = 32K, followed later by other messages which contain cb1 = 20K and cb2 = 0.

If typeStatus equals statusFolder then it means that FCI is copying a folder to a cabinet, and cb1 is the amount copied so far, and cb2 is the total size of the folder. Finally, if typeStatus equals statusCabinet, then it means that FCI is writing out a completed cabinet, and cb1 is the estimated cabinet size that was previously passed to GetNextCab, and cb2 is the actual resulting cabinet size.

The progress function should return 0 for success, or -1 for failure, with an exception in the case of statusCabinet messages, where the function should return the desired cabinet size (cb2), or possibly a value rounded up to slightly higher than that.

The pfnOpenInfo parameter should point to a function which opens a file and returns its datestamp, timestamp, and attributes. The function will receive five parameters; pszName, the complete pathname of the file to open; pdate, a memory location to return a FAT-style date code; ptime, a memory location to return a FAT-style time code; pattribs, a memory location to return FAT-style attributes; and pv, the application-specific context pointer originally passed to FCICreate. The function should open the file using a file open function compatible with those passed in to FCICreate, and return the resulting file handle, or -1 if unsuccessful.

The typeCompress parameter specifies the type of compression to use, which may be either tcompTYPE_NONE for no compression, or tcompTYPE_MSZIP for Microsoft ZIP compression. Other compression formats may be supported in the future.
Back to: Top of page > FCI > FCIAddFile

Returns

If successful, TRUE is returned. If unsuccessful, FALSE is returned, and the error structure pointed to by perf (from FCICreate) is filled out.

Back to: Top of page > FCI > FCIAddFile

FCIFlushCabinet

Back to: Top of page

BOOL DIAMONDAPI FCIFlushCabinet(
      HFCI                  hfci, 
      BOOL                  fGetNextCab, 
      PFNFCIGETNEXTCABINET  GetNextCab, 
      PFNFCISTATUS          pfnProgress 
);

Back to: Top of page > FCI > FCIFlushCabinet

Parameters

hfci

FCI Context pointer originally returned by FCICreate

fGetNextCab

Name of file to add (should include path information)

GetNextCab

Function called to obtain specifications on the next cabinet to create

pfnProgress

Progress function called to update the user

Back to: Top of page > FCI > FCIFlushCabinet

Description

The FCIFlushCabinet API forces the current cabinet under construction to be completed immediately and written to disk. Further calls to FCIAddFile will cause files to be added to another cabinet. It is also possible that there exists pending data in FCI's internal buffers that will may require spillover into another cabinet, if the current cabinet has reached the application-specified media size limit.

The hfci parameter must be the context pointer returned by a previous call to FCICreate.

The fGetNextCab flag determines whether the function pointed to by the supplied GetNextCab parameter, will be called. If fGetNextCab is TRUE, then GetNextCab will be called to obtain continuation information. Otherwise, if fGetNextCab is FALSE, then GetNextCab will be called only if the cabinet overflows.

The pfnProgress parameter should point to a function which is called periodically by FCI so that the application may send a progress report to the user. This function works in an identical manner to the progress function passed to FCIAddFile.

Back to: Top of page > FCI > FCIFlushCabinet

Returns

If successful, TRUE is returned. If unsuccessful, FALSE is returned, and the error structure pointed to by perf (from FCICreate) is filled out.

Back to: Top of page > FCI > FCIFlushCabinet

FCIFlushFolder

Back to: Top of page

BOOL DIAMONDAPI FCIFlushFolder(
      HFCI                  hfci, 
      PFNFCIGETNEXTCABINET  GetNextCab, 
      PFNFCISTATUS          pfnProgress 
);

Back to: Top of page > FCI > FCIFlushFolder

Parameters

hfci

FCI Context pointer originally returned by FCICreate

GetNextCab

Function called to obtain specifications on the next cabinet to create

pfnProgress

Progress function called to update the user

Back to: Top of page > FCI > FCIFlushFolder

Description

The FCIFlushFolder API forces the current folder under construction to be completed immediately, effectively resetting the compression history at this point (if compression is being used).

The hfci parameter must be the context pointer returned by a previous call to FCICreate.

The supplied GetNextCab function will be called if the cabinet overflows, which is a possibility if the pending data buffered inside FCI causes the application-specified cabinet media size to be exceeded.

The pfnProgress parameter should point to a function which is called periodically by FCI so that the application may send a progress report to the user. This function works in an identical manner to the progress function passed to FCIAddFile.

Back to: Top of page > FCI > FCIFlushFolder

FCIDestroy

Back to: Top of page

BOOL DIAMONDAPI FCIDestroy(
      HFCI  hfci
);

Back to: Top of page > FCI > FCIDestroy

Parameters

hfci

FCI Context pointer originally returned by FCICreate

Back to: Top of page > FCI > FCIDestroy

Description

The FCIDestroy API destroys an hfci context, freeing any memory and temporary files associated with the context.

Back to: Top of page > FCI > FCIDestroy

Returns

If successful, TRUE is returned. If unsuccessful, FALSE is returned. The only reason for failure is that the hfci passed in was not a proper context handle.

Back to: Top of page > FCI > FCIDestroy

FDI

The five FDI (File Decompression Interface) APIs are:

API Description
FDICreate Create an FCI context
FDIIsCabinet Determines whether a file is a cabinet, and returns information if so
FDICopy Extracts files from cabinets
FDIDestroy Destroy an FDI context

Back to: Top of page

FDICreate

Back to: Top of page

HFCI DIAMONDAPI FDICreate(
      PFNALLOC  pfnalloc, 
      PFNFREE   pfnfree, 
      PFNOPEN   pfnopen, 
      PFNREAD   pfnread, 
      PFNWRITE  pfnwrite, 
      PFNCLOSE  pfnclose, 
      PFNSEEK   pfnseek, 
      int       cpuType, 
      PERF      perf 
);

Back to: Top of page > FDI

Parameters

pfnalloc

Memory allocation function

pfnfree

Memory free function

pfnopen

Function to open a file

pfnread

Function to read data from a file

pfnwrite

Function to write data to a file

pfnclose

Function to close a file

pfnseek

Function to seek to a new position in a file

cpuType

Type of CPU

perf

Pointer to an error structure

Back to: Top of page > FDI > FDICreate

Description

The FDICreate API creates an FDI context that is passed to other FDI APIs.

The pfnalloc and pfnfree parameters should point to memory allocation and memory free functions which will be called by FDI to allocate and free memory. These two functions take parameters identical to the standard C malloc and free functions.

The pfnopen, pfnread, pfnwrite, pfnclose, and pfnseek parameters should point to functions which perform file open, file read, file write, file close, and file seek operations respectively. These functions should accept parameters identical to those for the standard _open, _read, _write, _close, and _lseek functions, and should likewise have identical return codes. Note that the FDI i/o functions do not take the same parameters as the FCI i/o functions.

It is not necessary for these functions to actually call _open etc.; these functions could instead call fopen, fread, fwrite, fclose, and fseek, or CreateFile, ReadFile, WriteFile, CloseHandle, and SetFilePointer, etc. However, the parameters and return codes will have to be translated appropriately (e.g. the file open mode passed in to pfnopen).

The cpuType parameter should equal one of cpu80386 (indicating that 80386 instructions may be used), cpu80286 (indicating that only 80286 instructions may be used), or cpuUNKNOWN (indicating that FDI should determine the CPU type). The cpuType parameter is looked at only by the 16-bit version of FDI; it is ignored by the 32-bit version of FDI.

The perf parameter should point to a global or allocated ERF structure. Any errors returned by FDICreate or subsequent FDI APIs using the same context will cause the ERF structure to be filled out.

Back to: Top of page > FDI > FDICreate

Returns

If successful, a non-NULL HFDI context pointer is returned. If unsuccessful, NULL is returned, and the error structure pointed to by perf is filled out.

Back to: Top of page > FDI > FDICreate

FDIIsCabinet

Back to: Top of page

BOOL DIAMONDAPI FDIIsCabinet(
      HFDI             hfdi, 
      int              hf, 
      PFDICABINETINFO  pfdici 
);

Back to: Top of page > FDI > FDIIsCabinet

Parameters

hfdi

FDI Context pointer originally returned by FDICreate

hf

File handle returned by a call to the application's file open function

pfdici

Pointer to a cabinet info structure

Back to: Top of page > FDI > FDIIsCabinet

Description

The FDIIsCabinet API determines whether a given file is a cabinet, and if so, returns information about the cabinet in the provided FDICABINETINFO structure.

The hfdi parameter is the context pointer returned by a previous call to FDICreate.

The hf parameter must be a file handle on the file being examined. The file handle must be of the same type as those used by the file i/o functions passed to FDICreate.

The pfdici parameter should point to an FDICABINETINFO structure, which will receive the cabinet details if the file is indeed a cabinet. The fields of this structure are as follows:

The cbCabinet field contains the length of the cabinet file, in bytes. The cFolders field contains the number of folders in the cabinet. The cFiles field contains the total number of files in the cabinet. The setID field contains the set ID (an application-defined magic number) of the cabinet. The iCabinet field contains the number of this cabinet in the set (0 for the first cabinet, 1 for the second, and so forth). The fReserve field is a Boolean indicating whether there is a reserved area present in the cabinet. The hasprev field is a Boolean indicating whether this cabinet is chained to the previous cabinet, by way of having a file continued from the previous cabinet into the current one. The hasnext field is a Boolean indicating whether this cabinet is chained to the next cabinet, by way of having a file continued from this cabinet into the next one.

Back to: Top of page > FDI > FDIIsCabinet

Returns

If the file is a cabinet, then TRUE is returned and the FDICABINETINFO structure is filled out. If the file is not a cabinet, or some other error occurred, then FALSE is returned. In either case, it is the responsibility of the application to close the file handle passed to this function.

Back to: Top of page > FDI > FDIIsCabinet

FDICopy

Back to: Top of page

BOOL FAR DIAMONDAPI FDICopy(
         HFDI           hfdi, 
   char  FAR           *pszCabinet, 
   char  FAR           *pszCabPath, 
   int                  flags, 
         PFNFDINOTIFY   pfnfdin, 
         PFNFDIDECRYPT  pfnfdid, 
   void  FAR           *pvUser 
);

Back to: Top of page > FDI > FDICopy

Parameters

hfdi

FDI Context pointer originally returned by FDICreate

pszCabinet

Name of cabinet file, excluding path information

pszCabPath

File path to cabinet file

flags

Flags to control the extract operation

pfnfdin

Pointer to a notification (status update) function

pfnfdid

Pointer to a decryption function

pvUser

Application-specified value to pass to notification function

Back to: Top of page > FDI > FDICopy

Description

The FDICopy API extracts one or more files from a cabinet. Information on each file in the cabinet is passed back to the supplied pfnfdin function, at which point the application may decide to extract or not extract the file.

The hfdi parameter is the context pointer returned by a previous call to FDICreate.

The pszCabinet parameter should be the name of the cabinet file, excluding any path information, from which to extract files. If a file is split over multiple cabinets, FDICopy does allow subsequent cabinets to be opened.

The pszCabPath parameter should be the file path of the cabinet file (e.g. "C:\MYCABS\"). The contents of pszCabPath and pszCabinet will be strung together to create the full pathname of the cabinet.

The flags parameter is used to set flags for the decoder. At this time there are no flags defined, and the flags parameter should be set to zero.

The pfnfdin parameter should point to a file notification function, which will be called periodically to update the application on the status of the decoder. The pfnfdin function takes two parameters; fdint, an integral value indicating the type of notification message, and pfdin, a pointer to an FDINOTIFICATION structure.

The fdint parameter may equal one of the following values; fdintCABINET_INFO (general information about the cabinet), fdintPARTIAL_FILE (the first file in the cabinet is a continuation from a previous cabinet), fdintCOPY_FILE (asks the application if this file should be copied), fdintCLOSE_FILE_INFO (close the file and set file attributes, date, etc.), or fdintNEXT_CABINET (file continued on next cabinet).

The pfdin parameter will point to an FDINOTIFICATION structure with some or all of the fields filled out, depending on the value of the fdint parameter. Four of the fields are used for general data; cb (a long integer), and psz1, psz2, and psz3 (pointers to strings), the meanings of which are highly dependent on the fdint value. The pv field will be the value the application originally passed in as the pvUser parameter to FDICopy.

The pfnfdin function must return a value to FDI, which tells FDI whether to continue, abort, skip a file, or perform some other operation. The values that can be returned depend on fdint, and are explained below.

Note that it is possible that future versions of FDI will have additional notification messages. Therefore, the application should ignore values of fdint it does not understand, and return zero to continue (preferably), or -1 (negative one) to abort.

If fdint equals fdintCABINET_INFO then the following fields will be filled out; psz1 will point to the name of the next cabinet (excluding path information); psz2 will point to the name of the next disk; psz3 will point to the cabinet path name; setID will equal the set ID of the current cabinet; and iCabinet will equal the cabinet number within the cabinet set (0 for the first cabinet, 1 for the second cabinet, etc.) The application should return 0 to indicate success, or -1 to indicate failure, which will abort FDICopy. An fdintCABINET_INFO notification will be provided exactly once for each cabinet opened by FDICopy, including continuation cabinets opened due to files spanning cabinet boundaries.

If fdint equals fdintCOPY_FILE then the following fields will be filled out; psz1 will point to the name of a file in the cabinet; cb will equal the uncompressed size of the file; date will equal the file's 16-bit FAT date; time will equal the file's 16-bit FAT time; and attribs will equal the file's 16-bit FAT attributes. The application may return one of three values; 0 (zero) to skip (i.e. not copy) the file; -1 (negative one) to abort FDICopy; or a non-zero (and non-negative-one) file handle for the destination to which to write the file. The file handle returned must be compatible with the PFNCLOSE function supplied to FDICreate. The fdintCOPY_FILE notification is called for each file that starts in the current cabinet, providing the opportunity for the application to request that the file be copied or skipped.

If fdint equals fdintCLOSE_FILE_INFO then the following fields will be filled out; psz1 will point to the name of a file in the cabinet; hf will be a file handle (which originated from fdintCOPY_FILE); date will equal the file's 16-bit FAT date; time will equal the file's 16-bit FAT time; attributes will equal the file's 16-bit FAT attributes (minus the _A_EXEC bit); and cb will equal either zero (0) or one (1), indicating whether the file should be executed after extract (one), or not (zero). It is the responsibility of the application to execute the file if cb equals one. The fdintCLOSE_FILE_INFO notification is called after all of the data has been written to a target file. The application must close the file (using the provided hf handle), and set the file date, time, and attributes. The application should return TRUE for success, or FALSE or -1 (negative one) to abort FDICopy. FDI assumes that the target file was closed, even if this callback returns failure; FDI will not attempt to use PFNCLOSE to close the file.

If fdint equals fdintPARTIAL_FILE then the following fields will be filled out; psz1 will point to the name of the file continued from a previous cabinet; psz2 will point to the name of the cabinet on which the first segment of the file exists; psz3 will point to the name of the disk on which the first segment of the file exists. The fdintPARTIAL_FILE notification is called for files at the beginning of a cabinet which are continued from a previous cabinet. This notification will occur only when FDICopy is started on the second or subsequent cabinet in a series, which has files continued from a previous cabinet. The application should return zero (0) for success, or -1 (negative one) for failure, which will abort FDICopy.

If fdint equals fdintNEXT_CABINET then the following fields will be filled out; psz1 will point to the name of the next cabinet on which the current file is continued; psz2 will point to the name of the next disk on which the current file is continued; psz3 will point to the cabinet path information; and fdie will equal a success or error value. The fdintNEXT_CABINET notification is called only when fdintCOPY_FILE was instructed to copy a file in the current cabinet that is continued in a subsequent cabinet. It is important that the cabinet path name, psz3, be validated before returning (psz3, which points to a 256 byte array, may be modified by the application; however, it is not permissible to modify psz1 or psz2). The application should ensure that the cabinet exists and is readable before returning; if necessary, the application should issue a disk change prompt and ensure that the cabinet file exists. When this function returns to FDI, FDI will verify that the setID and iCabinet fields of the supplied cabinet match the expected values for that cabinet. If not, FDI will continue to send fdintNEXT_CABINET notification messages with the fdie field set to FDIERROR_WRONG_CABINET, until the correct cabinet file is specified, or until this function returns -1 (negative one) to abort the FDICopy call. If after returning from this function, the cabinet file is not present and readable, or has been damaged, then the fdie field will equal one of the following values; FDIERROR_CABINET_NOT_FOUND, FDIERROR_NOT_A_CABINET, FDIERROR_UNKNOWN_CABINET_VERSION, FDIERROR_CORRUPT_CABINET, FDIERROR_BAD_COMPR_TYPE, FDIERROR_RESERVE_MISMATCH, FDIERROR_WRONG_CABINET. If there was no error, fdie will equal FDIERROR_NONE. The application should return 0 (zero) to indicate success, or -1 (negative one) to indicate failure, which will abort FDICopy

The pfndid parameter is reserved for encryption, and is currently not used by FDI. This parameter should be set to NULL. 

The pvUser parameter should contain an application-defined value that will be passed back as a field in the FDINOTIFICATION structure of the notification function. It not required, this field may be safely set to NULL.

Back to: Top of page > FDI > FDICopy

Returns

If successful, TRUE is returned. If unsuccessful, FALSE is returned, and the error structure pointed to by perf (from FDICreate) is filled out.

Back to: Top of page > FDI > FDICopy

FCIDestroy

Back to: Top of page

BOOL DIAMONDAPI FDIDestroy(
      HFDI  hfdi
);

Back to: Top of page > FDI > FDIDestroy

Parameters

hfdi

FDI Context pointer originally returned by FDICreate

Back to: Top of page > FDI > FDIDestroy

Description

The FDIDestroy API destroys an hfdi context, freeing any memory and temporary files associated with the context.

Back to: Top of page > FDI > FDIDestroy

Returns

If successful, TRUE is returned. If unsuccessful, FALSE is returned. The only reason for failure is that the hfdi passed in was not a proper context handle.

Back to: Top of page > FDI > FDIDestroy

#Microsoft LZX Data Compression Format

Copyright © 1997 Microsoft Corporation. All rights reserved.

Topics in this section

Introduction

Concepts

LZ77
Bitstream
Window Size
Trees
Repeated Offsets
Constants

LZX Compressed Data Format

Cabinet Block Size
Header Structure
Encoder Preprocessing
Block Structure
Uncompressed Block Format
Verbatim Block
Aligned Offset Block
Encoding the Trees and Pre-Trees
Compressed Literals
Match Offset => Formatted Offset
Formatted Offset => Position Slot, Position Footer
Position Footer => Verbatim Bits, Aligned Offset Bits
Match Length => Length Header, Length Footer
Length Header, Position Slot => Length/Position Header
Encoding a Match
Decoding a Match or an Uncompressed Character

Introduction

This document is a design specification for the format of LZX compressed data used in the LZX compression mode of Microsoft's CAB file format. The purpose of this document is to allow anyone to encode or decode LZX compressed data. This document describes only the format of the output –it does not provide any specific algorithms for match location, tree generation, etc.

Before proceeding with the design specification itself, a few important concepts are described in the following pages.

Back to: Top of page

Concepts

This section includes:

LZ77
Bitstream
Window Size