Up Next

About the Output Format

The default output of exifprobe is an approximate description of the structure of the data in the files it probes. The program shows not only what is in the file, but where it is, and attempts to display its relationship to other data in the file.

The most common structure found in many of the supported image file formats is the TIFF IFD. It is not universal...you won't find it in CIFF or X3F files, for example, but it is the format used to include EXIF information in camera image files, so finds its way into a large proportion of camera image files. The following paragraphs provide a brief description of the arrangement of a TIFF IFD as an aid to understanding the output of exifprobe. The description is followed by explanations of the exifprobe image summary and a few miscellaneous output conventions, and finally a few brief remarks about each of the supported file formats.

The Tagged Image File Format is described in an open specification published by Adobe. The specification describes the Image File Directory, or IFD.

A quick description of the display of an IFD by exifprobe follows. A TIFF FAQ, with a (possibly better) description of the format may be found at:
http://www.awaresystems.be/imaging/tiff/faq.html
The displays of other formats have a good deal in common with the display described for TIFF; if you understand the display for TIFF, you should be able to read the other displays, with the possible exception of CIFF.

The TIFF Image FIle Directory

The TIFF Header

A TIFF IFD must be preceded by a TIFF Header, signaling the start of the format. The function of the header is to identify the file type and provide some important information about how to read it. A TIFF image file will start with the header, but in other file formats which intend to include sections in IFD format, the TIFF header will be offset in the file. A JPEG image file which includes EXIF information will write the TIFF header just after a special "APP1" jpeg marker segment. Other file formats, such as MRW, may include a TIFF header and following IFD at some offset designated by the parent file format.

The TIFF header is an 8 byte section which begins with 2 bytes indicating the byteorder used to record integer data in the TIFF data to follow (including the rest of the header), followed by a two-byte "magic number'' which unambiguously declares that this is a TIFF file. The final four bytes must be read as an unsigned four-byte integer which indicates the offset from the start of this header to the start of the first Image File Directory. The IFD will usually follow immediately after the header, but this is not required (and is not always the case).

The IFD

The IFD has four parts, beginning with a number of entries, followed by the entries (12 bytes each), followed by a number giving the offset to the next IFD, if there is one. The fourth part follows the next ifd offset and is used to write data which won't fit in an entry; The fourth part may be any size (including 0).

The number of entries is 2 bytes to be interpreted as an unsigned 2-byte integer in the prevailing byte order. The next IFD offset is a 4 byte integer giving the start offset of the next IFD in the chain. If there is no next IFD this will be four bytes of 0. Between these two numbers are the fixed-length directory entries which contain (or point to) the information of interest.

Each entry is 12 bytes, beginning with a 2-byte integer tag number which identifies the data described by the entry. The correspondence between tag number and the identity of the data is normally contained in a document, which may or may not be public. The TIFF, EXIF, and DNG specifications are all public documents which include standard tag number/tag name information, along with the type and intent of the associated data.

The type of the associated data (two-byte integer, four-byte integer, ascii string, etc) is represented by the next two bytes (following the tag number) interpreted as a two-byte integer. There are twelve types defined in the TIFF specification, each of which describe a numerical type (e.g. byte, unsigned short integer, long integer, etc.) in which the entry data is expressed, except type 7 (UNDEFINED) which declares that the associated data must be interpreted in a manner specific to the entry (i.e. it is not a defined numerical type...it may be a special section with a structure of its own).

The next 4 bytes give a count of the number of items in the data. E.g. for BYTE type this will be the number of bytes, for USHORT type this will be the number of 2-byte integers (half the number of bytes), etc. The count for UNDEFINED type is the number of bytes in the data.

The final 4 bytes of the entry are the value of the data associated with the entry. If the type and count indicate that the data won't fit in 4 bytes (e.g. two LONG items won't fit, an ascii string longer than 4 bytes won't fit, etc.), then the value is written in that fourth section of the IFD (just after the next ifd offset) and an offset to the value is written in the value bytes of the directory entry. The offset is always written (except in some MakerNotes) relative to the start of the TIFF HEADER. E.g. if the TIFF header starts at offset 100 in the file, and the value is written at offset 569, the offset written in the directory entry will be 469.

The IFD ends when the last value is written. There is no specific indicator of the end of an IFD, although the next_ifd_offset, if there is one, represents an upper bound on the extent of the IFD.

We have:
[ 2 bytes] number of entries
[12 bytes] entry 1
[12 bytes] entry 2
...
[4 bytes] next ifd offset
...offset value
...offset value
...

Exifprobe Display of IFD

The following example illustrates how exifprobe displays a TIFF IFD:

@0x000c=12  : TIFF(II=0x4949) magic=0x002a="*0" ifd offset = 8 (+ 12 = 0x14/20)
@0x0014=20  : <IFD 0> 6 entries starting at file offset 0x16=22
@0x0016=22  :   <0x011a=  282> XResolution            [5 =RATIONAL    1]  = @0x6e=110
@0x0022=34  :   <0x011b=  283> YResolution            [5 =RATIONAL    1]  = @0x76=118
@0x002e=46  :   <0x0128=  296> ResolutionUnit         [3 =SHORT       1]  = 2 = 
@0x003a=58  :   <0x0213=  531> YCbCrPositioning       [3 =SHORT       1]  = 2 =  
@0x0046=70  :   <0x0214=  532> ReferenceBlackWhite    [5 =RATIONAL    6]  = @0x7e=126
@0x0052=82  :   <0x8769=34665> ExifIFDPointer         [4 =LONG        1]  = @0xae=174
@0x005e=94  :   **** next IFD offset 0 
@0x0062=98  :   ============= VALUES, IFD 0 ============ 
@0x006e=110 :   XResolution                     = 72
@0x0076=118 :   YResolution                     = 72
@0x007e=126 :   ReferenceBlackWhite             = 0,255,128 ... ,255 (6) -> @150
@0x00ae=174 :   <EXIF IFD> (in IFD 0) 6 entries starting at file offset 0xb0=176
@0x00b0=176 :     <0x9000=36864> Version                 [7 =UNDEFINED 4] = "0200"
@0x00bc=188 :     <0x9101=37121> ComponentsConfiguration [7 =UNDEFINED 4] = 1,2,3,0 = 
@0x00c8=200 :     <0xa000=40960> FlashPixVersion         [7 =UNDEFINED 4] = "0100"
@0x00d4=212 :     <0xa001=40961> ColorSpace              [3 =SHORT     1] = 1 = "sRGB"
@0x00e0=224 :     <0xa002=40962> PixelXDimension         [4 =LONG      1] = 1536
@0x00ec=236 :     <0xa003=40963> PixelYDimension         [4 =LONG      1] = 2048
@0x00f8=248 :     **** next IFD offset 0
-0x00fb=251 :   </EXIF IFD>
@0000fb=251 : </IFD 0>
The first column, before the colon, gives the offset of the item from the beginning of the file, in hexadecimal and decimal. File offsets are (almost) always preceded by an `@' symbol in exifprobe output. In this example the TIFF header begins at offset 12, The byteorder is displayed as "II" and its hexadecimal equivalent, followed by the ``magic number" in hexadecimal, followed by an ascii string representation (some magic numbers are strings, some are not; exifprobe shows both representations where possible). Finally, the offset to the start of the IFD is 8, from the start of the TIFF header at offset 12, so the file offset of the IFD is 20.

The two bytes at 20 and 21 form the number of entries (6). The IFD starts there, and is shown bracketed by "start section" and "end section" markers in angle brackets, as for HTML. The first entry starts at offset 22.

Each of the six entries following begins with a tag number (the file offset isn't part of the entry) again shown in both hexadecimal and decimal, placed in angle brackets, followed by the name assigned to the tag. The type and count are shown inside square brackets, with the type number at the left followed by the type name.

The count is at the right, next to the closing square bracket. The value is the last entry, following the `=' sign. The first two entries above are RATIONAL numbers, which are 8 bytes long. They won't fit in the four bytes of the value, so they are written at offsets 110 and 118 (again, offsets are preceded by `@' and shown in hexadecimal and decimal) and the offsets entered in the directory. If you look down past the VALUES marker, at file offsets 110 and 118, you will see the tag names repeated, followed by the actual value stored there. The next two entries are SHORT values, two bytes each, so they fit in the directory proper. The values (2) are defined in the specification, and exifprobe will print their interpretation following the value. ResolutionUnit=2 means that resolution values in the data represent "pixels per inch'', and YcbcrPositioning=2 should be interpreted as "co-sited", Those interpretations are written by exifprobe, but have been removed from the figure so that it will fit on the page, There are two more offset entries, but you will notice that the last one (ExifIFDPointer) is a LONG type, 4 bytes, which ought to fit in the directory entry (and does). So why is it marked as an offset? It's defined that way in the spec, and you have to just know it's an offset. The Exif IFD is a ``subifd'' contained within IFD 0, and should have been declared as UNDEFINED, but apparently it was deemed easier to violate the TIFF spec and common sense than to precompute the size of the Exif section. There seems to be a considerable preoccupation with supporting write-once media.

The end of the IFD is shown at offset 256. This is deduced after processing the Exif IFD, assuming that it has written nothing at random offsets later in the file, and noting that the exif data is the last of the data in the ifd. Determining the end of an IFD in the TIFF sections of some camera files is difficult. The TIFF IFD is "open ended" (there is no definition of the end of the IFD, and the specification specifically allows writing offset data at random offsets in the file. Exifprobe attempts to maintain the fiction that an IFD has an end, forms a "container" for related data, and that the file has a structure. This is often akin to hunting snarks.

The Image Summary

The display of file structure and contents is followed, in all output formats, by a summary of all of the images found in the file. In fact, the image summary is displayed even when all file output is "zeroed out'' using the -Z option.

It is not uncommon to find as many as 5 or 6 images in camera files based upon the TIFF format. The extra images are versions of the same capture as the primary image, in reduced size or resolution, and often in varying image formats. Many of these will be described as "thumbnails'', although they may be nearly as large as the primary image when displayed. Most commonly, these will be JPEG format, although TIFF files often contain reduced-size RGB data. Exifprobe attempts to scout out all of the blobs of image data and describes what it finds at the end of output. The summary is reasonably self-explanatory. There will be two lines for each image found, the first showing the start offset in the file, the type of image data, the reported size of the image, the length in bytes of the image data, and an indication of the section of the file which describes the image for files which have distinct sections. A typical summary entry will look like this:
@0x000001a=26    :  Start of CRW uncompressed primary image [3072x2048] length 5341670
-0x5181ff=5341695:    End of CRW primary image data
@0x518200=5341696:  Start of JPEG baseline DCT compressed reduced-resolution image [1536x1024]...
-0x55ee5f=5631583:    End of JPEG reduced-resolution image data
 
If the display is in LIST mode, the same summary will instead look like this:


# Start of CRW uncompressed primary image [3072x2048] length 5341670 at offset 0x1a/26
#   End of CRW primary image data at offset 0x5181ff/5341695
# Start of JPEG baseline DCT compressed reduced-resolution image [1536x1024] length 289888...
#   End of JPEG reduced-resolution image data at offset 0x55ee5f/5631583
This is done to permit Unix shells to snarf up LIST mode output with minimum fuss. The displays above have been elided a bit to fit on the page. You will find that the default structural output of exifprobe requires a rather wide terminal window; an 80 column console will not be sufficient.

The image summary concludes with a count of the number of images found. If the data for an image claimed somewhere in the file could not be found or validated, a count of images not found will also be printed:
Number of images = 3
Images not found = 1 
In LIST mode, the spaces are omitted to form names acceptable to a Unix shell.

Other Output

There are several additional items of information printed for all files (even under '-Z'), including the current filename, the file size, the file type, and the file format.

The file type and file format differ in that the type is the basic file type as taken from the header, magic number or initial marker in the file...the information which an image display or processing program will use to identify the type. If there is a "byteorder" specifier in the header, it will be displayed as ``II'' for little-endian (Intel) byteorder, or ``MM'' for big-endian (Motorola) byteorder. The basic type from the file header may not be the whole story, however. JPEG files may include JFIF, EXIF, or other "APP" sections which affect the format of the file; many files marked with a TIFF header are "private'' formats which use TIFF headers and the IFD format, but do not adhere to the TIFF specification. The file format, displayed at the end, is a slash-separated list of known sections encountered in the file, possibly followed by a square-bracketed modifier for some TIFF-derived types. E.g.
File Format = TIFF/EXIF [CR2] # with MakerNote (Canon [4])
As shown, files which include a MakerNote are identified; if a MakerNote contains a subifd, that will be mentioned as well. The number in square brackets after the Maker name is the noteversion used internally by exifprobe.

Miscellaneous Output Conventions

Identifying Offsets

Offsets written at the left of the structural display are normally marked with the symbol `@'. This symbol is sometimes replaced by '>' or even '+', indicating that the current item or section is written "out of place". The TIFF specification offers a mild expectation that data written at offset for an IFD entry will be written within the "values" section of the IFD, and the Exif specification explicitly declares that thumbnails described in an IFD should be located within the IFD. These strictures are often ignored, particularly in MakerNote IFDs. This has implications for image editting software which may want to move or update the section. If data described within the section is written at random locations within the file, it must be moved "piecemeal'' and the software must fully understand the structure of the section so that it can find and re-write all of the pieces. The section cannot be moved as a "blob''. If exifprobe detects a section which appears out of place, the '@' symbol in its offset will be replaced by '>' to indicate that the section is located outside its IFD. If the out-of-place section contains a subsection with a similar affliction, the subsection's offsets may be marked with '+'.

Section tags in LIST mode

Tags which introduce a section or subsection and have an "UNDEFINED" TIFF type (e.g MakerNotes, some thumbnails) are written in LIST mode in the form "@offset:length'', followed by "# UNDEFINED'', e.g.
JPEG.APP1.Ifd0.Exif.MakerNote    = @922:226    # UNDEFINED
The auxilliary program extract_section provided with the distribution will accept that format as an argument describing the section to be extracted.

Short JPEG thumbnails

Most embedded thumbnails live in sections with a specified length. If exifprobe detects a JPEG_EOI (the marker which indicates the end of the data stream) before the end of the section, it will report that, and mark the summary entry, so that the length in the summary will be given as "<=" (less than or equal to) the section length. Most often, this means the section has been padded with 1-3 bytes to an even or four-byte boundary. In some cases, the padding may be several kilobytes (or more), possibly because space for the image was reserved before its size was known.

Alternatively, if the program fails to find a JPEG_EOI for a JPEG image, a warning will be issued.

Asterisks

Some tag names in MakerNote sections may appear followed by one or two asterisks. If one asterisk appears, I have concerns about the correctness of the tagname or its interpretation, possibly because conflicting data has been reported. If there are two asterisks, the reference information may be incomplete, confusing, or otherwise potentially unreliable. In either case, you should take extra care to validate the information for your images before using it in any important way.

It should be noted that the absence of asterisks does not guarantee the accuracy of information. It does mean that the information appears consistent and reasonable in images used to test exifprobe and I have no reason to suspect its accuracy.


Up Next