About the Output Format
The default output of exifprobe is an approximate description
of the structure of the data in the files it probes. The program
shows not only what is in the file, but where it is,
and attempts to display its relationship to other data in the file.
The most common structure found in many of the supported image file
formats is the TIFF IFD. It is not universal...you won't find
it in CIFF or X3F files, for example, but it is the format used to
include EXIF information in camera image files, so finds its way into
a large proportion of camera image files. The following paragraphs
provide a brief description of the arrangement of a TIFF IFD as an
aid to understanding the output of exifprobe. The description
is followed by explanations of the exifprobe image summary
and a few miscellaneous output conventions, and finally a few brief
remarks about each of the supported file formats.
The Tagged Image File Format is described in an open specification
published by Adobe. The specification describes the Image File
Directory, or IFD.
A quick description of the display of an IFD by exifprobe
follows. A TIFF FAQ, with a (possibly better) description of the format
may be found at:
The displays of other formats have a good deal in common with the
display described for TIFF; if you understand the display for TIFF,
you should be able to read the other displays, with the possible exception
of CIFF.
The TIFF Image FIle Directory
The TIFF Header
A TIFF IFD must be preceded by a TIFF Header, signaling the
start of the format. The function of the header is to identify the
file type and provide some important information about how to read
it. A TIFF image file will start with the header, but in other file
formats which intend to include sections in IFD format, the TIFF header
will be offset in the file. A JPEG image file which includes EXIF
information will write the TIFF header just after a special "APP1"
jpeg marker segment. Other file formats, such as MRW, may include
a TIFF header and following IFD at some offset designated by the parent
file format.
The TIFF header is an 8 byte section which begins with 2 bytes indicating
the byteorder used to record integer data in the TIFF data
to follow (including the rest of the header), followed by a two-byte
"magic number'' which unambiguously declares that this is
a TIFF file. The final four bytes must be read as an unsigned four-byte
integer which indicates the offset from the start of this header
to the start of the first Image File Directory. The IFD will usually
follow immediately after the header, but this is not required (and
is not always the case).
The IFD
The IFD has four parts, beginning with a number of entries,
followed by the entries (12 bytes each), followed by a number
giving the offset to the next IFD, if there is one. The fourth
part follows the next ifd offset and is used to write data
which won't fit in an entry; The fourth part may be any size (including
0).
The number of entries is 2 bytes to be interpreted as an unsigned
2-byte integer in the prevailing byte order. The next IFD offset
is a 4 byte integer giving the start offset of the next IFD in the
chain. If there is no next IFD this will be four bytes of 0. Between
these two numbers are the fixed-length directory entries which
contain (or point to) the information of interest.
Each entry is 12 bytes, beginning with a 2-byte integer tag
number which identifies the data described by the entry. The correspondence
between tag number and the identity of the data is normally contained
in a document, which may or may not be public. The TIFF, EXIF, and
DNG specifications are all public documents which include standard
tag number/tag name information, along with the type and intent of
the associated data.
The type of the associated data (two-byte integer, four-byte
integer, ascii string, etc) is represented by the next two bytes (following
the tag number) interpreted as a two-byte integer. There are twelve
types defined in the TIFF specification, each of which describe a
numerical type (e.g. byte, unsigned short integer, long
integer, etc.) in which the entry data is expressed, except type
7 (UNDEFINED) which declares that the associated data must be interpreted
in a manner specific to the entry (i.e. it is not a defined numerical
type...it may be a special section with a structure of its own).
The next 4 bytes give a count of the number of items in the
data. E.g. for BYTE type this will be the number of bytes, for USHORT
type this will be the number of 2-byte integers (half the number of
bytes), etc. The count for UNDEFINED type is the number of bytes in
the data.
The final 4 bytes of the entry are the value of the data associated
with the entry. If the type and count indicate that
the data won't fit in 4 bytes (e.g. two LONG items won't fit, an ascii
string longer than 4 bytes won't fit, etc.), then the value is written
in that fourth section of the IFD (just after the next ifd offset)
and an offset to the value is written in the value bytes of
the directory entry. The offset is always written (except in
some MakerNotes) relative to the start of the TIFF HEADER.
E.g. if the TIFF header starts at offset 100 in the
file, and the value is written at offset 569, the offset
written in the directory entry will be 469.
The IFD ends when the last value is written. There is no specific
indicator of the end of an IFD, although the next_ifd_offset,
if there is one, represents an upper bound on the extent of the IFD.
We have:
[ 2 bytes] number of entries
[12 bytes] entry 1
[12 bytes] entry 2
...
[4 bytes] next ifd offset
...offset value
...offset value
...
Exifprobe Display of IFD
The following example illustrates how exifprobe displays
a TIFF IFD:
@0x000c=12 : TIFF(II=0x4949) magic=0x002a="*0" ifd offset = 8 (+ 12 = 0x14/20)
@0x0014=20 : <IFD 0> 6 entries starting at file offset 0x16=22
@0x0016=22 : <0x011a= 282> XResolution [5 =RATIONAL 1] = @0x6e=110
@0x0022=34 : <0x011b= 283> YResolution [5 =RATIONAL 1] = @0x76=118
@0x002e=46 : <0x0128= 296> ResolutionUnit [3 =SHORT 1] = 2 =
@0x003a=58 : <0x0213= 531> YCbCrPositioning [3 =SHORT 1] = 2 =
@0x0046=70 : <0x0214= 532> ReferenceBlackWhite [5 =RATIONAL 6] = @0x7e=126
@0x0052=82 : <0x8769=34665> ExifIFDPointer [4 =LONG 1] = @0xae=174
@0x005e=94 : **** next IFD offset 0
@0x0062=98 : ============= VALUES, IFD 0 ============
@0x006e=110 : XResolution = 72
@0x0076=118 : YResolution = 72
@0x007e=126 : ReferenceBlackWhite = 0,255,128 ... ,255 (6) -> @150
@0x00ae=174 : <EXIF IFD> (in IFD 0) 6 entries starting at file offset 0xb0=176
@0x00b0=176 : <0x9000=36864> Version [7 =UNDEFINED 4] = "0200"
@0x00bc=188 : <0x9101=37121> ComponentsConfiguration [7 =UNDEFINED 4] = 1,2,3,0 =
@0x00c8=200 : <0xa000=40960> FlashPixVersion [7 =UNDEFINED 4] = "0100"
@0x00d4=212 : <0xa001=40961> ColorSpace [3 =SHORT 1] = 1 = "sRGB"
@0x00e0=224 : <0xa002=40962> PixelXDimension [4 =LONG 1] = 1536
@0x00ec=236 : <0xa003=40963> PixelYDimension [4 =LONG 1] = 2048
@0x00f8=248 : **** next IFD offset 0
-0x00fb=251 : </EXIF IFD>
@0000fb=251 : </IFD 0>
The first column, before the colon, gives the offset of the item from
the beginning of the file, in hexadecimal and decimal. File offsets
are (almost) always preceded by an `@' symbol
in exifprobe output. In
this example the TIFF header begins at offset 12, The byteorder is
displayed as "II" and its hexadecimal equivalent,
followed by the ``magic number" in hexadecimal, followed
by an ascii string representation (some magic numbers are strings,
some are not; exifprobe shows both representations where possible).
Finally, the offset to the start of the IFD is 8, from the start of
the TIFF header at offset 12, so the file offset of the IFD is 20.
The two bytes at 20 and 21 form the number of entries (6). The IFD
starts there, and is shown bracketed by "start section"
and "end section" markers in angle brackets, as
for HTML. The first entry starts at offset 22.
Each of the six entries following begins with a tag number (the file
offset isn't part of the entry) again shown in both hexadecimal and
decimal, placed in angle brackets, followed by the name assigned to
the tag. The type and count are shown inside square
brackets, with the type number at the left followed by the type name.
The count is at the right, next to the closing square bracket. The
value is the last entry, following the `=' sign. The first
two entries above are RATIONAL numbers, which are 8 bytes
long. They won't fit in the four bytes of the value, so they are written
at offsets 110 and 118 (again, offsets are preceded by `@'
and shown in hexadecimal and decimal) and the offsets entered in the
directory. If you look down past the VALUES marker, at file
offsets 110 and 118, you will see the tag names repeated, followed
by the actual value stored there. The next two entries are SHORT
values, two bytes each, so they fit in the directory proper. The values
(2) are defined in the specification, and exifprobe will print their
interpretation following the value. ResolutionUnit=2 means that resolution
values in the data represent "pixels per inch'', and YcbcrPositioning=2
should be interpreted as "co-sited", Those interpretations
are written by exifprobe, but have been removed from the
figure so that it will fit on the page, There are two more offset
entries, but you will notice that the last one (ExifIFDPointer) is
a LONG type, 4 bytes, which ought to fit in the directory
entry (and does). So why is it marked as an offset? It's defined that
way in the spec, and you have to just know it's an offset. The Exif
IFD is a ``subifd'' contained within IFD 0, and should have been
declared as UNDEFINED, but apparently it was deemed easier
to violate the TIFF spec and common sense than to precompute the size
of the Exif section. There seems to be a considerable preoccupation
with supporting write-once media.
The end of the IFD is shown at offset 256. This is deduced after processing
the Exif IFD, assuming that it has written nothing at random offsets
later in the file, and noting that the exif data is the last of the
data in the ifd. Determining the end of an IFD in the TIFF sections
of some camera files is difficult. The TIFF IFD is "open
ended" (there is no definition of the end of the IFD, and
the specification specifically allows writing offset data at random
offsets in the file. Exifprobe attempts to maintain the fiction
that an IFD has an end, forms a "container" for
related data, and that the file has a structure. This is often akin
to hunting snarks.
The Image Summary
The display of file structure and contents is followed, in all output
formats, by a summary of all of the images found in the file. In fact,
the image summary is displayed even when all file output is "zeroed
out'' using the -Z option.
It is not uncommon to find as many as 5 or 6 images in camera files
based upon the TIFF format. The extra images are versions of the
same capture as the primary image, in reduced size or resolution, and
often in varying image formats. Many of these will be described as
"thumbnails'', although they may be nearly as large as the
primary image when displayed. Most commonly, these will be JPEG format,
although TIFF files often contain reduced-size RGB data. Exifprobe
attempts to scout out all of the blobs of image data and describes
what it finds at the end of output. The summary is reasonably self-explanatory.
There will be two lines for each image found, the first showing the
start offset in the file, the type of image data, the reported size
of the image, the length in bytes of the image data, and an indication
of the section of the file which describes the image for files which
have distinct sections. A typical summary entry will look like this:
-
@0x000001a=26 : Start of CRW uncompressed primary image [3072x2048] length 5341670
-0x5181ff=5341695: End of CRW primary image data
@0x518200=5341696: Start of JPEG baseline DCT compressed reduced-resolution image [1536x1024]...
-0x55ee5f=5631583: End of JPEG reduced-resolution image data
If the display is in LIST mode, the same summary will instead look
like this:
# Start of CRW uncompressed primary image [3072x2048] length 5341670 at offset 0x1a/26
# End of CRW primary image data at offset 0x5181ff/5341695
# Start of JPEG baseline DCT compressed reduced-resolution image [1536x1024] length 289888...
# End of JPEG reduced-resolution image data at offset 0x55ee5f/5631583
This is done to permit Unix shells to snarf up LIST mode output with
minimum fuss. The displays above have been elided a bit to fit on
the page. You will find that the default structural output of exifprobe
requires a rather wide terminal window; an 80 column console will
not be sufficient.
The image summary concludes with a count of the number of images found.
If the data for an image claimed somewhere in the file could not be
found or validated, a count of images not found will also be
printed:
-
Number of images = 3
Images not found = 1
In LIST mode, the spaces are omitted to form names acceptable to a
Unix shell.
Other Output
There are several additional items of information printed for all
files (even under '-Z'), including the current filename, the
file size, the file type, and the file format.
The file type and file format differ in that the type
is the basic file type as taken from the header, magic number or initial
marker in the file...the information which an image display
or processing program will use to identify the type. If there is a
"byteorder" specifier in the header, it will be
displayed as ``II'' for little-endian (Intel)
byteorder, or ``MM'' for big-endian (Motorola)
byteorder. The basic type from the file header may not be the whole
story, however. JPEG files may include JFIF, EXIF, or other
"APP" sections which affect the format of
the file; many files marked with a TIFF header are "private''
formats which use TIFF headers and the IFD format, but do not adhere
to the TIFF specification. The file format, displayed at the
end, is a slash-separated list of known sections encountered in the
file, possibly followed by a square-bracketed modifier for some TIFF-derived
types. E.g.
-
File Format = TIFF/EXIF [CR2] # with MakerNote (Canon [4])
As shown, files which include a MakerNote are identified; if a MakerNote
contains a subifd, that will be mentioned as well. The number in square
brackets after the Maker name is the noteversion used internally
by exifprobe.
Miscellaneous Output Conventions
Identifying Offsets
Offsets written at the left of the structural display are normally
marked with the symbol `@'. This symbol is sometimes replaced by '>'
or even '+', indicating that the current item or section is written
"out of place". The TIFF specification offers a
mild expectation that data written at offset for an IFD entry will
be written within the "values" section of the IFD,
and the Exif specification explicitly declares that thumbnails described
in an IFD should be located within the IFD. These strictures are often
ignored, particularly in MakerNote IFDs. This has implications for
image editting software which may want to move or update the section.
If data described within the section is written at random locations
within the file, it must be moved "piecemeal'' and the software
must fully understand the structure of the section so that it can
find and re-write all of the pieces. The section cannot be moved as
a "blob''. If exifprobe detects a section which
appears out of place, the '@' symbol in its offset will be replaced
by '>' to indicate that the section is located outside its IFD. If
the out-of-place section contains a subsection with a similar affliction,
the subsection's offsets may be marked with '+'.
Section tags in LIST mode
Tags which introduce a section or subsection and have an "UNDEFINED"
TIFF type (e.g MakerNotes, some thumbnails) are written in LIST mode
in the form "@offset:length'', followed by "#
UNDEFINED'', e.g.
-
JPEG.APP1.Ifd0.Exif.MakerNote = @922:226 # UNDEFINED
The auxilliary program extract_section provided with the distribution
will accept that format as an argument describing the section to be
extracted.
Short JPEG thumbnails
Most embedded thumbnails live in sections with a specified length.
If exifprobe detects a JPEG_EOI (the marker which indicates
the end of the data stream) before the end of the section, it will
report that, and mark the summary entry, so that the length in the
summary will be given as "<=" (less than or equal
to) the section length. Most often, this means the section has been
padded with 1-3 bytes to an even or four-byte boundary. In some cases,
the padding may be several kilobytes (or more), possibly because space
for the image was reserved before its size was known.
Alternatively, if the program fails to find a JPEG_EOI for a JPEG
image, a warning will be issued.
Asterisks
Some tag names in MakerNote sections may appear followed by one or
two asterisks. If one asterisk appears, I have concerns about the
correctness of the tagname or its interpretation, possibly because
conflicting data has been reported. If there are two asterisks, the
reference information may be incomplete, confusing, or otherwise potentially
unreliable. In either case, you should take extra care to validate
the information for your images before using it in any important way.
It should be noted that the absence of asterisks does not guarantee
the accuracy of information. It does mean that the information appears
consistent and reasonable in images used to test exifprobe
and I have no reason to suspect its accuracy.