Using Bio-Formats Guide by Melissa Linkert Overview ---------- This document describes various things that are useful to know when working with Bio-Formats. It is recommended that you obtain the Bio-Formats source by following the directions at http://www.loci.wisc.edu/software, rather than using an official release. It is also recommended that you have a copy of the JavaDocs nearby (available online at http://hudson.openmicroscopy.org.uk/job/LOCI/javadoc/); the notes that follow will make more sense when you see the API. For a complete list of supported formats, see the Bio-Formats home page: http://www.loci.wisc.edu/ome/formats.html Basic File Reading -------------------- Bio-Formats provides several methods for retrieving data from files in an arbitrary (supported) format. These methods fall into three categories: raw pixels, core metadata, and format-specific metadata. All methods described here are present and documented in loci.formats.IFormatReader - it is advised that you take a look at the source and/or JavaDoc. In general, it is recommended that you read files using an instance of ImageReader. While it is possible to work with readers for a specific format, ImageReader contains additional logic to automatically detect the format of a file and delegate subsequent calls to the appropriate reader. Prior to retrieving pixels or metadata, it is necessary to call setId(String) on the reader instance, passing in the name of the file to read. Some formats allow multiple series (5D image stacks) per file; in this case you may wish to call setSeries(int) to change which series is being read. Raw pixels are always retrieved one plane at a time. Planes can be returned either in a byte array, or in a java.awt.image.BufferedImage (using openBytes(int) and openImage(int) respectively). It is entirely up to you which method to use, as the pixel values are always identical. In general, BufferedImages are more convenient for viewer applications and applications that don't need to perform computations on pixel data, while byte arrays are better for applications that perform pixel manipulations. Core metadata is the general term for anything that might be needed to work with the planes in a file. A list of core metadata fields is given below, with the appropriate accessor method in parentheses: - image width (getSizeX()) - image height (getSizeY()) - number of series per file (getSeriesCount()) - total number of images per series (getImageCount()) - number of slices in the current series (getSizeZ()) - number of timepoints in the current series (getSizeT()) - number of actual channels in the current series (getSizeC()) - number of channels per image (getRGBChannelCount()) - the ordering of the images within the current series (getDimensionOrder()) - whether each image is RGB (isRGB()) - whether the pixel bytes are in little-endian order (isLittleEndian()) - whether the channels in an image are interleaved (isInterleaved()) - the type of pixel data in this file (getPixelType()) All file formats are guaranteed to accurately report core metadata. Format-specific metadata refers to any other data specified in the file - this includes acquisition and hardware parameters, among other things. This data is stored internally in a java.util.Hashtable, and can be accessed in one of two ways: individual values can be retrieved by calling getMetadataValue(String), which gets the value of the specified key. Alternatively, getMetadata() will return the entire Hashtable. Note that the keys in this Hashtable are different for each format, hence the name "format-specific metadata". See the Bio-Formats Metadata Guide for more information on the metadata capabilities that Bio-Formats provides. File Reading Extras --------------------- The previous section described how to read pixels as they are stored in the file. However, the native format isn't necessarily convenient, so Bio-Formats provides a few extras to make file reading more flexible. - There are a few "wrapper" readers (that implement IFormatReader) that take a reader in the constructor, and manipulate the results somehow, for convenience. Using them is similar to the java.io InputStream/OutputStream model: just layer whichever functionality you need by nesting the wrappers. + FileStitcher extends IFormatReader, and uses advanced pattern matching heuristics to group files that belong to the same dataset. + ChannelSeparator extends IFormatReader, and makes sure that all planes are grayscale - RGB images are split into 3 separate grayscale images. + ChannelMerger extends IFormatReader, and merges grayscale images to RGB if the number of channels is greater than 1. + ChannelFiller extends IFormatReader, and converts indexed color images to RGB images. + MinMaxCalculator extends IFormatReader, and provides an API for retrieving the minimum and maximum pixel values for each channel. + DimensionSwapper extends IFormatReader, and provides an API for changing the dimension order of a file. - ImageTools provides a number of methods for manipulating BufferedImages and primitive type arrays. In particular, there are methods to split and merge channels in a BufferedImage/array, as well as converting to a specific data type (e.g. convert short data to byte data). Writing Files --------------- The following file formats can be written using Bio-Formats: - TIFF (uncompressed or LZW) - OME-TIFF (uncompressed or LZW) - JPEG - PNG - AVI (uncompressed) - QuickTime (uncompressed is supported natively; additional codecs use QTJava) - Encapsulated PostScript (EPS) We are planning support for OME-XML in the near future. The writer API (see loci.formats.IFormatWriter) is very similar to the reader API, in that files are written one plane at time (rather than all at once). All writers allow the output file to be changed before the last plane has been written. This allows you to write to any number of output files using the same writer and output settings (compression, frames per second, etc.), and is especially useful for formats that do not support multiple images per file. A word of warning: IFormatWriter.saveImage(Image, boolean) accepts generic java.awt.Images, and converts them to a BufferedImage under the hood. The problem is that not all formats support all types of data (e.g. JPEG does not support 16-bit data). To prevent the possibility of corrupt or invalid files, it is important to check that the Image you supply to saveImage() is supported. This can be done using the isSupportedType and getPixelTypes methods of IFormatWriter. Please see the Movie Stitcher (loci.apps.stitcher) for an example of how to write files using Bio-Formats. Arcane Notes and Implementation Details ----------------------------------------- Following is a list of known oddities. o IFormatWriter accepts Image objects (not just BufferedImages); yet all writers convert the Image to a BufferedImage. You can still pass in a BufferedImage, but you are free to pass in any Image object. o All readers have another openBytes method that takes a pre-allocated byte array, but there is no corresponding method for openImage. The rationale behind pre-allocated byte arrays is (1) array allocation takes a relatively long time; and (2) pre-allocation avoids memory spikes on the heap. The reason there isn't something similar for openImage (i.e., a method that takes a pre-allocated BufferedImage) is that it's kind of a pain to implement, and no one has cared so far. If you want this method, we can work towards adding it. o Importing multi-file formats (Leica LEI, PerkinElmer, FV1000 OIF, ICS, and Prairie TIFF) can fail if any of the files are renamed. There are "best guess" heuristics in these readers, but they aren't guaranteed to work in general. So please don't rename files in these formats. o If you are working on a Macintosh, make sure that the data and resource forks of your image files are stored together. Bio-Formats does not handle separated forks (the native QuickTime reader tries, but usually fails). o Through specialized I/O classes, Bio-Formats is able to control the number of open file descriptors (in the current JVM). Currently, the maximum is 200, which is lower than the default on most systems. Side note on I/O: the reasoning behind writing our own I/O stuff (see loci.common.RandomAccessInputStream) is 1) InputStreams are fast at reading data sequentially, but cannot do random access; 2) RandomAccessFiles are great for random access, but less efficient for sequential reading; 3) we needed RandomAccessFile-like functionality for byte arrays; 4) we wanted to be able to read from disk, over HTTP, and potentially other sources. The result is a hybrid class that extends InputStream and implements DataInput to meet all of our goals. o RLE-compressed QuickTime movies will look funny if the planes are not read in sequential order, since proper decoding of a particular plane can depend on the previous plane.