JPEG 2000

The Joint Photographic Experts Group committee has created its own wavelet-based image compression standard, JPEG 2000, which is intended to eventually supersede their original discrete cosine transform-based JPEG standard. Part of JPEG 2000 has been published as an ISO standard, ISO/IEC 15444-1:2000.

JPEG 2000 can operate at higher compression ratios without generating the characteristic 'blocky and blurry' artifacts of the original DCT-based JPEG standard.

JPEG 2000 is not yet widely supported in web browsers, and hence is not generally used in the World Wide Web.

The JPEG committee has stated:

It has always been a strong goal of the JPEG committee that its standards should be implementable in their baseline form without payment of royalty and license fees...

...The up and coming JPEG 2000 standard has been prepared along these lines, and agreement reached with over 20 large organisations holding many patents in this area to allow use of their intellectual property in connection with the standard without payment of license fees or royalties.

Technical Discussion

The aim of JPEG2000 is not only improved compression performance over JPEG but also adding (or improving) features such as scalability and editability to the JPEG2000 format. Moreover, very low and very high compression rates should be supported well, which includes lossless compression.

JPEG2000 (as JPEG) applies a form of transform coding to compress images using the wavelet transform (as opposed to JPEG which uses DCT). Initially, images have to be transformed (from the RGB color space) to the well known YCrCb color space or to the RCT space (reversible component transform) leading to three components. The chrominance components can be but do not have to be down-scaled. After that, the image is split into so-called tiles. The purpose of tiles is to cope with memory limitations more easily. These tiles are then wavelet transformed to an arbitrary depth.

The result is a collection of sub-bands which represent several approximation scales. A sub-band is a set of coefficients - real numbers which represent aspects of the image associated with a certain frequency range as well as a spatial area of the image. These coefficients are scalar-quantized, giving a set of integer numbers which have to be encoded bit-by-bit.

The quantized sub-bands are split further into partitions. A partition is the grouping of coefficients that correspond approximately to spatial blocks of the image. A partition is formed by collections of blocks from different sub-bands (one block from each sub-band). The purpose of partitions is to enable spatially progressive bit-streams.

Partitions are split further into code-blocks. Code-blocks are located in a single sub-band and have equal sizes. The encoder has to encode the bits of all quantized coefficients of a code-block, starting with the most significant bits and progressing to less significant bits. The result is a bit-stream that is split into packets. Packets are the key to quality scalability (i.e. packets containing less significant bits can be discarded to achieve lower bit-rates and higher distortion).

The bit-stream is produced by an arithmetic codec, namely the binary MQ-coder. Each bit-plane of a code-block is scanned in three passes, first encoding bits (and signs) of insignificant coefficients with significant neighbors (i.e. with 1-bits in higher bit-planes), then refinement bits of significant coefficients and finally coefficients without significant neighbours. The MQ-coder is supported with certain contexts representing neighborhood-significance information.

The points at which these bit-streams are cut into packets are the "secret" of JPEG2000. Collections of packets are grouped into layers. Layers represent certain image qualities. Subsequent layers successively increase the image quality. The main problem is now to find the optimal packet length for all code-blocks which minimize the overall distortion. There is a recommended method to solve this problem which is not included in the standard. This method is called rate-distortion optimization and uses the Lagrange multiplier λ. The problem can be viewed as optimization problem with a constraint. The target is the overall distortion which has to be minimized; the variables are the packet lengths (thus, it is a highly multidimensional problem); the constraint is a demanded overall bit-rate. The solution is in the end that the distortion-slopes (rate of distortion decrease when varying the packet length) of all code-blocks have to be equal.

Packets can be reordered arbitrarily in the JPEG2000 bit-stream. This gives the encoder as well as image servers a high degree of freedom. Already encoded images can be sent over networks with arbitrary bit-rates by using a layer-progressive encoding order. On the other hand, color components can be moved back in the bit-stream; lower resolutions (corresponding to low-frequency sub-bands) could be sent first for image previewing. Finally, spatial browsing of large images is possible through appropriate tile- and/or partition selection. All these operations do not require any re-encoding but only byte-wise copy operations.

JPEG2000 gains about 20% compression performance for medium compression rates. For lower or higher compression rates, the improvement is much greater. It has, however, slightly higher computational and memory demands. Lossless compression is achieved through the use of a special integer wavelet filter (biorthogonal 3/5 instead of Daubechies biorthogonal 7/9) and a quantization step size of 1.

External links: