Recent Changes - Search:

libjpeg-turbo Home

About libjpeg-turbo

Downloads

Documentation

Reports

Position Statements

Developer Info

Contact

Performance

NOTE: This report reflects the performance of libjpeg-turbo 1.5.x. Various enhancements in libjpeg-turbo 2.0.x and 2.1 have improved the performance of libjpeg-turbo significantly on modern x86 CPUs, and the performance of Intel® IPP has likely improved as well. The report will be re-factored for libjpeg-turbo 2.1, time permitting.

This article compares the performance of libjpeg-turbo with libjpeg and with the Intel® Integrated Performance Primitives (Intel® IPP), one of the most popular proprietary JPEG codecs on the market.

Test Images

vgl_5674_0098.ppm: A frame capture from the 3D Studio Max Viewperf test. This frame represents a movie set with significant wireframe and texture content (1240 x 960 pixels.)

vgl_6434_0018.ppm: A frame capture from the Pro/ENGINEER Viewperf test. This frame represents an exploded rendering of a race car with smooth shading and lighting (1240 x 960 pixels.)

vgl_6548_0026.ppm: A frame capture from the UGS NX Viewperf test. This frame represents a grayscale wireframe rendering of an engine block with partial transparency (1240 x 960 pixels.)

artificial.ppm: Ray-traced content from http://www.imagecompression.info/test_images/ (8-bit version, 3072 x 2048 pixels.)

nightshot_iso_100.ppm: Photographic content from http://www.imagecompression.info/test_images/ (8-bit version, 3136 x 2352 pixels.)

The Viewperf frame captures, which can be downloaded from here, are meant to represent workloads that might be encountered by VirtualGL and TurboVNC, two of the projects that drove the development of libjpeg-turbo in its early days. These images were chosen because, as a group, they contain a variety of frequencies and color counts, and quantitatively, they were among the most difficult Viewperf frame captures to compress. They all had below-average compression ratios, but none of them were corner cases.

The photograph was chosen because its performance was typical of other photographic images in the test image set. The ray-traced image was chosen for its high color count. In general, the ray-traced image performed similarly to the Pro/E frame capture, so future tests may omit the latter.

Test Methodology

For each test image, the tjbench program (available in the libjpeg-turbo distribution) was used to measure the compression and decompression performance and the compression ratio obtained when compressing the test image as a JPEG image and then decompressing the JPEG image back to its original, uncompressed form. tjbench measures these metrics for four chrominance subsampling settings: grayscale, 4:2:0, 4:2:2, and 4:4:4. The following tjbench command-line options were used:

tjbench {image} 95 -rgb -qq -nowrite -warmup 10

For comparison, tjbench was built and run against the (now obsolete) TurboJPEG/IPP library, which implements the TurboJPEG API on top of the Intel® Integrated Performance Primitives (actually, an older version of tjbench, called jpgtest, was used, since it still supports the legacy version of the TurboJPEG API used by TurboJPEG/IPP.)

Additionally, the TurboJPEG wrapper library from libjpeg-turbo was built against libjpeg v6b and v8d, and tjbench was thus used to test the performance of those libraries. A repository of the IJG code releases with TurboJPEG and tjbench added can be found here.

Results

The raw data can be found in the following spreadsheet: libjpegturbo-1.5.ods

Speedup Relative to Other Codecs, Non-Grayscale Compression/Decompression (1.0 = equal performance) [Linux]
 x86-64x86
 CompressionDecompressionCompressionDecompression
libjpeg v6b3.47 - 6.002.08 - 3.433.41 - 5.682.35 - 4.08
libjpeg v8d3.46 - 6.911.89 - 4.313.22 - 6.112.42 - 4.77
Intel® IPP v7.10.989 - 1.340.822 - 1.170.809 - 1.130.693 - 1.29
Speedup Relative to Other Codecs, Non-Grayscale Compression/Decompression (1.0 = equal performance) [Android & iOS]
 Armv8Armv7
 CompressionDecompressionCompressionDecompression
libjpeg v6b2.91 - 4.111.76 - 2.593.04 - 3.961.70 - 3.57

In the most general terms, libjpeg-turbo is 2.1 - 6.0x as fast as libjpeg v6b and 1.9 - 6.9x as fast as libjpeg v8d when running under Linux on x86 desktop CPUs. libjpeg-turbo is also 82 - 134% as fast as Intel® IPP when using x86-64 code and 69 - 129% as fast as Intel® IPP when using x86 code on the same systems.

libjpeg-turbo's primary weakness relative to IPP is 32-bit decompression performance, particularly on Intel processors. This is largely due to the Huffman decoder running out of registers and having to swap some inner loop variables back and forth from memory. The Huffman codec optimizations in libjpeg-turbo reduced this effect somewhat, but it could not be eliminated entirely. IPP has some weaknesses, however. Perhaps the most notable is that the 64-bit version requires SSE3 code in order to perform optimally, so libjpeg-turbo will have a clear advantage on older Opteron and Athlon64 systems that lack the SSE3 instruction set.

Fast Upsampling and Fast IDCT

The TurboJPEG wrapper library, which was used when benchmarking libjpeg-turbo, is configured to use settings that duplicate, as closely as possible, the image quality of Intel® IPP. Thus, the fast integer forward DCT, the slow integer inverse DCT, and slow (AKA "fancy" or "smooth") chrominance upsampling are enabled in the underlying libjpeg API (NOTE: the TurboJPEG wrapper uses the slow integer forward DCT for JPEG qualities 96-100, but the above tests were all conducted with quality=95.)

Enabling fast (AKA "merged") chrominance upsampling in the decompressor improved decompression performance for chrominance-subsampled JPEGs by as much as 15-20% relative to the results above. Whether or not the quality loss from merged upsampling is noticeable depends on the image. As the name implies, "smooth" upsampling averages out the subsampling error in the chrominance planes, but this can also reduce the sharpness of lines and other high-frequency features in the image. Merged upsampling can sometimes produce a sharper output image, but one in which the subsampling error is more visible. Enabling fast chrominance upsampling can be accomplished by passing a flag of TJ_FASTUPSAMPLE to tjDecompress*() (TurboJPEG API) or by setting cinfo.do_fancy_upsampling=FALSE (libjpeg API.)

Enabling the fast IDCT in the decompressor improved decompression performance across the board by 4-14%, relative to the results above, with very little overall loss in quality. Enabling the fast IDCT can be accomplished by passing a flag of TJ_FASTDCT to tjDecompress*() (TurboJPEG API) or by setting cinfo.dct_method=JDCT_FASTEST (libjpeg API.)

Creative Commons LicenseAll content on this web site is licensed under the Creative Commons Attribution 2.5 License. Any works containing material derived from this web site must cite The libjpeg-turbo Project as the source of the material and list the current URL for the libjpeg-turbo web site.

Edit - History - Print - Recent Changes - Search
Page last modified on February 02, 2021, at 10:15 AM