18-May-2013
7 ZIP Utilities
seven best zip archive utilities to choose from

By Utility

see details for:



By File Type

support for file type:

 





login

The Canterbury Corpus


The canterbury corpus is a set of files named after the university of Canterbury in New Zealand where they were developed. The files were designed in order to test different lossless compression algorithms against a standardized set of data. Comparing algorithms against a standardized set of data allows provides a way for testing which algorithm performs better in different criteria such as speed or compression ratios. The corpus however is only useful if algorithm designers are not using it for the development of the algorithm and are not optimizing their algorithms for perform best against the standardized corpus.

 

There are five canterbury corpus sets. Here is some information about each and a link for downloading the complete set:

 

Main canterbury corpus

the main set of files developed in 1997. The files were chosen for their quality of producing normal results for the algorithms that were available at the time. The prediction was that future algorithms would also yield normal results with those files.

 

Downlaod the full set canterbury corpus files

 

Artificial corpus

This file set was designed in order to test algorithms under worst case extreme conditions. Running compression algorithms on the artificial corpus will yield no relevant results but is designed mainly to test extreme conditions.

 

Download the artificial corpus files

 

Large corpus

a set of relatively large files intended to be used with compression algorithms that include a very large dictionary or that otherwise require a large set of data in order to ramp up to their highest ratio.

 

Download the large corpus files

 

Miscellaneous corpus

A set of files that were added over the years by algorithms creators and researches.

 

Download the large miscellaneous corpus files

 

Calgary corpus

An older set of files developed in the early 80s that is still used as the de-facto standard for comparion algorithms. The calgary corpus is now replaced by the canterbury corpus.

 

Download the large calgary corpus files