ciFAIR

A duplicate-free variant of the CIFAR test set.

Download ciFAIR-10 Download ciFAIR-100 Paper

The test sets of the popular CIFAR-10 and CIFAR-100 datasets contain 3.25% and 10% duplicate images, respectively, i.e., images that can also be found in very similar form in the training set or the test set itself. ciFAIR-10 and ciFAIR-100 are variants of these datasets with modified test sets, where all these duplicates have been replaced with new images.

Details about how we found duplicates in CIFAR and created ciFAIR can be found in the following paper:

Do We Train on Test Data? Purging CIFAR of Near-Duplicates.
Björn Barz and Joachim Denzler.
Journal of Imaging, 6(6):41, 2020.

The training sets have remained unchanged and are identical to those of CIFAR.

We encourage everyone training models on CIFAR to evaluate them on the ciFAIR test sets for an unbiased comparison. Download links can be found at the top of the page. We provide code for loading both datasets with several deep learning frameworks below. If you use ciFAIR, please cite the paper mentioned above.

Both datasets have the same structure as CIFAR and are intended to be used as drop-in replacements. However, there are two compatibility issues:

  • The test set pickle files do not contain an item 'filenames'.
  • The test set pickle files cannot be loaded with Python 2. We were not able to save them in a compatible pickle format due to this bug. If you find a way to achieve this, please create a pull request.

If you are interested in the actual duplicate images we have found in the original CIFAR datasets, you can find lists of these here.

Leaderboard & Pre-Trained Models

We maintain a community-driven leaderboard of CNN architectures for image classification on ciFAIR. Methods are sorted by their error rate on the ciFAIR-100 test set and the best value in each column is highlighted in bold font.
Architectures are linked to the corresponding paper. Clicking on the name of the CNN framework used for a certain architecture will bring you to the source code used for training the model.

Architecture Code Params CIFAR-10 ciFAIR-10 CIFAR-100 ciFAIR-100 Pre-Trained Models
PyramidNet-272-200 PyTorch 26.6 M 3.58% 4.00% 17.05% 19.38% CIFAR-10 / CIFAR-100
ResNeXt-29 (8x64d) PyTorch 34.5 M 3.56% 3.95% 18.38% 20.84% CIFAR-10 / CIFAR-100
DenseNet-BC (L=190, k=40) PyTorch 25.2 M 3.90% 4.20% 18.62% 21.02% CIFAR-10 / CIFAR-100
WRN-28-10 (Py)Torch 36.5 M 3.89% 4.25% 18.95% 21.48% CIFAR-10 / CIFAR-100
ResNet-110 Keras 1.7 M 5.26% 5.77% 26.05% 29.25% CIFAR-10 / CIFAR-100
Plain-11 Keras 3.4 M 5.91% 6.43% 27.82% 31.34% CIFAR-10 / CIFAR-100

If you think a certain architecture should be included in this leaderboard, your pull request is very welcome.

Data Loaders

PyTorch

Download PyTorch data loader

Requires torchvision.

Simply do from cifair import ciFAIR10, ciFAIR100 and use these two classes just like the CIFAR data loaders from torchvision.

Keras / tf.keras

Download Keras data loader Download tf.keras data loader

The Keras data loader requires keras >= 2.0.3, the tf.keras version requires tensorflow >= 1.9.

Usage:

import cifair
(X_train, y_train), (X_test, y_test) = cifair.load_cifair10()