The test sets of the popular CIFAR-10 and CIFAR-100 datasets contain 3.25% and 10% duplicate images, respectively, i.e., images that can also be found in very similar form in the training set or the test set itself. ciFAIR-10 and ciFAIR-100 are variants of these datasets with modified test sets, where all these duplicates have been replaced with new images.
Details about how we found duplicates in CIFAR and created ciFAIR can be found in the following paper:
Do We Train on Test Data? Purging CIFAR of Near-Duplicates.
Björn Barz and Joachim Denzler.
Journal of Imaging, 6(6):41, 2020.
The training sets have remained unchanged and are identical to those of CIFAR.
We encourage everyone training models on CIFAR to evaluate them on the ciFAIR test sets for an unbiased comparison. Download links can be found at the top of the page. We provide code for loading both datasets with several deep learning frameworks below. If you use ciFAIR, please cite the paper mentioned above.
Both datasets have the same structure as CIFAR and are intended to be used as drop-in replacements. However, there are two compatibility issues:
- The test set pickle files do not contain an item
'filenames'
. - The test set pickle files cannot be loaded with Python 2. We were not able to save them in a compatible pickle format due to this bug. If you find a way to achieve this, please create a pull request.
If you are interested in the actual duplicate images we have found in the original CIFAR datasets, you can find lists of these here.
Leaderboard & Pre-Trained Models
We maintain a community-driven leaderboard of CNN architectures for image classification on ciFAIR.
Methods are sorted by their error rate on the ciFAIR-100 test set and the best value in each column is highlighted in bold font.
Architectures are linked to the corresponding paper.
Clicking on the name of the CNN framework used for a certain architecture will bring you to the source code used for training the model.
Architecture | Code | Params | CIFAR-10 | ciFAIR-10 | CIFAR-100 | ciFAIR-100 | Pre-Trained Models |
---|---|---|---|---|---|---|---|
PyramidNet-272-200 | PyTorch | 26.6 M | 3.58% | 4.00% | 17.05% | 19.38% | CIFAR-10 / CIFAR-100 |
ResNeXt-29 (8x64d) | PyTorch | 34.5 M | 3.56% | 3.95% | 18.38% | 20.84% | CIFAR-10 / CIFAR-100 |
DenseNet-BC (L=190, k=40) | PyTorch | 25.2 M | 3.90% | 4.20% | 18.62% | 21.02% | CIFAR-10 / CIFAR-100 |
WRN-28-10 | (Py)Torch | 36.5 M | 3.89% | 4.25% | 18.95% | 21.48% | CIFAR-10 / CIFAR-100 |
ResNet-110 | Keras | 1.7 M | 5.26% | 5.77% | 26.05% | 29.25% | CIFAR-10 / CIFAR-100 |
Plain-11 | Keras | 3.4 M | 5.91% | 6.43% | 27.82% | 31.34% | CIFAR-10 / CIFAR-100 |
If you think a certain architecture should be included in this leaderboard, your pull request is very welcome.
Data Loaders
PyTorch
Requires torchvision
.
Simply do from cifair import ciFAIR10, ciFAIR100
and use these two classes just like the CIFAR data loaders from torchvision
.
Keras / tf.keras
Download Keras data loader Download tf.keras data loader
The Keras data loader requires keras >= 2.0.3
, the tf.keras version requires tensorflow >= 1.9
.
Usage:
import cifair
(X_train, y_train), (X_test, y_test) = cifair.load_cifair10()