Zero-shot-Gnerative-Models

In this repository, we showcase a wide range of Generative Adversarial Networks (GANs) and Variational Auto-Encoder (VAEs) methods applied to the Generalized Zero-shot Learning task.

Here, we have omitted the creation of the datasets. Please refer to other Generalized Zero-shot Learning papers for dataset downloads and place them in the "datasets" folder. Our framework is suitable for APY, AWA1, AWA2, CUB (with 312-dim attributes or 1024-dim attributes), FLO and SUN. The relevant experiments have been conducted.

First, we will showcase the results produced by various generative models under the most common GZSL training process. The specific results are as follows:

Methods	Source	Unseen Accuracy	Seen Accuracy	Harmonic Mean
AE	-	48.77%	56.92%	52.53%
BIR VAE	Bounded Information Rate Variational AutoEncoder	53.00%	57.35%	55.09%
VAE	-	57.87%	68.86%	62.89%
BEGAN	BEGAN: Boundary Equilibrium GAN	48.74%	63.37%	55.10%
DRGAN	DRAGAN: Deep Regret Analytic GAN	56.66%	69.24%	62.32%
fGAN-total_variation	f-Divergence GANs	58.35%	67.73%	62.69%
fGAN-forward_kl	-	57.73%	72.00% 🔵	64.08%
fGAN-reverse_kl	-	60.11% 🔵	67.83%	63.73%
fGAN-pearson	-	57.74%	72.56% 🔴	64.18% 🔵
fGAN-hellinger	-	58.25%	69.64%	63.44%
fGAN-jensen_shannon	-	58.76%	69.59%	63.72%
FisherGAN	-	59.43%	67.13%	63.05%
InfoGAN	-	58.06%	70.34%	63.61%
LSGAN	LSGAN: Least Squares GAN	54.73%	65.37%	59.58%
MMGAN	Mini-max GAN	53.59%	62.50%	57.50%
NSGAN	Non-saturating GAN	54.87%	63.87%	59.03%
RaNSGAN	Relativistic GAN	46.73%	59.32%	52.28%
WGAN	Feature Generating Networks for Zero-Shot Learning	57.9%	61.4%	59.6%
CramerGAN	The Cramer distance as a solution to biased Wasserstein gradients	60.62% 🔴	70.36%	65.13% 🔴

Note:

All the results mentioned above are based on experiments conducted on AWA1, where generally generating 2,000 unseen samples yields the optimal results. However, on AWA2, it usually takes creating 4,000 unseen samples to achieve the best outcome. For CUB and SUN, typically 400-500 unseen samples are enough, and the same goes for FLO and AWA1.
Although CramerGAN exhibits excellent performance, it cannot escape the phenomenon of mode collapse. If a situation arises where the Unseen Accuracy is zero, please change the value of the seed, as it can play a crucial role!

Certainly, in addition to various generative models, we also demonstrate three very useful tricks:

Replacing the original WGAN training with the CramerGAN training.
Applying a certain degree of Gaussian noise separately on seen and unseen attributes.
Utilizing the generative prototypes classifier.

These three tricks will significantly enhance the final results!

Under the influence of these three tricks, the current performance of GZSL is primarily represented as follows (The reason we are not considering AWA1 is that we have also tested various GZSL results with different backbones, and the AWA1 image dataset has not been open-sourced):

Datasets	Unseen Accuracy	Seen Accuracy	Harmonic Mean
APY	37.1%	59.3%	45.6%
FLO	61.5%	74.8%	67.5%
AWA2	64.4%	78.7%	70.8%
CUB	49.2%	61.6%	55.5%
SUN	48.9%	37.9%	42.7%

Next, we will showcase the results of the GZSL task using pretrained models on ImageNet-1K and models trained from scratch on these datasets, employing backbones such as ConvNet, Vision Transformer, and Swin Transformer. The resulting outcomes are as follows:

Dataset	Pretrained Model	Unseen Accuracy	Seen Accuracy	Harmonic Mean
AWA2	convnext_tiny_1k_224_ema	48.78% 🔴	64.99% 🔵	55.73% 🔴
	convnext_tiny_1k_224_ema_from_scratch	37.77%	53.05%	44.13%
	convnext_small_1k_224_ema	45.83% 🔵	67.94% 🔴	54.73% 🔵
	convnext_small_1k_224_ema_from_scratch	37.54%	50.58%	43.09%
	convnext_base_1k_224_ema	44.61%	64.32%	52.68%
	convnext_base_1k_224_ema_from_scratch	34.98%	52.58%	42.01%
	convnext_large_1k_224_ema	45.23%	57.07%	50.46%
	convnext_large_1k_224_ema_from_scratch	33.90%	52.05%	41.06%
CUB	convnext_tiny_1k_224_ema	33.67% 🔴	28.98%	31.15% 🔵
	convnext_tiny_1k_224_ema_from_scratch	28.35%	30.79%	29.52%
	convnext_small_1k_224_ema	32.21% 🔵	30.62%	31.39% 🔴
	convnext_small_1k_224_ema_from_scratch	27.77%	34.64%	30.83%
	convnext_base_1k_224_ema	29.53%	30.53%	30.02%
	convnext_base_1k_224_ema_from_scratch	27.00%	36.03% 🔴	30.87%
	convnext_large_1k_224_ema	24.37%	35.56% 🔵	28.92%
	convnext_large_1k_224_ema_from_scratch	28.19%	34.41%	30.99%
CUB2	convnext_tiny_1k_224_ema	27.77%	31.25%	29.41%
	convnext_tiny_1k_224_ema_from_scratch	27.14%	31.81%	29.29%
	convnext_small_1k_224_ema	23.13%	32.46%	27.01%
	convnext_small_1k_224_ema_from_scratch	33.84% 🔴	33.25%	33.54% 🔴
	convnext_base_1k_224_ema	23.63%	23.32%	23.48%
	convnext_base_1k_224_ema_from_scratch	29.51% 🔵	35.46% 🔴	32.21% 🔵
	convnext_large_1k_224_ema	21.12%	21.37%	21.24%
	convnext_large_1k_224_ema_from_scratch	26.29%	35.36% 🔵	30.16%
SUN	convnext_tiny_1k_224_ema	37.99%	26.74%	31.39%
	convnext_tiny_1k_224_ema_from_scratch	20.00%	10.58%	13.84%
	convnext_small_1k_224_ema	43.40% 🔴	26.43%	32.86% 🔴
	convnext_small_1k_224_ema_from_scratch	20.07%	12.64%	15.51%
	convnext_base_1k_224_ema	36.67% 🔵	29.22% 🔴	32.53% 🔵
	convnext_base_1k_224_ema_from_scratch	21.39%	13.37%	16.46%
	convnext_large_1k_224_ema	35.07%	28.91% 🔵	31.70%
	convnext_large_1k_224_ema_from_scratch	17.22%	12.02%	14.16%
aPY	convnext_tiny_1k_224_ema	11.43%	34.61%	17.18%
	convnext_tiny_1k_224_ema_from_scratch	17.69%	36.96%	23.92% 🔵
	convnext_small_1k_224_ema	25.53% 🔵	15.41%	19.22%
	convnext_small_1k_224_ema_from_scratch	16.91%	17.11%	17.01%
	convnext_base_1k_224_ema	13.51%	54.68% 🔴	21.66%
	convnext_base_1k_224_ema_from_scratch	18.93%	19.03%	18.98%
	convnext_large_1k_224_ema	26.52% 🔴	37.23% 🔵	30.98% 🔴
	convnext_large_1k_224_ema_from_scratch	17.48%	22.78%	19.78%
FLO	convnext_tiny_1k_224_ema	27.28%	10.39%	15.05%
	convnext_tiny_1k_224_ema_from_scratch	50.20% 🔵	56.43%	53.13%
	convnext_small_1k_224_ema	21.17%	14.18%	16.99%
	convnext_small_1k_224_ema_from_scratch	49.96%	66.62%	57.10%
	convnext_base_1k_224_ema	21.59%	19.64%	20.57%
	convnext_base_1k_224_ema_from_scratch	53.90% 🔴	72.50% 🔵	61.83% 🔴
	convnext_large_1k_224_ema	26.01%	31.22%	28.38%
	convnext_large_1k_224_ema_from_scratch	48.35%	78.45% 🔴	59.83% 🔵

Datasets	Pretrained Model	Unseen Accuracy	Seen Accuracy	Harmonic Mean
AWA2	ViT_base_patch16_224	67.81% 🔴	76.95% 🔵	72.10% 🔵
	ViT_base_patch16_224_from_scratch	46.63%	61.48%	53.04%
	ViT_base_patch32_224	60.91%	72.74%	66.30%
	ViT_base_patch32_224_from_scratch	61.14%	71.71%	66.00%
	ViT_large_patch16_224	66.69% 🔵	84.15% 🔴	74.41% 🔴
	ViT_large_patch16_224_from_scratch	56.70%	73.21%	63.91%
CUB	ViT_base_patch16_224	55.84% 🔵	66.00% 🔴	60.50% 🔵
	ViT_base_patch16_224_from_scratch	46.15%	60.45%	52.34%
	ViT_base_patch32_224	55.53%	62.26%	58.70%
	ViT_base_patch32_224_from_scratch	51.76%	59.03%	55.15%
	ViT_large_patch16_224	52.54%	62.40%	57.05%
	ViT_large_patch16_224_from_scratch	57.67% 🔴	65.43% 🔵	61.31% 🔴
CUB2	ViT_base_patch16_224	62.86% 🔵	63.03%	62.94%
	ViT_base_patch16_224_from_scratch	58.26%	60.90%	59.55%
	ViT_base_patch32_224	59.10%	57.71%	58.40%
	ViT_base_patch32_224_from_scratch	62.53%	58.26%	60.32%
	ViT_large_patch16_224	62.56%	64.84% 🔵	63.68% 🔵
	ViT_large_patch16_224_from_scratch	69.45% 🔴	67.13% 🔴	68.27% 🔴
SUN	ViT_base_patch16_224	50.21%	49.11% 🔴	49.65% 🔵
	ViT_base_patch16_224_from_scratch	37.57%	35.00%	36.24%
	ViT_base_patch32_224	50.28%	40.97%	45.15%
	ViT_base_patch32_224_from_scratch	42.57%	41.12%	41.83%
	ViT_large_patch16_224	56.94% 🔴	48.68% 🔵	52.49% 🔴
	ViT_large_patch16_224_from_scratch	51.67% 🔵	45.04%	48.13%
aPY	ViT_base_patch16_224	18.80%	28.53%	22.67%
	ViT_base_patch16_224_from_scratch	21.93%	37.21% 🔴	27.59% 🔵
	ViT_base_patch32_224	20.75%	26.62%	23.32%
	ViT_base_patch32_224_from_scratch	25.72% 🔴	26.86%	26.28%
	ViT_large_patch16_224	21.00%	32.20%	25.42%
	ViT_large_patch16_224_from_scratch	24.01% 🔵	36.68% 🔵	29.03% 🔴
FLO	ViT_base_patch16_224	67.73% 🔵	56.07%	61.35%
	ViT_base_patch16_224_from_scratch	70.02% 🔴	72.82%	71.39% 🔴
	ViT_base_patch32_224	58.66%	52.35%	55.33%
	ViT_base_patch32_224_from_scratch	60.07%	75.80% 🔵	67.03%
	ViT_large_patch16_224	57.27%	75.59%	65.16%
	ViT_large_patch16_224_from_scratch	59.96%	88.54% 🔴	71.50% 🔵

Datasets	Pretrained Model	Unseen Accuracy	Seen Accuracy	Harmonic Mean
AWA2	swin_tiny_patch4_window7_224	52.04% 🔴	60.55%	55.97%
	swin_tiny_patch4_window7_224_from_scratch	32.68%	43.20%	37.21%
	swin_small_patch4_window7_224	50.46% 🔵	64.90%	56.78%
	swin_small_patch4_window7_224_from_scratch	36.27%	44.70%	40.05%
	swin_base_patch4_window7_224	48.85%	73.64% 🔴	58.74% 🔴
	swin_base_patch4_window7_224_from_scratch	33.90%	44.16%	38.35%
	swin_base_patch4_window12_384	48.08%	73.02% 🔵	57.98% 🔵
	swin_base_patch4_window12_384_from_scratch	34.98%	45.46%	39.54%
CUB1	swin_tiny_patch4_window7_224	33.04%	22.39%	26.69%
	swin_tiny_patch4_window7_224_from_scratch	23.47%	20.31%	21.77%
	swin_small_patch4_window7_224	36.56% 🔴	27.36%	31.30%
	swin_small_patch4_window7_224_from_scratch	22.89%	24.62%	23.73%
	swin_base_patch4_window7_224	33.86%	30.61% 🔵	32.16% 🔵
	swin_base_patch4_window7_224_from_scratch	22.25%	26.29%	24.10%
	swin_base_patch4_window12_384	34.60% 🔵	32.79% 🔴	33.67% 🔴
	swin_base_patch4_window12_384_from_scratch	1.65%	1.12%	1.33%
CUB2	swin_tiny_patch4_window7_224	39.14% 🔴	32.32%	35.40% 🔴
	swin_tiny_patch4_window7_224_from_scratch	14.19%	16.04%	15.06%
	swin_small_patch4_window7_224	34.60% 🔵	32.79% 🔵	33.67% 🔵
	swin_small_patch4_window7_224_from_scratch	18.70%	17.40%	18.03%
	swin_base_patch4_window7_224	32.07%	33.97% 🔴	33.00%
	swin_base_patch4_window7_224_from_scratch	20.38%	22.27%	21.28%
	swin_base_patch4_window12_384	33.31%	31.49%	32.38%
	swin_base_patch4_window12_384_from_scratch	1.70%	0.89%	1.17%
SUN	swin_tiny_patch4_window7_224	42.78%	16.28%	23.58%
	swin_tiny_patch4_window7_224_from_scratch	16.32%	14.65%	15.44%
	swin_small_patch4_window7_224	42.92% 🔵	23.45%	30.33% 🔵
	swin_small_patch4_window7_224_from_scratch	16.18%	13.29%	14.60%
	swin_base_patch4_window7_224	42.99% 🔴	26.98% 🔵	33.15% 🔴
	swin_base_patch4_window7_224_from_scratch	15.63%	8.49%	11.00%
	swin_base_patch4_window12_384	40.90%	27.87% 🔴	33.15% 🔴
	swin_base_patch4_window12_384_from_scratch	1.11%	0.58%	0.76%
aPY	swin_tiny_patch4_window7_224	10.52%	27.53% 🔵	15.23%
	swin_tiny_patch4_window7_224_from_scratch	14.47%	21.56%	17.32%
	swin_small_patch4_window7_224	17.30%	19.44%	18.31%
	swin_small_patch4_window7_224_from_scratch	14.48%	26.98%	18.84%
	swin_base_patch4_window7_224	21.52% 🔵	20.79%	21.15% 🔵
	swin_base_patch4_window7_224_from_scratch	17.51%	22.84%	19.82%
	swin_base_patch4_window12_384	25.60% 🔴	27.18%	26.36% 🔴
	swin_base_patch4_window12_384_from_scratch	15.40%	32.54% 🔴	20.90%

Name		Name	Last commit message	Last commit date
Latest commit History 42 Commits
Encoder-Decoder		Encoder-Decoder
GAN		GAN
README.md		README.md
classifier.py		classifier.py
classifier_with_class_norm.py		classifier_with_class_norm.py
main.py		main.py
models.py		models.py
pre_classifier.py		pre_classifier.py
util.py		util.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Zero-shot-Gnerative-Models

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Zero-shot-Gnerative-Models

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages