◆ Revisiting: residual blocks in a GAN ◆
In September 2021 I wrote Aplicação de redes residuais em GAN - a class project where I bolted ResNet-style residual blocks onto a small GAN trained on the Stanford Dogs dataset. The output was not great. I had a model that mostly produced shapes that looked vaguely like a dog if you squinted, and a paragraph at the end honestly admitting that it had not converged.
Reading it now, the more interesting story is which half of that experiment aged well and which half aged badly.
What aged badly: the GAN
GANs lost. The generative landscape after 2022 was taken over by diffusion models - DDPM, Stable Diffusion, Imagen, SDXL - and by 2024 essentially nobody was training a new GAN for image generation as a default. The discriminator-vs-generator adversarial setup was elegant; it was also notoriously unstable, mode-collapse-prone, and unforgiving of hyperparameters. Diffusion sidestepped all of that by reframing generation as iterative denoising.
So a 2021 student doing what I did would not pick a GAN today. They would pick a small diffusion model and learn an architecture that the field is still actively building on.
What aged well: the residuals
The residual block won completely. It is now invisible because it is everywhere:
- The U-Net at the heart of every diffusion model is residual end to end.
- Every modern Transformer (which is to say, every LLM) is built around residual connections - the
x + Attention(x)andx + MLP(x)pattern is the entire skeleton. - ConvNeXt, EfficientNetV2, segmentation backbones - all residual.
The 2015 ResNet paper's claim - that skip connections let you train networks much deeper without vanishing gradients - turned out to be a structural truth about deep nets, not just a CNN trick. The little experiment in the 2021 post stumbled into a piece of plumbing that became universal.
The takeaway
The conclusion of the 2021 post said "residuals helped, even though the model didn't converge." That instinct turned out to be correct, just for completely different reasons than I expected at the time. The architecture I was testing it inside lost the field; the technique I was bolting on quietly became the field. Worth keeping in mind as a general pattern - the trick you pick up while doing something that doesn't work is sometimes more durable than whatever you were actually trying to build.