Diffusion neural networks such as DALL-E 2, Imagen and Stable Diffusion memorize individual images from training data and output them during generation, researchers found. The preprint of the research paper has been published on arXiv. As the authors showed, diffusion neural networks are much less private than previous generative models, such as GaN. New advances in privacy-preserving learning may be needed to address the vulnerabilities.

Diffusion models for image generation are gaining popularity by leaps and bounds. Read more about them in @Nikuson's article. But in a nutshell, in training the model takes the original image, distorts it beyond recognition by adding noise, and then from the unrecognizable pile of pixels it learns to assemble the image using noise reduction.
Compared to previous models, such as the generative-adversarial network or the variational autoencoder, diffusion models generate higher quality samples, and these models are easier to scale and maintain. But the main popularity of diffusion neural networks is due to their ability to generate new images, ostensibly not similar to those in the training dataset. That is, the models should not recreate the images generated by training when prompted.
Researchers have shown that diffusion models do generate training-processed images when prompted. In their experiments, they obtained hundreds of images that were absolutely identical to those in the training data. Some of them make it possible to fully identify the identity of the person whose picture was in the training set.
There is a definite tradeoff between privacy and performance that leads to this effect. Increased model performance leads to increased memorization and decreased privacy. With the desire and knowledge of the principles of the diffusion neural network, an attacker with certain attacks can gradually pull out the images that are in the training data. The method of deduplication does not help to fully protect against this effect, but the authors still recommend to use it to minimize the risks as much as possible. Also, the authors, with the help of the attacks they presented, suggest that regular model audits should be conducted to check privacy.
As the study shows, modern diffusion models remember twice as much as comparable generative models, and more useful diffusion models remember more than weaker diffusion models. According to the authors, this suggests that the vulnerability of generative image models may increase over time. At the moment, the percentage of duplicates issued is relatively low, but it's only a matter of time, the researchers believe.
The work raises questions about the memorization and generalization capabilities of diffusion models. The authors strongly discourage the use of diffusion models on privacy-sensitive data.