Change language

Neural networks like DALL-E 2 memorize individual images from the training data and output them when generating

Diffusion neural networks such as DALL-E 2, Imagen and Stable Diffusion memorize individual images from training data and output them during generation, researchers found. The preprint of the research paper has been published on arXiv. As the authors showed, diffusion neural networks are much less private than previous generative models, such as GaN. New advances in privacy-preserving learning may be needed to address the vulnerabilities.

Neural networks like DALL-E 2 memorize individual images from the training data and output them when generating

Diffusion models for image generation are gaining popularity by leaps and bounds. Read more about them in @Nikuson's article. But in a nutshell, in training the model takes the original image, distorts it beyond recognition by adding noise, and then from the unrecognizable pile of pixels it learns to assemble the image using noise reduction.

Compared to previous models, such as the generative-adversarial network or the variational autoencoder, diffusion models generate higher quality samples, and these models are easier to scale and maintain. But the main popularity of diffusion neural networks is due to their ability to generate new images, ostensibly not similar to those in the training dataset. That is, the models should not recreate the images generated by training when prompted.

Researchers have shown that diffusion models do generate training-processed images when prompted. In their experiments, they obtained hundreds of images that were absolutely identical to those in the training data. Some of them make it possible to fully identify the identity of the person whose picture was in the training set.

There is a definite tradeoff between privacy and performance that leads to this effect. Increased model performance leads to increased memorization and decreased privacy. With the desire and knowledge of the principles of the diffusion neural network, an attacker with certain attacks can gradually pull out the images that are in the training data. The method of deduplication does not help to fully protect against this effect, but the authors still recommend to use it to minimize the risks as much as possible. Also, the authors, with the help of the attacks they presented, suggest that regular model audits should be conducted to check privacy.

As the study shows, modern diffusion models remember twice as much as comparable generative models, and more useful diffusion models remember more than weaker diffusion models. According to the authors, this suggests that the vulnerability of generative image models may increase over time. At the moment, the percentage of duplicates issued is relatively low, but it's only a matter of time, the researchers believe.

The work raises questions about the memorization and generalization capabilities of diffusion models. The authors strongly discourage the use of diffusion models on privacy-sensitive data.

Shop

Gifts for programmers

Learn programming in R: courses

$FREE
Gifts for programmers

Best Python online courses for 2022

$FREE
Gifts for programmers

Best laptop for Fortnite

$399+
Gifts for programmers

Best laptop for Excel

$
Gifts for programmers

Best laptop for Solidworks

$399+
Gifts for programmers

Best laptop for Roblox

$399+
Gifts for programmers

Best computer for crypto mining

$499+
Gifts for programmers

Best laptop for Sims 4

$

Latest questions

PythonStackOverflow

Common xlabel/ylabel for matplotlib subplots

1947 answers

PythonStackOverflow

Check if one list is a subset of another in Python

1173 answers

PythonStackOverflow

How to specify multiple return types using type-hints

1002 answers

PythonStackOverflow

Printing words vertically in Python

909 answers

PythonStackOverflow

Python Extract words from a given string

798 answers

PythonStackOverflow

Why do I get "Pickle - EOFError: Ran out of input" reading an empty file?

606 answers

PythonStackOverflow

Python os.path.join () method

384 answers

PythonStackOverflow

Flake8: Ignore specific warning for entire file

360 answers

News


Wiki

Python | How to copy data from one Excel sheet to another

Common xlabel/ylabel for matplotlib subplots

Check if one list is a subset of another in Python

How to specify multiple return types using type-hints

Printing words vertically in Python

Python Extract words from a given string

Cyclic redundancy check in Python

Finding mean, median, mode in Python without libraries

Python add suffix / add prefix to strings in a list

Why do I get "Pickle - EOFError: Ran out of input" reading an empty file?

Python - Move item to the end of the list

Python - Print list vertically