Change language

REBAR pytorch implementation

| |

High variance gradient estimators make learning difficult in models with discrete latent variables. In general, methods have relied on control variables to lower the REINFORCE estimator's variance. In more recent work, a continuous relaxation of discrete variables has been used to provide low-variance but biased gradient estimates (Jang et al. 2016, Maddison et al. 2016).

REBAR pytorch

Through a novel control variate that generates low-variance, "emph-unbiased" gradient estimates, we combine the two techniques in this work. The tightness of the relaxation can then be modified online, eliminating it as a hyperparameter, after which we make a modification to the continuous relaxation. On a number of benchmark generative modeling problems, we demonstrate state-of-the-art variance reduction, which typically promotes faster convergence to a better final log-likelihood.

In machine learning, discrete latent variable models are ubiquitous.
mixture models, Markov decision processes for reinforcement learning (RL), generative models for structured prediction, and most recently hard attention models (Mnih et al., 2014) and memory networks (Zaremba & Sutskever, 2015).

However, if the individual latent variables cannot be analytically marginalized, maximizing the objective across these models using methods such as REINFORCE (Williams, 1992) can be achieved using the variance gradients obtained from the samples. This is difficult due to the high estimate of . Most approaches to reduce this variance focus on developing clever control variables (Mnih & Gregor, 2014; Titias & Lázaro-Gredilla, 2015; Gu et al., 2015; Mnih & Rezende, 2016 ).

Recently, Jang et al. (2016) and Madison et al. (2016) originally introduced a new distribution that continuously relaxes discrete random variables: the Gumbel softmax or concrete distribution. Replacing each discrete random variable in the model with a concrete random variable leads to a continuous model to which the trick of reparameterization can be applied (Kingma & Welling, 2013; Rezende et al., 2014).

Gradients are biased for discrete models, but can be used effectively to optimize large models. The tightness of relaxation is controlled by the temperature hyperparameter.

The REBAR estimator is due to differences in reparameterization gradients and implicitly implements the recommendations from (Roeder et al., 2017).

When optimizing the relaxation temperature, we need the derivative according to the slope of the parameter. Empirically, temperature varies slowly with parameters, so multiple parameter updates may recover the cost of this operation. Investigation of these ideas is left to future work.

Investigating extensions to multi-sample cases (e.g. VIMCO (Mnih & Rezende, 2016)) and using hierarchical structures in models using the Q function, it is clear to apply this approach to reinforcement learning. .

At the lower temperature bound, the slope estimate is unbiased, but the variance of the slope estimator diverges, so the temperature must be adjusted to compensate for the bias and variance.

REBAR pytorch implementation in TensorFlow

Quick Start:

Requirements:

  • TensorFlow (see tensorflow.org for how to install)
  • MNIST dataset
  • Omniglot dataset

First download datasets by selecting URLs to download the data from. Then fill in the download_data.py script like so:

MNIST_URL = 'http://yann.lecun.com/exdb/mnist'
MNIST_BINARIZED_URL = 'http://www.cs.toronto.edu/~larocheh/public/datasets/binarized_mnist'
OMNIGLOT_URL = 'https://github.com/yburda/iwae/raw/master/datasets/OMNIGLOT/chardata.mat'

Then execute the script to download the data:

python download_data.py

Then run the model training script:

python rebar_train.py --hparams="model=SBNDynamicRebar,learning_rate=0.0003,n_layer=2,task=sbn"

and you should get something like the following:

Step 2084: [-231.026474      0.3711713     1.            1.06934261    1.07023323
    1.02173257    1.02171052    1.            1.            1.            1.        ]
-3.6465678215
Step 4168: [-156.86795044    0.3097114     1.            1.03964758    1.03936625
    1.02627242    1.02629256    1.            1.            1.            1.        ]
-4.42727231979
Step 6252: [-143.4650116     0.26153237    1.            1.03633797    1.03600132
    1.02639604    1.02639794    1.            1.            1.            1.        ]
-4.85577583313
Step 8336: [-137.65275574    0.22313026    1.            1.03467286    1.03428006
    1.02336085    1.02335203    0.99999988    1.            0.99999988
    1.        ]
-4.95563364029

The first number in the list is the log likelihood lower bound and the number after the list is the log of the variance of the gradient estimator. The rest of the numbers are for debugging.

We can also compare the variance between methods:

python rebar_train.py \
  --hparams="model=SBNTrackGradVariances,learning_rate=0.0003,n_layer=2,task=omni"

and you should see something like:

Step 959: [ -2.60478699e+02   3.84281784e-01   6.31126612e-02   3.27319391e-02
   6.13379292e-03   1.98278503e-04   1.96425783e-04   8.83973844e-04
   8.70995224e-04             -inf]
('DynamicREBAR', -3.725339889526367)
('MuProp', -0.033569782972335815)
('NVIL', 2.7640280723571777)
('REBAR', -3.539274215698242)
('SimpleMuProp', -0.040744658559560776)
Step 1918: [ -2.06948471e+02   3.35904926e-01   5.20901568e-03   7.81541676e-05
   2.06885766e-03   1.08521657e-04   1.07351625e-04   2.30646547e-04
   2.26554010e-04  -8.22885323e+00]
('DynamicREBAR', -3.864381790161133)
('MuProp', -0.7183765172958374)
('NVIL', 2.266523599624634)
('REBAR', -3.662022113800049)
('SimpleMuProp', -0.7071359157562256)

where the tuples show the log of the variance of the gradient estimators.

REBAR pytorch implementation example

Source: https://github.com/pemami4911/REBAR-pytorch

For the Sigmoid Belief Network model and the binarized MNIST benchmark, we attempted to develop REBAR. The performance of my implementation in comparison to the authors' Tensorflow implementation is shown below. Both models employ one nonlinear stochastic layer, a fixed temperature of 0.5, and a fixed eta of 1.0 in this run (the parameter that multiplies the gumbel control variate and is normally optimized with a variance objective).

REBAR pytorch implementation

Here are results for 1 nonlinear stochastic layer on binarized MNIST from the paper:

REBAR pytorch implementation
REBAR pytorch implementation

Shop

Learn programming in R: courses

$

Best Python online courses for 2022

$

Best laptop for Fortnite

$

Best laptop for Excel

$

Best laptop for Solidworks

$

Best laptop for Roblox

$

Best computer for crypto mining

$

Best laptop for Sims 4

$

Latest questions

NUMPYNUMPY

Common xlabel/ylabel for matplotlib subplots

12 answers

NUMPYNUMPY

How to specify multiple return types using type-hints

12 answers

NUMPYNUMPY

Why do I get "Pickle - EOFError: Ran out of input" reading an empty file?

12 answers

NUMPYNUMPY

Flake8: Ignore specific warning for entire file

12 answers

NUMPYNUMPY

glob exclude pattern

12 answers

NUMPYNUMPY

How to avoid HTTP error 429 (Too Many Requests) python

12 answers

NUMPYNUMPY

Python CSV error: line contains NULL byte

12 answers

NUMPYNUMPY

csv.Error: iterator should return strings, not bytes

12 answers

News


Wiki

Python | How to copy data from one Excel sheet to another

Common xlabel/ylabel for matplotlib subplots

Check if one list is a subset of another in Python

sin

How to specify multiple return types using type-hints

exp

Printing words vertically in Python

exp

Python Extract words from a given string

Cyclic redundancy check in Python

Finding mean, median, mode in Python without libraries

cos

Python add suffix / add prefix to strings in a list

Why do I get "Pickle - EOFError: Ran out of input" reading an empty file?

Python - Move item to the end of the list

Python - Print list vertically