StackOverflow

### Answer rating: 255

### Answer rating: 71

If I want to use the BatchNormalization function in Keras, then do I need to call it once only at the beginning?

I read this documentation for it: http://keras.io/layers/normalization/

I don"t see where I"m supposed to call it. Below is my code attempting to use it:

```
model = Sequential()
keras.layers.normalization.BatchNormalization(epsilon=1e-06, mode=0, momentum=0.9, weights=None)
model.add(Dense(64, input_dim=14, init="uniform"))
model.add(Activation("tanh"))
model.add(Dropout(0.5))
model.add(Dense(64, init="uniform"))
model.add(Activation("tanh"))
model.add(Dropout(0.5))
model.add(Dense(2, init="uniform"))
model.add(Activation("softmax"))
sgd = SGD(lr=0.1, decay=1e-6, momentum=0.9, nesterov=True)
model.compile(loss="binary_crossentropy", optimizer=sgd)
model.fit(X_train, y_train, nb_epoch=20, batch_size=16, show_accuracy=True, validation_split=0.2, verbose = 2)
```

I ask because if I run the code with the second line including the batch normalization and if I run the code without the second line I get similar outputs. So either I"m not calling the function in the right place, or I guess it doesn"t make that much of a difference.

Just to answer this question in a little more detail, and as Pavel said, Batch Normalization is just another layer, so you can use it as such to create your desired network architecture.

The general use case is to use BN between the linear and non-linear layers in your network, because it normalizes the input to your activation function, so that you"re centered in the linear section of the activation function (such as Sigmoid). There"s a small discussion of it here

In your case above, this might look like:

```
# import BatchNormalization
from keras.layers.normalization import BatchNormalization
# instantiate model
model = Sequential()
# we can think of this chunk as the input layer
model.add(Dense(64, input_dim=14, init="uniform"))
model.add(BatchNormalization())
model.add(Activation("tanh"))
model.add(Dropout(0.5))
# we can think of this chunk as the hidden layer
model.add(Dense(64, init="uniform"))
model.add(BatchNormalization())
model.add(Activation("tanh"))
model.add(Dropout(0.5))
# we can think of this chunk as the output layer
model.add(Dense(2, init="uniform"))
model.add(BatchNormalization())
model.add(Activation("softmax"))
# setting up the optimization of our weights
sgd = SGD(lr=0.1, decay=1e-6, momentum=0.9, nesterov=True)
model.compile(loss="binary_crossentropy", optimizer=sgd)
# running the fitting
model.fit(X_train, y_train, nb_epoch=20, batch_size=16, show_accuracy=True, validation_split=0.2, verbose = 2)
```

Hope this clarifies things a bit more.

This thread is misleading. Tried commenting on Lucas Ramadan"s answer, but I don"t have the right privileges yet, so I"ll just put this here.

Batch normalization works best after the activation function, and here or here is why: it was developed to prevent internal covariate shift. Internal covariate shift occurs when the distribution of the *activations* of a layer shifts significantly throughout training. Batch normalization is used so that the distribution of the inputs (and these inputs are literally the result of an activation function) to a specific layer doesn"t change over time due to parameter updates from each batch (or at least, allows it to change in an advantageous way). It uses batch statistics to do the normalizing, and then uses the batch normalization parameters (gamma and beta in the original paper) "to make sure that the transformation inserted in the network can represent the identity transform" (quote from original paper). But the point is that we"re trying to normalize the inputs to a layer, so it should always go immediately before the next layer in the network. Whether or not that"s after an activation function is dependent on the architecture in question.

Cloud computing provides the capability to use computing and storage resources on a metered basis and reduce the investments in an organization’s computing infrastructure. The spawning and deletion ...

10/07/2020

Managing and analyzing data have always offered the greatest benefits and the greatest challenges for organizations of all sizes and across all industries. Businesses have long struggled with finding ...

10/07/2020

The rate at which we produce data is growing steadily, thus creating even larger streams of continuously evolving data. Online news, micro-blogs, search queries are just a few examples of these contin...

10/07/2020

Topics on Big Data are growing rapidly. From the first 3 V’s that originally characterized Big Data, the industry now has identified 42 V’s associated with Big Data. The list of how we characteriz...

10/07/2020

X
# Submit new EBook