Python | Offset zeros at the end of the list

Python Methods and Functions | zeros

Method # 1: Using List Comprehension + isinstance()

In this method, we perform the shift operation in 2 steps. In step 1, we get all the values ​​we need to get in front, and at the end, we just push the zeros to the end. The isinstance method is used to filter the False entity.

# Python3 demo code
# Offset zeros at the end of the list
# using the comprehension list + isinstance ()

# initializing list

test_list = [ 1 , 4 , None , " Manjeet " , False , 0 , False , 0 , "Nikhil" ]

# prints the original list

print ( "The original list : " + str (test_list))

# using comprehension list + isinstance ()
# Shift zeros at the end of the list

temp = [ele for ele in test_list if ele or  

  ele is None or isinstance (ele, bool )]

res = temp + [ 0 ] * ( len (test_list) - len (temp))

# print result

print ( "The list after shifting 0's to e nd: " + str (res))


The original list: [ 1, 4, None, 'Manjeet', False, 0, False, 0, 'Nikhil']
The list after shifting 0's to end: [1, 4, None, 'Manjeet', False, False, ' Nikhil ', 0, 0]

Method # 2: Using List Comprehension + isinstance () + List Slicing

This method is the same as described above method, the only modification is that to reduce the number of steps, slice the list is used to attach zeros to complete the whole task in just 1 step.

# Python3 demo code
# Shift zeros at the end of the list
# use list comprehension + isinstanc e () + slice list

# initializing list

test_list = [ 1 , 4 , None , "Manjeet" , False , 0 , False , 0 , "Nikhil" ]

# print the original list

print ( "The origin al list: " + str (test_list))

# using list comprehension + isinstance () + list slicing
# Shift zeros at the end of the list

res = ([ele for ele in test_list if not isinstance (ele, int )

or ele or isinstance (ele, bool )]

+ [ 0 ] * len (test_list)) [: len (test_list)]

# print result

print ( "The list after shifting 0's to end:" + str (res))


The original list: [1, 4, None, 'Manjeet', False, 0, False, 0, 'Nikhil']
The list after shifting 0's to end: [1, 4, None, 'Manjeet', False, False, 'Nikhil', 0, 0]

Python | Offset zeros at the end of the list: StackOverflow Questions

Display number with leading zeros

Question by jeff


a = 1
b = 10
c = 100

How do I display a leading zero for all numbers with less than two digits?

This is the output I"m expecting:


Best way to format integer as string with leading zeros?

I need to add leading zeros to integer to make a string with defined quantity of digits ($cnt). What the best way to translate this simple function from PHP to Python:

function add_nulls($int, $cnt=2) {
    $int = intval($int);
    for($i=0; $i<($cnt-strlen($int)); $i++)
        $nulls .= "0";
    return $nulls.$int;

Is there a function that can do this?

List of zeros in python

How can I create a list which contains only zeros? I want to be able to create a zeros list for each int in range(10)

For example, if the int in the range was 4 I will get:


and for 7:


How to declare array of zeros in python (or an array of a certain size)

I am trying to build a histogram of counts... so I create buckets. I know I could just go through and append a bunch of zeros i.e something along these lines:

buckets = []
for i in xrange(0,100):

Is there a more elegant way to do it? I feel like there should be a way to just declare an array of a certain size.

I know numpy has numpy.zeros but I want the more general solution

Formatting floats without trailing zeros

How can I format a float so that it doesn"t contain trailing zeros? In other words, I want the resulting string to be as short as possible.

For example:

3 -> "3"
3. -> "3"
3.0 -> "3"
3.1 -> "3.1"
3.14 -> "3.14"
3.140 -> "3.14"

Convert to binary and keep leading zeros

I"m trying to convert an integer to binary using the bin() function in Python. However, it always removes the leading zeros, which I actually need, such that the result is always 8-bit:


bin(1) -> 0b1

# What I would like:
bin(1) -> 0b00000001

Is there a way of doing this?

How to remove leading and trailing zeros in a string? Python

I have several alphanumeric strings like these

listOfNum = ["000231512-n","1209123100000-n00000","alphanumeric0000", "000alphanumeric"]

The desired output for removing trailing zeros would be:

listOfNum = ["000231512-n","1209123100000-n","alphanumeric", "000alphanumeric"]

The desired output for leading trailing zeros would be:

listOfNum = ["231512-n","1209123100000-n00000","alphanumeric0000", "alphanumeric"]

The desire output for removing both leading and trailing zeros would be:

listOfNum = ["231512-n","1209123100000-n", "alphanumeric", "alphanumeric"]

For now i"ve been doing it the following way, please suggest a better way if there is:

listOfNum = ["000231512-n","1209123100000-n00000","alphanumeric0000", 
trailingremoved = []
leadingremoved = []
bothremoved = []

# Remove trailing
for i in listOfNum:
  while i[-1] == "0":
    i = i[:-1]

# Remove leading
for i in listOfNum:
  while i[0] == "0":
    i = i[1:]

# Remove both
for i in listOfNum:
  while i[0] == "0":
    i = i[1:]
  while i[-1] == "0":
    i = i[:-1]

Drop rows with all zeros in pandas data frame

I can use pandas dropna() functionality to remove rows with some or all columns set as NA"s. Is there an equivalent function for dropping rows with all columns having value 0?

P   kt  b   tt  mky depth
1   0   0   0   0   0
2   0   0   0   0   0
3   0   0   0   0   0
4   0   0   0   0   0
5   1.1 3   4.5 2.3 9.0

In this example, we would like to drop the first 4 rows from the data frame.


python how to pad numpy array with zeros

I want to know how I can pad a 2D numpy array with zeros using python 2.6.6 with numpy version 1.5.0. But these are my limitations. Therefore I cannot use np.pad. For example, I want to pad a with zeros such that its shape matches b. The reason why I want to do this is so I can do:


such that

>>> a
array([[ 1.,  1.,  1.,  1.,  1.],
       [ 1.,  1.,  1.,  1.,  1.],
       [ 1.,  1.,  1.,  1.,  1.]])
>>> b
array([[ 3.,  3.,  3.,  3.,  3.,  3.],
       [ 3.,  3.,  3.,  3.,  3.,  3.],
       [ 3.,  3.,  3.,  3.,  3.,  3.],
       [ 3.,  3.,  3.,  3.,  3.,  3.]])
>>> c
array([[1, 1, 1, 1, 1, 0],
       [1, 1, 1, 1, 1, 0],
       [1, 1, 1, 1, 1, 0],
       [0, 0, 0, 0, 0, 0]])

The only way I can think of doing this is appending, however this seems pretty ugly. is there a cleaner solution possibly using b.shape?

Edit, Thank you to MSeiferts answer. I had to clean it up a bit, and this is what I got:

def pad(array, reference_shape, offsets):
    array: Array to be padded
    reference_shape: tuple of size of ndarray to create
    offsets: list of offsets (number of elements must be equal to the dimension of the array)
    will throw a ValueError if offsets is too big and the reference_shape cannot handle the offsets

    # Create an array of zeros with the reference shape
    result = np.zeros(reference_shape)
    # Create a list of slices from offset to offset + shape in each dimension
    insertHere = [slice(offsets[dim], offsets[dim] + array.shape[dim]) for dim in range(array.ndim)]
    # Insert the array in the result at the specified offsets
    result[insertHere] = array
    return result

Test if numpy array contains only zeros

We initialize a numpy array with zeros as bellow:


But how do we check whether all elements in a given n*n numpy array matrix is zero.
The method just need to return a True if all the values are indeed zero.

Answer #1

If you like ascii art:

  • "VALID" = without padding:

       inputs:         1  2  3  4  5  6  7  8  9  10 11 (12 13)
                      |________________|                dropped
  • "SAME" = with zero padding:

                   pad|                                      |pad
       inputs:      0 |1  2  3  4  5  6  7  8  9  10 11 12 13|0  0

In this example:

  • Input width = 13
  • Filter width = 6
  • Stride = 5


  • "VALID" only ever drops the right-most columns (or bottom-most rows).
  • "SAME" tries to pad evenly left and right, but if the amount of columns to be added is odd, it will add the extra column to the right, as is the case in this example (the same logic applies vertically: there may be an extra row of zeros at the bottom).


About the name:

  • With "SAME" padding, if you use a stride of 1, the layer"s outputs will have the same spatial dimensions as its inputs.
  • With "VALID" padding, there"s no "made-up" padding inputs. The layer only uses valid input data.

Answer #2

Your array a defines the columns of the nonzero elements in the output array. You need to also define the rows and then use fancy indexing:

>>> a = np.array([1, 0, 3])
>>> b = np.zeros((a.size, a.max()+1))
>>> b[np.arange(a.size),a] = 1
>>> b
array([[ 0.,  1.,  0.,  0.],
       [ 1.,  0.,  0.,  0.],
       [ 0.,  0.,  0.,  1.]])

Answer #3

If your main goal is to visualize the correlation matrix, rather than creating a plot per se, the convenient pandas styling options is a viable built-in solution:

import pandas as pd
import numpy as np

rs = np.random.RandomState(0)
df = pd.DataFrame(rs.rand(10, 10))
corr = df.corr()"coolwarm")
# "RdBu_r", "BrBG_r", & PuOr_r are other good diverging colormaps

enter image description here

Note that this needs to be in a backend that supports rendering HTML, such as the JupyterLab Notebook.


You can easily limit the digit precision:"coolwarm").set_precision(2)

enter image description here

Or get rid of the digits altogether if you prefer the matrix without annotations:"coolwarm").set_properties(**{"font-size": "0pt"})

enter image description here

The styling documentation also includes instructions of more advanced styles, such as how to change the display of the cell the mouse pointer is hovering over.

Time comparison

In my testing, style.background_gradient() was 4x faster than plt.matshow() and 120x faster than sns.heatmap() with a 10x10 matrix. Unfortunately it doesn"t scale as well as plt.matshow(): the two take about the same time for a 100x100 matrix, and plt.matshow() is 10x faster for a 1000x1000 matrix.


There are a few possible ways to save the stylized dataframe:

  • Return the HTML by appending the render() method and then write the output to a file.
  • Save as an .xslx file with conditional formatting by appending the to_excel() method.
  • Combine with imgkit to save a bitmap
  • Take a screenshot (like I have done here).

Normalize colors across the entire matrix (pandas >= 0.24)

By setting axis=None, it is now possible to compute the colors based on the entire matrix rather than per column or per row:"coolwarm", axis=None)

enter image description here

Single corner heatmap

Since many people are reading this answer I thought I would add a tip for how to only show one corner of the correlation matrix. I find this easier to read myself, since it removes the redundant information.

# Fill diagonal and upper half with NaNs
mask = np.zeros_like(corr, dtype=bool)
mask[np.triu_indices_from(mask)] = True
corr[mask] = np.nan
 .background_gradient(cmap="coolwarm", axis=None, vmin=-1, vmax=1)
 .highlight_null(null_color="#f1f1f1")  # Color NaNs grey

enter image description here

Answer #4

In numpy v1.7+, you can take advantage of the "where" option for ufuncs. You can do things in one line and you don"t have to deal with the errstate context manager.

>>> a = np.array([-1, 0, 1, 2, 3], dtype=float)
>>> b = np.array([ 0, 0, 0, 2, 2], dtype=float)

# If you don"t pass `out` the indices where (b == 0) will be uninitialized!
>>> c = np.divide(a, b, out=np.zeros_like(a), where=b!=0)
>>> print(c)
[ 0.   0.   0.   1.   1.5]

In this case, it does the divide calculation anywhere "where" b does not equal zero. When b does equal zero, then it remains unchanged from whatever value you originally gave it in the "out" argument.

Answer #5

Short Answer

You need to push a bytes-like object (bytes, bytearray, etc) to the base64.b64encode() method. Here are two ways:

>>> import base64
>>> data = base64.b64encode(b"data to be encoded")
>>> print(data)

Or with a variable:

>>> import base64
>>> string = "data to be encoded"
>>> data = base64.b64encode(string.encode())
>>> print(data)


In Python 3, str objects are not C-style character arrays (so they are not byte arrays), but rather, they are data structures that do not have any inherent encoding. You can encode that string (or interpret it) in a variety of ways. The most common (and default in Python 3) is utf-8, especially since it is backwards compatible with ASCII (although, as are most widely-used encodings). That is what is happening when you take a string and call the .encode() method on it: Python is interpreting the string in utf-8 (the default encoding) and providing you the array of bytes that it corresponds to.

Base-64 Encoding in Python 3

Originally the question title asked about Base-64 encoding. Read on for Base-64 stuff.

base64 encoding takes 6-bit binary chunks and encodes them using the characters A-Z, a-z, 0-9, "+", "/", and "=" (some encodings use different characters in place of "+" and "/"). This is a character encoding that is based off of the mathematical construct of radix-64 or base-64 number system, but they are very different. Base-64 in math is a number system like binary or decimal, and you do this change of radix on the entire number, or (if the radix you"re converting from is a power of 2 less than 64) in chunks from right to left.

In base64 encoding, the translation is done from left to right; those first 64 characters are why it is called base64 encoding. The 65th "=" symbol is used for padding, since the encoding pulls 6-bit chunks but the data it is usually meant to encode are 8-bit bytes, so sometimes there are only two or 4 bits in the last chunk.


>>> data = b"test"
>>> for byte in data:
...     print(format(byte, "08b"), end=" ")
01110100 01100101 01110011 01110100

If you interpret that binary data as a single integer, then this is how you would convert it to base-10 and base-64 (table for base-64):

base-2:  01 110100 011001 010111 001101 110100 (base-64 grouping shown)
base-10:                            1952805748
base-64:  B      0      Z      X      N      0

base64 encoding, however, will re-group this data thusly:

base-2:  011101  000110  010101 110011 011101 00(0000) <- pad w/zeros to make a clean 6-bit chunk
base-10:     29       6      21     51     29      0
base-64:      d       G       V      z      d      A

So, "B0ZXN0" is the base-64 version of our binary, mathematically speaking. However, base64 encoding has to do the encoding in the opposite direction (so the raw data is converted to "dGVzdA") and also has a rule to tell other applications how much space is left off at the end. This is done by padding the end with "=" symbols. So, the base64 encoding of this data is "dGVzdA==", with two "=" symbols to signify two pairs of bits will need to be removed from the end when this data gets decoded to make it match the original data.

Let"s test this to see if I am being dishonest:

>>> encoded = base64.b64encode(data)
>>> print(encoded)

Why use base64 encoding?

Let"s say I have to send some data to someone via email, like this data:

>>> data = b"x04x6dx73x67x08x08x08x20x20x20"
>>> print(data.decode())
>>> print(data)
b"x04msgx08x08x08   "

There are two problems I planted:

  1. If I tried to send that email in Unix, the email would send as soon as the x04 character was read, because that is ASCII for END-OF-TRANSMISSION (Ctrl-D), so the remaining data would be left out of the transmission.
  2. Also, while Python is smart enough to escape all of my evil control characters when I print the data directly, when that string is decoded as ASCII, you can see that the "msg" is not there. That is because I used three BACKSPACE characters and three SPACE characters to erase the "msg". Thus, even if I didn"t have the EOF character there the end user wouldn"t be able to translate from the text on screen to the real, raw data.

This is just a demo to show you how hard it can be to simply send raw data. Encoding the data into base64 format gives you the exact same data but in a format that ensures it is safe for sending over electronic media such as email.

Answer #6

I think all of the answers here cover the core of what the lambda function does in the context of sorted() quite nicely, however I still feel like a description that leads to an intuitive understanding is lacking, so here is my two cents.

For the sake of completeness, I"ll state the obvious up front: sorted() returns a list of sorted elements and if we want to sort in a particular way or if we want to sort a complex list of elements (e.g. nested lists or a list of tuples) we can invoke the key argument.

For me, the intuitive understanding of the key argument, why it has to be callable, and the use of lambda as the (anonymous) callable function to accomplish this comes in two parts.

  1. Using lamba ultimately means you don"t have to write (define) an entire function, like the one sblom provided an example of. Lambda functions are created, used, and immediately destroyed - so they don"t funk up your code with more code that will only ever be used once. This, as I understand it, is the core utility of the lambda function and its application for such a role is broad. Its syntax is purely a convention, which is in essence the nature of programmatic syntax in general. Learn the syntax and be done with it.

Lambda syntax is as follows:

lambda input_variable(s): tasty one liner

where lambda is a python keyword.


In [1]: f00 = lambda x: x/2

In [2]: f00(10)
Out[2]: 5.0

In [3]: (lambda x: x/2)(10)
Out[3]: 5.0

In [4]: (lambda x, y: x / y)(10, 2)
Out[4]: 5.0

In [5]: (lambda: "amazing lambda")() # func with no args!
Out[5]: "amazing lambda"
  1. The idea behind the key argument is that it should take in a set of instructions that will essentially point the "sorted()" function at those list elements which should be used to sort by. When it says key=, what it really means is: As I iterate through the list, one element at a time (i.e. for e in some_list), I"m going to pass the current element to the function specifed by the key argument and use that to create a transformed list which will inform me on the order of the final sorted list.

Check it out:

In [6]: mylist = [3, 6, 3, 2, 4, 8, 23]  # an example list
# sorted(mylist, key=HowToSort)  # what we will be doing

Base example:

# mylist = [3, 6, 3, 2, 4, 8, 23]
In [7]: sorted(mylist)
Out[7]: [2, 3, 3, 4, 6, 8, 23]  
# all numbers are in ascending order (i.e.from low to high).

Example 1:

# mylist = [3, 6, 3, 2, 4, 8, 23]
In [8]: sorted(mylist, key=lambda x: x % 2 == 0)

# Quick Tip: The % operator returns the *remainder* of a division
# operation. So the key lambda function here is saying "return True 
# if x divided by 2 leaves a remainer of 0, else False". This is a 
# typical way to check if a number is even or odd.

Out[8]: [3, 3, 23, 6, 2, 4, 8]  
# Does this sorted result make intuitive sense to you?

Notice that my lambda function told sorted to check if each element e was even or odd before sorting.

BUT WAIT! You may (or perhaps should) be wondering two things.

First, why are the odd numbers coming before the even numbers? After all, the key value seems to be telling the sorted function to prioritize evens by using the mod operator in x % 2 == 0.

Second, why are the even numbers still out of order? 2 comes before 6, right?

By analyzing this result, we"ll learn something deeper about how the "key" argument really works, especially in conjunction with the anonymous lambda function.

Firstly, you"ll notice that while the odds come before the evens, the evens themselves are not sorted. Why is this?? Lets read the docs:

Key Functions Starting with Python 2.4, both list.sort() and sorted() added a key parameter to specify a function to be called on each list element prior to making comparisons.

We have to do a little bit of reading between the lines here, but what this tells us is that the sort function is only called once, and if we specify the key argument, then we sort by the value that key function points us to.

So what does the example using a modulo return? A boolean value: True == 1, False == 0. So how does sorted deal with this key? It basically transforms the original list to a sequence of 1s and 0s.

[3, 6, 3, 2, 4, 8, 23] becomes [0, 1, 0, 1, 1, 1, 0]

Now we"re getting somewhere. What do you get when you sort the transformed list?

[0, 0, 0, 1, 1, 1, 1]

Okay, so now we know why the odds come before the evens. But the next question is: Why does the 6 still come before the 2 in my final list? Well that"s easy - it is because sorting only happens once! Those 1s still represent the original list values, which are in their original positions relative to each other. Since sorting only happens once, and we don"t call any kind of sort function to order the original even numbers from low to high, those values remain in their original order relative to one another.

The final question is then this: How do I think conceptually about how the order of my boolean values get transformed back in to the original values when I print out the final sorted list?

Sorted() is a built-in method that (fun fact) uses a hybrid sorting algorithm called Timsort that combines aspects of merge sort and insertion sort. It seems clear to me that when you call it, there is a mechanic that holds these values in memory and bundles them with their boolean identity (mask) determined by (...!) the lambda function. The order is determined by their boolean identity calculated from the lambda function, but keep in mind that these sublists (of one"s and zeros) are not themselves sorted by their original values. Hence, the final list, while organized by Odds and Evens, is not sorted by sublist (the evens in this case are out of order). The fact that the odds are ordered is because they were already in order by coincidence in the original list. The takeaway from all this is that when lambda does that transformation, the original order of the sublists are retained.

So how does this all relate back to the original question, and more importantly, our intuition on how we should implement sorted() with its key argument and lambda?

That lambda function can be thought of as a pointer that points to the values we need to sort by, whether its a pointer mapping a value to its boolean transformed by the lambda function, or if its a particular element in a nested list, tuple, dict, etc., again determined by the lambda function.

Lets try and predict what happens when I run the following code.

In [9]: mylist = [(3, 5, 8), (6, 2, 8), (2, 9, 4), (6, 8, 5)]
In[10]: sorted(mylist, key=lambda x: x[1])

My sorted call obviously says, "Please sort this list". The key argument makes that a little more specific by saying, "for each element x in mylist, return the second index of that element, then sort all of the elements of the original list mylist by the sorted order of the list calculated by the lambda function. Since we have a list of tuples, we can return an indexed element from that tuple using the lambda function.

The pointer that will be used to sort would be:

[5, 2, 9, 8] # the second element of each tuple

Sorting this pointer list returns:

[2, 5, 8, 9]

Applying this to mylist, we get:

Out[10]: [(6, 2, 8), (3, 5, 8), (6, 8, 5), (2, 9, 4)]
# Notice the sorted pointer list is the same as the second index of each tuple in this final list

Run that code, and you"ll find that this is the order. Try sorting a list of integers using this key function and you"ll find that the code breaks (why? Because you cannot index an integer of course).

This was a long winded explanation, but I hope this helps to sort your intuition on the use of lambda functions - as the key argument in sorted(), and beyond.

Answer #7

Very simple, you create an array containing zeros using the reference shape:

result = np.zeros(b.shape)
# actually you can also use result = np.zeros_like(b) 
# but that also copies the dtype not only the shape

and then insert the array where you need it:

result[:a.shape[0],:a.shape[1]] = a

and voila you have padded it:

array([[ 1.,  1.,  1.,  1.,  1.,  0.],
       [ 1.,  1.,  1.,  1.,  1.,  0.],
       [ 1.,  1.,  1.,  1.,  1.,  0.],
       [ 0.,  0.,  0.,  0.,  0.,  0.]])

You can also make it a bit more general if you define where your upper left element should be inserted

result = np.zeros_like(b)
x_offset = 1  # 0 would be what you wanted
y_offset = 1  # 0 in your case
result[x_offset:a.shape[0]+x_offset,y_offset:a.shape[1]+y_offset] = a

array([[ 0.,  0.,  0.,  0.,  0.,  0.],
       [ 0.,  1.,  1.,  1.,  1.,  1.],
       [ 0.,  1.,  1.,  1.,  1.,  1.],
       [ 0.,  1.,  1.,  1.,  1.,  1.]])

but then be careful that you don"t have offsets bigger than allowed. For x_offset = 2 for example this will fail.

If you have an arbitary number of dimensions you can define a list of slices to insert the original array. I"ve found it interesting to play around a bit and created a padding function that can pad (with offset) an arbitary shaped array as long as the array and reference have the same number of dimensions and the offsets are not too big.

def pad(array, reference, offsets):
    array: Array to be padded
    reference: Reference array with the desired shape
    offsets: list of offsets (number of elements must be equal to the dimension of the array)
    # Create an array of zeros with the reference shape
    result = np.zeros(reference.shape)
    # Create a list of slices from offset to offset + shape in each dimension
    insertHere = [slice(offset[dim], offset[dim] + array.shape[dim]) for dim in range(a.ndim)]
    # Insert the array in the result at the specified offsets
    result[insertHere] = a
    return result

And some test cases:

import numpy as np

# 1 Dimension
a = np.ones(2)
b = np.ones(5)
offset = [3]
pad(a, b, offset)

# 3 Dimensions

a = np.ones((3,3,3))
b = np.ones((5,4,3))
offset = [1,0,0]
pad(a, b, offset)

Answer #8

Personally, I"d go for: (y == 0).sum() and (y == 1).sum()


import numpy as np
y = np.array([0, 0, 0, 1, 0, 1, 1, 0, 0, 0, 0, 1])
num_zeros = (y == 0).sum()
num_ones = (y == 1).sum()

Answer #9

Tensorflow 2 Docs

Saving Checkpoints

Adapted from the docs

# -------------------------
# -----  Toy Context  -----
# -------------------------
import tensorflow as tf

class Net(tf.keras.Model):
    """A simple linear model."""

    def __init__(self):
        super(Net, self).__init__()
        self.l1 = tf.keras.layers.Dense(5)

    def call(self, x):
        return self.l1(x)

def toy_dataset():
    inputs = tf.range(10.0)[:, None]
    labels = inputs * 5.0 + tf.range(5.0)[None, :]
    return (, y=labels)).repeat().batch(2)

def train_step(net, example, optimizer):
    """Trains `net` on `example` using `optimizer`."""
    with tf.GradientTape() as tape:
        output = net(example["x"])
        loss = tf.reduce_mean(tf.abs(output - example["y"]))
    variables = net.trainable_variables
    gradients = tape.gradient(loss, variables)
    optimizer.apply_gradients(zip(gradients, variables))
    return loss

# ----------------------------
# -----  Create Objects  -----
# ----------------------------

net = Net()
opt = tf.keras.optimizers.Adam(0.1)
dataset = toy_dataset()
iterator = iter(dataset)
ckpt = tf.train.Checkpoint(
    step=tf.Variable(1), optimizer=opt, net=net, iterator=iterator
manager = tf.train.CheckpointManager(ckpt, "./tf_ckpts", max_to_keep=3)

# ----------------------------
# -----  Train and Save  -----
# ----------------------------

if manager.latest_checkpoint:
    print("Restored from {}".format(manager.latest_checkpoint))
    print("Initializing from scratch.")

for _ in range(50):
    example = next(iterator)
    loss = train_step(net, example, opt)
    if int(ckpt.step) % 10 == 0:
        save_path =
        print("Saved checkpoint for step {}: {}".format(int(ckpt.step), save_path))
        print("loss {:1.2f}".format(loss.numpy()))

# ---------------------
# -----  Restore  -----
# ---------------------

# In another script, re-initialize objects
opt = tf.keras.optimizers.Adam(0.1)
net = Net()
dataset = toy_dataset()
iterator = iter(dataset)
ckpt = tf.train.Checkpoint(
    step=tf.Variable(1), optimizer=opt, net=net, iterator=iterator
manager = tf.train.CheckpointManager(ckpt, "./tf_ckpts", max_to_keep=3)

# Re-use the manager code above ^

if manager.latest_checkpoint:
    print("Restored from {}".format(manager.latest_checkpoint))
    print("Initializing from scratch.")

for _ in range(50):
    example = next(iterator)
    # Continue training or evaluate etc.

More links

Checkpoints capture the exact value of all parameters (tf.Variable objects) used by a model. Checkpoints do not contain any description of the computation defined by the model and thus are typically only useful when source code that will use the saved parameter values is available.

The SavedModel format on the other hand includes a serialized description of the computation defined by the model in addition to the parameter values (checkpoint). Models in this format are independent of the source code that created the model. They are thus suitable for deployment via TensorFlow Serving, TensorFlow Lite, TensorFlow.js, or programs in other programming languages (the C, C++, Java, Go, Rust, C# etc. TensorFlow APIs).

(Highlights are my own)

Tensorflow < 2

From the docs:


# Create some variables.
v1 = tf.get_variable("v1", shape=[3], initializer = tf.zeros_initializer)
v2 = tf.get_variable("v2", shape=[5], initializer = tf.zeros_initializer)

inc_v1 = v1.assign(v1+1)
dec_v2 = v2.assign(v2-1)

# Add an op to initialize the variables.
init_op = tf.global_variables_initializer()

# Add ops to save and restore all the variables.
saver = tf.train.Saver()

# Later, launch the model, initialize the variables, do some work, and save the
# variables to disk.
with tf.Session() as sess:
  # Do some work with the model.
  # Save the variables to disk.
  save_path =, "/tmp/model.ckpt")
  print("Model saved in path: %s" % save_path)



# Create some variables.
v1 = tf.get_variable("v1", shape=[3])
v2 = tf.get_variable("v2", shape=[5])

# Add ops to save and restore all the variables.
saver = tf.train.Saver()

# Later, launch the model, use the saver to restore variables from disk, and
# do some work with the model.
with tf.Session() as sess:
  # Restore variables from disk.
  saver.restore(sess, "/tmp/model.ckpt")
  print("Model restored.")
  # Check the values of the variables
  print("v1 : %s" % v1.eval())
  print("v2 : %s" % v2.eval())


Many good answer, for completeness I"ll add my 2 cents: simple_save. Also a standalone code example using the API.

Python 3 ; Tensorflow 1.14

import tensorflow as tf
from tensorflow.saved_model import tag_constants

with tf.Graph().as_default():
    with tf.Session() as sess:

        # Saving
        inputs = {
            "batch_size_placeholder": batch_size_placeholder,
            "features_placeholder": features_placeholder,
            "labels_placeholder": labels_placeholder,
        outputs = {"prediction": model_output}
            sess, "path/to/your/location/", inputs, outputs


graph = tf.Graph()
with restored_graph.as_default():
    with tf.Session() as sess:
        batch_size_placeholder = graph.get_tensor_by_name("batch_size_placeholder:0")
        features_placeholder = graph.get_tensor_by_name("features_placeholder:0")
        labels_placeholder = graph.get_tensor_by_name("labels_placeholder:0")
        prediction = restored_graph.get_tensor_by_name("dense/BiasAdd:0"), feed_dict={
            batch_size_placeholder: some_value,
            features_placeholder: some_other_value,
            labels_placeholder: another_value

Standalone example

Original blog post

The following code generates random data for the sake of the demonstration.

  1. We start by creating the placeholders. They will hold the data at runtime. From them, we create the Dataset and then its Iterator. We get the iterator"s generated tensor, called input_tensor which will serve as input to our model.
  2. The model itself is built from input_tensor: a GRU-based bidirectional RNN followed by a dense classifier. Because why not.
  3. The loss is a softmax_cross_entropy_with_logits, optimized with Adam. After 2 epochs (of 2 batches each), we save the "trained" model with tf.saved_model.simple_save. If you run the code as is, then the model will be saved in a folder called simple/ in your current working directory.
  4. In a new graph, we then restore the saved model with tf.saved_model.loader.load. We grab the placeholders and logits with graph.get_tensor_by_name and the Iterator initializing operation with graph.get_operation_by_name.
  5. Lastly we run an inference for both batches in the dataset, and check that the saved and restored model both yield the same values. They do!


import os
import shutil
import numpy as np
import tensorflow as tf
from tensorflow.python.saved_model import tag_constants

def model(graph, input_tensor):
    """Create the model which consists of
    a bidirectional rnn (GRU(10)) followed by a dense classifier

        graph (tf.Graph): Tensors" graph
        input_tensor (tf.Tensor): Tensor fed as input to the model

        tf.Tensor: the model"s output layer Tensor
    cell = tf.nn.rnn_cell.GRUCell(10)
    with graph.as_default():
        ((fw_outputs, bw_outputs), (fw_state, bw_state)) = tf.nn.bidirectional_dynamic_rnn(
            sequence_length=[10] * 32,
        outputs = tf.concat((fw_outputs, bw_outputs), 2)
        mean = tf.reduce_mean(outputs, axis=1)
        dense = tf.layers.dense(mean, 5, activation=None)

        return dense

def get_opt_op(graph, logits, labels_tensor):
    """Create optimization operation from model"s logits and labels

        graph (tf.Graph): Tensors" graph
        logits (tf.Tensor): The model"s output without activation
        labels_tensor (tf.Tensor): Target labels

        tf.Operation: the operation performing a stem of Adam optimizer
    with graph.as_default():
        with tf.variable_scope("loss"):
            loss = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(
                    logits=logits, labels=labels_tensor, name="xent"),
        with tf.variable_scope("optimizer"):
            opt_op = tf.train.AdamOptimizer(1e-2).minimize(loss)
        return opt_op

if __name__ == "__main__":
    # Set random seed for reproducibility
    # and create synthetic data
    features = np.random.randn(64, 10, 30)
    labels = np.eye(5)[np.random.randint(0, 5, (64,))]

    graph1 = tf.Graph()
    with graph1.as_default():
        # Random seed for reproducibility
        # Placeholders
        batch_size_ph = tf.placeholder(tf.int64, name="batch_size_ph")
        features_data_ph = tf.placeholder(tf.float32, [None, None, 30], "features_data_ph")
        labels_data_ph = tf.placeholder(tf.int32, [None, 5], "labels_data_ph")
        # Dataset
        dataset =, labels_data_ph))
        dataset = dataset.batch(batch_size_ph)
        iterator =, dataset.output_shapes)
        dataset_init_op = iterator.make_initializer(dataset, name="dataset_init")
        input_tensor, labels_tensor = iterator.get_next()

        # Model
        logits = model(graph1, input_tensor)
        # Optimization
        opt_op = get_opt_op(graph1, logits, labels_tensor)

        with tf.Session(graph=graph1) as sess:
            # Initialize variables
            for epoch in range(3):
                batch = 0
                # Initialize dataset (could feed epochs in Dataset.repeat(epochs))
                        features_data_ph: features,
                        labels_data_ph: labels,
                        batch_size_ph: 32
                values = []
                while True:
                        if epoch < 2:
                            # Training
                            _, value =[opt_op, logits])
                            print("Epoch {}, batch {} | Sample value: {}".format(epoch, batch, value[0]))
                            batch += 1
                            # Final inference
                            print("Epoch {}, batch {} | Final inference | Sample value: {}".format(epoch, batch, values[-1][0]))
                            batch += 1
                    except tf.errors.OutOfRangeError:
            # Save model state
            cwd = os.getcwd()
            path = os.path.join(cwd, "simple")
            shutil.rmtree(path, ignore_errors=True)
            inputs_dict = {
                "batch_size_ph": batch_size_ph,
                "features_data_ph": features_data_ph,
                "labels_data_ph": labels_data_ph
            outputs_dict = {
                "logits": logits
                sess, path, inputs_dict, outputs_dict
    # Restoring
    graph2 = tf.Graph()
    with graph2.as_default():
        with tf.Session(graph=graph2) as sess:
            # Restore saved values
            # Get restored placeholders
            labels_data_ph = graph2.get_tensor_by_name("labels_data_ph:0")
            features_data_ph = graph2.get_tensor_by_name("features_data_ph:0")
            batch_size_ph = graph2.get_tensor_by_name("batch_size_ph:0")
            # Get restored model output
            restored_logits = graph2.get_tensor_by_name("dense/BiasAdd:0")
            # Get dataset initializing operation
            dataset_init_op = graph2.get_operation_by_name("dataset_init")

            # Initialize restored dataset
                    features_data_ph: features,
                    labels_data_ph: labels,
                    batch_size_ph: 32

            # Compute inference for both batches in dataset
            restored_values = []
            for i in range(2):
                print("Restored values: ", restored_values[i][0])

    # Check if original inference and restored inference are equal
    valid = all((v == rv).all() for v, rv in zip(values, restored_values))
Inferences match: ", valid)

This will print:

$ python3

Epoch 0, batch 0 | Sample value: [-0.13851789 -0.3087595   0.12804556  0.20013677 -0.08229901]
Epoch 0, batch 1 | Sample value: [-0.00555491 -0.04339041 -0.05111827 -0.2480045  -0.00107776]
Epoch 1, batch 0 | Sample value: [-0.19321944 -0.2104792  -0.00602257  0.07465433  0.11674127]
Epoch 1, batch 1 | Sample value: [-0.05275984  0.05981954 -0.15913513 -0.3244143   0.10673307]
Epoch 2, batch 0 | Final inference | Sample value: [-0.26331693 -0.13013336 -0.12553    -0.04276478  0.2933622 ]
Epoch 2, batch 1 | Final inference | Sample value: [-0.07730117  0.11119192 -0.20817074 -0.35660955  0.16990358]

INFO:tensorflow:Assets added to graph.
INFO:tensorflow:No assets to write.
INFO:tensorflow:SavedModel written to: b"/some/path/simple/saved_model.pb"

INFO:tensorflow:Restoring parameters from b"/some/path/simple/variables/variables"
Restored values:  [-0.26331693 -0.13013336 -0.12553    -0.04276478  0.2933622 ]
Restored values:  [-0.07730117  0.11119192 -0.20817074 -0.35660955  0.16990358]

Inferences match:  True

Answer #10

What is RMSE? Also known as MSE, RMD, or RMS. What problem does it solve?

If you understand RMSE: (Root mean squared error), MSE: (Mean Squared Error) RMD (Root mean squared deviation) and RMS: (Root Mean Squared), then asking for a library to calculate this for you is unnecessary over-engineering. All these metrics are a single line of python code at most 2 inches long. The three metrics rmse, mse, rmd, and rms are at their core conceptually identical.

RMSE answers the question: "How similar, on average, are the numbers in list1 to list2?". The two lists must be the same size. I want to "wash out the noise between any two given elements, wash out the size of the data collected, and get a single number feel for change over time".

Intuition and ELI5 for RMSE:

Imagine you are learning to throw darts at a dart board. Every day you practice for one hour. You want to figure out if you are getting better or getting worse. So every day you make 10 throws and measure the distance between the bullseye and where your dart hit.

You make a list of those numbers list1. Use the root mean squared error between the distances at day 1 and a list2 containing all zeros. Do the same on the 2nd and nth days. What you will get is a single number that hopefully decreases over time. When your RMSE number is zero, you hit bullseyes every time. If the rmse number goes up, you are getting worse.

Example in calculating root mean squared error in python:

import numpy as np
d = [0.000, 0.166, 0.333]   #ideal target distances, these can be all zeros.
p = [0.000, 0.254, 0.998]   #your performance goes here

print("d is: " + str(["%.8f" % elem for elem in d]))
print("p is: " + str(["%.8f" % elem for elem in p]))

def rmse(predictions, targets):
    return np.sqrt(((predictions - targets) ** 2).mean())

rmse_val = rmse(np.array(d), np.array(p))
print("rms error is: " + str(rmse_val))

Which prints:

d is: ["0.00000000", "0.16600000", "0.33300000"]
p is: ["0.00000000", "0.25400000", "0.99800000"]
rms error between lists d and p is: 0.387284994115

The mathematical notation:

root mean squared deviation explained

Glyph Legend: n is a whole positive integer representing the number of throws. i represents a whole positive integer counter that enumerates sum. d stands for the ideal distances, the list2 containing all zeros in above example. p stands for performance, the list1 in the above example. superscript 2 stands for numeric squared. di is the i"th index of d. pi is the i"th index of p.

The rmse done in small steps so it can be understood:

def rmse(predictions, targets):

    differences = predictions - targets                       #the DIFFERENCEs.

    differences_squared = differences ** 2                    #the SQUAREs of ^

    mean_of_differences_squared = differences_squared.mean()  #the MEAN of ^

    rmse_val = np.sqrt(mean_of_differences_squared)           #ROOT of ^

    return rmse_val                                           #get the ^

How does every step of RMSE work:

Subtracting one number from another gives you the distance between them.

8 - 5 = 3         #absolute distance between 8 and 5 is +3
-20 - 10 = -30    #absolute distance between -20 and 10 is +30

If you multiply any number times itself, the result is always positive because negative times negative is positive:

3*3     = 9   = positive
-30*-30 = 900 = positive

Add them all up, but wait, then an array with many elements would have a larger error than a small array, so average them by the number of elements.

But wait, we squared them all earlier to force them positive. Undo the damage with a square root!

That leaves you with a single number that represents, on average, the distance between every value of list1 to it"s corresponding element value of list2.

If the RMSE value goes down over time we are happy because variance is decreasing.

RMSE isn"t the most accurate line fitting strategy, total least squares is:

Root mean squared error measures the vertical distance between the point and the line, so if your data is shaped like a banana, flat near the bottom and steep near the top, then the RMSE will report greater distances to points high, but short distances to points low when in fact the distances are equivalent. This causes a skew where the line prefers to be closer to points high than low.

If this is a problem the total least squares method fixes this:

Gotchas that can break this RMSE function:

If there are nulls or infinity in either input list, then output rmse value is is going to not make sense. There are three strategies to deal with nulls / missing values / infinities in either list: Ignore that component, zero it out or add a best guess or a uniform random noise to all timesteps. Each remedy has its pros and cons depending on what your data means. In general ignoring any component with a missing value is preferred, but this biases the RMSE toward zero making you think performance has improved when it really hasn"t. Adding random noise on a best guess could be preferred if there are lots of missing values.

In order to guarantee relative correctness of the RMSE output, you must eliminate all nulls/infinites from the input.

RMSE has zero tolerance for outlier data points which don"t belong

Root mean squared error squares relies on all data being right and all are counted as equal. That means one stray point that"s way out in left field is going to totally ruin the whole calculation. To handle outlier data points and dismiss their tremendous influence after a certain threshold, see Robust estimators that build in a threshold for dismissal of outliers.