Python | Pandas dataframe.to_clipboard ()

clip | Python Methods and Functions

The Pandas function dataframe.to_clipboard() copies the object to the system clipboard. This function writes the textual representation of the object to the system clipboard. This can be easily pasted into Excel or Notepad ++.

Syntax: DataFrame.to_clipboard (excel = True, sep = None, ** kwargs)

excel: True, use the provided separator, writing in a csv format for allowing easy pasting into excel. False, write a string representation of the object to the clipboard.
sep: Field delimiter.
** kwargs: These parameters will be passed to DataFrame.to_csv.

Note: to link to the CSV file used in the code, click here

Example # 1: Use to_clipboard () to copy the object to the clipboard in the format, to_clipboard () Excel.

# import pandas as pd

import pandas as pd

# Create data frame

df = pd.read_csv ( 'nba.csv' )

# Print the data frame


We will now copy this object to the clipboard in a non-Excel format.

# copy to clipboard

df.to_clipboard (excel = False , sep = ',' )


We just pasted what was copied to the clipboard after the previous command. Notepad ++ software was used.

Example # 2: Use to_clipboard () to copy the object to the clipboard in Excel format.

# import pandas as pd

import pandas as pd

# Create a data frame

df = pd.read_csv ( 'nba.csv' )

# Print the data frame


Now we will copy this object to the clipboard in Excel format.

# copy to clipboard

df.to_clipboard (excel = True )


Python | Pandas dataframe.to_clipboard (): StackOverflow Questions

How do I copy a string to the clipboard?

Question by Dancrew32

I"m trying to make a basic Windows application that builds a string out of user input and then adds it to the clipboard. How do I copy a string to the clipboard using Python?

Python script to copy text to clipboard

I just need a python script that copies text to the clipboard.

After the script gets executed i need the output of the text to be pasted to another source. Is it possible to write a python script that does this job?

How do I read text from the clipboard?

How do I read text from the (windows) clipboard with python?

Unresolved Import Issues with PyDev and Eclipse

I am very new to PyDev and Python, though I have used Eclipse for Java plenty. I am trying to work through some of the Dive Into Python examples and this feels like an extremely trivial problem that"s just becoming exceedingly annoying. I am using Ubuntu Linux 10.04.

I want to be able to use the file, which is located in the directory /Desktop/Python_Tutorials/diveintopython/py

Here is my file that I"m working on in my PyDev/Eclipse project:

import sys

This works fine, but then I want the next line of my code to be:

import odbchelper

and this causes an unresolved import error every time. I have added files to just about every directory possible and it doesn"t help anything. I"ve tried adding files one at a time to the various levels of directories between the project location and the file, and I"ve also tried adding the files to all of the directories in between simultaneously. Neither works.

All I want to do is have a project somewhere in some other directory, say /Desktop/MyStuff/Project, in which I have ... and then from I want to import from /Desktop/Python_Tutorials/diveintopython/py/

Every message board response I can find just saying to use the sys.path.append() function to add this directory to my path, and then import it ... but that is precisely what I am doing in my code and it"s not working.

I have also tried the Ctrl-1 trick to suppress the error message, but the program is still not functioning correctly. I get an error, ImportError: No module named odbchelper. So it"s clearly not getting the path added, or there is some problem that all of my many permutations of adding files has missed.

It"s very frustrating that something this simple... calling things from some file that exists somewhere else on my machine... requires this much effort.

How to apply gradient clipping in TensorFlow?

Considering the example code.

I would like to know How to apply gradient clipping on this network on the RNN where there is a possibility of exploding gradients.

tf.clip_by_value(t, clip_value_min, clip_value_max, name=None)

This is an example that could be used but where do I introduce this ? In the def of RNN

    lstm_cell = rnn_cell.BasicLSTMCell(n_hidden, forget_bias=1.0)
    # Split data because rnn cell needs a list of inputs for the RNN inner loop
    _X = tf.split(0, n_steps, _X) # n_steps
tf.clip_by_value(_X, -1, 1, name=None)

But this doesn"t make sense as the tensor _X is the input and not the grad what is to be clipped?

Do I have to define my own Optimizer for this or is there a simpler option?

Answer #1

This post aims to give readers a primer on SQL-flavored merging with Pandas, how to use it, and when not to use it.

In particular, here"s what this post will go through:

  • The basics - types of joins (LEFT, RIGHT, OUTER, INNER)

    • merging with different column names
    • merging with multiple columns
    • avoiding duplicate merge key column in output

What this post (and other posts by me on this thread) will not go through:

  • Performance-related discussions and timings (for now). Mostly notable mentions of better alternatives, wherever appropriate.
  • Handling suffixes, removing extra columns, renaming outputs, and other specific use cases. There are other (read: better) posts that deal with that, so figure it out!

Note Most examples default to INNER JOIN operations while demonstrating various features, unless otherwise specified.

Furthermore, all the DataFrames here can be copied and replicated so you can play with them. Also, see this post on how to read DataFrames from your clipboard.

Lastly, all visual representation of JOIN operations have been hand-drawn using Google Drawings. Inspiration from here.

Enough talk - just show me how to use merge!

Setup & Basics

left = pd.DataFrame({"key": ["A", "B", "C", "D"], "value": np.random.randn(4)})
right = pd.DataFrame({"key": ["B", "D", "E", "F"], "value": np.random.randn(4)})


  key     value
0   A  1.764052
1   B  0.400157
2   C  0.978738
3   D  2.240893


  key     value
0   B  1.867558
1   D -0.977278
2   E  0.950088
3   F -0.151357

For the sake of simplicity, the key column has the same name (for now).

An INNER JOIN is represented by

Note This, along with the forthcoming figures all follow this convention:

  • blue indicates rows that are present in the merge result
  • red indicates rows that are excluded from the result (i.e., removed)
  • green indicates missing values that are replaced with NaNs in the result

To perform an INNER JOIN, call merge on the left DataFrame, specifying the right DataFrame and the join key (at the very least) as arguments.

left.merge(right, on="key")
# Or, if you want to be explicit
# left.merge(right, on="key", how="inner")

  key   value_x   value_y
0   B  0.400157  1.867558
1   D  2.240893 -0.977278

This returns only rows from left and right which share a common key (in this example, "B" and "D).

A LEFT OUTER JOIN, or LEFT JOIN is represented by

This can be performed by specifying how="left".

left.merge(right, on="key", how="left")

  key   value_x   value_y
0   A  1.764052       NaN
1   B  0.400157  1.867558
2   C  0.978738       NaN
3   D  2.240893 -0.977278

Carefully note the placement of NaNs here. If you specify how="left", then only keys from left are used, and missing data from right is replaced by NaN.

And similarly, for a RIGHT OUTER JOIN, or RIGHT JOIN which is...

...specify how="right":

left.merge(right, on="key", how="right")

  key   value_x   value_y
0   B  0.400157  1.867558
1   D  2.240893 -0.977278
2   E       NaN  0.950088
3   F       NaN -0.151357

Here, keys from right are used, and missing data from left is replaced by NaN.

Finally, for the FULL OUTER JOIN, given by

specify how="outer".

left.merge(right, on="key", how="outer")

  key   value_x   value_y
0   A  1.764052       NaN
1   B  0.400157  1.867558
2   C  0.978738       NaN
3   D  2.240893 -0.977278
4   E       NaN  0.950088
5   F       NaN -0.151357

This uses the keys from both frames, and NaNs are inserted for missing rows in both.

The documentation summarizes these various merges nicely:

Enter image description here

Other JOINs - LEFT-Excluding, RIGHT-Excluding, and FULL-Excluding/ANTI JOINs

If you need LEFT-Excluding JOINs and RIGHT-Excluding JOINs in two steps.

For LEFT-Excluding JOIN, represented as

Start by performing a LEFT OUTER JOIN and then filtering (excluding!) rows coming from left only,

(left.merge(right, on="key", how="left", indicator=True)
     .query("_merge == "left_only"")
     .drop("_merge", 1))

  key   value_x  value_y
0   A  1.764052      NaN
2   C  0.978738      NaN


left.merge(right, on="key", how="left", indicator=True)

  key   value_x   value_y     _merge
0   A  1.764052       NaN  left_only
1   B  0.400157  1.867558       both
2   C  0.978738       NaN  left_only
3   D  2.240893 -0.977278       both

And similarly, for a RIGHT-Excluding JOIN,

(left.merge(right, on="key", how="right", indicator=True)
     .query("_merge == "right_only"")
     .drop("_merge", 1))

  key  value_x   value_y
2   E      NaN  0.950088
3   F      NaN -0.151357

Lastly, if you are required to do a merge that only retains keys from the left or right, but not both (IOW, performing an ANTI-JOIN),

You can do this in similar fashion—

(left.merge(right, on="key", how="outer", indicator=True)
     .query("_merge != "both"")
     .drop("_merge", 1))

  key   value_x   value_y
0   A  1.764052       NaN
2   C  0.978738       NaN
4   E       NaN  0.950088
5   F       NaN -0.151357

Different names for key columns

If the key columns are named differently—for example, left has keyLeft, and right has keyRight instead of key—then you will have to specify left_on and right_on as arguments instead of on:

left2 = left.rename({"key":"keyLeft"}, axis=1)
right2 = right.rename({"key":"keyRight"}, axis=1)


  keyLeft     value
0       A  1.764052
1       B  0.400157
2       C  0.978738
3       D  2.240893


  keyRight     value
0        B  1.867558
1        D -0.977278
2        E  0.950088
3        F -0.151357
left2.merge(right2, left_on="keyLeft", right_on="keyRight", how="inner")

  keyLeft   value_x keyRight   value_y
0       B  0.400157        B  1.867558
1       D  2.240893        D -0.977278

Avoiding duplicate key column in output

When merging on keyLeft from left and keyRight from right, if you only want either of the keyLeft or keyRight (but not both) in the output, you can start by setting the index as a preliminary step.

left3 = left2.set_index("keyLeft")
left3.merge(right2, left_index=True, right_on="keyRight")

    value_x keyRight   value_y
0  0.400157        B  1.867558
1  2.240893        D -0.977278

Contrast this with the output of the command just before (that is, the output of left2.merge(right2, left_on="keyLeft", right_on="keyRight", how="inner")), you"ll notice keyLeft is missing. You can figure out what column to keep based on which frame"s index is set as the key. This may matter when, say, performing some OUTER JOIN operation.

Merging only a single column from one of the DataFrames

For example, consider

right3 = right.assign(newcol=np.arange(len(right)))
  key     value  newcol
0   B  1.867558       0
1   D -0.977278       1
2   E  0.950088       2
3   F -0.151357       3

If you are required to merge only "new_val" (without any of the other columns), you can usually just subset columns before merging:

left.merge(right3[["key", "newcol"]], on="key")

  key     value  newcol
0   B  0.400157       0
1   D  2.240893       1

If you"re doing a LEFT OUTER JOIN, a more performant solution would involve map:

# left["newcol"] = left["key"].map(right3.set_index("key")["newcol"]))

  key     value  newcol
0   A  1.764052     NaN
1   B  0.400157     0.0
2   C  0.978738     NaN
3   D  2.240893     1.0

As mentioned, this is similar to, but faster than

left.merge(right3[["key", "newcol"]], on="key", how="left")

  key     value  newcol
0   A  1.764052     NaN
1   B  0.400157     0.0
2   C  0.978738     NaN
3   D  2.240893     1.0

Merging on multiple columns

To join on more than one column, specify a list for on (or left_on and right_on, as appropriate).

left.merge(right, on=["key1", "key2"] ...)

Or, in the event the names are different,

left.merge(right, left_on=["lkey1", "lkey2"], right_on=["rkey1", "rkey2"])

Other useful merge* operations and functions

This section only covers the very basics, and is designed to only whet your appetite. For more examples and cases, see the documentation on merge, join, and concat as well as the links to the function specifications.

Continue Reading

Jump to other topics in Pandas Merging 101 to continue learning:

*You are here.

Answer #2

You can adjust the subplot geometry in the very tight_layout call as follows:

fig.tight_layout(rect=[0, 0.03, 1, 0.95])

As it"s stated in the documentation (

tight_layout() only considers ticklabels, axis labels, and titles. Thus, other artists may be clipped and also may overlap.

Answer #3

Disclaimer: I"m mostly writing this post with syntactical considerations and general behaviour in mind. I"m not familiar with the memory and CPU aspect of the methods described, and I aim this answer at those who have reasonably small sets of data, such that the quality of the interpolation can be the main aspect to consider. I am aware that when working with very large data sets, the better-performing methods (namely griddata and RBFInterpolator without a neighbors keyword argument) might not be feasible.

Note that this answer uses the new RBFInterpolator class introduced in SciPy 1.7.0. For the legacy Rbf class see the previous version of this answer.

I"m going to compare three kinds of multi-dimensional interpolation methods (interp2d/splines, griddata and RBFInterpolator). I will subject them to two kinds of interpolation tasks and two kinds of underlying functions (points from which are to be interpolated). The specific examples will demonstrate two-dimensional interpolation, but the viable methods are applicable in arbitrary dimensions. Each method provides various kinds of interpolation; in all cases I will use cubic interpolation (or something close1). It"s important to note that whenever you use interpolation you introduce bias compared to your raw data, and the specific methods used affect the artifacts that you will end up with. Always be aware of this, and interpolate responsibly.

The two interpolation tasks will be

  1. upsampling (input data is on a rectangular grid, output data is on a denser grid)
  2. interpolation of scattered data onto a regular grid

The two functions (over the domain [x, y] in [-1, 1]x[-1, 1]) will be

  1. a smooth and friendly function: cos(pi*x)*sin(pi*y); range in [-1, 1]
  2. an evil (and in particular, non-continuous) function: x*y / (x^2 + y^2) with a value of 0.5 near the origin; range in [-0.5, 0.5]

Here"s how they look:

fig1: test functions

I will first demonstrate how the three methods behave under these four tests, then I"ll detail the syntax of all three. If you know what you should expect from a method, you might not want to waste your time learning its syntax (looking at you, interp2d).

Test data

For the sake of explicitness, here is the code with which I generated the input data. While in this specific case I"m obviously aware of the function underlying the data, I will only use this to generate input for the interpolation methods. I use numpy for convenience (and mostly for generating the data), but scipy alone would suffice too.

import numpy as np
import scipy.interpolate as interp

# auxiliary function for mesh generation
def gimme_mesh(n):
    minval = -1
    maxval =  1
    # produce an asymmetric shape in order to catch issues with transpositions
    return np.meshgrid(np.linspace(minval, maxval, n),
                       np.linspace(minval, maxval, n + 1))

# set up underlying test functions, vectorized
def fun_smooth(x, y):
    return np.cos(np.pi*x) * np.sin(np.pi*y)

def fun_evil(x, y):
    # watch out for singular origin; function has no unique limit there
    return np.where(x**2 + y**2 > 1e-10, x*y/(x**2+y**2), 0.5)

# sparse input mesh, 6x7 in shape
N_sparse = 6
x_sparse, y_sparse = gimme_mesh(N_sparse)
z_sparse_smooth = fun_smooth(x_sparse, y_sparse)
z_sparse_evil = fun_evil(x_sparse, y_sparse)

# scattered input points, 10^2 altogether (shape (100,))
N_scattered = 10
rng = np.random.default_rng()
x_scattered, y_scattered = rng.random((2, N_scattered**2))*2 - 1
z_scattered_smooth = fun_smooth(x_scattered, y_scattered)
z_scattered_evil = fun_evil(x_scattered, y_scattered)

# dense output mesh, 20x21 in shape
N_dense = 20
x_dense, y_dense = gimme_mesh(N_dense)

Smooth function and upsampling

Let"s start with the easiest task. Here"s how an upsampling from a mesh of shape [6, 7] to one of [20, 21] works out for the smooth test function:

fig2: smooth upsampling

Even though this is a simple task, there are already subtle differences between the outputs. At a first glance all three outputs are reasonable. There are two features to note, based on our prior knowledge of the underlying function: the middle case of griddata distorts the data most. Note the y == -1 boundary of the plot (nearest the x label): the function should be strictly zero (since y == -1 is a nodal line for the smooth function), yet this is not the case for griddata. Also note the x == -1 boundary of the plots (behind, to the left): the underlying function has a local maximum (implying zero gradient near the boundary) at [-1, -0.5], yet the griddata output shows clearly non-zero gradient in this region. The effect is subtle, but it"s a bias none the less.

Evil function and upsampling

A bit harder task is to perform upsampling on our evil function:

fig3: evil upsampling

Clear differences are starting to show among the three methods. Looking at the surface plots, there are clear spurious extrema appearing in the output from interp2d (note the two humps on the right side of the plotted surface). While griddata and RBFInterpolator seem to produce similar results at first glance, producing local minima near [0.4, -0.4] that is absent from the underlying function.

However, there is one crucial aspect in which RBFInterpolator is far superior: it respects the symmetry of the underlying function (which is of course also made possible by the symmetry of the sample mesh). The output from griddata breaks the symmetry of the sample points, which is already weakly visible in the smooth case.

Smooth function and scattered data

Most often one wants to perform interpolation on scattered data. For this reason I expect these tests to be more important. As shown above, the sample points were chosen pseudo-uniformly in the domain of interest. In realistic scenarios you might have additional noise with each measurement, and you should consider whether it makes sense to interpolate your raw data to begin with.

Output for the smooth function:

fig4: smooth scattered interpolation

Now there"s already a bit of a horror show going on. I clipped the output from interp2d to between [-1, 1] exclusively for plotting, in order to preserve at least a minimal amount of information. It"s clear that while some of the underlying shape is present, there are huge noisy regions where the method completely breaks down. The second case of griddata reproduces the shape fairly nicely, but note the white regions at the border of the contour plot. This is due to the fact that griddata only works inside the convex hull of the input data points (in other words, it doesn"t perform any extrapolation). I kept the default NaN value for output points lying outside the convex hull.2 Considering these features, RBFInterpolator seems to perform best.

Evil function and scattered data

And the moment we"ve all been waiting for:

fig5: evil scattered interpolation

It"s no huge surprise that interp2d gives up. In fact, during the call to interp2d you should expect some friendly RuntimeWarnings complaining about the impossibility of the spline to be constructed. As for the other two methods, RBFInterpolator seems to produce the best output, even near the borders of the domain where the result is extrapolated.

So let me say a few words about the three methods, in decreasing order of preference (so that the worst is the least likely to be read by anybody).


The RBF in the name of the RBFInterpolator class stands for "radial basis functions". To be honest I"ve never considered this approach until I started researching for this post, but I"m pretty sure I"ll be using these in the future.

Just like the spline-based methods (see later), usage comes in two steps: first one creates a callable RBFInterpolator class instance based on the input data, and then calls this object for a given output mesh to obtain the interpolated result. Example from the smooth upsampling test:

import scipy.interpolate as interp

sparse_points = np.stack([x_sparse.ravel(), y_sparse.ravel()], -1)  # shape (N, 2) in 2d
dense_points = np.stack([x_dense.ravel(), y_dense.ravel()], -1)  # shape (N, 2) in 2d

zfun_smooth_rbf = interp.RBFInterpolator(sparse_points, z_sparse_smooth.ravel(),
                                         smoothing=0, kernel="cubic")  # explicit default smoothing=0 for interpolation
z_dense_smooth_rbf = zfun_smooth_rbf(dense_points).reshape(x_dense.shape)  # not really a function, but a callable class instance

zfun_evil_rbf = interp.RBFInterpolator(sparse_points, z_sparse_evil.ravel(),
                                       smoothing=0, kernel="cubic")  # explicit default smoothing=0 for interpolation
z_dense_evil_rbf = zfun_evil_rbf(dense_points).reshape(x_dense.shape)  # not really a function, but a callable class instance

Note that we had to do some array building gymnastics to make the API of RBFInterpolator happy. Since we have to pass the 2d points as arrays of shape (N, 2), we have to flatten the input grid and stack the two flattened arrays. The constructed interpolator also expects query points in this format, and the result will be a 1d array of shape (N,) which we have to reshape back to match our 2d grid for plotting. Since RBFInterpolator makes no assumptions about the number of dimensions of the input points, it supports arbitrary dimensions for interpolation.

So, scipy.interpolate.RBFInterpolator

  • produces well-behaved output even for crazy input data
  • supports interpolation in higher dimensions
  • extrapolates outside the convex hull of the input points (of course extrapolation is always a gamble, and you should generally not rely on it at all)
  • creates an interpolator as a first step, so evaluating it in various output points is less additional effort
  • can have output point arrays of arbitrary shape (as opposed to being constrained to rectangular meshes, see later)
  • more likely to preserving the symmetry of the input data
  • supports multiple kinds of radial functions for keyword kernel: multiquadric, inverse_multiquadric, inverse_quadratic, gaussian, linear, cubic, quintic, thin_plate_spline (the default). As of SciPy 1.7.0 the class doesn"t allow passing a custom callable due to technical reasons, but this is likely to be added in a future version.
  • can give inexact interpolations by increasing the smoothing parameter

One drawback of RBF interpolation is that interpolating N data points involves inverting an N x N matrix. This quadratic complexity very quickly blows up memory need for a large number of data points. However, the new RBFInterpolator class also supports a neighbors keyword parameter that restricts computation of each radial basis function to k nearest neighbours, thereby reducing memory need.


My former favourite, griddata, is a general workhorse for interpolation in arbitrary dimensions. It doesn"t perform extrapolation beyond setting a single preset value for points outside the convex hull of the nodal points, but since extrapolation is a very fickle and dangerous thing, this is not necessarily a con. Usage example:

sparse_points = np.stack([x_sparse.ravel(), y_sparse.ravel()], -1)  # shape (N, 2) in 2d
z_dense_smooth_griddata = interp.griddata(sparse_points, z_sparse_smooth.ravel(),
                                          (x_dense, y_dense), method="cubic")  # default method is linear

Note that the same array transformations were necessary for the input arrays as for RBFInterpolator. The input points have to be specified in an array of shape [N, D] in D dimensions, or alternatively as a tuple of 1d arrays:

z_dense_smooth_griddata = interp.griddata((x_sparse.ravel(), y_sparse.ravel()),
                                          z_sparse_smooth.ravel(), (x_dense, y_dense), method="cubic")

The output point arrays can be specified as a tuple of arrays of arbitrary dimensions (as in both above snippets), which gives us some more flexibility.

In a nutshell, scipy.interpolate.griddata

  • produces well-behaved output even for crazy input data
  • supports interpolation in higher dimensions
  • does not perform extrapolation, a single value can be set for the output outside the convex hull of the input points (see fill_value)
  • computes the interpolated values in a single call, so probing multiple sets of output points starts from scratch
  • can have output points of arbitrary shape
  • supports nearest-neighbour and linear interpolation in arbitrary dimensions, cubic in 1d and 2d. Nearest-neighbour and linear interpolation use NearestNDInterpolator and LinearNDInterpolator under the hood, respectively. 1d cubic interpolation uses a spline, 2d cubic interpolation uses CloughTocher2DInterpolator to construct a continuously differentiable piecewise-cubic interpolator.
  • might violate the symmetry of the input data


The only reason I"m discussing interp2d and its relatives is that it has a deceptive name, and people are likely to try using it. Spoiler alert: don"t use it (as of scipy version 1.7.0). It"s already more special than the previous subjects in that it"s specifically used for two-dimensional interpolation, but I suspect this is by far the most common case for multivariate interpolation.

As far as syntax goes, interp2d is similar to RBFInterpolator in that it first needs constructing an interpolation instance, which can be called to provide the actual interpolated values. There"s a catch, however: the output points have to be located on a rectangular mesh, so inputs going into the call to the interpolator have to be 1d vectors which span the output grid, as if from numpy.meshgrid:

# reminder: x_sparse and y_sparse are of shape [6, 7] from numpy.meshgrid
zfun_smooth_interp2d = interp.interp2d(x_sparse, y_sparse, z_sparse_smooth, kind="cubic")   # default kind is "linear"
# reminder: x_dense and y_dense are of shape (20, 21) from numpy.meshgrid
xvec = x_dense[0,:] # 1d array of unique x values, 20 elements
yvec = y_dense[:,0] # 1d array of unique y values, 21 elements
z_dense_smooth_interp2d = zfun_smooth_interp2d(xvec, yvec)   # output is (20, 21)-shaped array

One of the most common mistakes when using interp2d is putting your full 2d meshes into the interpolation call, which leads to explosive memory consumption, and hopefully to a hasty MemoryError.

Now, the greatest problem with interp2d is that it often doesn"t work. In order to understand this, we have to look under the hood. It turns out that interp2d is a wrapper for the lower-level functions bisplrep + bisplev, which are in turn wrappers for FITPACK routines (written in Fortran). The equivalent call to the previous example would be

kind = "cubic"
if kind == "linear":
    kx = ky = 1
elif kind == "cubic":
    kx = ky = 3
elif kind == "quintic":
    kx = ky = 5
# bisplrep constructs a spline representation, bisplev evaluates the spline at given points
bisp_smooth = interp.bisplrep(x_sparse.ravel(), y_sparse.ravel(),
                              z_sparse_smooth.ravel(), kx=kx, ky=ky, s=0)
z_dense_smooth_bisplrep = interp.bisplev(xvec, yvec, bisp_smooth).T  # note the transpose

Now, here"s the thing about interp2d: (in scipy version 1.7.0) there is a nice comment in interpolate/ for interp2d:

if not rectangular_grid:
    # TODO: surfit is really not meant for interpolation!
    self.tck = fitpack.bisplrep(x, y, z, kx=kx, ky=ky, s=0.0)

and indeed in interpolate/, in bisplrep there"s some setup and ultimately

tx, ty, c, o = _fitpack._surfit(x, y, z, w, xb, xe, yb, ye, kx, ky,
                                task, s, eps, tx, ty, nxest, nyest,
                                wrk, lwrk1, lwrk2)                 

And that"s it. The routines underlying interp2d are not really meant to perform interpolation. They might suffice for sufficiently well-behaved data, but under realistic circumstances you will probably want to use something else.

Just to conclude, interpolate.interp2d

  • can lead to artifacts even with well-tempered data
  • is specifically for bivariate problems (although there"s the limited interpn for input points defined on a grid)
  • performs extrapolation
  • creates an interpolator as a first step, so evaluating it in various output points is less additional effort
  • can only produce output over a rectangular grid, for scattered output you would have to call the interpolator in a loop
  • supports linear, cubic and quintic interpolation
  • might violate the symmetry of the input data

1I"m fairly certain that the cubic and linear kind of basis functions of RBFInterpolator do not exactly correspond to the other interpolators of the same name.
2These NaNs are also the reason for why the surface plot seems so odd: matplotlib historically has difficulties with plotting complex 3d objects with proper depth information. The NaN values in the data confuse the renderer, so parts of the surface that should be in the back are plotted to be in the front. This is an issue with visualization, and not interpolation.

Answer #4

Gradient clipping needs to happen after computing the gradients, but before applying them to update the model"s parameters. In your example, both of those things are handled by the AdamOptimizer.minimize() method.

In order to clip your gradients you"ll need to explicitly compute, clip, and apply them as described in this section in TensorFlow"s API documentation. Specifically you"ll need to substitute the call to the minimize() method with something like the following:

optimizer = tf.train.AdamOptimizer(learning_rate=learning_rate)
gvs = optimizer.compute_gradients(cost)
capped_gvs = [(tf.clip_by_value(grad, -1., 1.), var) for grad, var in gvs]
train_op = optimizer.apply_gradients(capped_gvs)

Answer #5

Despite what seems to be popular, you probably want to clip the whole gradient by its global norm:

optimizer = tf.train.AdamOptimizer(1e-3)
gradients, variables = zip(*optimizer.compute_gradients(loss))
gradients, _ = tf.clip_by_global_norm(gradients, 5.0)
optimize = optimizer.apply_gradients(zip(gradients, variables))

Clipping each gradient matrix individually changes their relative scale but is also possible:

optimizer = tf.train.AdamOptimizer(1e-3)
gradients, variables = zip(*optimizer.compute_gradients(loss))
gradients = [
    None if gradient is None else tf.clip_by_norm(gradient, 5.0)
    for gradient in gradients]
optimize = optimizer.apply_gradients(zip(gradients, variables))

In TensorFlow 2, a tape computes the gradients, the optimizers come from Keras, and we don"t need to store the update op because it runs automatically without passing it to a session:

optimizer = tf.keras.optimizers.Adam(1e-3)
# ...
with tf.GradientTape() as tape:
  loss = ...
variables = ...
gradients = tape.gradient(loss, variables)
gradients, _ = tf.clip_by_global_norm(gradients, 5.0)
optimizer.apply_gradients(zip(gradients, variables))

Answer #6

Update: User cphyc has kindly created a Github repository for the code in this answer (see here), and bundled the code into a package which may be installed using pip install matplotlib-label-lines.

Pretty Picture:

semi-automatic plot-labeling

In matplotlib it"s pretty easy to label contour plots (either automatically or by manually placing labels with mouse clicks). There does not (yet) appear to be any equivalent capability to label data series in this fashion! There may be some semantic reason for not including this feature which I am missing.

Regardless, I have written the following module which takes any allows for semi-automatic plot labelling. It requires only numpy and a couple of functions from the standard math library.


The default behaviour of the labelLines function is to space the labels evenly along the x axis (automatically placing at the correct y-value of course). If you want you can just pass an array of the x co-ordinates of each of the labels. You can even tweak the location of one label (as shown in the bottom right plot) and space the rest evenly if you like.

In addition, the label_lines function does not account for the lines which have not had a label assigned in the plot command (or more accurately if the label contains "_line").

Keyword arguments passed to labelLines or labelLine are passed on to the text function call (some keyword arguments are set if the calling code chooses not to specify).


  • Annotation bounding boxes sometimes interfere undesirably with other curves. As shown by the 1 and 10 annotations in the top left plot. I"m not even sure this can be avoided.
  • It would be nice to specify a y position instead sometimes.
  • It"s still an iterative process to get annotations in the right location
  • It only works when the x-axis values are floats


  • By default, the labelLines function assumes that all data series span the range specified by the axis limits. Take a look at the blue curve in the top left plot of the pretty picture. If there were only data available for the x range 0.5-1 then then we couldn"t possibly place a label at the desired location (which is a little less than 0.2). See this question for a particularly nasty example. Right now, the code does not intelligently identify this scenario and re-arrange the labels, however there is a reasonable workaround. The labelLines function takes the xvals argument; a list of x-values specified by the user instead of the default linear distribution across the width. So the user can decide which x-values to use for the label placement of each data series.

Also, I believe this is the first answer to complete the bonus objective of aligning the labels with the curve they"re on. :)

from math import atan2,degrees
import numpy as np

#Label line with line2D label data
def labelLine(line,x,label=None,align=True,**kwargs):

    ax = line.axes
    xdata = line.get_xdata()
    ydata = line.get_ydata()

    if (x < xdata[0]) or (x > xdata[-1]):
        print("x label location is outside data range!")

    #Find corresponding y co-ordinate and angle of the line
    ip = 1
    for i in range(len(xdata)):
        if x < xdata[i]:
            ip = i

    y = ydata[ip-1] + (ydata[ip]-ydata[ip-1])*(x-xdata[ip-1])/(xdata[ip]-xdata[ip-1])

    if not label:
        label = line.get_label()

    if align:
        #Compute the slope
        dx = xdata[ip] - xdata[ip-1]
        dy = ydata[ip] - ydata[ip-1]
        ang = degrees(atan2(dy,dx))

        #Transform to screen co-ordinates
        pt = np.array([x,y]).reshape((1,2))
        trans_angle = ax.transData.transform_angles(np.array((ang,)),pt)[0]

        trans_angle = 0

    #Set a bunch of keyword arguments
    if "color" not in kwargs:
        kwargs["color"] = line.get_color()

    if ("horizontalalignment" not in kwargs) and ("ha" not in kwargs):
        kwargs["ha"] = "center"

    if ("verticalalignment" not in kwargs) and ("va" not in kwargs):
        kwargs["va"] = "center"

    if "backgroundcolor" not in kwargs:
        kwargs["backgroundcolor"] = ax.get_facecolor()

    if "clip_on" not in kwargs:
        kwargs["clip_on"] = True

    if "zorder" not in kwargs:
        kwargs["zorder"] = 2.5


def labelLines(lines,align=True,xvals=None,**kwargs):

    ax = lines[0].axes
    labLines = []
    labels = []

    #Take only the lines which have labels other than the default ones
    for line in lines:
        label = line.get_label()
        if "_line" not in label:

    if xvals is None:
        xmin,xmax = ax.get_xlim()
        xvals = np.linspace(xmin,xmax,len(labLines)+2)[1:-1]

    for line,x,label in zip(labLines,xvals,labels):

Test code to generate the pretty picture above:

from matplotlib import pyplot as plt
from scipy.stats import loglaplace,chi2

from labellines import *

X = np.linspace(0,1,500)
A = [1,2,5,10,20]
funcs = [np.arctan,np.sin,loglaplace(4).pdf,chi2(5).pdf]

for a in A:


for a in A:


for a in A:

xvals = [0.8,0.55,0.22,0.104,0.045]

for a in A:

lines = plt.gca().get_lines()
labelLine(l1,0.6,label=r"$Re=${}".format(l1.get_label()),ha="left",va="bottom",align = False)

Answer #7

To get this to work with jupyter (version 4.0.6) I created ~/.jupyter/custom/custom.css containing:

/* Make the notebook cells take almost all available width */
.container {
    width: 99% !important;

/* Prevent the edit cell highlight box from getting clipped;
 * important so that it also works when cell is in edit mode*/
div.cell.selected {
    border-left-width: 1px !important;

Answer #8


Spreadsheet version

spreadsheet screenshot

Alternatively, in plain text: (also available as a a screenshot)

                         Bracket Matching -.  .- Line Numbering
                          Smart Indent -.  |  |  .- UML Editing / Viewing
         Source Control Integration -.  |  |  |  |  .- Code Folding
                    Error Markup -.  |  |  |  |  |  |  .- Code Templates
  Integrated Python Debugging -.  |  |  |  |  |  |  |  |  .- Unit Testing
    Multi-Language Support -.  |  |  |  |  |  |  |  |  |  |  .- GUI Designer (Qt, Eric, etc)
   Auto Code Completion -.  |  |  |  |  |  |  |  |  |  |  |  |  .- Integrated DB Support
     Commercial/Free -.  |  |  |  |  |  |  |  |  |  |  |  |  |  |  .- Refactoring
   Cross Platform -.  |  |  |  |  |  |  |  |  |  |  |  |  |  |  |  |     
Atom              |Y |F |Y |Y*|Y |Y |Y |Y |Y |Y |  |Y |Y |  |  |  |  |*many plugins
Editra            |Y |F |Y |Y |  |  |Y |Y |Y |Y |  |Y |  |  |  |  |  |
Emacs             |Y |F |Y |Y |Y |Y |Y |Y |Y |Y |Y |Y |Y |Y |  |  |  |
Eric Ide          |Y |F |Y |  |Y |Y |  |Y |  |Y |  |Y |  |Y |  |  |  |
Geany             |Y |F |Y*|Y |  |  |  |Y |Y |Y |  |Y |  |  |  |  |  |*very limited
Gedit             |Y |F |Y¹|Y |  |  |  |Y |Y |Y |  |  |Y²|  |  |  |  |¹with plugin; ²sort of
Idle              |Y |F |Y |  |Y |  |  |Y |Y |  |  |  |  |  |  |  |  |
IntelliJ          |Y |CF|Y |Y |Y |Y |Y |Y |Y |Y |Y |Y |Y |Y |Y |Y |Y |
JEdit             |Y |F |  |Y |  |  |  |  |Y |Y |  |Y |  |  |  |  |  |
KDevelop          |Y |F |Y*|Y |  |  |Y |Y |Y |Y |  |Y |  |  |  |  |  |*no type inference
Komodo            |Y |CF|Y |Y |Y |Y |Y |Y |Y |Y |  |Y |Y |Y |  |Y |  |
NetBeans*         |Y |F |Y |Y |Y |  |Y |Y |Y |Y |Y |Y |Y |Y |  |  |Y |*pre-v7.0
Notepad++         |W |F |Y |Y |  |Y*|Y*|Y*|Y |Y |  |Y |Y*|  |  |  |  |*with plugin
Pfaide            |W |C |Y |Y |  |  |  |Y |Y |Y |  |Y |Y |  |  |  |  |
PIDA              |LW|F |Y |Y |  |  |  |Y |Y |Y |  |Y |  |  |  |  |  |VIM based
PTVS              |W |F |Y |Y |Y |Y |Y |Y |Y |Y |  |Y |  |  |Y*|  |Y |*WPF bsed
PyCharm           |Y |CF|Y |Y*|Y |Y |Y |Y |Y |Y |Y |Y |Y |Y |Y |Y |Y |*JavaScript
PyDev (Eclipse)   |Y |F |Y |Y |Y |Y |Y |Y |Y |Y |Y |Y |Y |Y |  |  |  |
PyScripter        |W |F |Y |  |Y |Y |  |Y |Y |Y |  |Y |Y |Y |  |  |  |
PythonWin         |W |F |Y |  |Y |  |  |Y |Y |  |  |Y |  |  |  |  |  |
SciTE             |Y |F¬π|  |Y |  |Y |  |Y |Y |Y |  |Y |Y |  |  |  |  |¬πMac version is
ScriptDev         |W |C |Y |Y |Y |Y |  |Y |Y |Y |  |Y |Y |  |  |  |  |    commercial
Spyder            |Y |F |Y |  |Y |Y |  |Y |Y |Y |  |  |  |  |  |  |  |
Sublime Text      |Y |CF|Y |Y |  |Y |Y |Y |Y |Y |  |Y |Y |Y*|  |  |  |extensible w/Python,
TextMate          |M |F |  |Y |  |  |Y |Y |Y |Y |  |Y |Y |  |  |  |  |    *PythonTestRunner
UliPad            |Y |F |Y |Y |Y |  |  |Y |Y |  |  |  |Y |Y |  |  |  |
Vim               |Y |F |Y |Y |Y |Y |Y |Y |Y |Y |  |Y |Y |Y |  |  |  |
Visual Studio     |W |CF|Y |Y |Y |Y |Y |Y |Y |Y |? |Y |? |? |Y |? |Y |
Visual Studio Code|Y |F |Y |Y |Y |Y |Y |Y |Y |Y |? |Y |? |? |? |? |Y |uses plugins
WingIde           |Y |C |Y |Y*|Y |Y |Y |Y |Y |Y |  |Y |Y |Y |  |  |  |*support for C
Zeus              |W |C |  |  |  |  |Y |Y |Y |Y |  |Y |Y |  |  |  |  |
   Cross Platform -"  |  |  |  |  |  |  |  |  |  |  |  |  |  |  |  |     
     Commercial/Free -"  |  |  |  |  |  |  |  |  |  |  |  |  |  |  "- Refactoring
   Auto Code Completion -"  |  |  |  |  |  |  |  |  |  |  |  |  "- Integrated DB Support
    Multi-Language Support -"  |  |  |  |  |  |  |  |  |  |  "- GUI Designer (Qt, Eric, etc)
  Integrated Python Debugging -"  |  |  |  |  |  |  |  |  "- Unit Testing
                    Error Markup -"  |  |  |  |  |  |  "- Code Templates
         Source Control Integration -"  |  |  |  |  "- Code Folding
                          Smart Indent -"  |  |  "- UML Editing / Viewing
                         Bracket Matching -"  "- Line Numbering

Acronyms used:

 L  - Linux
 W  - Windows
 M  - Mac
 C  - Commercial
 F  - Free
 CF - Commercial with Free limited edition
 ?  - To be confirmed

I don"t mention basics like syntax highlighting as I expect these by default.

This is a just dry list reflecting your feedback and comments, I am not advocating any of these tools. I will keep updating this list as you keep posting your answers.

PS. Can you help me to add features of the above editors to the list (like auto-complete, debugging, etc.)?

We have a comprehensive wiki page for this question

Submit edits to the spreadsheet

Answer #9

Note: The ideas here are pretty generic for Stack Overflow, indeed questions.

Disclaimer: Writing a good question is hard.

The Good:

  • do include small* example DataFrame, either as runnable code:

      In [1]: df = pd.DataFrame([[1, 2], [1, 3], [4, 6]], columns=["A", "B"])

    or make it "copy and pasteable" using pd.read_clipboard(sep="ss+"), you can format the text for Stack Overflow highlight and use Ctrl+K (or prepend four spaces to each line), or place three tildes above and below your code with your code unindented:

      In [2]: df
         A  B
      0  1  2
      1  1  3
      2  4  6

    test pd.read_clipboard(sep="ss+") yourself.

    * I really do mean small, the vast majority of example DataFrames could be fewer than 6 rowscitation needed, and I bet I can do it in 5 rows. Can you reproduce the error with df = df.head(), if not fiddle around to see if you can make up a small DataFrame which exhibits the issue you are facing.

    * Every rule has an exception, the obvious one is for performance issues (in which case definitely use %timeit and possibly %prun), where you should generate (consider using np.random.seed so we have the exact same frame): df = pd.DataFrame(np.random.randn(100000000, 10)). Saying that, "make this code fast for me" is not strictly on topic for the site...

  • write out the outcome you desire (similarly to above)

      In [3]: iwantthis
         A  B
      0  1  5
      1  4  6

    Explain what the numbers come from: the 5 is sum of the B column for the rows where A is 1.

  • do show the code you"ve tried:

      In [4]: df.groupby("A").sum()
      1  5
      4  6

    But say what"s incorrect: the A column is in the index rather than a column.

  • do show you"ve done some research (search the documentation, search Stack¬†Overflow), and give a summary:

    The docstring for sum simply states "Compute sum of group values"

    The groupby documentation doesn"t give any examples for this.

    Aside: the answer here is to use df.groupby("A", as_index=False).sum().

  • if it"s relevant that you have Timestamp columns, e.g. you"re resampling or something, then be explicit and apply pd.to_datetime to them for good measure**.

      df["date"] = pd.to_datetime(df["date"]) # this column ought to be date..

    ** Sometimes this is the issue itself: they were strings.

The Bad:

  • don"t include a MultiIndex, which we can"t copy and paste (see above). This is kind of a grievance with Pandas" default display, but nonetheless annoying:

      In [11]: df
      A B
      1 2  3
        2  6

    The correct way is to include an ordinary DataFrame with a set_index call:

      In [12]: df = pd.DataFrame([[1, 2, 3], [1, 2, 6]], columns=["A", "B", "C"]).set_index(["A", "B"])
      In [13]: df
      A B
      1 2  3
        2  6
  • do provide insight to what it is when giving the outcome you want:

      1  1
      5  0

    Be specific about how you got the numbers (what are they)... double check they"re correct.

  • If your code throws an error, do include the entire stack trace (this can be edited out later if it"s too noisy). Show the line number (and the corresponding line of your code which it"s raising against).

The Ugly:

  • don"t link to a CSV file we don"t have access to (ideally don"t link to an external source at all...)

      df = pd.read_csv("my_secret_file.csv")  # ideally with lots of parsing options

    Most data is proprietary we get that: Make up similar data and see if you can reproduce the problem (something small).

  • don"t explain the situation vaguely in words, like you have a DataFrame which is "large", mention some of the column names in passing (be sure not to mention their dtypes). Try and go into lots of detail about something which is completely meaningless without seeing the actual context. Presumably no one is even going to read to the end of this paragraph.

    Essays are bad, it"s easier with small examples.

  • don"t include 10+ (100+??) lines of data munging before getting to your actual question.

    Please, we see enough of this in our day jobs. We want to help, but not like this.... Cut the intro, and just show the relevant DataFrames (or small versions of them) in the step which is causing you trouble.

Anyway, have fun learning Python, NumPy and Pandas!

Answer #10

Actually, pywin32 and ctypes seem to be an overkill for this simple task. Tkinter is a cross-platform GUI framework, which ships with Python by default and has clipboard accessing methods along with other cool stuff.

If all you need is to put some text to system clipboard, this will do it:

from Tkinter import Tk
r = Tk()
r.clipboard_append("i can has clipboardz?")
r.update() # now it stays on the clipboard after the window is closed

And that"s all, no need to mess around with platform-specific third-party libraries.

If you are using Python 3, replace TKinter with tkinter.