Python script to open google map location on clipboard

clip | open | Python Methods and Functions

Here's a step-by-step process:

  1. Create Address_string from the command line: command line arguments can be read through the sys module. The sys.argv array has the first element as the filename and the remaining elements as command line arguments, which are separated into different elements by spaces, like raw_input (). Split (). Therefore, if the length of sys.argv is greater than 1, we can be sure that the command line arguments are passed. 
    Since sys.argv is a list of strings, it can be passed to the join () method, which returns a single string value. Since the first element — this is the filename, which is not needed, we can slice the list and join from the second element onwards.

    # File name - Map.py

    import sys

    print '' . join (sys.argv [ 1 :])

     If we run & gt; & gt; & gt; python Map.py New Delhi The output of the program would be New Delhi. 
  2. Open your web browser: we will use the web browser module to open the browser. The web browser plug-in has a open () method that can launch a web browser at the specified URL. For example, the script below will open a web browser to the Python.Engineering home page.

    import webbrowser

    webbrowser. open ( ' https://www.python.engineering/ ' )

  3. Search URL. Now when we go to Google Maps and search for Google maps, the URL turns out to be unreasonable and lacks a clear pattern as shown below. 
    https://www.google.co.in/maps/place/GeeksforGeeks/@28.5011226,77.4077907,17z/data=!3m1!4b1!4m5! 3m4! 1s0x390ce626851f7009: 0x621185133cfd1ad1! 8m2! 3d28.5011226! 4d77.4099794? gl = en

    Websites often add extra text to the URL for additional tasks such as customization and tracking. However, you may notice that the initial portion of the URL — this is https://www.google.co.in/maps/place/GeeksforGeeks, where Python.Engineering is our search keyword. 
    Also, for example, say, when looking for New Delhi instead of New Delhi, if we write only New Delhi, the + sign is inserted in the right places on its own, which makes our task even easier. 
    Therefore, the final URL can be taken as https://www.google.co.in/maps/place/ Address_String / .

  4. Combining the two and ending the script: A python script to open the given command line address is given below. There will be two imported modules, a web browser to open the browser at the specified URL and sys to work with command line arguments.
    • Step One & # 8212 ; check if any command line is set or not, which is done with len (sys.argv).
    • We then use the join method to form the address bar of the place to look for in Google Maps.
    • Finally, when we get the address, we open the browser to the URL using the open () method of the webbrowser module.

    The program is launched via CMD (windows) or terminal ( Linux) in the following format:

     & gt; & gt; & gt; python [File Name] [Address to be searched] For eg. & gt; & gt; & gt; python Map.py Python.Engineering 

    # File name - Map.py

    import sys, webbrowser

    if len (sys.argv) & gt;  1 # Argument passed

    map_string = ' ' . join (sys.argv [ 1 :])

    webbrowser. open ( ' https://www.google.com/maps/place/ ' + map_string)

     

    else :

    print "Pass the string as command line argument, Try Aga in "

      

     & gt; & gt; & gt; python Map.py SeeksforGeeks The above command will open map of  Python.Engineering  in the web browser. 

This article courtesy of Harshit Agrawal . If you are as Python.Engineering and would like to contribute, you can also write an article using contribute.python.engineering or by posting an article contribute @ python.engineering. See my article appearing on the Python.Engineering homepage and help other geeks.

Please post comments if you find anything wrong or if you'd like to share more information on the topic discussed above.





Python script to open google map location on clipboard: StackOverflow Questions

How do I copy a string to the clipboard?

Question by Dancrew32

I"m trying to make a basic Windows application that builds a string out of user input and then adds it to the clipboard. How do I copy a string to the clipboard using Python?

Python script to copy text to clipboard

I just need a python script that copies text to the clipboard.

After the script gets executed i need the output of the text to be pasted to another source. Is it possible to write a python script that does this job?

How do I read text from the clipboard?

How do I read text from the (windows) clipboard with python?

Unresolved Import Issues with PyDev and Eclipse

I am very new to PyDev and Python, though I have used Eclipse for Java plenty. I am trying to work through some of the Dive Into Python examples and this feels like an extremely trivial problem that"s just becoming exceedingly annoying. I am using Ubuntu Linux 10.04.

I want to be able to use the file odbchelper.py, which is located in the directory /Desktop/Python_Tutorials/diveintopython/py

Here is my example.py file that I"m working on in my PyDev/Eclipse project:

import sys
sys.path.append("~/Desktop/Python_Tutorials/diveintopython/py")

This works fine, but then I want the next line of my code to be:

import odbchelper

and this causes an unresolved import error every time. I have added __init__.py files to just about every directory possible and it doesn"t help anything. I"ve tried adding __init__.py files one at a time to the various levels of directories between the project location and the odbchelper.py file, and I"ve also tried adding the __init__.py files to all of the directories in between simultaneously. Neither works.

All I want to do is have a project somewhere in some other directory, say /Desktop/MyStuff/Project, in which I have example.py ... and then from example.py I want to import odbchelper.py from /Desktop/Python_Tutorials/diveintopython/py/

Every message board response I can find just saying to use the sys.path.append() function to add this directory to my path, and then import it ... but that is precisely what I am doing in my code and it"s not working.

I have also tried the Ctrl-1 trick to suppress the error message, but the program is still not functioning correctly. I get an error, ImportError: No module named odbchelper. So it"s clearly not getting the path added, or there is some problem that all of my many permutations of adding __init__.py files has missed.

It"s very frustrating that something this simple... calling things from some file that exists somewhere else on my machine... requires this much effort.

How to apply gradient clipping in TensorFlow?

Considering the example code.

I would like to know How to apply gradient clipping on this network on the RNN where there is a possibility of exploding gradients.

tf.clip_by_value(t, clip_value_min, clip_value_max, name=None)

This is an example that could be used but where do I introduce this ? In the def of RNN

    lstm_cell = rnn_cell.BasicLSTMCell(n_hidden, forget_bias=1.0)
    # Split data because rnn cell needs a list of inputs for the RNN inner loop
    _X = tf.split(0, n_steps, _X) # n_steps
tf.clip_by_value(_X, -1, 1, name=None)

But this doesn"t make sense as the tensor _X is the input and not the grad what is to be clipped?

Do I have to define my own Optimizer for this or is there a simpler option?

Answer #1

This post aims to give readers a primer on SQL-flavored merging with Pandas, how to use it, and when not to use it.

In particular, here"s what this post will go through:

  • The basics - types of joins (LEFT, RIGHT, OUTER, INNER)

    • merging with different column names
    • merging with multiple columns
    • avoiding duplicate merge key column in output

What this post (and other posts by me on this thread) will not go through:

  • Performance-related discussions and timings (for now). Mostly notable mentions of better alternatives, wherever appropriate.
  • Handling suffixes, removing extra columns, renaming outputs, and other specific use cases. There are other (read: better) posts that deal with that, so figure it out!

Note Most examples default to INNER JOIN operations while demonstrating various features, unless otherwise specified.

Furthermore, all the DataFrames here can be copied and replicated so you can play with them. Also, see this post on how to read DataFrames from your clipboard.

Lastly, all visual representation of JOIN operations have been hand-drawn using Google Drawings. Inspiration from here.



Enough talk - just show me how to use merge!

Setup & Basics

np.random.seed(0)
left = pd.DataFrame({"key": ["A", "B", "C", "D"], "value": np.random.randn(4)})
right = pd.DataFrame({"key": ["B", "D", "E", "F"], "value": np.random.randn(4)})

left

  key     value
0   A  1.764052
1   B  0.400157
2   C  0.978738
3   D  2.240893

right

  key     value
0   B  1.867558
1   D -0.977278
2   E  0.950088
3   F -0.151357

For the sake of simplicity, the key column has the same name (for now).

An INNER JOIN is represented by

Note This, along with the forthcoming figures all follow this convention:

  • blue indicates rows that are present in the merge result
  • red indicates rows that are excluded from the result (i.e., removed)
  • green indicates missing values that are replaced with NaNs in the result

To perform an INNER JOIN, call merge on the left DataFrame, specifying the right DataFrame and the join key (at the very least) as arguments.

left.merge(right, on="key")
# Or, if you want to be explicit
# left.merge(right, on="key", how="inner")

  key   value_x   value_y
0   B  0.400157  1.867558
1   D  2.240893 -0.977278

This returns only rows from left and right which share a common key (in this example, "B" and "D).

A LEFT OUTER JOIN, or LEFT JOIN is represented by

This can be performed by specifying how="left".

left.merge(right, on="key", how="left")

  key   value_x   value_y
0   A  1.764052       NaN
1   B  0.400157  1.867558
2   C  0.978738       NaN
3   D  2.240893 -0.977278

Carefully note the placement of NaNs here. If you specify how="left", then only keys from left are used, and missing data from right is replaced by NaN.

And similarly, for a RIGHT OUTER JOIN, or RIGHT JOIN which is...

...specify how="right":

left.merge(right, on="key", how="right")

  key   value_x   value_y
0   B  0.400157  1.867558
1   D  2.240893 -0.977278
2   E       NaN  0.950088
3   F       NaN -0.151357

Here, keys from right are used, and missing data from left is replaced by NaN.

Finally, for the FULL OUTER JOIN, given by

specify how="outer".

left.merge(right, on="key", how="outer")

  key   value_x   value_y
0   A  1.764052       NaN
1   B  0.400157  1.867558
2   C  0.978738       NaN
3   D  2.240893 -0.977278
4   E       NaN  0.950088
5   F       NaN -0.151357

This uses the keys from both frames, and NaNs are inserted for missing rows in both.

The documentation summarizes these various merges nicely:

Enter image description here


Other JOINs - LEFT-Excluding, RIGHT-Excluding, and FULL-Excluding/ANTI JOINs

If you need LEFT-Excluding JOINs and RIGHT-Excluding JOINs in two steps.

For LEFT-Excluding JOIN, represented as

Start by performing a LEFT OUTER JOIN and then filtering (excluding!) rows coming from left only,

(left.merge(right, on="key", how="left", indicator=True)
     .query("_merge == "left_only"")
     .drop("_merge", 1))

  key   value_x  value_y
0   A  1.764052      NaN
2   C  0.978738      NaN

Where,

left.merge(right, on="key", how="left", indicator=True)

  key   value_x   value_y     _merge
0   A  1.764052       NaN  left_only
1   B  0.400157  1.867558       both
2   C  0.978738       NaN  left_only
3   D  2.240893 -0.977278       both

And similarly, for a RIGHT-Excluding JOIN,

(left.merge(right, on="key", how="right", indicator=True)
     .query("_merge == "right_only"")
     .drop("_merge", 1))

  key  value_x   value_y
2   E      NaN  0.950088
3   F      NaN -0.151357

Lastly, if you are required to do a merge that only retains keys from the left or right, but not both (IOW, performing an ANTI-JOIN),

You can do this in similar fashion—

(left.merge(right, on="key", how="outer", indicator=True)
     .query("_merge != "both"")
     .drop("_merge", 1))

  key   value_x   value_y
0   A  1.764052       NaN
2   C  0.978738       NaN
4   E       NaN  0.950088
5   F       NaN -0.151357

Different names for key columns

If the key columns are named differently—for example, left has keyLeft, and right has keyRight instead of key—then you will have to specify left_on and right_on as arguments instead of on:

left2 = left.rename({"key":"keyLeft"}, axis=1)
right2 = right.rename({"key":"keyRight"}, axis=1)

left2

  keyLeft     value
0       A  1.764052
1       B  0.400157
2       C  0.978738
3       D  2.240893

right2

  keyRight     value
0        B  1.867558
1        D -0.977278
2        E  0.950088
3        F -0.151357
left2.merge(right2, left_on="keyLeft", right_on="keyRight", how="inner")

  keyLeft   value_x keyRight   value_y
0       B  0.400157        B  1.867558
1       D  2.240893        D -0.977278

Avoiding duplicate key column in output

When merging on keyLeft from left and keyRight from right, if you only want either of the keyLeft or keyRight (but not both) in the output, you can start by setting the index as a preliminary step.

left3 = left2.set_index("keyLeft")
left3.merge(right2, left_index=True, right_on="keyRight")

    value_x keyRight   value_y
0  0.400157        B  1.867558
1  2.240893        D -0.977278

Contrast this with the output of the command just before (that is, the output of left2.merge(right2, left_on="keyLeft", right_on="keyRight", how="inner")), you"ll notice keyLeft is missing. You can figure out what column to keep based on which frame"s index is set as the key. This may matter when, say, performing some OUTER JOIN operation.


Merging only a single column from one of the DataFrames

For example, consider

right3 = right.assign(newcol=np.arange(len(right)))
right3
  key     value  newcol
0   B  1.867558       0
1   D -0.977278       1
2   E  0.950088       2
3   F -0.151357       3

If you are required to merge only "new_val" (without any of the other columns), you can usually just subset columns before merging:

left.merge(right3[["key", "newcol"]], on="key")

  key     value  newcol
0   B  0.400157       0
1   D  2.240893       1

If you"re doing a LEFT OUTER JOIN, a more performant solution would involve map:

# left["newcol"] = left["key"].map(right3.set_index("key")["newcol"]))
left.assign(newcol=left["key"].map(right3.set_index("key")["newcol"]))

  key     value  newcol
0   A  1.764052     NaN
1   B  0.400157     0.0
2   C  0.978738     NaN
3   D  2.240893     1.0

As mentioned, this is similar to, but faster than

left.merge(right3[["key", "newcol"]], on="key", how="left")

  key     value  newcol
0   A  1.764052     NaN
1   B  0.400157     0.0
2   C  0.978738     NaN
3   D  2.240893     1.0

Merging on multiple columns

To join on more than one column, specify a list for on (or left_on and right_on, as appropriate).

left.merge(right, on=["key1", "key2"] ...)

Or, in the event the names are different,

left.merge(right, left_on=["lkey1", "lkey2"], right_on=["rkey1", "rkey2"])

Other useful merge* operations and functions

This section only covers the very basics, and is designed to only whet your appetite. For more examples and cases, see the documentation on merge, join, and concat as well as the links to the function specifications.



Continue Reading

Jump to other topics in Pandas Merging 101 to continue learning:

*You are here.

Answer #2

You can adjust the subplot geometry in the very tight_layout call as follows:

fig.tight_layout(rect=[0, 0.03, 1, 0.95])

As it"s stated in the documentation (https://matplotlib.org/users/tight_layout_guide.html):

tight_layout() only considers ticklabels, axis labels, and titles. Thus, other artists may be clipped and also may overlap.

Answer #3

Disclaimer: I"m mostly writing this post with syntactical considerations and general behaviour in mind. I"m not familiar with the memory and CPU aspect of the methods described, and I aim this answer at those who have reasonably small sets of data, such that the quality of the interpolation can be the main aspect to consider. I am aware that when working with very large data sets, the better-performing methods (namely griddata and RBFInterpolator without a neighbors keyword argument) might not be feasible.

Note that this answer uses the new RBFInterpolator class introduced in SciPy 1.7.0. For the legacy Rbf class see the previous version of this answer.

I"m going to compare three kinds of multi-dimensional interpolation methods (interp2d/splines, griddata and RBFInterpolator). I will subject them to two kinds of interpolation tasks and two kinds of underlying functions (points from which are to be interpolated). The specific examples will demonstrate two-dimensional interpolation, but the viable methods are applicable in arbitrary dimensions. Each method provides various kinds of interpolation; in all cases I will use cubic interpolation (or something close1). It"s important to note that whenever you use interpolation you introduce bias compared to your raw data, and the specific methods used affect the artifacts that you will end up with. Always be aware of this, and interpolate responsibly.

The two interpolation tasks will be

  1. upsampling (input data is on a rectangular grid, output data is on a denser grid)
  2. interpolation of scattered data onto a regular grid

The two functions (over the domain [x, y] in [-1, 1]x[-1, 1]) will be

  1. a smooth and friendly function: cos(pi*x)*sin(pi*y); range in [-1, 1]
  2. an evil (and in particular, non-continuous) function: x*y / (x^2 + y^2) with a value of 0.5 near the origin; range in [-0.5, 0.5]

Here"s how they look:

fig1: test functions

I will first demonstrate how the three methods behave under these four tests, then I"ll detail the syntax of all three. If you know what you should expect from a method, you might not want to waste your time learning its syntax (looking at you, interp2d).

Test data

For the sake of explicitness, here is the code with which I generated the input data. While in this specific case I"m obviously aware of the function underlying the data, I will only use this to generate input for the interpolation methods. I use numpy for convenience (and mostly for generating the data), but scipy alone would suffice too.

import numpy as np
import scipy.interpolate as interp

# auxiliary function for mesh generation
def gimme_mesh(n):
    minval = -1
    maxval =  1
    # produce an asymmetric shape in order to catch issues with transpositions
    return np.meshgrid(np.linspace(minval, maxval, n),
                       np.linspace(minval, maxval, n + 1))

# set up underlying test functions, vectorized
def fun_smooth(x, y):
    return np.cos(np.pi*x) * np.sin(np.pi*y)

def fun_evil(x, y):
    # watch out for singular origin; function has no unique limit there
    return np.where(x**2 + y**2 > 1e-10, x*y/(x**2+y**2), 0.5)

# sparse input mesh, 6x7 in shape
N_sparse = 6
x_sparse, y_sparse = gimme_mesh(N_sparse)
z_sparse_smooth = fun_smooth(x_sparse, y_sparse)
z_sparse_evil = fun_evil(x_sparse, y_sparse)

# scattered input points, 10^2 altogether (shape (100,))
N_scattered = 10
rng = np.random.default_rng()
x_scattered, y_scattered = rng.random((2, N_scattered**2))*2 - 1
z_scattered_smooth = fun_smooth(x_scattered, y_scattered)
z_scattered_evil = fun_evil(x_scattered, y_scattered)

# dense output mesh, 20x21 in shape
N_dense = 20
x_dense, y_dense = gimme_mesh(N_dense)

Smooth function and upsampling

Let"s start with the easiest task. Here"s how an upsampling from a mesh of shape [6, 7] to one of [20, 21] works out for the smooth test function:

fig2: smooth upsampling

Even though this is a simple task, there are already subtle differences between the outputs. At a first glance all three outputs are reasonable. There are two features to note, based on our prior knowledge of the underlying function: the middle case of griddata distorts the data most. Note the y == -1 boundary of the plot (nearest the x label): the function should be strictly zero (since y == -1 is a nodal line for the smooth function), yet this is not the case for griddata. Also note the x == -1 boundary of the plots (behind, to the left): the underlying function has a local maximum (implying zero gradient near the boundary) at [-1, -0.5], yet the griddata output shows clearly non-zero gradient in this region. The effect is subtle, but it"s a bias none the less.

Evil function and upsampling

A bit harder task is to perform upsampling on our evil function:

fig3: evil upsampling

Clear differences are starting to show among the three methods. Looking at the surface plots, there are clear spurious extrema appearing in the output from interp2d (note the two humps on the right side of the plotted surface). While griddata and RBFInterpolator seem to produce similar results at first glance, producing local minima near [0.4, -0.4] that is absent from the underlying function.

However, there is one crucial aspect in which RBFInterpolator is far superior: it respects the symmetry of the underlying function (which is of course also made possible by the symmetry of the sample mesh). The output from griddata breaks the symmetry of the sample points, which is already weakly visible in the smooth case.

Smooth function and scattered data

Most often one wants to perform interpolation on scattered data. For this reason I expect these tests to be more important. As shown above, the sample points were chosen pseudo-uniformly in the domain of interest. In realistic scenarios you might have additional noise with each measurement, and you should consider whether it makes sense to interpolate your raw data to begin with.

Output for the smooth function:

fig4: smooth scattered interpolation

Now there"s already a bit of a horror show going on. I clipped the output from interp2d to between [-1, 1] exclusively for plotting, in order to preserve at least a minimal amount of information. It"s clear that while some of the underlying shape is present, there are huge noisy regions where the method completely breaks down. The second case of griddata reproduces the shape fairly nicely, but note the white regions at the border of the contour plot. This is due to the fact that griddata only works inside the convex hull of the input data points (in other words, it doesn"t perform any extrapolation). I kept the default NaN value for output points lying outside the convex hull.2 Considering these features, RBFInterpolator seems to perform best.

Evil function and scattered data

And the moment we"ve all been waiting for:

fig5: evil scattered interpolation

It"s no huge surprise that interp2d gives up. In fact, during the call to interp2d you should expect some friendly RuntimeWarnings complaining about the impossibility of the spline to be constructed. As for the other two methods, RBFInterpolator seems to produce the best output, even near the borders of the domain where the result is extrapolated.


So let me say a few words about the three methods, in decreasing order of preference (so that the worst is the least likely to be read by anybody).

scipy.interpolate.RBFInterpolator

The RBF in the name of the RBFInterpolator class stands for "radial basis functions". To be honest I"ve never considered this approach until I started researching for this post, but I"m pretty sure I"ll be using these in the future.

Just like the spline-based methods (see later), usage comes in two steps: first one creates a callable RBFInterpolator class instance based on the input data, and then calls this object for a given output mesh to obtain the interpolated result. Example from the smooth upsampling test:

import scipy.interpolate as interp

sparse_points = np.stack([x_sparse.ravel(), y_sparse.ravel()], -1)  # shape (N, 2) in 2d
dense_points = np.stack([x_dense.ravel(), y_dense.ravel()], -1)  # shape (N, 2) in 2d

zfun_smooth_rbf = interp.RBFInterpolator(sparse_points, z_sparse_smooth.ravel(),
                                         smoothing=0, kernel="cubic")  # explicit default smoothing=0 for interpolation
z_dense_smooth_rbf = zfun_smooth_rbf(dense_points).reshape(x_dense.shape)  # not really a function, but a callable class instance

zfun_evil_rbf = interp.RBFInterpolator(sparse_points, z_sparse_evil.ravel(),
                                       smoothing=0, kernel="cubic")  # explicit default smoothing=0 for interpolation
z_dense_evil_rbf = zfun_evil_rbf(dense_points).reshape(x_dense.shape)  # not really a function, but a callable class instance

Note that we had to do some array building gymnastics to make the API of RBFInterpolator happy. Since we have to pass the 2d points as arrays of shape (N, 2), we have to flatten the input grid and stack the two flattened arrays. The constructed interpolator also expects query points in this format, and the result will be a 1d array of shape (N,) which we have to reshape back to match our 2d grid for plotting. Since RBFInterpolator makes no assumptions about the number of dimensions of the input points, it supports arbitrary dimensions for interpolation.

So, scipy.interpolate.RBFInterpolator

  • produces well-behaved output even for crazy input data
  • supports interpolation in higher dimensions
  • extrapolates outside the convex hull of the input points (of course extrapolation is always a gamble, and you should generally not rely on it at all)
  • creates an interpolator as a first step, so evaluating it in various output points is less additional effort
  • can have output point arrays of arbitrary shape (as opposed to being constrained to rectangular meshes, see later)
  • more likely to preserving the symmetry of the input data
  • supports multiple kinds of radial functions for keyword kernel: multiquadric, inverse_multiquadric, inverse_quadratic, gaussian, linear, cubic, quintic, thin_plate_spline (the default). As of SciPy 1.7.0 the class doesn"t allow passing a custom callable due to technical reasons, but this is likely to be added in a future version.
  • can give inexact interpolations by increasing the smoothing parameter

One drawback of RBF interpolation is that interpolating N data points involves inverting an N x N matrix. This quadratic complexity very quickly blows up memory need for a large number of data points. However, the new RBFInterpolator class also supports a neighbors keyword parameter that restricts computation of each radial basis function to k nearest neighbours, thereby reducing memory need.

scipy.interpolate.griddata

My former favourite, griddata, is a general workhorse for interpolation in arbitrary dimensions. It doesn"t perform extrapolation beyond setting a single preset value for points outside the convex hull of the nodal points, but since extrapolation is a very fickle and dangerous thing, this is not necessarily a con. Usage example:

sparse_points = np.stack([x_sparse.ravel(), y_sparse.ravel()], -1)  # shape (N, 2) in 2d
z_dense_smooth_griddata = interp.griddata(sparse_points, z_sparse_smooth.ravel(),
                                          (x_dense, y_dense), method="cubic")  # default method is linear

Note that the same array transformations were necessary for the input arrays as for RBFInterpolator. The input points have to be specified in an array of shape [N, D] in D dimensions, or alternatively as a tuple of 1d arrays:

z_dense_smooth_griddata = interp.griddata((x_sparse.ravel(), y_sparse.ravel()),
                                          z_sparse_smooth.ravel(), (x_dense, y_dense), method="cubic")

The output point arrays can be specified as a tuple of arrays of arbitrary dimensions (as in both above snippets), which gives us some more flexibility.

In a nutshell, scipy.interpolate.griddata

  • produces well-behaved output even for crazy input data
  • supports interpolation in higher dimensions
  • does not perform extrapolation, a single value can be set for the output outside the convex hull of the input points (see fill_value)
  • computes the interpolated values in a single call, so probing multiple sets of output points starts from scratch
  • can have output points of arbitrary shape
  • supports nearest-neighbour and linear interpolation in arbitrary dimensions, cubic in 1d and 2d. Nearest-neighbour and linear interpolation use NearestNDInterpolator and LinearNDInterpolator under the hood, respectively. 1d cubic interpolation uses a spline, 2d cubic interpolation uses CloughTocher2DInterpolator to construct a continuously differentiable piecewise-cubic interpolator.
  • might violate the symmetry of the input data

scipy.interpolate.interp2d/scipy.interpolate.bisplrep

The only reason I"m discussing interp2d and its relatives is that it has a deceptive name, and people are likely to try using it. Spoiler alert: don"t use it (as of scipy version 1.7.0). It"s already more special than the previous subjects in that it"s specifically used for two-dimensional interpolation, but I suspect this is by far the most common case for multivariate interpolation.

As far as syntax goes, interp2d is similar to RBFInterpolator in that it first needs constructing an interpolation instance, which can be called to provide the actual interpolated values. There"s a catch, however: the output points have to be located on a rectangular mesh, so inputs going into the call to the interpolator have to be 1d vectors which span the output grid, as if from numpy.meshgrid:

# reminder: x_sparse and y_sparse are of shape [6, 7] from numpy.meshgrid
zfun_smooth_interp2d = interp.interp2d(x_sparse, y_sparse, z_sparse_smooth, kind="cubic")   # default kind is "linear"
# reminder: x_dense and y_dense are of shape (20, 21) from numpy.meshgrid
xvec = x_dense[0,:] # 1d array of unique x values, 20 elements
yvec = y_dense[:,0] # 1d array of unique y values, 21 elements
z_dense_smooth_interp2d = zfun_smooth_interp2d(xvec, yvec)   # output is (20, 21)-shaped array

One of the most common mistakes when using interp2d is putting your full 2d meshes into the interpolation call, which leads to explosive memory consumption, and hopefully to a hasty MemoryError.

Now, the greatest problem with interp2d is that it often doesn"t work. In order to understand this, we have to look under the hood. It turns out that interp2d is a wrapper for the lower-level functions bisplrep + bisplev, which are in turn wrappers for FITPACK routines (written in Fortran). The equivalent call to the previous example would be

kind = "cubic"
if kind == "linear":
    kx = ky = 1
elif kind == "cubic":
    kx = ky = 3
elif kind == "quintic":
    kx = ky = 5
# bisplrep constructs a spline representation, bisplev evaluates the spline at given points
bisp_smooth = interp.bisplrep(x_sparse.ravel(), y_sparse.ravel(),
                              z_sparse_smooth.ravel(), kx=kx, ky=ky, s=0)
z_dense_smooth_bisplrep = interp.bisplev(xvec, yvec, bisp_smooth).T  # note the transpose

Now, here"s the thing about interp2d: (in scipy version 1.7.0) there is a nice comment in interpolate/interpolate.py for interp2d:

if not rectangular_grid:
    # TODO: surfit is really not meant for interpolation!
    self.tck = fitpack.bisplrep(x, y, z, kx=kx, ky=ky, s=0.0)

and indeed in interpolate/fitpack.py, in bisplrep there"s some setup and ultimately

tx, ty, c, o = _fitpack._surfit(x, y, z, w, xb, xe, yb, ye, kx, ky,
                                task, s, eps, tx, ty, nxest, nyest,
                                wrk, lwrk1, lwrk2)                 

And that"s it. The routines underlying interp2d are not really meant to perform interpolation. They might suffice for sufficiently well-behaved data, but under realistic circumstances you will probably want to use something else.

Just to conclude, interpolate.interp2d

  • can lead to artifacts even with well-tempered data
  • is specifically for bivariate problems (although there"s the limited interpn for input points defined on a grid)
  • performs extrapolation
  • creates an interpolator as a first step, so evaluating it in various output points is less additional effort
  • can only produce output over a rectangular grid, for scattered output you would have to call the interpolator in a loop
  • supports linear, cubic and quintic interpolation
  • might violate the symmetry of the input data

1I"m fairly certain that the cubic and linear kind of basis functions of RBFInterpolator do not exactly correspond to the other interpolators of the same name.
2These NaNs are also the reason for why the surface plot seems so odd: matplotlib historically has difficulties with plotting complex 3d objects with proper depth information. The NaN values in the data confuse the renderer, so parts of the surface that should be in the back are plotted to be in the front. This is an issue with visualization, and not interpolation.

Answer #4

Gradient clipping needs to happen after computing the gradients, but before applying them to update the model"s parameters. In your example, both of those things are handled by the AdamOptimizer.minimize() method.

In order to clip your gradients you"ll need to explicitly compute, clip, and apply them as described in this section in TensorFlow"s API documentation. Specifically you"ll need to substitute the call to the minimize() method with something like the following:

optimizer = tf.train.AdamOptimizer(learning_rate=learning_rate)
gvs = optimizer.compute_gradients(cost)
capped_gvs = [(tf.clip_by_value(grad, -1., 1.), var) for grad, var in gvs]
train_op = optimizer.apply_gradients(capped_gvs)

Answer #5

Despite what seems to be popular, you probably want to clip the whole gradient by its global norm:

optimizer = tf.train.AdamOptimizer(1e-3)
gradients, variables = zip(*optimizer.compute_gradients(loss))
gradients, _ = tf.clip_by_global_norm(gradients, 5.0)
optimize = optimizer.apply_gradients(zip(gradients, variables))

Clipping each gradient matrix individually changes their relative scale but is also possible:

optimizer = tf.train.AdamOptimizer(1e-3)
gradients, variables = zip(*optimizer.compute_gradients(loss))
gradients = [
    None if gradient is None else tf.clip_by_norm(gradient, 5.0)
    for gradient in gradients]
optimize = optimizer.apply_gradients(zip(gradients, variables))

In TensorFlow 2, a tape computes the gradients, the optimizers come from Keras, and we don"t need to store the update op because it runs automatically without passing it to a session:

optimizer = tf.keras.optimizers.Adam(1e-3)
# ...
with tf.GradientTape() as tape:
  loss = ...
variables = ...
gradients = tape.gradient(loss, variables)
gradients, _ = tf.clip_by_global_norm(gradients, 5.0)
optimizer.apply_gradients(zip(gradients, variables))

Answer #6

Update: User cphyc has kindly created a Github repository for the code in this answer (see here), and bundled the code into a package which may be installed using pip install matplotlib-label-lines.


Pretty Picture:

semi-automatic plot-labeling

In matplotlib it"s pretty easy to label contour plots (either automatically or by manually placing labels with mouse clicks). There does not (yet) appear to be any equivalent capability to label data series in this fashion! There may be some semantic reason for not including this feature which I am missing.

Regardless, I have written the following module which takes any allows for semi-automatic plot labelling. It requires only numpy and a couple of functions from the standard math library.

Description

The default behaviour of the labelLines function is to space the labels evenly along the x axis (automatically placing at the correct y-value of course). If you want you can just pass an array of the x co-ordinates of each of the labels. You can even tweak the location of one label (as shown in the bottom right plot) and space the rest evenly if you like.

In addition, the label_lines function does not account for the lines which have not had a label assigned in the plot command (or more accurately if the label contains "_line").

Keyword arguments passed to labelLines or labelLine are passed on to the text function call (some keyword arguments are set if the calling code chooses not to specify).

Issues

  • Annotation bounding boxes sometimes interfere undesirably with other curves. As shown by the 1 and 10 annotations in the top left plot. I"m not even sure this can be avoided.
  • It would be nice to specify a y position instead sometimes.
  • It"s still an iterative process to get annotations in the right location
  • It only works when the x-axis values are floats

Gotchas

  • By default, the labelLines function assumes that all data series span the range specified by the axis limits. Take a look at the blue curve in the top left plot of the pretty picture. If there were only data available for the x range 0.5-1 then then we couldn"t possibly place a label at the desired location (which is a little less than 0.2). See this question for a particularly nasty example. Right now, the code does not intelligently identify this scenario and re-arrange the labels, however there is a reasonable workaround. The labelLines function takes the xvals argument; a list of x-values specified by the user instead of the default linear distribution across the width. So the user can decide which x-values to use for the label placement of each data series.

Also, I believe this is the first answer to complete the bonus objective of aligning the labels with the curve they"re on. :)

label_lines.py:

from math import atan2,degrees
import numpy as np

#Label line with line2D label data
def labelLine(line,x,label=None,align=True,**kwargs):

    ax = line.axes
    xdata = line.get_xdata()
    ydata = line.get_ydata()

    if (x < xdata[0]) or (x > xdata[-1]):
        print("x label location is outside data range!")
        return

    #Find corresponding y co-ordinate and angle of the line
    ip = 1
    for i in range(len(xdata)):
        if x < xdata[i]:
            ip = i
            break

    y = ydata[ip-1] + (ydata[ip]-ydata[ip-1])*(x-xdata[ip-1])/(xdata[ip]-xdata[ip-1])

    if not label:
        label = line.get_label()

    if align:
        #Compute the slope
        dx = xdata[ip] - xdata[ip-1]
        dy = ydata[ip] - ydata[ip-1]
        ang = degrees(atan2(dy,dx))

        #Transform to screen co-ordinates
        pt = np.array([x,y]).reshape((1,2))
        trans_angle = ax.transData.transform_angles(np.array((ang,)),pt)[0]

    else:
        trans_angle = 0

    #Set a bunch of keyword arguments
    if "color" not in kwargs:
        kwargs["color"] = line.get_color()

    if ("horizontalalignment" not in kwargs) and ("ha" not in kwargs):
        kwargs["ha"] = "center"

    if ("verticalalignment" not in kwargs) and ("va" not in kwargs):
        kwargs["va"] = "center"

    if "backgroundcolor" not in kwargs:
        kwargs["backgroundcolor"] = ax.get_facecolor()

    if "clip_on" not in kwargs:
        kwargs["clip_on"] = True

    if "zorder" not in kwargs:
        kwargs["zorder"] = 2.5

    ax.text(x,y,label,rotation=trans_angle,**kwargs)

def labelLines(lines,align=True,xvals=None,**kwargs):

    ax = lines[0].axes
    labLines = []
    labels = []

    #Take only the lines which have labels other than the default ones
    for line in lines:
        label = line.get_label()
        if "_line" not in label:
            labLines.append(line)
            labels.append(label)

    if xvals is None:
        xmin,xmax = ax.get_xlim()
        xvals = np.linspace(xmin,xmax,len(labLines)+2)[1:-1]

    for line,x,label in zip(labLines,xvals,labels):
        labelLine(line,x,label,align,**kwargs)

Test code to generate the pretty picture above:

from matplotlib import pyplot as plt
from scipy.stats import loglaplace,chi2

from labellines import *

X = np.linspace(0,1,500)
A = [1,2,5,10,20]
funcs = [np.arctan,np.sin,loglaplace(4).pdf,chi2(5).pdf]

plt.subplot(221)
for a in A:
    plt.plot(X,np.arctan(a*X),label=str(a))

labelLines(plt.gca().get_lines(),zorder=2.5)

plt.subplot(222)
for a in A:
    plt.plot(X,np.sin(a*X),label=str(a))

labelLines(plt.gca().get_lines(),align=False,fontsize=14)

plt.subplot(223)
for a in A:
    plt.plot(X,loglaplace(4).pdf(a*X),label=str(a))

xvals = [0.8,0.55,0.22,0.104,0.045]
labelLines(plt.gca().get_lines(),align=False,xvals=xvals,color="k")

plt.subplot(224)
for a in A:
    plt.plot(X,chi2(5).pdf(a*X),label=str(a))

lines = plt.gca().get_lines()
l1=lines[-1]
labelLine(l1,0.6,label=r"$Re=${}".format(l1.get_label()),ha="left",va="bottom",align = False)
labelLines(lines[:-1],align=False)

plt.show()

Answer #7

To get this to work with jupyter (version 4.0.6) I created ~/.jupyter/custom/custom.css containing:

/* Make the notebook cells take almost all available width */
.container {
    width: 99% !important;
}   

/* Prevent the edit cell highlight box from getting clipped;
 * important so that it also works when cell is in edit mode*/
div.cell.selected {
    border-left-width: 1px !important;
}

Answer #8

Results

Spreadsheet version

spreadsheet screenshot

Alternatively, in plain text: (also available as a a screenshot)

                         Bracket Matching -.  .- Line Numbering
                          Smart Indent -.  |  |  .- UML Editing / Viewing
         Source Control Integration -.  |  |  |  |  .- Code Folding
                    Error Markup -.  |  |  |  |  |  |  .- Code Templates
  Integrated Python Debugging -.  |  |  |  |  |  |  |  |  .- Unit Testing
    Multi-Language Support -.  |  |  |  |  |  |  |  |  |  |  .- GUI Designer (Qt, Eric, etc)
   Auto Code Completion -.  |  |  |  |  |  |  |  |  |  |  |  |  .- Integrated DB Support
     Commercial/Free -.  |  |  |  |  |  |  |  |  |  |  |  |  |  |  .- Refactoring
   Cross Platform -.  |  |  |  |  |  |  |  |  |  |  |  |  |  |  |  |     
                  +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
Atom              |Y |F |Y |Y*|Y |Y |Y |Y |Y |Y |  |Y |Y |  |  |  |  |*many plugins
Editra            |Y |F |Y |Y |  |  |Y |Y |Y |Y |  |Y |  |  |  |  |  |
Emacs             |Y |F |Y |Y |Y |Y |Y |Y |Y |Y |Y |Y |Y |Y |  |  |  |
Eric Ide          |Y |F |Y |  |Y |Y |  |Y |  |Y |  |Y |  |Y |  |  |  |
Geany             |Y |F |Y*|Y |  |  |  |Y |Y |Y |  |Y |  |  |  |  |  |*very limited
Gedit             |Y |F |Y¹|Y |  |  |  |Y |Y |Y |  |  |Y²|  |  |  |  |¹with plugin; ²sort of
Idle              |Y |F |Y |  |Y |  |  |Y |Y |  |  |  |  |  |  |  |  |
IntelliJ          |Y |CF|Y |Y |Y |Y |Y |Y |Y |Y |Y |Y |Y |Y |Y |Y |Y |
JEdit             |Y |F |  |Y |  |  |  |  |Y |Y |  |Y |  |  |  |  |  |
KDevelop          |Y |F |Y*|Y |  |  |Y |Y |Y |Y |  |Y |  |  |  |  |  |*no type inference
Komodo            |Y |CF|Y |Y |Y |Y |Y |Y |Y |Y |  |Y |Y |Y |  |Y |  |
NetBeans*         |Y |F |Y |Y |Y |  |Y |Y |Y |Y |Y |Y |Y |Y |  |  |Y |*pre-v7.0
Notepad++         |W |F |Y |Y |  |Y*|Y*|Y*|Y |Y |  |Y |Y*|  |  |  |  |*with plugin
Pfaide            |W |C |Y |Y |  |  |  |Y |Y |Y |  |Y |Y |  |  |  |  |
PIDA              |LW|F |Y |Y |  |  |  |Y |Y |Y |  |Y |  |  |  |  |  |VIM based
PTVS              |W |F |Y |Y |Y |Y |Y |Y |Y |Y |  |Y |  |  |Y*|  |Y |*WPF bsed
PyCharm           |Y |CF|Y |Y*|Y |Y |Y |Y |Y |Y |Y |Y |Y |Y |Y |Y |Y |*JavaScript
PyDev (Eclipse)   |Y |F |Y |Y |Y |Y |Y |Y |Y |Y |Y |Y |Y |Y |  |  |  |
PyScripter        |W |F |Y |  |Y |Y |  |Y |Y |Y |  |Y |Y |Y |  |  |  |
PythonWin         |W |F |Y |  |Y |  |  |Y |Y |  |  |Y |  |  |  |  |  |
SciTE             |Y |F¬π|  |Y |  |Y |  |Y |Y |Y |  |Y |Y |  |  |  |  |¬πMac version is
ScriptDev         |W |C |Y |Y |Y |Y |  |Y |Y |Y |  |Y |Y |  |  |  |  |    commercial
Spyder            |Y |F |Y |  |Y |Y |  |Y |Y |Y |  |  |  |  |  |  |  |
Sublime Text      |Y |CF|Y |Y |  |Y |Y |Y |Y |Y |  |Y |Y |Y*|  |  |  |extensible w/Python,
TextMate          |M |F |  |Y |  |  |Y |Y |Y |Y |  |Y |Y |  |  |  |  |    *PythonTestRunner
UliPad            |Y |F |Y |Y |Y |  |  |Y |Y |  |  |  |Y |Y |  |  |  |
Vim               |Y |F |Y |Y |Y |Y |Y |Y |Y |Y |  |Y |Y |Y |  |  |  |
Visual Studio     |W |CF|Y |Y |Y |Y |Y |Y |Y |Y |? |Y |? |? |Y |? |Y |
Visual Studio Code|Y |F |Y |Y |Y |Y |Y |Y |Y |Y |? |Y |? |? |? |? |Y |uses plugins
WingIde           |Y |C |Y |Y*|Y |Y |Y |Y |Y |Y |  |Y |Y |Y |  |  |  |*support for C
Zeus              |W |C |  |  |  |  |Y |Y |Y |Y |  |Y |Y |  |  |  |  |
                  +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
   Cross Platform -"  |  |  |  |  |  |  |  |  |  |  |  |  |  |  |  |     
     Commercial/Free -"  |  |  |  |  |  |  |  |  |  |  |  |  |  |  "- Refactoring
   Auto Code Completion -"  |  |  |  |  |  |  |  |  |  |  |  |  "- Integrated DB Support
    Multi-Language Support -"  |  |  |  |  |  |  |  |  |  |  "- GUI Designer (Qt, Eric, etc)
  Integrated Python Debugging -"  |  |  |  |  |  |  |  |  "- Unit Testing
                    Error Markup -"  |  |  |  |  |  |  "- Code Templates
         Source Control Integration -"  |  |  |  |  "- Code Folding
                          Smart Indent -"  |  |  "- UML Editing / Viewing
                         Bracket Matching -"  "- Line Numbering

Acronyms used:

 L  - Linux
 W  - Windows
 M  - Mac
 C  - Commercial
 F  - Free
 CF - Commercial with Free limited edition
 ?  - To be confirmed

I don"t mention basics like syntax highlighting as I expect these by default.


This is a just dry list reflecting your feedback and comments, I am not advocating any of these tools. I will keep updating this list as you keep posting your answers.

PS. Can you help me to add features of the above editors to the list (like auto-complete, debugging, etc.)?

We have a comprehensive wiki page for this question https://wiki.python.org/moin/IntegratedDevelopmentEnvironments

Submit edits to the spreadsheet

Answer #9

Note: The ideas here are pretty generic for Stack Overflow, indeed questions.

Disclaimer: Writing a good question is hard.

The Good:

  • do include small* example DataFrame, either as runnable code:

      In [1]: df = pd.DataFrame([[1, 2], [1, 3], [4, 6]], columns=["A", "B"])
    

    or make it "copy and pasteable" using pd.read_clipboard(sep="ss+"), you can format the text for Stack Overflow highlight and use Ctrl+K (or prepend four spaces to each line), or place three tildes above and below your code with your code unindented:

      In [2]: df
      Out[2]:
         A  B
      0  1  2
      1  1  3
      2  4  6
    

    test pd.read_clipboard(sep="ss+") yourself.

    * I really do mean small, the vast majority of example DataFrames could be fewer than 6 rowscitation needed, and I bet I can do it in 5 rows. Can you reproduce the error with df = df.head(), if not fiddle around to see if you can make up a small DataFrame which exhibits the issue you are facing.

    * Every rule has an exception, the obvious one is for performance issues (in which case definitely use %timeit and possibly %prun), where you should generate (consider using np.random.seed so we have the exact same frame): df = pd.DataFrame(np.random.randn(100000000, 10)). Saying that, "make this code fast for me" is not strictly on topic for the site...

  • write out the outcome you desire (similarly to above)

      In [3]: iwantthis
      Out[3]:
         A  B
      0  1  5
      1  4  6
    

    Explain what the numbers come from: the 5 is sum of the B column for the rows where A is 1.

  • do show the code you"ve tried:

      In [4]: df.groupby("A").sum()
      Out[4]:
         B
      A
      1  5
      4  6
    

    But say what"s incorrect: the A column is in the index rather than a column.

  • do show you"ve done some research (search the documentation, search Stack¬†Overflow), and give a summary:

    The docstring for sum simply states "Compute sum of group values"

    The groupby documentation doesn"t give any examples for this.

    Aside: the answer here is to use df.groupby("A", as_index=False).sum().

  • if it"s relevant that you have Timestamp columns, e.g. you"re resampling or something, then be explicit and apply pd.to_datetime to them for good measure**.

      df["date"] = pd.to_datetime(df["date"]) # this column ought to be date..
    

    ** Sometimes this is the issue itself: they were strings.

The Bad:

  • don"t include a MultiIndex, which we can"t copy and paste (see above). This is kind of a grievance with Pandas" default display, but nonetheless annoying:

      In [11]: df
      Out[11]:
           C
      A B
      1 2  3
        2  6
    

    The correct way is to include an ordinary DataFrame with a set_index call:

      In [12]: df = pd.DataFrame([[1, 2, 3], [1, 2, 6]], columns=["A", "B", "C"]).set_index(["A", "B"])
    
      In [13]: df
      Out[13]:
           C
      A B
      1 2  3
        2  6
    
  • do provide insight to what it is when giving the outcome you want:

         B
      A
      1  1
      5  0
    

    Be specific about how you got the numbers (what are they)... double check they"re correct.

  • If your code throws an error, do include the entire stack trace (this can be edited out later if it"s too noisy). Show the line number (and the corresponding line of your code which it"s raising against).

The Ugly:

  • don"t link to a CSV file we don"t have access to (ideally don"t link to an external source at all...)

      df = pd.read_csv("my_secret_file.csv")  # ideally with lots of parsing options
    

    Most data is proprietary we get that: Make up similar data and see if you can reproduce the problem (something small).

  • don"t explain the situation vaguely in words, like you have a DataFrame which is "large", mention some of the column names in passing (be sure not to mention their dtypes). Try and go into lots of detail about something which is completely meaningless without seeing the actual context. Presumably no one is even going to read to the end of this paragraph.

    Essays are bad, it"s easier with small examples.

  • don"t include 10+ (100+??) lines of data munging before getting to your actual question.

    Please, we see enough of this in our day jobs. We want to help, but not like this.... Cut the intro, and just show the relevant DataFrames (or small versions of them) in the step which is causing you trouble.

Anyway, have fun learning Python, NumPy and Pandas!

Answer #10

Actually, pywin32 and ctypes seem to be an overkill for this simple task. Tkinter is a cross-platform GUI framework, which ships with Python by default and has clipboard accessing methods along with other cool stuff.

If all you need is to put some text to system clipboard, this will do it:

from Tkinter import Tk
r = Tk()
r.withdraw()
r.clipboard_clear()
r.clipboard_append("i can has clipboardz?")
r.update() # now it stays on the clipboard after the window is closed
r.destroy()

And that"s all, no need to mess around with platform-specific third-party libraries.

If you are using Python 3, replace TKinter with tkinter.

Python script to open google map location on clipboard: StackOverflow Questions

How can I open multiple files using "with open" in Python?

I want to change a couple of files at one time, iff I can write to all of them. I"m wondering if I somehow can combine the multiple open calls with the with statement:

try:
  with open("a", "w") as a and open("b", "w") as b:
    do_something()
except IOError as e:
  print "Operation failed: %s" % e.strerror

If that"s not possible, what would an elegant solution to this problem look like?

open() in Python does not create a file if it doesn"t exist

What is the best way to open a file as read/write if it exists, or if it does not, then create it and open it as read/write? From what I read, file = open("myfile.dat", "rw") should do this, right?

It is not working for me (Python 2.6.2) and I"m wondering if it is a version problem, or not supposed to work like that or what.

The bottom line is, I just need a solution for the problem. I am curious about the other stuff, but all I need is a nice way to do the opening part.

The enclosing directory was writeable by user and group, not other (I"m on a Linux system... so permissions 775 in other words), and the exact error was:

IOError: no such file or directory.

Difference between modes a, a+, w, w+, and r+ in built-in open function?

In the python built-in open function, what is the exact difference between the modes w, a, w+, a+, and r+?

In particular, the documentation implies that all of these will allow writing to the file, and says that it opens the files for "appending", "writing", and "updating" specifically, but does not define what these terms mean.

Simple Digit Recognition OCR in OpenCV-Python

I am trying to implement a "Digit Recognition OCR" in OpenCV-Python (cv2). It is just for learning purposes. I would like to learn both KNearest and SVM features in OpenCV.

I have 100 samples (i.e. images) of each digit. I would like to train with them.

There is a sample letter_recog.py that comes with OpenCV sample. But I still couldn"t figure out on how to use it. I don"t understand what are the samples, responses etc. Also, it loads a txt file at first, which I didn"t understand first.

Later on searching a little bit, I could find a letter_recognition.data in cpp samples. I used it and made a code for cv2.KNearest in the model of letter_recog.py (just for testing):

import numpy as np
import cv2

fn = "letter-recognition.data"
a = np.loadtxt(fn, np.float32, delimiter=",", converters={ 0 : lambda ch : ord(ch)-ord("A") })
samples, responses = a[:,1:], a[:,0]

model = cv2.KNearest()
retval = model.train(samples,responses)
retval, results, neigh_resp, dists = model.find_nearest(samples, k = 10)
print results.ravel()

It gave me an array of size 20000, I don"t understand what it is.

Questions:

1) What is letter_recognition.data file? How to build that file from my own data set?

2) What does results.reval() denote?

3) How we can write a simple digit recognition tool using letter_recognition.data file (either KNearest or SVM)?

Does reading an entire file leave the file handle open?

If you read an entire file with content = open("Path/to/file", "r").read() is the file handle left open until the script exits? Is there a more concise method to read a whole file?

Store output of subprocess.Popen call in a string

I"m trying to make a system call in Python and store the output to a string that I can manipulate in the Python program.

#!/usr/bin/python
import subprocess
p2 = subprocess.Popen("ntpq -p")

I"ve tried a few things including some of the suggestions here:

Retrieving the output of subprocess.call()

but without any luck.

"Unicode Error "unicodeescape" codec can"t decode bytes... Cannot open text files in Python 3

I am using Python 3.1 on a Windows 7 machine. Russian is the default system language, and utf-8 is the default encoding.

Looking at the answer to a previous question, I have attempting using the "codecs" module to give me a little luck. Here"s a few examples:

>>> g = codecs.open("C:UsersEricDesktopeeline.txt", "r", encoding="utf-8")
SyntaxError: (unicode error) "unicodeescape" codec can"t decode bytes in position 2-4: truncated UXXXXXXXX escape (<pyshell#39>, line 1)
>>> g = codecs.open("C:UsersEricDesktopSite.txt", "r", encoding="utf-8")
SyntaxError: (unicode error) "unicodeescape" codec can"t decode bytes in position 2-4: truncated UXXXXXXXX escape (<pyshell#40>, line 1)
>>> g = codecs.open("C:Python31Notes.txt", "r", encoding="utf-8")
SyntaxError: (unicode error) "unicodeescape" codec can"t decode bytes in position 11-12: malformed N character escape (<pyshell#41>, line 1)
>>> g = codecs.open("C:UsersEricDesktopSite.txt", "r", encoding="utf-8")
SyntaxError: (unicode error) "unicodeescape" codec can"t decode bytes in position 2-4: truncated UXXXXXXXX escape (<pyshell#44>, line 1)

My last idea was, I thought it might have been the fact that Windows "translates" a few folders, such as the "users" folder, into Russian (though typing "users" is still the correct path), so I tried it in the Python31 folder. Still, no luck. Any ideas?

Python subprocess/Popen with a modified environment

I believe that running an external command with a slightly modified environment is a very common case. That"s how I tend to do it:

import subprocess, os
my_env = os.environ
my_env["PATH"] = "/usr/sbin:/sbin:" + my_env["PATH"]
subprocess.Popen(my_command, env=my_env)

I"ve got a gut feeling that there"s a better way; does it look alright?

Cannot find module cv2 when using OpenCV

I have installed OpenCV on the Occidentalis operating system (a variant of Raspbian) on a Raspberry Pi, using jayrambhia"s script found here. It installed version 2.4.5.

When I try import cv2 in a Python program, I get the following message:

[email protected]~$ python cam.py
Traceback (most recent call last)
File "cam.py", line 1, in <module>
    import cv2
ImportError: No module named cv2

The file cv2.so is stored in /usr/local/lib/python2.7/site-packages/...

There are also folders in /usr/local/lib called python3.2 and python2.6, which could be a problem but I"m not sure.

Is this a path error perhaps? Any help is appreciated, I am new to Linux.

How to crop an image in OpenCV using Python

How can I crop images, like I"ve done before in PIL, using OpenCV.

Working example on PIL

im = Image.open("0.png").convert("L")
im = im.crop((1, 1, 98, 33))
im.save("_0.png")

But how I can do it on OpenCV?

This is what I tried:

im = cv.imread("0.png", cv.CV_LOAD_IMAGE_GRAYSCALE)
(thresh, im_bw) = cv.threshold(im, 128, 255, cv.THRESH_OTSU)
im = cv.getRectSubPix(im_bw, (98, 33), (1, 1))
cv.imshow("Img", im)
cv.waitKey(0)

But it doesn"t work.

I think I incorrectly used getRectSubPix. If this is the case, please explain how I can correctly use this function.

Answer #1

Since this question was asked in 2010, there has been real simplification in how to do simple multithreading with Python with map and pool.

The code below comes from an article/blog post that you should definitely check out (no affiliation) - Parallelism in one line: A Better Model for Day to Day Threading Tasks. I"ll summarize below - it ends up being just a few lines of code:

from multiprocessing.dummy import Pool as ThreadPool
pool = ThreadPool(4)
results = pool.map(my_function, my_array)

Which is the multithreaded version of:

results = []
for item in my_array:
    results.append(my_function(item))

Description

Map is a cool little function, and the key to easily injecting parallelism into your Python code. For those unfamiliar, map is something lifted from functional languages like Lisp. It is a function which maps another function over a sequence.

Map handles the iteration over the sequence for us, applies the function, and stores all of the results in a handy list at the end.

Enter image description here


Implementation

Parallel versions of the map function are provided by two libraries:multiprocessing, and also its little known, but equally fantastic step child:multiprocessing.dummy.

multiprocessing.dummy is exactly the same as multiprocessing module, but uses threads instead (an important distinction - use multiple processes for CPU-intensive tasks; threads for (and during) I/O):

multiprocessing.dummy replicates the API of multiprocessing, but is no more than a wrapper around the threading module.

import urllib2
from multiprocessing.dummy import Pool as ThreadPool

urls = [
  "http://www.python.org",
  "http://www.python.org/about/",
  "http://www.onlamp.com/pub/a/python/2003/04/17/metaclasses.html",
  "http://www.python.org/doc/",
  "http://www.python.org/download/",
  "http://www.python.org/getit/",
  "http://www.python.org/community/",
  "https://wiki.python.org/moin/",
]

# Make the Pool of workers
pool = ThreadPool(4)

# Open the URLs in their own threads
# and return the results
results = pool.map(urllib2.urlopen, urls)

# Close the pool and wait for the work to finish
pool.close()
pool.join()

And the timing results:

Single thread:   14.4 seconds
       4 Pool:   3.1 seconds
       8 Pool:   1.4 seconds
      13 Pool:   1.3 seconds

Passing multiple arguments (works like this only in Python 3.3 and later):

To pass multiple arrays:

results = pool.starmap(function, zip(list_a, list_b))

Or to pass a constant and an array:

results = pool.starmap(function, zip(itertools.repeat(constant), list_a))

If you are using an earlier version of Python, you can pass multiple arguments via this workaround).

(Thanks to user136036 for the helpful comment.)

Answer #2

os.listdir() - list in the current directory

With listdir in os module you get the files and the folders in the current dir

 import os
 arr = os.listdir()
 print(arr)
 
 >>> ["$RECYCLE.BIN", "work.txt", "3ebooks.txt", "documents"]

Looking in a directory

arr = os.listdir("c:\files")

glob from glob

with glob you can specify a type of file to list like this

import glob

txtfiles = []
for file in glob.glob("*.txt"):
    txtfiles.append(file)

glob in a list comprehension

mylist = [f for f in glob.glob("*.txt")]

get the full path of only files in the current directory

import os
from os import listdir
from os.path import isfile, join

cwd = os.getcwd()
onlyfiles = [os.path.join(cwd, f) for f in os.listdir(cwd) if 
os.path.isfile(os.path.join(cwd, f))]
print(onlyfiles) 

["G:\getfilesname\getfilesname.py", "G:\getfilesname\example.txt"]

Getting the full path name with os.path.abspath

You get the full path in return

 import os
 files_path = [os.path.abspath(x) for x in os.listdir()]
 print(files_path)
 
 ["F:\documentiapplications.txt", "F:\documenticollections.txt"]

Walk: going through sub directories

os.walk returns the root, the directories list and the files list, that is why I unpacked them in r, d, f in the for loop; it, then, looks for other files and directories in the subfolders of the root and so on until there are no subfolders.

import os

# Getting the current work directory (cwd)
thisdir = os.getcwd()

# r=root, d=directories, f = files
for r, d, f in os.walk(thisdir):
    for file in f:
        if file.endswith(".docx"):
            print(os.path.join(r, file))

os.listdir(): get files in the current directory (Python 2)

In Python 2, if you want the list of the files in the current directory, you have to give the argument as "." or os.getcwd() in the os.listdir method.

 import os
 arr = os.listdir(".")
 print(arr)
 
 >>> ["$RECYCLE.BIN", "work.txt", "3ebooks.txt", "documents"]

To go up in the directory tree

# Method 1
x = os.listdir("..")

# Method 2
x= os.listdir("/")

Get files: os.listdir() in a particular directory (Python 2 and 3)

 import os
 arr = os.listdir("F:\python")
 print(arr)
 
 >>> ["$RECYCLE.BIN", "work.txt", "3ebooks.txt", "documents"]

Get files of a particular subdirectory with os.listdir()

import os

x = os.listdir("./content")

os.walk(".") - current directory

 import os
 arr = next(os.walk("."))[2]
 print(arr)
 
 >>> ["5bs_Turismo1.pdf", "5bs_Turismo1.pptx", "esperienza.txt"]

next(os.walk(".")) and os.path.join("dir", "file")

 import os
 arr = []
 for d,r,f in next(os.walk("F:\_python")):
     for file in f:
         arr.append(os.path.join(r,file))

 for f in arr:
     print(files)

>>> F:\_python\dict_class.py
>>> F:\_python\programmi.txt

next(os.walk("F:\") - get the full path - list comprehension

 [os.path.join(r,file) for r,d,f in next(os.walk("F:\_python")) for file in f]
 
 >>> ["F:\_python\dict_class.py", "F:\_python\programmi.txt"]

os.walk - get full path - all files in sub dirs**

x = [os.path.join(r,file) for r,d,f in os.walk("F:\_python") for file in f]
print(x)

>>> ["F:\_python\dict.py", "F:\_python\progr.txt", "F:\_python\readl.py"]

os.listdir() - get only txt files

 arr_txt = [x for x in os.listdir() if x.endswith(".txt")]
 print(arr_txt)
 
 >>> ["work.txt", "3ebooks.txt"]

Using glob to get the full path of the files

If I should need the absolute path of the files:

from path import path
from glob import glob
x = [path(f).abspath() for f in glob("F:\*.txt")]
for f in x:
    print(f)

>>> F:acquistionline.txt
>>> F:acquisti_2018.txt
>>> F:ootstrap_jquery_ecc.txt

Using os.path.isfile to avoid directories in the list

import os.path
listOfFiles = [f for f in os.listdir() if os.path.isfile(f)]
print(listOfFiles)

>>> ["a simple game.py", "data.txt", "decorator.py"]

Using pathlib from Python 3.4

import pathlib

flist = []
for p in pathlib.Path(".").iterdir():
    if p.is_file():
        print(p)
        flist.append(p)

 >>> error.PNG
 >>> exemaker.bat
 >>> guiprova.mp3
 >>> setup.py
 >>> speak_gui2.py
 >>> thumb.PNG

With list comprehension:

flist = [p for p in pathlib.Path(".").iterdir() if p.is_file()]

Alternatively, use pathlib.Path() instead of pathlib.Path(".")

Use glob method in pathlib.Path()

import pathlib

py = pathlib.Path().glob("*.py")
for file in py:
    print(file)

>>> stack_overflow_list.py
>>> stack_overflow_list_tkinter.py

Get all and only files with os.walk

import os
x = [i[2] for i in os.walk(".")]
y=[]
for t in x:
    for f in t:
        y.append(f)
print(y)

>>> ["append_to_list.py", "data.txt", "data1.txt", "data2.txt", "data_180617", "os_walk.py", "READ2.py", "read_data.py", "somma_defaltdic.py", "substitute_words.py", "sum_data.py", "data.txt", "data1.txt", "data_180617"]

Get only files with next and walk in a directory

 import os
 x = next(os.walk("F://python"))[2]
 print(x)
 
 >>> ["calculator.bat","calculator.py"]

Get only directories with next and walk in a directory

 import os
 next(os.walk("F://python"))[1] # for the current dir use (".")
 
 >>> ["python3","others"]

Get all the subdir names with walk

for r,d,f in os.walk("F:\_python"):
    for dirs in d:
        print(dirs)

>>> .vscode
>>> pyexcel
>>> pyschool.py
>>> subtitles
>>> _metaprogramming
>>> .ipynb_checkpoints

os.scandir() from Python 3.5 and greater

import os
x = [f.name for f in os.scandir() if f.is_file()]
print(x)

>>> ["calculator.bat","calculator.py"]

# Another example with scandir (a little variation from docs.python.org)
# This one is more efficient than os.listdir.
# In this case, it shows the files only in the current directory
# where the script is executed.

import os
with os.scandir() as i:
    for entry in i:
        if entry.is_file():
            print(entry.name)

>>> ebookmaker.py
>>> error.PNG
>>> exemaker.bat
>>> guiprova.mp3
>>> setup.py
>>> speakgui4.py
>>> speak_gui2.py
>>> speak_gui3.py
>>> thumb.PNG

Examples:

Ex. 1: How many files are there in the subdirectories?

In this example, we look for the number of files that are included in all the directory and its subdirectories.

import os

def count(dir, counter=0):
    "returns number of files in dir and subdirs"
    for pack in os.walk(dir):
        for f in pack[2]:
            counter += 1
    return dir + " : " + str(counter) + "files"

print(count("F:\python"))

>>> "F:\python" : 12057 files"

Ex.2: How to copy all files from a directory to another?

A script to make order in your computer finding all files of a type (default: pptx) and copying them in a new folder.

import os
import shutil
from path import path

destination = "F:\file_copied"
# os.makedirs(destination)

def copyfile(dir, filetype="pptx", counter=0):
    "Searches for pptx (or other - pptx is the default) files and copies them"
    for pack in os.walk(dir):
        for f in pack[2]:
            if f.endswith(filetype):
                fullpath = pack[0] + "\" + f
                print(fullpath)
                shutil.copy(fullpath, destination)
                counter += 1
    if counter > 0:
        print("-" * 30)
        print("	==> Found in: `" + dir + "` : " + str(counter) + " files
")

for dir in os.listdir():
    "searches for folders that starts with `_`"
    if dir[0] == "_":
        # copyfile(dir, filetype="pdf")
        copyfile(dir, filetype="txt")


>>> _compiti18Compito Contabilità 1conti.txt
>>> _compiti18Compito Contabilità 1modula4.txt
>>> _compiti18Compito Contabilità 1moduloa4.txt
>>> ------------------------
>>> ==> Found in: `_compiti18` : 3 files

Ex. 3: How to get all the files in a txt file

In case you want to create a txt file with all the file names:

import os
mylist = ""
with open("filelist.txt", "w", encoding="utf-8") as file:
    for eachfile in os.listdir():
        mylist += eachfile + "
"
    file.write(mylist)

Example: txt with all the files of an hard drive

"""
We are going to save a txt file with all the files in your directory.
We will use the function walk()
"""

import os

# see all the methods of os
# print(*dir(os), sep=", ")
listafile = []
percorso = []
with open("lista_file.txt", "w", encoding="utf-8") as testo:
    for root, dirs, files in os.walk("D:\"):
        for file in files:
            listafile.append(file)
            percorso.append(root + "\" + file)
            testo.write(file + "
")
listafile.sort()
print("N. of files", len(listafile))
with open("lista_file_ordinata.txt", "w", encoding="utf-8") as testo_ordinato:
    for file in listafile:
        testo_ordinato.write(file + "
")

with open("percorso.txt", "w", encoding="utf-8") as file_percorso:
    for file in percorso:
        file_percorso.write(file + "
")

os.system("lista_file.txt")
os.system("lista_file_ordinata.txt")
os.system("percorso.txt")

All the file of C: in one text file

This is a shorter version of the previous code. Change the folder where to start finding the files if you need to start from another position. This code generate a 50 mb on text file on my computer with something less then 500.000 lines with files with the complete path.

import os

with open("file.txt", "w", encoding="utf-8") as filewrite:
    for r, d, f in os.walk("C:\"):
        for file in f:
            filewrite.write(f"{r + file}
")

How to write a file with all paths in a folder of a type

With this function you can create a txt file that will have the name of a type of file that you look for (ex. pngfile.txt) with all the full path of all the files of that type. It can be useful sometimes, I think.

import os

def searchfiles(extension=".ttf", folder="H:\"):
    "Create a txt file with all the file of a type"
    with open(extension[1:] + "file.txt", "w", encoding="utf-8") as filewrite:
        for r, d, f in os.walk(folder):
            for file in f:
                if file.endswith(extension):
                    filewrite.write(f"{r + file}
")

# looking for png file (fonts) in the hard disk H:
searchfiles(".png", "H:\")

>>> H:4bs_18Dolphins5.png
>>> H:4bs_18Dolphins6.png
>>> H:4bs_18Dolphins7.png
>>> H:5_18marketing htmlassetsimageslogo2.png
>>> H:7z001.png
>>> H:7z002.png

(New) Find all files and open them with tkinter GUI

I just wanted to add in this 2019 a little app to search for all files in a dir and be able to open them by doubleclicking on the name of the file in the list. enter image description here

import tkinter as tk
import os

def searchfiles(extension=".txt", folder="H:\"):
    "insert all files in the listbox"
    for r, d, f in os.walk(folder):
        for file in f:
            if file.endswith(extension):
                lb.insert(0, r + "\" + file)

def open_file():
    os.startfile(lb.get(lb.curselection()[0]))

root = tk.Tk()
root.geometry("400x400")
bt = tk.Button(root, text="Search", command=lambda:searchfiles(".png", "H:\"))
bt.pack()
lb = tk.Listbox(root)
lb.pack(fill="both", expand=1)
lb.bind("<Double-Button>", lambda x: open_file())
root.mainloop()

Answer #3

I just used the following which was quite simple. First open a console then cd to where you"ve downloaded your file like some-package.whl and use

pip install some-package.whl

Note: if pip.exe is not recognized, you may find it in the "Scripts" directory from where python has been installed. If pip is not installed, this page can help: How do I install pip on Windows?

Note: for clarification
If you copy the *.whl file to your local drive (ex. C:some-dirsome-file.whl) use the following command line parameters --

pip install C:/some-dir/some-file.whl

Answer #4

This is the behaviour to adopt when the referenced object is deleted. It is not specific to Django; this is an SQL standard. Although Django has its own implementation on top of SQL. (1)

There are seven possible actions to take when such event occurs:

  • CASCADE: When the referenced object is deleted, also delete the objects that have references to it (when you remove a blog post for instance, you might want to delete comments as well). SQL equivalent: CASCADE.
  • PROTECT: Forbid the deletion of the referenced object. To delete it you will have to delete all objects that reference it manually. SQL equivalent: RESTRICT.
  • RESTRICT: (introduced in Django 3.1) Similar behavior as PROTECT that matches SQL"s RESTRICT more accurately. (See django documentation example)
  • SET_NULL: Set the reference to NULL (requires the field to be nullable). For instance, when you delete a User, you might want to keep the comments he posted on blog posts, but say it was posted by an anonymous (or deleted) user. SQL equivalent: SET NULL.
  • SET_DEFAULT: Set the default value. SQL equivalent: SET DEFAULT.
  • SET(...): Set a given value. This one is not part of the SQL standard and is entirely handled by Django.
  • DO_NOTHING: Probably a very bad idea since this would create integrity issues in your database (referencing an object that actually doesn"t exist). SQL equivalent: NO ACTION. (2)

Source: Django documentation

See also the documentation of PostgreSQL for instance.

In most cases, CASCADE is the expected behaviour, but for every ForeignKey, you should always ask yourself what is the expected behaviour in this situation. PROTECT and SET_NULL are often useful. Setting CASCADE where it should not, can potentially delete all of your database in cascade, by simply deleting a single user.


Additional note to clarify cascade direction

It"s funny to notice that the direction of the CASCADE action is not clear to many people. Actually, it"s funny to notice that only the CASCADE action is not clear. I understand the cascade behavior might be confusing, however you must think that it is the same direction as any other action. Thus, if you feel that CASCADE direction is not clear to you, it actually means that on_delete behavior is not clear to you.

In your database, a foreign key is basically represented by an integer field which value is the primary key of the foreign object. Let"s say you have an entry comment_A, which has a foreign key to an entry article_B. If you delete the entry comment_A, everything is fine. article_B used to live without comment_A and don"t bother if it"s deleted. However, if you delete article_B, then comment_A panics! It never lived without article_B and needs it, and it"s part of its attributes (article=article_B, but what is article_B???). This is where on_delete steps in, to determine how to resolve this integrity error, either by saying:

  • "No! Please! Don"t! I can"t live without you!" (which is said PROTECT or RESTRICT in Django/SQL)
  • "All right, if I"m not yours, then I"m nobody"s" (which is said SET_NULL)
  • "Good bye world, I can"t live without article_B" and commit suicide (this is the CASCADE behavior).
  • "It"s OK, I"ve got spare lover, and I"ll reference article_C from now" (SET_DEFAULT, or even SET(...)).
  • "I can"t face reality, and I"ll keep calling your name even if that"s the only thing left to me!" (DO_NOTHING)

I hope it makes cascade direction clearer. :)


Footnotes

(1) Django has its own implementation on top of SQL. And, as mentioned by @JoeMjr2 in the comments below, Django will not create the SQL constraints. If you want the constraints to be ensured by your database (for instance, if your database is used by another application, or if you hang in the database console from time to time), you might want to set the related constraints manually yourself. There is an open ticket to add support for database-level on delete constrains in Django.

(2) Actually, there is one case where DO_NOTHING can be useful: If you want to skip Django"s implementation and implement the constraint yourself at the database-level.

Answer #5

Running brew reinstall [email protected] didn"t work for my existing Python 2.7 virtual environments. Inside them there were still ERROR:root:code for hash sha1 was not found errors.

I encountered this problem after I ran brew upgrade openssl. And here"s the fix:

$ ls /usr/local/Cellar/openssl

...which shows

1.0.2t

According to the existing version, run:

$ brew switch openssl 1.0.2t

...which shows

Cleaning /usr/local/Cellar/openssl/1.0.2t
Opt link created for /usr/local/Cellar/openssl/1.0.2t

After that, run the following command in a Python 2.7 virtualenv:

(my-venv) $ python -c "import hashlib;m=hashlib.md5();print(m.hexdigest())"

...which shows

d41d8cd98f00b204e9800998ecf8427e

No more errors.

Answer #6

You opened the file in binary mode:

with open(fname, "rb") as f:

This means that all data read from the file is returned as bytes objects, not str. You cannot then use a string in a containment test:

if "some-pattern" in tmp: continue

You"d have to use a bytes object to test against tmp instead:

if b"some-pattern" in tmp: continue

or open the file as a textfile instead by replacing the "rb" mode with "r".

Answer #7

⚡️ TL;DR — One line solution.

All you have to do is:

sudo easy_install pip

2019: ⚠️easy_install has been deprecated. Check Method #2 below for preferred installation!

Details:

⚡️ OK, I read the solutions given above, but here"s an EASY solution to install pip.

MacOS comes with Python installed. But to make sure that you have Python installed open the terminal and run the following command.

python --version

If this command returns a version number that means Python exists. Which also means that you already have access to easy_install considering you are using macOS/OSX.

ℹ️ Now, all you have to do is run the following command.

sudo easy_install pip

After that, pip will be installed and you"ll be able to use it for installing other packages.

Let me know if you have any problems installing pip this way.

Cheers!

P.S. I ended up blogging a post about it. QuickTip: How Do I Install pip on macOS or OS X?


✅ UPDATE (Jan 2019): METHOD #2: Two line solution —

easy_install has been deprecated. Please use get-pip.py instead.

First of all download the get-pip file

curl https://bootstrap.pypa.io/get-pip.py -o get-pip.py

Now run this file to install pip

python get-pip.py

That should do it.

Another gif you said? Here ya go!

Answer #8

I noticed that every now and then I need to Google fopen all over again, just to build a mental image of what the primary differences between the modes are. So, I thought a diagram will be faster to read next time. Maybe someone else will find that helpful too.

Answer #9

It helps to install a python package foo on your machine (can also be in virtualenv) so that you can import the package foo from other projects and also from [I]Python prompts.

It does the similar job of pip, easy_install etc.,


Using setup.py

Let"s start with some definitions:

Package - A folder/directory that contains __init__.py file.
Module - A valid python file with .py extension.
Distribution - How one package relates to other packages and modules.

Let"s say you want to install a package named foo. Then you do,

$ git clone https://github.com/user/foo  
$ cd foo
$ python setup.py install

Instead, if you don"t want to actually install it but still would like to use it. Then do,

$ python setup.py develop  

This command will create symlinks to the source directory within site-packages instead of copying things. Because of this, it is quite fast (particularly for large packages).


Creating setup.py

If you have your package tree like,

foo
├── foo
│   ├── data_struct.py
│   ├── __init__.py
│   └── internals.py
├── README
├── requirements.txt
└── setup.py

Then, you do the following in your setup.py script so that it can be installed on some machine:

from setuptools import setup

setup(
   name="foo",
   version="1.0",
   description="A useful module",
   author="Man Foo",
   author_email="[email protected]",
   packages=["foo"],  #same as name
   install_requires=["wheel", "bar", "greek"], #external packages as dependencies
)

Instead, if your package tree is more complex like the one below:

foo
├── foo
│   ├── data_struct.py
│   ├── __init__.py
│   └── internals.py
├── README
├── requirements.txt
├── scripts
│   ├── cool
│   └── skype
└── setup.py

Then, your setup.py in this case would be like:

from setuptools import setup

setup(
   name="foo",
   version="1.0",
   description="A useful module",
   author="Man Foo",
   author_email="[email protected]",
   packages=["foo"],  #same as name
   install_requires=["wheel", "bar", "greek"], #external packages as dependencies
   scripts=[
            "scripts/cool",
            "scripts/skype",
           ]
)

Add more stuff to (setup.py) & make it decent:

from setuptools import setup

with open("README", "r") as f:
    long_description = f.read()

setup(
   name="foo",
   version="1.0",
   description="A useful module",
   license="MIT",
   long_description=long_description,
   author="Man Foo",
   author_email="[email protected]",
   url="http://www.foopackage.com/",
   packages=["foo"],  #same as name
   install_requires=["wheel", "bar", "greek"], #external packages as dependencies
   scripts=[
            "scripts/cool",
            "scripts/skype",
           ]
)

The long_description is used in pypi.org as the README description of your package.


And finally, you"re now ready to upload your package to PyPi.org so that others can install your package using pip install yourpackage.

At this point there are two options.

  • publish in the temporary test.pypi.org server to make oneself familiarize with the procedure, and then publish it on the permanent pypi.org server for the public to use your package.
  • publish straight away on the permanent pypi.org server, if you are already familiar with the procedure and have your user credentials (e.g., username, password, package name)

Once your package name is registered in pypi.org, nobody can claim or use it. Python packaging suggests the twine package for uploading purposes (of your package to PyPi). Thus,

(1) the first step is to locally build the distributions using:

# prereq: wheel (pip install wheel)  
$ python setup.py sdist bdist_wheel   

(2) then using twine for uploading either to test.pypi.org or pypi.org:

$ twine upload --repository testpypi dist/*  
username: ***  
password: ***  

It will take few minutes for the package to appear on test.pypi.org. Once you"re satisfied with it, you can then upload your package to the real & permanent index of pypi.org simply with:

$ twine upload dist/*  

Optionally, you can also sign the files in your package with a GPG by:

$ twine upload dist/* --sign 

Bonus Reading:

Answer #10

tl;dr / quick fix

  • Don"t decode/encode willy nilly
  • Don"t assume your strings are UTF-8 encoded
  • Try to convert strings to Unicode strings as soon as possible in your code
  • Fix your locale: How to solve UnicodeDecodeError in Python 3.6?
  • Don"t be tempted to use quick reload hacks

Unicode Zen in Python 2.x - The Long Version

Without seeing the source it"s difficult to know the root cause, so I"ll have to speak generally.

UnicodeDecodeError: "ascii" codec can"t decode byte generally happens when you try to convert a Python 2.x str that contains non-ASCII to a Unicode string without specifying the encoding of the original string.

In brief, Unicode strings are an entirely separate type of Python string that does not contain any encoding. They only hold Unicode point codes and therefore can hold any Unicode point from across the entire spectrum. Strings contain encoded text, beit UTF-8, UTF-16, ISO-8895-1, GBK, Big5 etc. Strings are decoded to Unicode and Unicodes are encoded to strings. Files and text data are always transferred in encoded strings.

The Markdown module authors probably use unicode() (where the exception is thrown) as a quality gate to the rest of the code - it will convert ASCII or re-wrap existing Unicodes strings to a new Unicode string. The Markdown authors can"t know the encoding of the incoming string so will rely on you to decode strings to Unicode strings before passing to Markdown.

Unicode strings can be declared in your code using the u prefix to strings. E.g.

>>> my_u = u"my ünicôdé strįng"
>>> type(my_u)
<type "unicode">

Unicode strings may also come from file, databases and network modules. When this happens, you don"t need to worry about the encoding.

Gotchas

Conversion from str to Unicode can happen even when you don"t explicitly call unicode().

The following scenarios cause UnicodeDecodeError exceptions:

# Explicit conversion without encoding
unicode("€")

# New style format string into Unicode string
# Python will try to convert value string to Unicode first
u"The currency is: {}".format("€")

# Old style format string into Unicode string
# Python will try to convert value string to Unicode first
u"The currency is: %s" % "€"

# Append string to Unicode
# Python will try to convert string to Unicode first
u"The currency is: " + "€"         

Examples

In the following diagram, you can see how the word café has been encoded in either "UTF-8" or "Cp1252" encoding depending on the terminal type. In both examples, caf is just regular ascii. In UTF-8, é is encoded using two bytes. In "Cp1252", é is 0xE9 (which is also happens to be the Unicode point value (it"s no coincidence)). The correct decode() is invoked and conversion to a Python Unicode is successfull: Diagram of a string being converted to a Python Unicode string

In this diagram, decode() is called with ascii (which is the same as calling unicode() without an encoding given). As ASCII can"t contain bytes greater than 0x7F, this will throw a UnicodeDecodeError exception:

Diagram of a string being converted to a Python Unicode string with the wrong encoding

The Unicode Sandwich

It"s good practice to form a Unicode sandwich in your code, where you decode all incoming data to Unicode strings, work with Unicodes, then encode to strs on the way out. This saves you from worrying about the encoding of strings in the middle of your code.

Input / Decode

Source code

If you need to bake non-ASCII into your source code, just create Unicode strings by prefixing the string with a u. E.g.

u"Zürich"

To allow Python to decode your source code, you will need to add an encoding header to match the actual encoding of your file. For example, if your file was encoded as "UTF-8", you would use:

# encoding: utf-8

This is only necessary when you have non-ASCII in your source code.

Files

Usually non-ASCII data is received from a file. The io module provides a TextWrapper that decodes your file on the fly, using a given encoding. You must use the correct encoding for the file - it can"t be easily guessed. For example, for a UTF-8 file:

import io
with io.open("my_utf8_file.txt", "r", encoding="utf-8") as my_file:
     my_unicode_string = my_file.read() 

my_unicode_string would then be suitable for passing to Markdown. If a UnicodeDecodeError from the read() line, then you"ve probably used the wrong encoding value.

CSV Files

The Python 2.7 CSV module does not support non-ASCII characters üò©. Help is at hand, however, with https://pypi.python.org/pypi/backports.csv.

Use it like above but pass the opened file to it:

from backports import csv
import io
with io.open("my_utf8_file.txt", "r", encoding="utf-8") as my_file:
    for row in csv.reader(my_file):
        yield row

Databases

Most Python database drivers can return data in Unicode, but usually require a little configuration. Always use Unicode strings for SQL queries.

MySQL

In the connection string add:

charset="utf8",
use_unicode=True

E.g.

>>> db = MySQLdb.connect(host="localhost", user="root", passwd="passwd", db="sandbox", use_unicode=True, charset="utf8")
PostgreSQL

Add:

psycopg2.extensions.register_type(psycopg2.extensions.UNICODE)
psycopg2.extensions.register_type(psycopg2.extensions.UNICODEARRAY)

HTTP

Web pages can be encoded in just about any encoding. The Content-type header should contain a charset field to hint at the encoding. The content can then be decoded manually against this value. Alternatively, Python-Requests returns Unicodes in response.text.

Manually

If you must decode strings manually, you can simply do my_string.decode(encoding), where encoding is the appropriate encoding. Python 2.x supported codecs are given here: Standard Encodings. Again, if you get UnicodeDecodeError then you"ve probably got the wrong encoding.

The meat of the sandwich

Work with Unicodes as you would normal strs.

Output

stdout / printing

print writes through the stdout stream. Python tries to configure an encoder on stdout so that Unicodes are encoded to the console"s encoding. For example, if a Linux shell"s locale is en_GB.UTF-8, the output will be encoded to UTF-8. On Windows, you will be limited to an 8bit code page.

An incorrectly configured console, such as corrupt locale, can lead to unexpected print errors. PYTHONIOENCODING environment variable can force the encoding for stdout.

Files

Just like input, io.open can be used to transparently convert Unicodes to encoded byte strings.

Database

The same configuration for reading will allow Unicodes to be written directly.

Python 3

Python 3 is no more Unicode capable than Python 2.x is, however it is slightly less confused on the topic. E.g the regular str is now a Unicode string and the old str is now bytes.

The default encoding is UTF-8, so if you .decode() a byte string without giving an encoding, Python 3 uses UTF-8 encoding. This probably fixes 50% of people"s Unicode problems.

Further, open() operates in text mode by default, so returns decoded str (Unicode ones). The encoding is derived from your locale, which tends to be UTF-8 on Un*x systems or an 8-bit code page, such as windows-1251, on Windows boxes.

Why you shouldn"t use sys.setdefaultencoding("utf8")

It"s a nasty hack (there"s a reason you have to use reload) that will only mask problems and hinder your migration to Python 3.x. Understand the problem, fix the root cause and enjoy Unicode zen. See Why should we NOT use sys.setdefaultencoding("utf-8") in a py script? for further details