# numpy.arctan2 () in Python

| | | | | | | | | | | | | | | | | | | |

## Syntax

numpy.arctan2(x1, x2, /, out=None, *, where=True, casting=’same_kind’, order=’K’, dtype=None, subok=True[, signature, extobj]) = ufunc ’arctan2’
Parameters
x1 array_like, real-valued
y-coordinates.
x2 array_like, real-valued
x-coordinates. If x1.shape != x2.shape , they must be broadcastable to a common shape (which becomes the shape of the output).
out ndarray, None, or tuple of ndarray and None, optional
A location into which the result is stored. If provided, it must have a shape that the inputs broadcast to. If not provided or None, a freshly-allocated array is returned. A tuple (possible only as a keyword argument) must have length equal to the number of outputs.
where array_like, optional
This condition is broadcast over the input. At locations where the condition is True, the out array will be set to the ufunc result. Elsewhere, the out array will retain its original value. Note that if an uninitialized out array is created via the default out=None, locations within it where the condition is False will remain uninitialized.
**kwargs
For other keyword-only arguments, see the ufunc docs.
Returns
angle ndarray
Array of angles in radians, in the range [-pi, pi]. This is a scalar if both x1 and x2 are scalars.
We will cover NumPy arctan2. Along with that, for a better general understanding, we will also look at its syntax and parameter. Then, we will see the application of the whole theoretical part through some examples. But first, let’s try to analyze the function through its definition. First, arctan means the inverse of a tan function. Now the NumPy arctan2 function helps us to calculate the arc tan value between X1 and X2 in radians. Here, X1 and 2 are parameters that we will discuss later. As we progress through this article, things will become clearer for you. Next, let’s look at the syntax associated with it.
The method numpy.arctan2() calculates the element-wise arctangent of arr1 / arr2 and selects the quadrant correctly. The quadrant is chosen so that arctan2(x1, x2) is the signed angle in radians between the ray ending at the origin and passing through point (1, 0) and the ray ending at the origin and passing through point (x2, x1).
Arctan2 is a 4-quadrant inverse function. Taking this into account, it gives a value between 0 and 2pi. The range of this function is -180 to 180 degrees. These are the 2 key points that distinguish Arctan2 from arctan features.

## Difference Between Arctan and Arctan2

In this section we will discuss the difference between 2 Numpy functions.
 NumPy arctan NumPy arctan2 arctan is a 2 quadrant inverse function. arctan2 is a 4 quadrant inverse function. The range of the arctan function is from -90 to 90 degree. The range for arctan2 is -180 to 180 degree. This function accepts a single array. This function as discussed take 2 input arrays.
Now we are finished with the theoretical part for NumPy arctan2. This section explores how this feature works and how it helps us get the output we want. We’ll start with an elementary example and gradually move on to a more complicated example.

## NumPy Arctan2 Example

import numpy as ppool
y=[1,1]
x=[1,1.732]
print(ppool.arctan2(y,x))

[0.78539816 0.52361148]
Above we see a simple example of our arctan2 function. Now let’s walk line by line and understand how we got the result. First, we imported the NumPy function. Then we defined our 2 sets of the array. By using the syntax of our function and the print statement, we get the result we want. Here both values ‚Äã‚Äãare given in radians. Now if you want to check the result to some extent. To do this, we need to use this particular method:
Angle in degree = angle in radian * 180/pi
If we make calculations on our results, we get a 45 and 30-degree answer. Here we considered pi to 3.14. The responses match and therefore the result is verified.

## Numpy Arctan() Example #2

Now suppose that we also want to obtain the values ‚Äã‚Äãin degrees. It is a simple process and can be done with the help of the for loop and the formula discussed above. Let’s see how:
import numpy as ppool
degree=0
y=[-1,1.732]
x=[2,1]
b=ppool.arctan2(y,x)
print(b)
for vals in b:
degree=vals*(180/3.14)
print(degree)

### Output:

[-0.46364761  1.04718485]
-26.578525356734104
60.02970472117416
See how we get the values ‚Äã‚Äãin radians and degrees. All steps are similar to the first example. The only difference we used a "for loop". If you want something simpler we can also use another method
import numpy as ppool
y=[-1,1.732]
x=[2,1]
b=ppool.arctan2(y,x)*(180/3.14)
print(b)
Here all you need to do is multiply the value (180 / 3.14) or (180 / ppool.pi) by the array. You can definitely use this method on the for loop method. But either way, you will get the desired output which is a degree value.

### Output:

[-26.57852536  60.02970472]

## NumPy Arctan2 Example #3

’’’ Computes the direction of arrival wrt a source and receiver ’’’

s_ind = self.key2ind(source)

# vector from receiver to source
v = self.X[:,s_ind] - self.X[:,r_ind]

azimuth = np.arctan2(v[1], v[0]) elevation = np.arctan2(v[2], la.norm(v[:2])) azimuth = azimuth + 2*np.pi if azimuth < 0. else azimuth elevation = elevation + 2*np.pi if elevation < 0. else elevation return np.array([azimuth, elevation])

## NumPy Arctan2 Example #4

def mtx_freq2visi(M, p_mic_x, p_mic_y):
"""
build the matrix that maps the Fourier series to the visibility
:param M: the Fourier series expansion is limited from -M to M
:param p_mic_x: a vector that constains microphones x coordinates
:param p_mic_y: a vector that constains microphones y coordinates
:return:
"""
num_mic = p_mic_x.size
ms = np.reshape(np.arange(-M, M + 1, step=1), (1, -1), order=’F’)
G = np.zeros((num_mic * (num_mic - 1), 2 * M + 1), dtype=complex, order=’C’)
count_G = 0
for q in range(num_mic):
p_x_outer = p_mic_x[q]
p_y_outer = p_mic_y[q]
for qp in range(num_mic):
if not q == qp:
p_x_qqp = p_x_outer - p_mic_x[qp]
p_y_qqp = p_y_outer - p_mic_y[qp]
norm_p_qqp = np.sqrt(p_x_qqp ** 2 + p_y_qqp ** 2)
phi_qqp = np.arctan2(p_y_qqp, p_x_qqp) G[count_G, :] = (-1j) ** ms * sp.special.jv(ms, norm_p_qqp) * np.exp(1j * ms * phi_qqp) count_G += 1 return G

## NumPy Arctan2 Example #5

def vector_angle(u, v, direction=None):
’’’
vector_angle(u, v) yields the angle between the two vectors u and v. The optional argument
direction is by default None, which specifies that the smallest possible angle between the
vectors be reported; if the vectors u and v are 2D vectors and direction parameters True and
False specify the clockwise or counter-clockwise directions, respectively; if the vectors are
3D vectors, then direction may be a 3D point that is not in the plane containing u, v, and the
origin, and it specifies around which direction (u x v or v x u) the the counter-clockwise angle
from u to v should be reported (the cross product vector that has a positive dot product with
the direction argument is used as the rotation axis).
’’’
if direction is None:
return np.arccos(vector_angle_cos(u, v))
elif direction is True:
return np.arctan2(v[1], v[0]) - np.arctan2(u[1], u[0]) elif direction is False: return np.arctan2(u[1], u[0]) - np.arctan2(v[1], v[0]) else: axis1 = normalize(u) axis2 = normalize(np.cross(u, v)) if np.dot(axis2, direction) < 0: axis2 = -axis2 return np.arctan2(np.dot(axis2, v), np.dot(axis1, v))

## NumPy Arctan2 Example #6

def __init__(self, line):
data = line.split(’ ’)
data[1:] = [float(x) for x in data[1:]]
self.classname = data[0]
self.xmin = data[1]
self.ymin = data[2]
self.xmax = data[1]+data[3]
self.ymax = data[2]+data[4]
self.box2d = np.array([self.xmin,self.ymin,self.xmax,self.ymax])
self.centroid = np.array([data[5],data[6],data[7]])
self.unused_dimension = np.array([data[8],data[9],data[10]])
self.w = data[8]
self.l = data[9]
self.h = data[10]
self.orientation = np.zeros((3,))
self.orientation[0] = data[11]
self.orientation[1] = data[12]
self.heading_angle = -1 * np.arctan2(self.orientation[1], self.orientation[0])

## np.arctan2 Example #7

def stanleyControl(state, cx, cy, cyaw, last_target_idx):
"""
:param state: (State object)
:param cx: ([float])
:param cy: ([float])
:param cyaw: ([float])
:param last_target_idx: (int)
:return: (float, int, float)
"""
# Cross track error
current_target_idx, error_front_axle = calcTargetIndex(state, cx, cy)

if last_target_idx >= current_target_idx:
current_target_idx = last_target_idx

# theta_e corrects the heading error
theta_e = normalizeAngle(cyaw[current_target_idx] - state.yaw)
# theta_d corrects the cross track error
theta_d = np.arctan2(K_STANLEY_CONTROL * error_front_axle, state.v) # Steering control delta = theta_e + theta_d return delta, current_target_idx, error_front_axle

## np arctan2 Example #8

def calcTargetIndex(state, cx, cy):
"""
:param state: (State object)
:param cx: [float]
:param cy: [float]
:return: (int, float)
"""
# Calc front axle position
fx = state.x + CAR_LENGTH * np.cos(state.yaw)
fy = state.y + CAR_LENGTH * np.sin(state.yaw)

# Search nearest point index
dx = [fx - icx for icx in cx]
dy = [fy - icy for icy in cy]
d = [np.sqrt(idx ** 2 + idy ** 2) for (idx, idy) in zip(dx, dy)]
error_front_axle = min(d)
target_idx = d.index(error_front_axle)

target_yaw = normalizeAngle(np.arctan2(fy - cy[target_idx], fx - cx[target_idx]) - state.yaw) if target_yaw > 0.0: error_front_axle = - error_front_axle return target_idx, error_front_axle

## NumPy Arctan2 Example #9

def vehicle_flat_reverse(zflag, params={}):
# Get the parameter values
b = params.get(’wheelbase’, 3.)

# Create a vector to store the state and inputs
x = np.zeros(3)
u = np.zeros(2)

# Given the flat variables, solve for the state
x[0] = zflag[0][0]  # x position
x[1] = zflag[1][0]  # y position
x[2] = np.arctan2(zflag[1][1], zflag[0][1]) # tan(theta) = ydot/xdot # And next solve for the inputs u[0] = zflag[0][1] * np.cos(x[2]) + zflag[1][1] * np.sin(x[2]) thdot_v = zflag[1][2] * np.cos(x[2]) - zflag[0][2] * np.sin(x[2]) u[1] = np.arctan2(thdot_v, u[0]**2 / b) return x, u # Function to compute the RHS of the system dynamics

## np.arctan2 Example #10

def GetNeighborCells(self, p, nr, dp = None):
’’’
Returns all cells no more than a given distance in any direction
from a specified cell
p:      The cell of which to get the neighbors
dp:     Direction preference
’’’
pi, pj, pk = p
tqm = self.qm * self.qp
nc = [(pi - i * tqm, pj - j * tqm, pk) for i in range(-nr, nr + 1) for j in range(-nr, nr + 1)]
if dp is not None:                      #Sort points based on direction preference
dpa = np.arctan2(dp[1], dp[0]) #Get angle of direction prefered #Prefer directions in the direction of dp; sort based on magnitude of angle from last direction nc = sorted(nc, key = lambda t : np.abs(np.arctan2(t[1], t[0]) - dpa)) return nc #Gets the current 3d position of the player

## Python atan or atan2, what should I use?

### StackOverflow question

My formula f=arctan(ImZ/ReZ) There are two options: Option 1 (atan):
ImZ=-4.593172163003
ImR=-4.297336384845

>>> z=y/x
>>> f1=math.atan(z)
>>> f1
0.8186613519278327
Option 2 (atan2)
>>> f=math.atan2(y,x)
>>> f
-2.3229313016619604
Why are these two results different?

Atan takes single argument and Atan2 takes two arguments.The purpose of using two arguments instead of one is to gather information on the signs of the inputs in order to return the appropriate quadrant of the computed angle, which is not possible for the single-argument Atan Atan2 result is always between -pi and pi. Reference: https://en.wikipedia.org/wiki/Atan2

## Archived version

numpy.arctan2 (arr1, arr2, casting = & # 39; same_kind & # 39 ;, order = & # 39; K & # 39 ;, dtype = None, ufunc & # 39; arctan & # 39;) : Calculates the element-wise arctangent of arr1 / arr2 by choosing the correct quadrant.¬†The quadrant is chosen so that arctan2 (x1, x2) is the angle sign in radians between a ray ending at the origin and passing through point (1, 0) and a ray ending at the origin and passing through through point (x2) x1).
Parameters: arr1: [array_like] real valued;¬†y-coordinates arr2: [array_like] real valued;¬†x-coordinates.¬†It must match shape of y-cordinates. out: [ndarray, array_like [ OPTIONAL ]] array of same shape as x . where: [array_like, optional] True value means to calculate the universal functions (ufunc) at that position, False value means to leave the value in the output alone. Note: 2pi Radians = 360 degrees The convention is to return the angle z whose real part lies in [-pi / 2 , pi / 2]. Return: Element-wise arc tangent of arr1 / arr2.¬†The values ‚Äã‚Äãare in the closed interval [-pi / 2, pi / 2].
Code # 1: Work
 # Python3 program explaining # arctan2 () function ¬† import numpy as np ¬† arr1 = [ - 1 , + 1 , + 1 , - 1 ] arr2 = [ - 1 , - 1 , + 1 , + 1 ] ¬†¬† ans = np.arctan2 (arr2, arr1) * 180 / np.pi ¬† print ( "x-coordinates:" , arr1) print ( "y-coordin¬†ates: " , arr2) ¬†¬† print ( "arctan2 values:" , ans)
Output:
x-coordinates: [-1, 1, 1, -1] y-coordinates: [-1, -1, 1, 1] arctan2 values: [-135.¬†-45.¬†45. 135.]
Code # 2: Work
 # Python3 program showing # arctan2 () functions ¬† import numpy as np ¬† a = np.arctan2 ([ 0. , 0. , np.inf ], [ + 0. , - 0. , np.inf]) ¬† b = np.arctan2 ([ 1. , - 1. ], [ 0. , 0. ]) ¬† print ( " a : " , a) ¬†¬† print ( "b:" , b )
Output:
a: [0. 3.14159265 0.78539816] b: [1.57079633 -1.57079633]
Links: arctan2.html#numpy.arctan2>https://docs.scipy.org/doc/numpy-1.13.0/reference/g¬†enerated / numpy.arctan2.html # numpy.arctan2 ,

## How can I make a time delay in Python?

I would like to know how to put a time delay in a Python script.

import time
time.sleep(5)   # Delays for 5 seconds. You can also use a float value.

Here is another example where something is run approximately once a minute:

import time
while True:
print("This prints once a minute.")
time.sleep(60) # Delay for 1 minute (60 seconds).

You can use the sleep() function in the time module. It can take a float argument for sub-second resolution.

from time import sleep
sleep(0.1) # Time in seconds

# How can I make a time delay in Python?

In a single thread I suggest the sleep function:

>>> from time import sleep

>>> sleep(4)

This function actually suspends the processing of the thread in which it is called by the operating system, allowing other threads and processes to execute while it sleeps.

Use it for that purpose, or simply to delay a function from executing. For example:

>>> def party_time():
...     print("hooray!")
...
>>> sleep(3); party_time()
hooray!

"hooray!" is printed 3 seconds after I hit Enter.

### Example using sleep with multiple threads and processes

Again, sleep suspends your thread - it uses next to zero processing power.

To demonstrate, create a script like this (I first attempted this in an interactive Python 3.5 shell, but sub-processes can"t find the party_later function for some reason):

from concurrent.futures import ThreadPoolExecutor, ProcessPoolExecutor, as_completed
from time import sleep, time

def party_later(kind="", n=""):
sleep(3)
return kind + n + " party time!: " + __name__

def main():
with ProcessPoolExecutor() as proc_executor:
start_time = time()
proc_future1 = proc_executor.submit(party_later, kind="proc", n="1")
proc_future2 = proc_executor.submit(party_later, kind="proc", n="2")
for f in as_completed([
print(f.result())
end_time = time()
print("total time to execute four 3-sec functions:", end_time - start_time)

if __name__ == "__main__":
main()

Example output from this script:

proc1 party time!: __mp_main__
proc2 party time!: __mp_main__
total time to execute four 3-sec functions: 3.4519670009613037

You can trigger a function to be called at a later time in a separate thread with the Timer threading object:

>>> t = Timer(3, party_time, args=None, kwargs=None)
>>> t.start()
>>>
>>> hooray!

>>>

The blank line illustrates that the function printed to my standard output, and I had to hit Enter to ensure I was on a prompt.

The upside of this method is that while the Timer thread was waiting, I was able to do other things, in this case, hitting Enter one time - before the function executed (see the first empty prompt).

There isn"t a respective object in the multiprocessing library. You can create one, but it probably doesn"t exist for a reason. A sub-thread makes a lot more sense for a simple timer than a whole new subprocess.

Delays can be also implemented by using the following methods.

The first method:

import time
time.sleep(5) # Delay for 5 seconds.

The second method to delay would be using the implicit wait method:

driver.implicitly_wait(5)

The third method is more useful when you have to wait until a particular action is completed or until an element is found:

## How to delete a file or folder in Python?

How do I delete a file or folder in Python?

Path objects from the Python 3.4+ pathlib module also expose these instance methods:

## Removing white space around a saved image in matplotlib

I need to take an image and save it after some process. The figure looks fine when I display it, but after saving the figure, I got some white space around the saved image. I have tried the "tight" option for savefig method, did not work either. The code:

import matplotlib.image as mpimg
import matplotlib.pyplot as plt

fig = plt.figure(1)
plt.imshow(img)

extent = ax.get_window_extent().transformed(fig.dpi_scale_trans.inverted())
plt.savefig("1.png", bbox_inches=extent)

plt.axis("off")
plt.show()

I am trying to draw a basic graph by using NetworkX on a figure and save it. I realized that without a graph it works, but when added a graph I get white space around the saved image;

import matplotlib.image as mpimg
import matplotlib.pyplot as plt
import networkx as nx

G = nx.Graph()
pos = {1:[100,120], 2:[200,300], 3:[50,75]}

fig = plt.figure(1)
plt.imshow(img)

nx.draw(G, pos=pos)

extent = ax.get_window_extent().transformed(fig.dpi_scale_trans.inverted())
plt.savefig("1.png", bbox_inches = extent)

plt.axis("off")
plt.show()

You can remove the white space padding by setting bbox_inches="tight" in savefig:

plt.savefig("test.png",bbox_inches="tight")

You"ll have to put the argument to bbox_inches as a string, perhaps this is why it didn"t work earlier for you.

Possible duplicates:

Matplotlib plots: removing axis, legends and white spaces

How to set the margins for a matplotlib figure?

Reduce left and right margins in matplotlib plot

I cannot claim I know exactly why or how my “solution” works, but this is what I had to do when I wanted to plot the outline of a couple of aerofoil sections — without white margins — to a PDF file. (Note that I used matplotlib inside an IPython notebook, with the -pylab flag.)

plt.gca().set_axis_off()
plt.subplots_adjust(top = 1, bottom = 0, right = 1, left = 0,
hspace = 0, wspace = 0)
plt.margins(0,0)
plt.gca().xaxis.set_major_locator(plt.NullLocator())
plt.gca().yaxis.set_major_locator(plt.NullLocator())
plt.savefig("filename.pdf", bbox_inches = "tight",

I have tried to deactivate different parts of this, but this always lead to a white margin somewhere. You may even have modify this to keep fat lines near the limits of the figure from being shaved by the lack of margins.

## How do I install pip on macOS or OS X?

I spent most of the day yesterday searching for a clear answer for installing pip (package manager for Python). I can"t find a good solution.

How do I install it?

UPDATE (Jan 2019):

easy_install pip

If you need admin privileges to run this, try:

sudo easy_install pip

‚ö°Ô∏è TL;DR ‚Äî One line solution.

All you have to do is:

sudo easy_install pip

2019: ‚ö†Ô∏èeasy_install has been deprecated. Check Method #2 below for preferred installation!

Details:

‚ö°Ô∏è OK, I read the solutions given above, but here"s an EASY solution to install pip.

MacOS comes with Python installed. But to make sure that you have Python installed open the terminal and run the following command.

python --version

If this command returns a version number that means Python exists. Which also means that you already have access to easy_install considering you are using macOS/OSX.

‚ÑπÔ∏è Now, all you have to do is run the following command.

sudo easy_install pip

After that, pip will be installed and you"ll be able to use it for installing other packages.

Let me know if you have any problems installing pip this way.

Cheers!

P.S. I ended up blogging a post about it. QuickTip: How Do I Install pip on macOS or OS X?

‚úÖ UPDATE (Jan 2019): METHOD #2: Two line solution ‚Äî

curl https://bootstrap.pypa.io/get-pip.py -o get-pip.py

Now run this file to install pip

python get-pip.py

That should do it.

Another gif you said? Here ya go!

You can install it through Homebrew on OS X. Why would you install Python with Homebrew?

The version of Python that ships with OS X is great for learning but it‚Äôs not good for development. The version shipped with OS X may be out of date from the official current Python release, which is considered the stable production version. (source)

Homebrew is something of a package manager for OS X. Find more details on the Homebrew page. Once Homebrew is installed, run the following to install the latest Python, Pip & Setuptools:

brew install python

I"m surprised no-one has mentioned this - since 2013, python itself is capable of installing pip, no external commands (and no internet connection) required.

sudo -H python -m ensurepip

This will create a similar install to what easy_install would.

On Mac:

1. Install easy_install

curl https://bootstrap.pypa.io/ez_setup.py -o - | sudo python

2. Install pip

sudo easy_install pip

3. Now, you could install external modules. For example

pip install regex   # This is only an example for installing other modules

## How do I merge two dictionaries in a single expression (taking union of dictionaries)?

### Question by Carl Meyer

I have two Python dictionaries, and I want to write a single expression that returns these two dictionaries, merged (i.e. taking the union). The update() method would be what I need, if it returned its result instead of modifying a dictionary in-place.

>>> x = {"a": 1, "b": 2}
>>> y = {"b": 10, "c": 11}
>>> z = x.update(y)
>>> print(z)
None
>>> x
{"a": 1, "b": 10, "c": 11}

How can I get that final merged dictionary in z, not x?

(To be extra-clear, the last-one-wins conflict-handling of dict.update() is what I"m looking for as well.)

## How can I merge two Python dictionaries in a single expression?

For dictionaries x and y, z becomes a shallowly-merged dictionary with values from y replacing those from x.

• In Python 3.9.0 or greater (released 17 October 2020): PEP-584, discussed here, was implemented and provides the simplest method:

z = x | y          # NOTE: 3.9+ ONLY

• In Python 3.5 or greater:

z = {**x, **y}

• In Python 2, (or 3.4 or lower) write a function:

def merge_two_dicts(x, y):
z.update(y)    # modifies z with keys and values of y
return z

and now:

z = merge_two_dicts(x, y)

### Explanation

Say you have two dictionaries and you want to merge them into a new dictionary without altering the original dictionaries:

x = {"a": 1, "b": 2}
y = {"b": 3, "c": 4}

The desired result is to get a new dictionary (z) with the values merged, and the second dictionary"s values overwriting those from the first.

>>> z
{"a": 1, "b": 3, "c": 4}

A new syntax for this, proposed in PEP 448 and available as of Python 3.5, is

z = {**x, **y}

And it is indeed a single expression.

Note that we can merge in with literal notation as well:

z = {**x, "foo": 1, "bar": 2, **y}

and now:

>>> z
{"a": 1, "b": 3, "foo": 1, "bar": 2, "c": 4}

It is now showing as implemented in the release schedule for 3.5, PEP 478, and it has now made its way into the What"s New in Python 3.5 document.

However, since many organizations are still on Python 2, you may wish to do this in a backward-compatible way. The classically Pythonic way, available in Python 2 and Python 3.0-3.4, is to do this as a two-step process:

z = x.copy()
z.update(y) # which returns None since it mutates z

In both approaches, y will come second and its values will replace x"s values, thus b will point to 3 in our final result.

## Not yet on Python 3.5, but want a single expression

If you are not yet on Python 3.5 or need to write backward-compatible code, and you want this in a single expression, the most performant while the correct approach is to put it in a function:

def merge_two_dicts(x, y):
"""Given two dictionaries, merge them into a new dict as a shallow copy."""
z = x.copy()
z.update(y)
return z

and then you have a single expression:

z = merge_two_dicts(x, y)

You can also make a function to merge an arbitrary number of dictionaries, from zero to a very large number:

def merge_dicts(*dict_args):
"""
Given any number of dictionaries, shallow copy and merge into a new dict,
precedence goes to key-value pairs in latter dictionaries.
"""
result = {}
for dictionary in dict_args:
result.update(dictionary)
return result

This function will work in Python 2 and 3 for all dictionaries. e.g. given dictionaries a to g:

z = merge_dicts(a, b, c, d, e, f, g)

and key-value pairs in g will take precedence over dictionaries a to f, and so on.

Don"t use what you see in the formerly accepted answer:

z = dict(x.items() + y.items())

In Python 2, you create two lists in memory for each dict, create a third list in memory with length equal to the length of the first two put together, and then discard all three lists to create the dict. In Python 3, this will fail because you"re adding two dict_items objects together, not two lists -

>>> c = dict(a.items() + b.items())
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: unsupported operand type(s) for +: "dict_items" and "dict_items"

and you would have to explicitly create them as lists, e.g. z = dict(list(x.items()) + list(y.items())). This is a waste of resources and computation power.

Similarly, taking the union of items() in Python 3 (viewitems() in Python 2.7) will also fail when values are unhashable objects (like lists, for example). Even if your values are hashable, since sets are semantically unordered, the behavior is undefined in regards to precedence. So don"t do this:

>>> c = dict(a.items() | b.items())

This example demonstrates what happens when values are unhashable:

>>> x = {"a": []}
>>> y = {"b": []}
>>> dict(x.items() | y.items())
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: unhashable type: "list"

Here"s an example where y should have precedence, but instead the value from x is retained due to the arbitrary order of sets:

>>> x = {"a": 2}
>>> y = {"a": 1}
>>> dict(x.items() | y.items())
{"a": 2}

Another hack you should not use:

z = dict(x, **y)

This uses the dict constructor and is very fast and memory-efficient (even slightly more so than our two-step process) but unless you know precisely what is happening here (that is, the second dict is being passed as keyword arguments to the dict constructor), it"s difficult to read, it"s not the intended usage, and so it is not Pythonic.

Here"s an example of the usage being remediated in django.

Dictionaries are intended to take hashable keys (e.g. frozensets or tuples), but this method fails in Python 3 when keys are not strings.

>>> c = dict(a, **b)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: keyword arguments must be strings

From the mailing list, Guido van Rossum, the creator of the language, wrote:

I am fine with declaring dict({}, **{1:3}) illegal, since after all it is abuse of the ** mechanism.

and

Apparently dict(x, **y) is going around as "cool hack" for "call x.update(y) and return x". Personally, I find it more despicable than cool.

It is my understanding (as well as the understanding of the creator of the language) that the intended usage for dict(**y) is for creating dictionaries for readability purposes, e.g.:

dict(a=1, b=10, c=11)

{"a": 1, "b": 10, "c": 11}

Despite what Guido says, dict(x, **y) is in line with the dict specification, which btw. works for both Python 2 and 3. The fact that this only works for string keys is a direct consequence of how keyword parameters work and not a short-coming of dict. Nor is using the ** operator in this place an abuse of the mechanism, in fact, ** was designed precisely to pass dictionaries as keywords.

Again, it doesn"t work for 3 when keys are not strings. The implicit calling contract is that namespaces take ordinary dictionaries, while users must only pass keyword arguments that are strings. All other callables enforced it. dict broke this consistency in Python 2:

>>> foo(**{("a", "b"): None})
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: foo() keywords must be strings
>>> dict(**{("a", "b"): None})
{("a", "b"): None}

This inconsistency was bad given other implementations of Python (PyPy, Jython, IronPython). Thus it was fixed in Python 3, as this usage could be a breaking change.

I submit to you that it is malicious incompetence to intentionally write code that only works in one version of a language or that only works given certain arbitrary constraints.

dict(x.items() + y.items()) is still the most readable solution for Python 2. Readability counts.

My response: merge_two_dicts(x, y) actually seems much clearer to me, if we"re actually concerned about readability. And it is not forward compatible, as Python 2 is increasingly deprecated.

{**x, **y} does not seem to handle nested dictionaries. the contents of nested keys are simply overwritten, not merged [...] I ended up being burnt by these answers that do not merge recursively and I was surprised no one mentioned it. In my interpretation of the word "merging" these answers describe "updating one dict with another", and not merging.

Yes. I must refer you back to the question, which is asking for a shallow merge of two dictionaries, with the first"s values being overwritten by the second"s - in a single expression.

Assuming two dictionaries of dictionaries, one might recursively merge them in a single function, but you should be careful not to modify the dictionaries from either source, and the surest way to avoid that is to make a copy when assigning values. As keys must be hashable and are usually therefore immutable, it is pointless to copy them:

from copy import deepcopy

def dict_of_dicts_merge(x, y):
z = {}
overlapping_keys = x.keys() & y.keys()
for key in overlapping_keys:
z[key] = dict_of_dicts_merge(x[key], y[key])
for key in x.keys() - overlapping_keys:
z[key] = deepcopy(x[key])
for key in y.keys() - overlapping_keys:
z[key] = deepcopy(y[key])
return z

Usage:

>>> x = {"a":{1:{}}, "b": {2:{}}}
>>> y = {"b":{10:{}}, "c": {11:{}}}
>>> dict_of_dicts_merge(x, y)
{"b": {2: {}, 10: {}}, "a": {1: {}}, "c": {11: {}}}

Coming up with contingencies for other value types is far beyond the scope of this question, so I will point you at my answer to the canonical question on a "Dictionaries of dictionaries merge".

## Less Performant But Correct Ad-hocs

These approaches are less performant, but they will provide correct behavior. They will be much less performant than copy and update or the new unpacking because they iterate through each key-value pair at a higher level of abstraction, but they do respect the order of precedence (latter dictionaries have precedence)

You can also chain the dictionaries manually inside a dict comprehension:

{k: v for d in dicts for k, v in d.items()} # iteritems in Python 2.7

or in Python 2.6 (and perhaps as early as 2.4 when generator expressions were introduced):

dict((k, v) for d in dicts for k, v in d.items()) # iteritems in Python 2

itertools.chain will chain the iterators over the key-value pairs in the correct order:

from itertools import chain
z = dict(chain(x.items(), y.items())) # iteritems in Python 2

## Performance Analysis

I"m only going to do the performance analysis of the usages known to behave correctly. (Self-contained so you can copy and paste yourself.)

from timeit import repeat
from itertools import chain

x = dict.fromkeys("abcdefg")
y = dict.fromkeys("efghijk")

def merge_two_dicts(x, y):
z = x.copy()
z.update(y)
return z

min(repeat(lambda: {**x, **y}))
min(repeat(lambda: merge_two_dicts(x, y)))
min(repeat(lambda: {k: v for d in (x, y) for k, v in d.items()}))
min(repeat(lambda: dict(chain(x.items(), y.items()))))
min(repeat(lambda: dict(item for d in (x, y) for item in d.items())))

In Python 3.8.1, NixOS:

>>> min(repeat(lambda: {**x, **y}))
1.0804965235292912
>>> min(repeat(lambda: merge_two_dicts(x, y)))
1.636518670246005
>>> min(repeat(lambda: {k: v for d in (x, y) for k, v in d.items()}))
3.1779992282390594
>>> min(repeat(lambda: dict(chain(x.items(), y.items()))))
2.740647904574871
>>> min(repeat(lambda: dict(item for d in (x, y) for item in d.items())))
4.266070580109954
\$ uname -a
Linux nixos 4.19.113 #1-NixOS SMP Wed Mar 25 07:06:15 UTC 2020 x86_64 GNU/Linux

## Resources on Dictionaries

In your case, what you can do is:

z = dict(list(x.items()) + list(y.items()))

This will, as you want it, put the final dict in z, and make the value for key b be properly overridden by the second (y) dict"s value:

>>> x = {"a":1, "b": 2}
>>> y = {"b":10, "c": 11}
>>> z = dict(list(x.items()) + list(y.items()))
>>> z
{"a": 1, "c": 11, "b": 10}

If you use Python 2, you can even remove the list() calls. To create z:

>>> z = dict(x.items() + y.items())
>>> z
{"a": 1, "c": 11, "b": 10}

If you use Python version 3.9.0a4 or greater, then you can directly use:

x = {"a":1, "b": 2}
y = {"b":10, "c": 11}
z = x | y
print(z)
{"a": 1, "c": 11, "b": 10}

An alternative:

z = x.copy()
z.update(y)

Another, more concise, option:

z = dict(x, **y)

Note: this has become a popular answer, but it is important to point out that if y has any non-string keys, the fact that this works at all is an abuse of a CPython implementation detail, and it does not work in Python 3, or in PyPy, IronPython, or Jython. Also, Guido is not a fan. So I can"t recommend this technique for forward-compatible or cross-implementation portable code, which really means it should be avoided entirely.

This probably won"t be a popular answer, but you almost certainly do not want to do this. If you want a copy that"s a merge, then use copy (or deepcopy, depending on what you want) and then update. The two lines of code are much more readable - more Pythonic - than the single line creation with .items() + .items(). Explicit is better than implicit.

In addition, when you use .items() (pre Python 3.0), you"re creating a new list that contains the items from the dict. If your dictionaries are large, then that is quite a lot of overhead (two large lists that will be thrown away as soon as the merged dict is created). update() can work more efficiently, because it can run through the second dict item-by-item.

In terms of time:

>>> timeit.Timer("dict(x, **y)", "x = dict(zip(range(1000), range(1000)))
y=dict(zip(range(1000,2000), range(1000,2000)))").timeit(100000)
15.52571702003479
>>> timeit.Timer("temp = x.copy()
temp.update(y)", "x = dict(zip(range(1000), range(1000)))
y=dict(zip(range(1000,2000), range(1000,2000)))").timeit(100000)
15.694622993469238
>>> timeit.Timer("dict(x.items() + y.items())", "x = dict(zip(range(1000), range(1000)))
y=dict(zip(range(1000,2000), range(1000,2000)))").timeit(100000)
41.484580039978027

IMO the tiny slowdown between the first two is worth it for the readability. In addition, keyword arguments for dictionary creation was only added in Python 2.3, whereas copy() and update() will work in older versions.

## Finding the index of an item in a list

Given a list ["foo", "bar", "baz"] and an item in the list "bar", how do I get its index (1) in Python?

>>> ["foo", "bar", "baz"].index("bar")
1

Reference: Data Structures > More on Lists

# Caveats follow

Note that while this is perhaps the cleanest way to answer the question as asked, index is a rather weak component of the list API, and I can"t remember the last time I used it in anger. It"s been pointed out to me in the comments that because this answer is heavily referenced, it should be made more complete. Some caveats about list.index follow. It is probably worth initially taking a look at the documentation for it:

list.index(x[, start[, end]])

Return zero-based index in the list of the first item whose value is equal to x. Raises a ValueError if there is no such item.

The optional arguments start and end are interpreted as in the slice notation and are used to limit the search to a particular subsequence of the list. The returned index is computed relative to the beginning of the full sequence rather than the start argument.

## Linear time-complexity in list length

An index call checks every element of the list in order, until it finds a match. If your list is long, and you don"t know roughly where in the list it occurs, this search could become a bottleneck. In that case, you should consider a different data structure. Note that if you know roughly where to find the match, you can give index a hint. For instance, in this snippet, l.index(999_999, 999_990, 1_000_000) is roughly five orders of magnitude faster than straight l.index(999_999), because the former only has to search 10 entries, while the latter searches a million:

>>> import timeit
>>> timeit.timeit("l.index(999_999)", setup="l = list(range(0, 1_000_000))", number=1000)
9.356267921015387
>>> timeit.timeit("l.index(999_999, 999_990, 1_000_000)", setup="l = list(range(0, 1_000_000))", number=1000)
0.0004404920036904514

## Only returns the index of the first match to its argument

A call to index searches through the list in order until it finds a match, and stops there. If you expect to need indices of more matches, you should use a list comprehension, or generator expression.

>>> [1, 1].index(1)
0
>>> [i for i, e in enumerate([1, 2, 1]) if e == 1]
[0, 2]
>>> g = (i for i, e in enumerate([1, 2, 1]) if e == 1)
>>> next(g)
0
>>> next(g)
2

Most places where I once would have used index, I now use a list comprehension or generator expression because they"re more generalizable. So if you"re considering reaching for index, take a look at these excellent Python features.

## Throws if element not present in list

A call to index results in a ValueError if the item"s not present.

>>> [1, 1].index(2)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ValueError: 2 is not in list

If the item might not be present in the list, you should either

1. Check for it first with item in my_list (clean, readable approach), or
2. Wrap the index call in a try/except block which catches ValueError (probably faster, at least when the list to search is long, and the item is usually present.)

One thing that is really helpful in learning Python is to use the interactive help function:

>>> help(["foo", "bar", "baz"])
Help on list object:

class list(object)
...

|
|  index(...)
|      L.index(value, [start, [stop]]) -> integer -- return first index of value
|

which will often lead you to the method you are looking for.

The majority of answers explain how to find a single index, but their methods do not return multiple indexes if the item is in the list multiple times. Use enumerate():

for i, j in enumerate(["foo", "bar", "baz"]):
if j == "bar":
print(i)

The index() function only returns the first occurrence, while enumerate() returns all occurrences.

As a list comprehension:

[i for i, j in enumerate(["foo", "bar", "baz"]) if j == "bar"]

Here"s also another small solution with itertools.count() (which is pretty much the same approach as enumerate):

from itertools import izip as zip, count # izip for maximum efficiency
[i for i, j in zip(count(), ["foo", "bar", "baz"]) if j == "bar"]

This is more efficient for larger lists than using enumerate():

\$ python -m timeit -s "from itertools import izip as zip, count" "[i for i, j in zip(count(), ["foo", "bar", "baz"]*500) if j == "bar"]"
10000 loops, best of 3: 174 usec per loop
\$ python -m timeit "[i for i, j in enumerate(["foo", "bar", "baz"]*500) if j == "bar"]"
10000 loops, best of 3: 196 usec per loop

To get all indexes:

indexes = [i for i,x in enumerate(xs) if x == "foo"]

index() returns the first index of value!

| index(...)
| L.index(value, [start, [stop]]) -> integer -- return first index of value

def all_indices(value, qlist):
indices = []
idx = -1
while True:
try:
idx = qlist.index(value, idx+1)
indices.append(idx)
except ValueError:
break
return indices

all_indices("foo", ["foo";"bar";"baz";"foo"])

## InsecurePlatformWarning: A true SSLContext object is not available. This prevents urllib3 from configuring SSL appropriately

Tried to perform REST GET through python requests with the following code and I got error.

Code snip:

import requests
url = az_base_url + az_subscription_id + "/resourcegroups/Default-Networking/resources?" + az_api_version

Error:

/usr/local/lib/python2.7/dist-packages/requests/packages/urllib3/util/ssl_.py:79:
InsecurePlatformWarning: A true SSLContext object is not available.
This prevents urllib3 from configuring SSL appropriately and may cause certain SSL connections to fail.
InsecurePlatformWarning

My python version is 2.7.3. I tried to install urllib3 and requests[security] as some other thread suggests, I still got the same error.

Wonder if anyone can provide some tips?

The docs give a fair indicator of what"s required., however requests allow us to skip a few steps:

You only need to install the security package extras (thanks @admdrew for pointing it out)

\$ pip install requests[security]

or, install them directly:

\$ pip install pyopenssl ndg-httpsclient pyasn1

Requests will then automatically inject pyopenssl into urllib3

If you"re on ubuntu, you may run into trouble installing pyopenssl, you"ll need these dependencies:

\$ apt-get install libffi-dev libssl-dev

If you are not able to upgrade your Python version to 2.7.9, and want to suppress warnings,

pip install requests==2.5.3

## Dynamic instantiation from string name of a class in dynamically imported module?

In python, I have to instantiate certain class, knowing its name in a string, but this class "lives" in a dynamically imported module. An example follows:

import sys
def __init__(self, module_name, class_name): # both args are strings
try:
__import__(module_name)
modul = sys.modules[module_name]
instance = modul.class_name() # obviously this doesn"t works, here is my main problem!
except ImportError:
# manage import error

class myName:
# etc...

I use this arrangement to make any dynamically-loaded-module to be used by the loader-class following certain predefined behaviours in the dyn-loaded-modules...

You can use getattr

getattr(module, class_name)

to access the class. More complete code:

module = __import__(module_name)
class_ = getattr(module, class_name)
instance = class_()

As mentioned below, we may use importlib

import importlib
module = importlib.import_module(module_name)
class_ = getattr(module, class_name)
instance = class_()

# tl;dr

Import the root module with importlib.import_module and load the class by its name using getattr function:

# Standard import
import importlib
MyClass = getattr(importlib.import_module("module.submodule"), "MyClass")
# Instantiate the class (pass arguments to the constructor, if needed)
instance = MyClass()

# explanations

You probably don"t want to use __import__ to dynamically import a module by name, as it does not allow you to import submodules:

>>> mod = __import__("os.path")
>>> mod.join
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
AttributeError: "module" object has no attribute "join"

Here is what the python doc says about __import__:

Note: This is an advanced function that is not needed in everyday Python programming, unlike importlib.import_module().

Instead, use the standard importlib module to dynamically import a module by name. With getattr you can then instantiate a class by its name:

import importlib
my_module = importlib.import_module("module.submodule")
MyClass = getattr(my_module, "MyClass")
instance = MyClass()

You could also write:

import importlib
module_name, class_name = "module.submodule.MyClass".rsplit(".", 1)
MyClass = getattr(importlib.import_module(module_name), class_name)
instance = MyClass()

This code is valid in python ‚â• 2.7 (including python 3).

## pandas loc vs. iloc vs. at vs. iat?

Recently began branching out from my safe place (R) into Python and and am a bit confused by the cell localization/selection in Pandas. I"ve read the documentation but I"m struggling to understand the practical implications of the various localization/selection options.

Is there a reason why I should ever use .loc or .iloc over at, and iat or vice versa? In what situations should I use which method?

Note: future readers be aware that this question is old and was written before pandas v0.20 when there used to exist a function called .ix. This method was later split into two - loc and iloc - to make the explicit distinction between positional and label based indexing. Please beware that ix was discontinued due to inconsistent behavior and being hard to grok, and no longer exists in current versions of pandas (>= 1.0).

loc: only work on index
iloc: work on position
at: get scalar values. It"s a very fast loc
iat: Get scalar values. It"s a very fast iloc

Also,

at and iat are meant to access a scalar, that is, a single element in the dataframe, while loc and iloc are ments to access several elements at the same time, potentially to perform vectorized operations.

http://pyciencia.blogspot.com/2015/05/obtener-y-filtrar-datos-de-un-dataframe.html

## Meaning of @classmethod and @staticmethod for beginner?

### Question by user1632861

Could someone explain to me the meaning of @classmethod and @staticmethod in python? I need to know the difference and the meaning.

As far as I understand, @classmethod tells a class that it"s a method which should be inherited into subclasses, or... something. However, what"s the point of that? Why not just define the class method without adding @classmethod or @staticmethod or any @ definitions?

tl;dr: when should I use them, why should I use them, and how should I use them?

Though classmethod and staticmethod are quite similar, there"s a slight difference in usage for both entities: classmethod must have a reference to a class object as the first parameter, whereas staticmethod can have no parameters at all.

## Example

class Date(object):

def __init__(self, day=0, month=0, year=0):
self.day = day
self.month = month
self.year = year

@classmethod
def from_string(cls, date_as_string):
day, month, year = map(int, date_as_string.split("-"))
date1 = cls(day, month, year)
return date1

@staticmethod
def is_date_valid(date_as_string):
day, month, year = map(int, date_as_string.split("-"))
return day <= 31 and month <= 12 and year <= 3999

date2 = Date.from_string("11-09-2012")
is_date = Date.is_date_valid("11-09-2012")

## Explanation

Let"s assume an example of a class, dealing with date information (this will be our boilerplate):

class Date(object):

def __init__(self, day=0, month=0, year=0):
self.day = day
self.month = month
self.year = year

This class obviously could be used to store information about certain dates (without timezone information; let"s assume all dates are presented in UTC).

Here we have __init__, a typical initializer of Python class instances, which receives arguments as a typical instancemethod, having the first non-optional argument (self) that holds a reference to a newly created instance.

Class Method

We have some tasks that can be nicely done using classmethods.

Let"s assume that we want to create a lot of Date class instances having date information coming from an outer source encoded as a string with format "dd-mm-yyyy". Suppose we have to do this in different places in the source code of our project.

So what we must do here is:

1. Parse a string to receive day, month and year as three integer variables or a 3-item tuple consisting of that variable.
2. Instantiate Date by passing those values to the initialization call.

This will look like:

day, month, year = map(int, string_date.split("-"))
date1 = Date(day, month, year)

@classmethod
def from_string(cls, date_as_string):
day, month, year = map(int, date_as_string.split("-"))
date1 = cls(day, month, year)
return date1

date2 = Date.from_string("11-09-2012")

Let"s look more carefully at the above implementation, and review what advantages we have here:

1. We"ve implemented date string parsing in one place and it"s reusable now.
2. Encapsulation works fine here (if you think that you could implement string parsing as a single function elsewhere, this solution fits the OOP paradigm far better).
3. cls is an object that holds the class itself, not an instance of the class. It"s pretty cool because if we inherit our Date class, all children will have from_string defined also.

Static method

What about staticmethod? It"s pretty similar to classmethod but doesn"t take any obligatory parameters (like a class method or instance method does).

Let"s look at the next use case.

We have a date string that we want to validate somehow. This task is also logically bound to the Date class we"ve used so far, but doesn"t require instantiation of it.

Here is where staticmethod can be useful. Let"s look at the next piece of code:

@staticmethod
def is_date_valid(date_as_string):
day, month, year = map(int, date_as_string.split("-"))
return day <= 31 and month <= 12 and year <= 3999

# usage:
is_date = Date.is_date_valid("11-09-2012")

So, as we can see from usage of staticmethod, we don"t have any access to what the class is---it"s basically just a function, called syntactically like a method, but without access to the object and its internals (fields and another methods), while classmethod does.

Rostyslav Dzinko"s answer is very appropriate. I thought I could highlight one other reason you should choose @classmethod over @staticmethod when you are creating an additional constructor.

In the example above, Rostyslav used the @classmethod from_string as a Factory to create Date objects from otherwise unacceptable parameters. The same can be done with @staticmethod as is shown in the code below:

class Date:
def __init__(self, month, day, year):
self.month = month
self.day   = day
self.year  = year

def display(self):
return "{0}-{1}-{2}".format(self.month, self.day, self.year)

@staticmethod
def millenium(month, day):
return Date(month, day, 2000)

new_year = Date(1, 1, 2013)               # Creates a new Date object
millenium_new_year = Date.millenium(1, 1) # also creates a Date object.

# Proof:
new_year.display()           # "1-1-2013"
millenium_new_year.display() # "1-1-2000"

isinstance(new_year, Date) # True
isinstance(millenium_new_year, Date) # True

Thus both new_year and millenium_new_year are instances of the Date class.

But, if you observe closely, the Factory process is hard-coded to create Date objects no matter what. What this means is that even if the Date class is subclassed, the subclasses will still create plain Date objects (without any properties of the subclass). See that in the example below:

class DateTime(Date):
def display(self):
return "{0}-{1}-{2} - 00:00:00PM".format(self.month, self.day, self.year)

datetime1 = DateTime(10, 10, 1990)
datetime2 = DateTime.millenium(10, 10)

isinstance(datetime1, DateTime) # True
isinstance(datetime2, DateTime) # False

datetime1.display() # returns "10-10-1990 - 00:00:00PM"
datetime2.display() # returns "10-10-2000" because it"s not a DateTime object but a Date object. Check the implementation of the millenium method on the Date class for more details.

datetime2 is not an instance of DateTime? WTF? Well, that"s because of the @staticmethod decorator used.

In most cases, this is undesired. If what you want is a Factory method that is aware of the class that called it, then @classmethod is what you need.

Rewriting Date.millenium as (that"s the only part of the above code that changes):

@classmethod
def millenium(cls, month, day):
return cls(month, day, 2000)

ensures that the class is not hard-coded but rather learnt. cls can be any subclass. The resulting object will rightly be an instance of cls.
Let"s test that out:

datetime1 = DateTime(10, 10, 1990)
datetime2 = DateTime.millenium(10, 10)

isinstance(datetime1, DateTime) # True
isinstance(datetime2, DateTime) # True

datetime1.display() # "10-10-1990 - 00:00:00PM"
datetime2.display() # "10-10-2000 - 00:00:00PM"

The reason is, as you know by now, that @classmethod was used instead of @staticmethod

@classmethod means: when this method is called, we pass the class as the first argument instead of the instance of that class (as we normally do with methods). This means you can use the class and its properties inside that method rather than a particular instance.

@staticmethod means: when this method is called, we don"t pass an instance of the class to it (as we normally do with methods). This means you can put a function inside a class but you can"t access the instance of that class (this is useful when your method does not use the instance).

## What is the meaning of single and double underscore before an object name?

Can someone please explain the exact meaning of having single and double leading underscores before an object"s name in Python, and the difference between both?

Also, does that meaning stay the same regardless of whether the object in question is a variable, a function, a method, etc.?

## Single Underscore

Names, in a class, with a leading underscore are simply to indicate to other programmers that the attribute or method is intended to be private. However, nothing special is done with the name itself.

To quote PEP-8:

_single_leading_underscore: weak "internal use" indicator. E.g. from M import * does not import objects whose name starts with an underscore.

## Double Underscore (Name Mangling)

From the Python docs:

Any identifier of the form __spam (at least two leading underscores, at most one trailing underscore) is textually replaced with _classname__spam, where classname is the current class name with leading underscore(s) stripped. This mangling is done without regard to the syntactic position of the identifier, so it can be used to define class-private instance and class variables, methods, variables stored in globals, and even variables stored in instances. private to this class on instances of other classes.

And a warning from the same page:

Name mangling is intended to give classes an easy way to define ‚Äúprivate‚Äù instance variables and methods, without having to worry about instance variables defined by derived classes, or mucking with instance variables by code outside the class. Note that the mangling rules are designed mostly to avoid accidents; it still is possible for a determined soul to access or modify a variable that is considered private.

## Example

>>> class MyClass():
...     def __init__(self):
...             self.__superprivate = "Hello"
...             self._semiprivate = ", world!"
...
>>> mc = MyClass()
>>> print mc.__superprivate
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
AttributeError: myClass instance has no attribute "__superprivate"
>>> print mc._semiprivate
, world!
>>> print mc.__dict__
{"_MyClass__superprivate": "Hello", "_semiprivate": ", world!"}

__foo__: this is just a convention, a way for the Python system to use names that won"t conflict with user names.

_foo: this is just a convention, a way for the programmer to indicate that the variable is private (whatever that means in Python).

__foo: this has real meaning: the interpreter replaces this name with _classname__foo as a way to ensure that the name will not overlap with a similar name in another class.

No other form of underscores have meaning in the Python world.

There"s no difference between class, variable, global, etc in these conventions.

## Is there a list of Pytz Timezones?

I would like to know what are all the possible values for the timezone argument in the Python library pytz. How to do it?

You can list all the available timezones with pytz.all_timezones:

In [40]: import pytz
In [41]: pytz.all_timezones
Out[42]:
["Africa/Abidjan",
"Africa/Accra",
...]

There is also pytz.common_timezones:

In [45]: len(pytz.common_timezones)
Out[45]: 403

In [46]: len(pytz.all_timezones)
Out[46]: 563

Don"t create your own list - pytz has a built-in set:

import pytz
set(pytz.all_timezones_set)
>>> {"Europe/Vienna", "America/New_York", "America/Argentina/Salta",..}

You can then apply a timezone:

import datetime
tz = pytz.timezone("Pacific/Johnston")
ct = datetime.datetime.now(tz=tz)
>>> ct.isoformat()
2017-01-13T11:29:22.601991-05:00

Or if you already have a datetime object that is TZ aware (not naive):

# This timestamp is in UTC
my_ct = datetime.datetime.now(tz=pytz.UTC)

# Now convert it to another timezone
new_ct = my_ct.astimezone(tz)
>>> new_ct.isoformat()
2017-01-13T11:29:22.601991-05:00

## Python strptime() and timezones?

I have a CSV dumpfile from a Blackberry IPD backup, created using IPDDump. The date/time strings in here look something like this (where EST is an Australian time-zone):

Tue Jun 22 07:46:22 EST 2010

I need to be able to parse this date in Python. At first, I tried to use the strptime() function from datettime.

>>> datetime.datetime.strptime("Tue Jun 22 12:10:20 2010 EST", "%a %b %d %H:%M:%S %Y %Z")

However, for some reason, the datetime object that comes back doesn"t seem to have any tzinfo associated with it.

I did read on this page that apparently datetime.strptime silently discards tzinfo, however, I checked the documentation, and I can"t find anything to that effect documented here.

I have been able to get the date parsed using a third-party Python library, dateutil, however I"m still curious as to how I was using the in-built strptime() incorrectly? Is there any way to get strptime() to play nicely with timezones?

I recommend using python-dateutil. Its parser has been able to parse every date format I"ve thrown at it so far.

>>> from dateutil import parser
>>> parser.parse("Tue Jun 22 07:46:22 EST 2010")
datetime.datetime(2010, 6, 22, 7, 46, 22, tzinfo=tzlocal())
>>> parser.parse("Fri, 11 Nov 2011 03:18:09 -0400")
datetime.datetime(2011, 11, 11, 3, 18, 9, tzinfo=tzoffset(None, -14400))
>>> parser.parse("Sun")
datetime.datetime(2011, 12, 18, 0, 0)
>>> parser.parse("10-11-08")
datetime.datetime(2008, 10, 11, 0, 0)

and so on. No dealing with strptime() format nonsense... just throw a date at it and it Does The Right Thing.

Update: Oops. I missed in your original question that you mentioned that you used dateutil, sorry about that. But I hope this answer is still useful to other people who stumble across this question when they have date parsing questions and see the utility of that module.

## Fitting empirical distribution to theoretical ones with Scipy (Python)?

INTRODUCTION: I have a list of more than 30,000 integer values ranging from 0 to 47, inclusive, e.g.[0,0,0,0,..,1,1,1,1,...,2,2,2,2,...,47,47,47,...] sampled from some continuous distribution. The values in the list are not necessarily in order, but order doesn"t matter for this problem.

PROBLEM: Based on my distribution I would like to calculate p-value (the probability of seeing greater values) for any given value. For example, as you can see p-value for 0 would be approaching 1 and p-value for higher numbers would be tending to 0.

I don"t know if I am right, but to determine probabilities I think I need to fit my data to a theoretical distribution that is the most suitable to describe my data. I assume that some kind of goodness of fit test is needed to determine the best model.

Is there a way to implement such an analysis in Python (Scipy or Numpy)? Could you present any examples?

Thank you!

# Distribution Fitting with Sum of Square Error (SSE)

This is an update and modification to Saullo"s answer, that uses the full list of the current scipy.stats distributions and returns the distribution with the least SSE between the distribution"s histogram and the data"s histogram.

## Example Fitting

Using the El Ni√±o dataset from statsmodels, the distributions are fit and error is determined. The distribution with the least error is returned.

### Example Code

%matplotlib inline

import warnings
import numpy as np
import pandas as pd
import scipy.stats as st
import statsmodels.api as sm
from scipy.stats._continuous_distns import _distn_names
import matplotlib
import matplotlib.pyplot as plt

matplotlib.rcParams["figure.figsize"] = (16.0, 12.0)
matplotlib.style.use("ggplot")

# Create models from data
def best_fit_distribution(data, bins=200, ax=None):
"""Model data by finding best fit distribution to data"""
# Get histogram of original data
y, x = np.histogram(data, bins=bins, density=True)
x = (x + np.roll(x, -1))[:-1] / 2.0

# Best holders
best_distributions = []

# Estimate distribution parameters from data
for ii, distribution in enumerate([d for d in _distn_names if not d in ["levy_stable", "studentized_range"]]):

print("{:>3} / {:<3}: {}".format( ii+1, len(_distn_names), distribution ))

distribution = getattr(st, distribution)

# Try to fit the distribution
try:
# Ignore warnings from data that can"t be fit
with warnings.catch_warnings():
warnings.filterwarnings("ignore")

# fit dist to data
params = distribution.fit(data)

# Separate parts of parameters
arg = params[:-2]
loc = params[-2]
scale = params[-1]

# Calculate fitted PDF and error with fit in distribution
pdf = distribution.pdf(x, loc=loc, scale=scale, *arg)
sse = np.sum(np.power(y - pdf, 2.0))

# if axis pass in add to plot
try:
if ax:
pd.Series(pdf, x).plot(ax=ax)
end
except Exception:
pass

# identify if this distribution is better
best_distributions.append((distribution, params, sse))

except Exception:
pass

return sorted(best_distributions, key=lambda x:x[2])

def make_pdf(dist, params, size=10000):
"""Generate distributions"s Probability Distribution Function """

# Separate parts of parameters
arg = params[:-2]
loc = params[-2]
scale = params[-1]

# Get sane start and end points of distribution
start = dist.ppf(0.01, *arg, loc=loc, scale=scale) if arg else dist.ppf(0.01, loc=loc, scale=scale)
end = dist.ppf(0.99, *arg, loc=loc, scale=scale) if arg else dist.ppf(0.99, loc=loc, scale=scale)

# Build PDF and turn into pandas Series
x = np.linspace(start, end, size)
y = dist.pdf(x, loc=loc, scale=scale, *arg)
pdf = pd.Series(y, x)

return pdf

# Load data from statsmodels datasets

# Plot for comparison
plt.figure(figsize=(12,8))
ax = data.plot(kind="hist", bins=50, density=True, alpha=0.5, color=list(matplotlib.rcParams["axes.prop_cycle"])[1]["color"])

# Save plot limits
dataYLim = ax.get_ylim()

# Find best fit distribution
best_distibutions = best_fit_distribution(data, 200, ax)
best_dist = best_distibutions[0]

# Update plots
ax.set_ylim(dataYLim)
ax.set_title(u"El Ni√±o sea temp.
All Fitted Distributions")
ax.set_xlabel(u"Temp (¬∞C)")
ax.set_ylabel("Frequency")

# Make PDF with best params
pdf = make_pdf(best_dist[0], best_dist[1])

# Display
plt.figure(figsize=(12,8))
ax = pdf.plot(lw=2, label="PDF", legend=True)
data.plot(kind="hist", bins=50, density=True, alpha=0.5, label="Data", legend=True, ax=ax)

param_names = (best_dist[0].shapes + ", loc, scale").split(", ") if best_dist[0].shapes else ["loc", "scale"]
param_str = ", ".join(["{}={:0.2f}".format(k,v) for k,v in zip(param_names, best_dist[1])])
dist_str = "{}({})".format(best_dist[0].name, param_str)

ax.set_title(u"El Ni√±o sea temp. with best fit distribution
" + dist_str)
ax.set_xlabel(u"Temp. (¬∞C)")
ax.set_ylabel("Frequency")

There are more than 90 implemented distribution functions in SciPy v1.6.0. You can test how some of them fit to your data using their fit() method. Check the code below for more details:

import matplotlib.pyplot as plt
import numpy as np
import scipy
import scipy.stats
size = 30000
x = np.arange(size)
y = scipy.int_(np.round_(scipy.stats.vonmises.rvs(5,size=size)*47))
h = plt.hist(y, bins=range(48))

dist_names = ["gamma", "beta", "rayleigh", "norm", "pareto"]

for dist_name in dist_names:
dist = getattr(scipy.stats, dist_name)
params = dist.fit(y)
arg = params[:-2]
loc = params[-2]
scale = params[-1]
if arg:
pdf_fitted = dist.pdf(x, *arg, loc=loc, scale=scale) * size
else:
pdf_fitted = dist.pdf(x, loc=loc, scale=scale) * size
plt.plot(pdf_fitted, label=dist_name)
plt.xlim(0,47)
plt.legend(loc="upper right")
plt.show()

References:

- Fitting distributions, goodness of fit, p-value. Is it possible to do this with Scipy (Python)?

- Distribution fitting with Scipy

And here a list with the names of all distribution functions available in Scipy 0.12.0 (VI):

dist_names = [ "alpha", "anglit", "arcsine", "beta", "betaprime", "bradford", "burr", "cauchy", "chi", "chi2", "cosine", "dgamma", "dweibull", "erlang", "expon", "exponweib", "exponpow", "f", "fatiguelife", "fisk", "foldcauchy", "foldnorm", "frechet_r", "frechet_l", "genlogistic", "genpareto", "genexpon", "genextreme", "gausshyper", "gamma", "gengamma", "genhalflogistic", "gilbrat", "gompertz", "gumbel_r", "gumbel_l", "halfcauchy", "halflogistic", "halfnorm", "hypsecant", "invgamma", "invgauss", "invweibull", "johnsonsb", "johnsonsu", "ksone", "kstwobign", "laplace", "logistic", "loggamma", "loglaplace", "lognorm", "lomax", "maxwell", "mielke", "nakagami", "ncx2", "ncf", "nct", "norm", "pareto", "pearson3", "powerlaw", "powerlognorm", "powernorm", "rdist", "reciprocal", "rayleigh", "rice", "recipinvgauss", "semicircular", "t", "triang", "truncexpon", "truncnorm", "tukeylambda", "uniform", "vonmises", "wald", "weibull_min", "weibull_max", "wrapcauchy"]

## How do I merge two dictionaries in a single expression (taking union of dictionaries)?

### Question by Carl Meyer

I have two Python dictionaries, and I want to write a single expression that returns these two dictionaries, merged (i.e. taking the union). The update() method would be what I need, if it returned its result instead of modifying a dictionary in-place.

>>> x = {"a": 1, "b": 2}
>>> y = {"b": 10, "c": 11}
>>> z = x.update(y)
>>> print(z)
None
>>> x
{"a": 1, "b": 10, "c": 11}

How can I get that final merged dictionary in z, not x?

(To be extra-clear, the last-one-wins conflict-handling of dict.update() is what I"m looking for as well.)

## How can I merge two Python dictionaries in a single expression?

For dictionaries x and y, z becomes a shallowly-merged dictionary with values from y replacing those from x.

• In Python 3.9.0 or greater (released 17 October 2020): PEP-584, discussed here, was implemented and provides the simplest method:

z = x | y          # NOTE: 3.9+ ONLY

• In Python 3.5 or greater:

z = {**x, **y}

• In Python 2, (or 3.4 or lower) write a function:

def merge_two_dicts(x, y):
z.update(y)    # modifies z with keys and values of y
return z

and now:

z = merge_two_dicts(x, y)

### Explanation

Say you have two dictionaries and you want to merge them into a new dictionary without altering the original dictionaries:

x = {"a": 1, "b": 2}
y = {"b": 3, "c": 4}

The desired result is to get a new dictionary (z) with the values merged, and the second dictionary"s values overwriting those from the first.

>>> z
{"a": 1, "b": 3, "c": 4}

A new syntax for this, proposed in PEP 448 and available as of Python 3.5, is

z = {**x, **y}

And it is indeed a single expression.

Note that we can merge in with literal notation as well:

z = {**x, "foo": 1, "bar": 2, **y}

and now:

>>> z
{"a": 1, "b": 3, "foo": 1, "bar": 2, "c": 4}

It is now showing as implemented in the release schedule for 3.5, PEP 478, and it has now made its way into the What"s New in Python 3.5 document.

However, since many organizations are still on Python 2, you may wish to do this in a backward-compatible way. The classically Pythonic way, available in Python 2 and Python 3.0-3.4, is to do this as a two-step process:

z = x.copy()
z.update(y) # which returns None since it mutates z

In both approaches, y will come second and its values will replace x"s values, thus b will point to 3 in our final result.

## Not yet on Python 3.5, but want a single expression

If you are not yet on Python 3.5 or need to write backward-compatible code, and you want this in a single expression, the most performant while the correct approach is to put it in a function:

def merge_two_dicts(x, y):
"""Given two dictionaries, merge them into a new dict as a shallow copy."""
z = x.copy()
z.update(y)
return z

and then you have a single expression:

z = merge_two_dicts(x, y)

You can also make a function to merge an arbitrary number of dictionaries, from zero to a very large number:

def merge_dicts(*dict_args):
"""
Given any number of dictionaries, shallow copy and merge into a new dict,
precedence goes to key-value pairs in latter dictionaries.
"""
result = {}
for dictionary in dict_args:
result.update(dictionary)
return result

This function will work in Python 2 and 3 for all dictionaries. e.g. given dictionaries a to g:

z = merge_dicts(a, b, c, d, e, f, g)

and key-value pairs in g will take precedence over dictionaries a to f, and so on.

Don"t use what you see in the formerly accepted answer:

z = dict(x.items() + y.items())

In Python 2, you create two lists in memory for each dict, create a third list in memory with length equal to the length of the first two put together, and then discard all three lists to create the dict. In Python 3, this will fail because you"re adding two dict_items objects together, not two lists -

>>> c = dict(a.items() + b.items())
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: unsupported operand type(s) for +: "dict_items" and "dict_items"

and you would have to explicitly create them as lists, e.g. z = dict(list(x.items()) + list(y.items())). This is a waste of resources and computation power.

Similarly, taking the union of items() in Python 3 (viewitems() in Python 2.7) will also fail when values are unhashable objects (like lists, for example). Even if your values are hashable, since sets are semantically unordered, the behavior is undefined in regards to precedence. So don"t do this:

>>> c = dict(a.items() | b.items())

This example demonstrates what happens when values are unhashable:

>>> x = {"a": []}
>>> y = {"b": []}
>>> dict(x.items() | y.items())
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: unhashable type: "list"

Here"s an example where y should have precedence, but instead the value from x is retained due to the arbitrary order of sets:

>>> x = {"a": 2}
>>> y = {"a": 1}
>>> dict(x.items() | y.items())
{"a": 2}

Another hack you should not use:

z = dict(x, **y)

This uses the dict constructor and is very fast and memory-efficient (even slightly more so than our two-step process) but unless you know precisely what is happening here (that is, the second dict is being passed as keyword arguments to the dict constructor), it"s difficult to read, it"s not the intended usage, and so it is not Pythonic.

Here"s an example of the usage being remediated in django.

Dictionaries are intended to take hashable keys (e.g. frozensets or tuples), but this method fails in Python 3 when keys are not strings.

>>> c = dict(a, **b)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: keyword arguments must be strings

From the mailing list, Guido van Rossum, the creator of the language, wrote:

I am fine with declaring dict({}, **{1:3}) illegal, since after all it is abuse of the ** mechanism.

and

Apparently dict(x, **y) is going around as "cool hack" for "call x.update(y) and return x". Personally, I find it more despicable than cool.

It is my understanding (as well as the understanding of the creator of the language) that the intended usage for dict(**y) is for creating dictionaries for readability purposes, e.g.:

dict(a=1, b=10, c=11)

{"a": 1, "b": 10, "c": 11}

Despite what Guido says, dict(x, **y) is in line with the dict specification, which btw. works for both Python 2 and 3. The fact that this only works for string keys is a direct consequence of how keyword parameters work and not a short-coming of dict. Nor is using the ** operator in this place an abuse of the mechanism, in fact, ** was designed precisely to pass dictionaries as keywords.

Again, it doesn"t work for 3 when keys are not strings. The implicit calling contract is that namespaces take ordinary dictionaries, while users must only pass keyword arguments that are strings. All other callables enforced it. dict broke this consistency in Python 2:

>>> foo(**{("a", "b"): None})
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: foo() keywords must be strings
>>> dict(**{("a", "b"): None})
{("a", "b"): None}

This inconsistency was bad given other implementations of Python (PyPy, Jython, IronPython). Thus it was fixed in Python 3, as this usage could be a breaking change.

I submit to you that it is malicious incompetence to intentionally write code that only works in one version of a language or that only works given certain arbitrary constraints.

dict(x.items() + y.items()) is still the most readable solution for Python 2. Readability counts.

My response: merge_two_dicts(x, y) actually seems much clearer to me, if we"re actually concerned about readability. And it is not forward compatible, as Python 2 is increasingly deprecated.

{**x, **y} does not seem to handle nested dictionaries. the contents of nested keys are simply overwritten, not merged [...] I ended up being burnt by these answers that do not merge recursively and I was surprised no one mentioned it. In my interpretation of the word "merging" these answers describe "updating one dict with another", and not merging.

Yes. I must refer you back to the question, which is asking for a shallow merge of two dictionaries, with the first"s values being overwritten by the second"s - in a single expression.

Assuming two dictionaries of dictionaries, one might recursively merge them in a single function, but you should be careful not to modify the dictionaries from either source, and the surest way to avoid that is to make a copy when assigning values. As keys must be hashable and are usually therefore immutable, it is pointless to copy them:

from copy import deepcopy

def dict_of_dicts_merge(x, y):
z = {}
overlapping_keys = x.keys() & y.keys()
for key in overlapping_keys:
z[key] = dict_of_dicts_merge(x[key], y[key])
for key in x.keys() - overlapping_keys:
z[key] = deepcopy(x[key])
for key in y.keys() - overlapping_keys:
z[key] = deepcopy(y[key])
return z

Usage:

>>> x = {"a":{1:{}}, "b": {2:{}}}
>>> y = {"b":{10:{}}, "c": {11:{}}}
>>> dict_of_dicts_merge(x, y)
{"b": {2: {}, 10: {}}, "a": {1: {}}, "c": {11: {}}}

Coming up with contingencies for other value types is far beyond the scope of this question, so I will point you at my answer to the canonical question on a "Dictionaries of dictionaries merge".

## Less Performant But Correct Ad-hocs

These approaches are less performant, but they will provide correct behavior. They will be much less performant than copy and update or the new unpacking because they iterate through each key-value pair at a higher level of abstraction, but they do respect the order of precedence (latter dictionaries have precedence)

You can also chain the dictionaries manually inside a dict comprehension:

{k: v for d in dicts for k, v in d.items()} # iteritems in Python 2.7

or in Python 2.6 (and perhaps as early as 2.4 when generator expressions were introduced):

dict((k, v) for d in dicts for k, v in d.items()) # iteritems in Python 2

itertools.chain will chain the iterators over the key-value pairs in the correct order:

from itertools import chain
z = dict(chain(x.items(), y.items())) # iteritems in Python 2

## Performance Analysis

I"m only going to do the performance analysis of the usages known to behave correctly. (Self-contained so you can copy and paste yourself.)

from timeit import repeat
from itertools import chain

x = dict.fromkeys("abcdefg")
y = dict.fromkeys("efghijk")

def merge_two_dicts(x, y):
z = x.copy()
z.update(y)
return z

min(repeat(lambda: {**x, **y}))
min(repeat(lambda: merge_two_dicts(x, y)))
min(repeat(lambda: {k: v for d in (x, y) for k, v in d.items()}))
min(repeat(lambda: dict(chain(x.items(), y.items()))))
min(repeat(lambda: dict(item for d in (x, y) for item in d.items())))

In Python 3.8.1, NixOS:

>>> min(repeat(lambda: {**x, **y}))
1.0804965235292912
>>> min(repeat(lambda: merge_two_dicts(x, y)))
1.636518670246005
>>> min(repeat(lambda: {k: v for d in (x, y) for k, v in d.items()}))
3.1779992282390594
>>> min(repeat(lambda: dict(chain(x.items(), y.items()))))
2.740647904574871
>>> min(repeat(lambda: dict(item for d in (x, y) for item in d.items())))
4.266070580109954
\$ uname -a
Linux nixos 4.19.113 #1-NixOS SMP Wed Mar 25 07:06:15 UTC 2020 x86_64 GNU/Linux

## Resources on Dictionaries

In your case, what you can do is:

z = dict(list(x.items()) + list(y.items()))

This will, as you want it, put the final dict in z, and make the value for key b be properly overridden by the second (y) dict"s value:

>>> x = {"a":1, "b": 2}
>>> y = {"b":10, "c": 11}
>>> z = dict(list(x.items()) + list(y.items()))
>>> z
{"a": 1, "c": 11, "b": 10}

If you use Python 2, you can even remove the list() calls. To create z:

>>> z = dict(x.items() + y.items())
>>> z
{"a": 1, "c": 11, "b": 10}

If you use Python version 3.9.0a4 or greater, then you can directly use:

x = {"a":1, "b": 2}
y = {"b":10, "c": 11}
z = x | y
print(z)
{"a": 1, "c": 11, "b": 10}

An alternative:

z = x.copy()
z.update(y)

Another, more concise, option:

z = dict(x, **y)

Note: this has become a popular answer, but it is important to point out that if y has any non-string keys, the fact that this works at all is an abuse of a CPython implementation detail, and it does not work in Python 3, or in PyPy, IronPython, or Jython. Also, Guido is not a fan. So I can"t recommend this technique for forward-compatible or cross-implementation portable code, which really means it should be avoided entirely.

This probably won"t be a popular answer, but you almost certainly do not want to do this. If you want a copy that"s a merge, then use copy (or deepcopy, depending on what you want) and then update. The two lines of code are much more readable - more Pythonic - than the single line creation with .items() + .items(). Explicit is better than implicit.

In addition, when you use .items() (pre Python 3.0), you"re creating a new list that contains the items from the dict. If your dictionaries are large, then that is quite a lot of overhead (two large lists that will be thrown away as soon as the merged dict is created). update() can work more efficiently, because it can run through the second dict item-by-item.

In terms of time:

>>> timeit.Timer("dict(x, **y)", "x = dict(zip(range(1000), range(1000)))
y=dict(zip(range(1000,2000), range(1000,2000)))").timeit(100000)
15.52571702003479
>>> timeit.Timer("temp = x.copy()
temp.update(y)", "x = dict(zip(range(1000), range(1000)))
y=dict(zip(range(1000,2000), range(1000,2000)))").timeit(100000)
15.694622993469238
>>> timeit.Timer("dict(x.items() + y.items())", "x = dict(zip(range(1000), range(1000)))
y=dict(zip(range(1000,2000), range(1000,2000)))").timeit(100000)
41.484580039978027

IMO the tiny slowdown between the first two is worth it for the readability. In addition, keyword arguments for dictionary creation was only added in Python 2.3, whereas copy() and update() will work in older versions.

## How do you split a list into evenly sized chunks?

### Question by jespern

I have a list of arbitrary length, and I need to split it up into equal size chunks and operate on it. There are some obvious ways to do this, like keeping a counter and two lists, and when the second list fills up, add it to the first list and empty the second list for the next round of data, but this is potentially extremely expensive.

I was wondering if anyone had a good solution to this for lists of any length, e.g. using generators.

I was looking for something useful in itertools but I couldn"t find anything obviously useful. Might"ve missed it, though.

Here"s a generator that yields the chunks you want:

def chunks(lst, n):
"""Yield successive n-sized chunks from lst."""
for i in range(0, len(lst), n):
yield lst[i:i + n]

import pprint
pprint.pprint(list(chunks(range(10, 75), 10)))
[[10, 11, 12, 13, 14, 15, 16, 17, 18, 19],
[20, 21, 22, 23, 24, 25, 26, 27, 28, 29],
[30, 31, 32, 33, 34, 35, 36, 37, 38, 39],
[40, 41, 42, 43, 44, 45, 46, 47, 48, 49],
[50, 51, 52, 53, 54, 55, 56, 57, 58, 59],
[60, 61, 62, 63, 64, 65, 66, 67, 68, 69],
[70, 71, 72, 73, 74]]

If you"re using Python 2, you should use xrange() instead of range():

def chunks(lst, n):
"""Yield successive n-sized chunks from lst."""
for i in xrange(0, len(lst), n):
yield lst[i:i + n]

Also you can simply use list comprehension instead of writing a function, though it"s a good idea to encapsulate operations like this in named functions so that your code is easier to understand. Python 3:

[lst[i:i + n] for i in range(0, len(lst), n)]

Python 2 version:

[lst[i:i + n] for i in xrange(0, len(lst), n)]

If you want something super simple:

def chunks(l, n):
n = max(1, n)
return (l[i:i+n] for i in range(0, len(l), n))

Use xrange() instead of range() in the case of Python 2.x

Directly from the (old) Python documentation (recipes for itertools):

from itertools import izip, chain, repeat

"grouper(3, "abcdefg", "x") --> ("a","b","c"), ("d","e","f"), ("g","x","x")"

The current version, as suggested by J.F.Sebastian:

#from itertools import izip_longest as zip_longest # for Python 2.x
from itertools import zip_longest # for Python 3.x
#from six.moves import zip_longest # for both (uses the six compat library)

"grouper(3, "abcdefg", "x") --> ("a","b","c"), ("d","e","f"), ("g","x","x")"

I guess Guido"s time machine works‚Äîworked‚Äîwill work‚Äîwill have worked‚Äîwas working again.

These solutions work because [iter(iterable)]*n (or the equivalent in the earlier version) creates one iterator, repeated n times in the list. izip_longest then effectively performs a round-robin of "each" iterator; because this is the same iterator, it is advanced by each such call, resulting in each such zip-roundrobin generating one tuple of n items.

I know this is kind of old but nobody yet mentioned numpy.array_split:

import numpy as np

lst = range(50)
np.array_split(lst, 5)
# [array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9]),
#  array([10, 11, 12, 13, 14, 15, 16, 17, 18, 19]),
#  array([20, 21, 22, 23, 24, 25, 26, 27, 28, 29]),
#  array([30, 31, 32, 33, 34, 35, 36, 37, 38, 39]),
#  array([40, 41, 42, 43, 44, 45, 46, 47, 48, 49])]

I"m surprised nobody has thought of using iter"s two-argument form:

from itertools import islice

def chunk(it, size):
it = iter(it)
return iter(lambda: tuple(islice(it, size)), ())

Demo:

>>> list(chunk(range(14), 3))
[(0, 1, 2), (3, 4, 5), (6, 7, 8), (9, 10, 11), (12, 13)]

This works with any iterable and produces output lazily. It returns tuples rather than iterators, but I think it has a certain elegance nonetheless. It also doesn"t pad; if you want padding, a simple variation on the above will suffice:

from itertools import islice, chain, repeat

return iter(lambda: tuple(islice(it, size)), (padval,) * size)

Demo:

[(0, 1, 2), (3, 4, 5), (6, 7, 8), (9, 10, 11), (12, 13, None)]
[(0, 1, 2), (3, 4, 5), (6, 7, 8), (9, 10, 11), (12, 13, "a")]

Like the izip_longest-based solutions, the above always pads. As far as I know, there"s no one- or two-line itertools recipe for a function that optionally pads. By combining the above two approaches, this one comes pretty close:

it = iter(it)
sentinel = ()
else:
return iter(lambda: tuple(islice(it, size)), sentinel)

Demo:

>>> list(chunk(range(14), 3))
[(0, 1, 2), (3, 4, 5), (6, 7, 8), (9, 10, 11), (12, 13)]
>>> list(chunk(range(14), 3, None))
[(0, 1, 2), (3, 4, 5), (6, 7, 8), (9, 10, 11), (12, 13, None)]
>>> list(chunk(range(14), 3, "a"))
[(0, 1, 2), (3, 4, 5), (6, 7, 8), (9, 10, 11), (12, 13, "a")]

I believe this is the shortest chunker proposed that offers optional padding.

As Tomasz Gandor observed, the two padding chunkers will stop unexpectedly if they encounter a long sequence of pad values. Here"s a final variation that works around that problem in a reasonable way:

it = iter(it)
chunker = iter(lambda: tuple(islice(it, size)), ())
yield from chunker
else:
for ch in chunker:
yield ch if len(ch) == size else ch + (padval,) * (size - len(ch))

Demo:

>>> list(chunk([1, 2, (), (), 5], 2))
[(1, 2), ((), ()), (5,)]
>>> list(chunk([1, 2, None, None, 5], 2, None))
[(1, 2), (None, None), (5, None)]

## How to execute a program or call a system command?

### Question by alan lai

How do you call an external command (as if I"d typed it at the Unix shell or Windows command prompt) from within a Python script?

Use the subprocess module in the standard library:

import subprocess
subprocess.run(["ls", "-l"])

The advantage of subprocess.run over os.system is that it is more flexible (you can get the stdout, stderr, the "real" status code, better error handling, etc...).

Even the documentation for os.system recommends using subprocess instead:

The subprocess module provides more powerful facilities for spawning new processes and retrieving their results; using that module is preferable to using this function. See the Replacing Older Functions with the subprocess Module section in the subprocess documentation for some helpful recipes.

On Python 3.4 and earlier, use subprocess.call instead of .run:

subprocess.call(["ls", "-l"])

Here"s a summary of the ways to call external programs and the advantages and disadvantages of each:

1. os.system("some_command with args") passes the command and arguments to your system"s shell. This is nice because you can actually run multiple commands at once in this manner and set up pipes and input/output redirection. For example:

os.system("some_command < input_file | another_command > output_file")

However, while this is convenient, you have to manually handle the escaping of shell characters such as spaces, et cetera. On the other hand, this also lets you run commands which are simply shell commands and not actually external programs. See the documentation.

2. stream = os.popen("some_command with args") will do the same thing as os.system except that it gives you a file-like object that you can use to access standard input/output for that process. There are 3 other variants of popen that all handle the i/o slightly differently. If you pass everything as a string, then your command is passed to the shell; if you pass them as a list then you don"t need to worry about escaping anything. See the documentation.

3. The Popen class of the subprocess module. This is intended as a replacement for os.popen, but has the downside of being slightly more complicated by virtue of being so comprehensive. For example, you"d say:

print subprocess.Popen("echo Hello World", shell=True, stdout=subprocess.PIPE).stdout.read()

but it is nice to have all of the options there in one unified class instead of 4 different popen functions. See the documentation.

4. The call function from the subprocess module. This is basically just like the Popen class and takes all of the same arguments, but it simply waits until the command completes and gives you the return code. For example:

return_code = subprocess.call("echo Hello World", shell=True)

5. If you"re on Python 3.5 or later, you can use the new subprocess.run function, which is a lot like the above but even more flexible and returns a CompletedProcess object when the command finishes executing.

6. The os module also has all of the fork/exec/spawn functions that you"d have in a C program, but I don"t recommend using them directly.

The subprocess module should probably be what you use.

Finally, please be aware that for all methods where you pass the final command to be executed by the shell as a string and you are responsible for escaping it. There are serious security implications if any part of the string that you pass can not be fully trusted. For example, if a user is entering some/any part of the string. If you are unsure, only use these methods with constants. To give you a hint of the implications consider this code:

print subprocess.Popen("echo %s " % user_input, stdout=PIPE).stdout.read()

and imagine that the user enters something "my mama didnt love me && rm -rf /" which could erase the whole filesystem.

Typical implementation:

import subprocess

p = subprocess.Popen("ls", shell=True, stdout=subprocess.PIPE, stderr=subprocess.STDOUT)
print line,
retval = p.wait()

You are free to do what you want with the stdout data in the pipe. In fact, you can simply omit those parameters (stdout= and stderr=) and it"ll behave like os.system().

Some hints on detaching the child process from the calling one (starting the child process in background).

Suppose you want to start a long task from a CGI script. That is, the child process should live longer than the CGI script execution process.

The classical example from the subprocess module documentation is:

import subprocess
import sys

# Some code here

pid = subprocess.Popen([sys.executable, "longtask.py"]) # Call subprocess

# Some more code here

The idea here is that you do not want to wait in the line "call subprocess" until the longtask.py is finished. But it is not clear what happens after the line "some more code here" from the example.

My target platform was FreeBSD, but the development was on Windows, so I faced the problem on Windows first.

On Windows (Windows XP), the parent process will not finish until the longtask.py has finished its work. It is not what you want in a CGI script. The problem is not specific to Python; in the PHP community the problems are the same.

The solution is to pass DETACHED_PROCESS Process Creation Flag to the underlying CreateProcess function in Windows API. If you happen to have installed pywin32, you can import the flag from the win32process module, otherwise you should define it yourself:

DETACHED_PROCESS = 0x00000008

creationflags=DETACHED_PROCESS).pid

/* UPD 2015.10.27 @eryksun in a comment below notes, that the semantically correct flag is CREATE_NEW_CONSOLE (0x00000010) */

On FreeBSD we have another problem: when the parent process is finished, it finishes the child processes as well. And that is not what you want in a CGI script either. Some experiments showed that the problem seemed to be in sharing sys.stdout. And the working solution was the following:

pid = subprocess.Popen([sys.executable, "longtask.py"], stdout=subprocess.PIPE, stderr=subprocess.PIPE, stdin=subprocess.PIPE)

I have not checked the code on other platforms and do not know the reasons of the behaviour on FreeBSD. If anyone knows, please share your ideas. Googling on starting background processes in Python does not shed any light yet.

import os

Note that this is dangerous, since the command isn"t cleaned. I leave it up to you to google for the relevant documentation on the "os" and "sys" modules. There are a bunch of functions (exec* and spawn*) that will do similar things.

## Display number with leading zeros

### Question by jeff

Given:

a = 1
b = 10
c = 100

How do I display a leading zero for all numbers with less than two digits?

This is the output I"m expecting:

01
10
100

In Python 2 (and Python 3) you can do:

print "%02d" % (1,)

Basically % is like printf or sprintf (see docs).

For Python 3.+, the same behavior can also be achieved with format:

print("{:02d}".format(1))

For Python 3.6+ the same behavior can be achieved with f-strings:

print(f"{1:02d}")

You can use str.zfill:

print(str(1).zfill(2))
print(str(10).zfill(2))
print(str(100).zfill(2))

prints:

01
10
100

In Python 2.6+ and 3.0+, you would use the format() string method:

for i in (1, 10, 100):
print("{num:02d}".format(num=i))

or using the built-in (for a single number):

print(format(i, "02d"))

See the PEP-3101 documentation for the new formatting functions.

In Python >= 3.6, you can do this succinctly with the new f-strings that were introduced by using:

f"{val:02}"

which prints the variable with name val with a fill value of 0 and a width of 2.

For your specific example you can do this nicely in a loop:

a, b, c = 1, 10, 100
for val in [a, b, c]:
print(f"{val:02}")

which prints:

01
10
100

For more information on f-strings, take a look at PEP 498 where they were introduced.

## Best way to format integer as string with leading zeros?

I need to add leading zeros to integer to make a string with defined quantity of digits (\$cnt). What the best way to translate this simple function from PHP to Python:

\$int = intval(\$int);
for(\$i=0; \$i<(\$cnt-strlen(\$int)); \$i++)
\$nulls .= "0";
return \$nulls.\$int;
}

Is there a function that can do this?

You can use the zfill() method to pad a string with zeros:

In [3]: str(1).zfill(2)
Out[3]: "01"

## Shop

Best laptop for Sims 4

\$

Best laptop for Zoom

\$499

Best laptop for Minecraft

\$590

Best laptop for engineering student

\$

Best laptop for development

\$

Best laptop for Cricut Maker

\$

Best laptop for hacking

\$890

Best laptop for Machine Learning

\$950

Latest questions

NUMPYNUMPY

psycopg2: insert multiple rows with one query

NUMPYNUMPY

How to convert Nonetype to int or string?

NUMPYNUMPY

How to specify multiple return types using type-hints

NUMPYNUMPY

Javascript Error: IPython is not defined in JupyterLab

## Wiki

Python OpenCV | cv2.putText () method

numpy.arctan2 () in Python

Python | os.path.realpath () method

Python OpenCV | cv2.circle () method

Python OpenCV cv2.cvtColor () method

Python - Move item to the end of the list

time.perf_counter () function in Python

Check if one list is a subset of another in Python

Python os.path.join () method