  # Python | Numpy dstack () method

dstack | NumPy | Python Methods and Functions

With the `numpy.dstack() ` method we can get the index of the merged array by index and store as a stack using ` numpy.dstack () . `

` Syntax: numpy.dstack ((array1, array2)) Return: Return combined array index by index. Example # 1: In this example, we can see that with numpy .dstack () we can get the merged array as a stack index by index. # NumPy import import numpy as np   gfg1 = np.array ([ 1 , 2 , 3 ]) gfg2 = np.array ([ 4 , 5 , 6 ])   # using the numpy.dstack () method print ( np.dstack ((gfg1, gfg2))) `

Output:

[[[1 4]
[2 5]
[3 6]]]

Example # 2:

 ` # NumPy import ` ` import ` ` numpy as np `    ` gfg1 ` ` = ` ` np.array ([[` ` 10 ` `], [` ` 2 ` `], [` ` 13 ` `]] ) ` ` gfg2 ` ` = ` ` np.array ( [[` ` 41 ` `], [` ` 55 ` `], [` ` 6 ` `]]) `   ` # using the numpy.dstack () method ` ` print ` ` (np.dstack ((gfg1, gfg2))) `

Logout:

[[[10 41]]

[[2 55] ]

[[13 6]]]

## Python | Numpy dstack () method: StackOverflow Questions

I know `object` columns `type` makes the data hard to convert with a `pandas` function. When I received the data like this, the first thing that came to mind was to "flatten" or unnest the columns .

I am using `pandas` and `python` functions for this type of question. If you are worried about the speed of the above solutions, check user3483203"s answer, since it"s using `numpy` and most of the time `numpy` is faster . I recommend `Cpython` and `numba` if speed matters.

Method 0 [pandas >= 0.25]
Starting from pandas 0.25, if you only need to explode one column, you can use the `pandas.DataFrame.explode` function:

``````df.explode("B")

A  B
0  1  1
1  1  2
0  2  1
1  2  2
``````

Given a dataframe with an empty `list` or a `NaN` in the column. An empty list will not cause an issue, but a `NaN` will need to be filled with a `list`

``````df = pd.DataFrame({"A": [1, 2, 3, 4],"B": [[1, 2], [1, 2], [], np.nan]})
df.B = df.B.fillna({i: [] for i in df.index})  # replace NaN with []
df.explode("B")

A    B
0  1    1
0  1    2
1  2    1
1  2    2
2  3  NaN
3  4  NaN
``````

Method 1
`apply + pd.Series` (easy to understand but in terms of performance not recommended . )

``````df.set_index("A").B.apply(pd.Series).stack().reset_index(level=0).rename(columns={0:"B"})
Out:
A  B
0  1  1
1  1  2
0  2  1
1  2  2
``````

Method 2
Using `repeat` with `DataFrame` constructor , re-create your dataframe (good at performance, not good at multiple columns )

``````df=pd.DataFrame({"A":df.A.repeat(df.B.str.len()),"B":np.concatenate(df.B.values)})
df
Out:
A  B
0  1  1
0  1  2
1  2  1
1  2  2
``````

Method 2.1
for example besides A we have A.1 .....A.n. If we still use the method(Method 2) above it is hard for us to re-create the columns one by one .

Solution : `join` or `merge` with the `index` after "unnest" the single columns

``````s=pd.DataFrame({"B":np.concatenate(df.B.values)},index=df.index.repeat(df.B.str.len()))
s.join(df.drop("B",1),how="left")
Out:
B  A
0  1  1
0  2  1
1  1  2
1  2  2
``````

If you need the column order exactly the same as before, add `reindex` at the end.

``````s.join(df.drop("B",1),how="left").reindex(columns=df.columns)
``````

Method 3
recreate the `list`

``````pd.DataFrame([[x] + [z] for x, y in df.values for z in y],columns=df.columns)
Out:
A  B
0  1  1
1  1  2
2  2  1
3  2  2
``````

If more than two columns, use

``````s=pd.DataFrame([[x] + [z] for x, y in zip(df.index,df.B) for z in y])
s.merge(df,left_on=0,right_index=True)
Out:
0  1  A       B
0  0  1  1  [1, 2]
1  0  2  1  [1, 2]
2  1  1  2  [1, 2]
3  1  2  2  [1, 2]
``````

Method 4
using `reindex` or `loc`

``````df.reindex(df.index.repeat(df.B.str.len())).assign(B=np.concatenate(df.B.values))
Out:
A  B
0  1  1
0  1  2
1  2  1
1  2  2

#df.loc[df.index.repeat(df.B.str.len())].assign(B=np.concatenate(df.B.values))
``````

Method 5
when the list only contains unique values:

``````df=pd.DataFrame({"A":[1,2],"B":[[1,2],[3,4]]})
from collections import ChainMap
d = dict(ChainMap(*map(dict.fromkeys, df["B"], df["A"])))
pd.DataFrame(list(d.items()),columns=df.columns[::-1])
Out:
B  A
0  1  1
1  2  1
2  3  2
3  4  2
``````

Method 6
using `numpy` for high performance:

``````newvalues=np.dstack((np.repeat(df.A.values,list(map(len,df.B.values))),np.concatenate(df.B.values)))
pd.DataFrame(data=newvalues,columns=df.columns)
A  B
0  1  1
1  1  2
2  2  1
3  2  2
``````

Method 7
using base function `itertools` `cycle` and `chain`: Pure python solution just for fun

``````from itertools import cycle,chain
l=df.values.tolist()
l1=[list(zip([x], cycle(x)) if len([x]) > len(x) else list(zip(cycle([x]), x))) for x in l]
pd.DataFrame(list(chain.from_iterable(l1)),columns=df.columns)
A  B
0  1  1
1  1  2
2  2  1
3  2  2
``````

Generalizing to multiple columns

``````df=pd.DataFrame({"A":[1,2],"B":[[1,2],[3,4]],"C":[[1,2],[3,4]]})
df
Out:
A       B       C
0  1  [1, 2]  [1, 2]
1  2  [3, 4]  [3, 4]
``````

Self-def function:

``````def unnesting(df, explode):
idx = df.index.repeat(df[explode].str.len())
df1 = pd.concat([
pd.DataFrame({x: np.concatenate(df[x].values)}) for x in explode], axis=1)
df1.index = idx

return df1.join(df.drop(explode, 1), how="left")

unnesting(df,["B","C"])
Out:
B  C  A
0  1  1  1
0  2  2  1
1  3  3  2
1  4  4  2
``````

### Column-wise Unnesting

All above method is talking about the vertical unnesting and explode , If you do need expend the list horizontal, Check with `pd.DataFrame` constructor

``````df.join(pd.DataFrame(df.B.tolist(),index=df.index).add_prefix("B_"))
Out:
A       B       C  B_0  B_1
0  1  [1, 2]  [1, 2]    1    2
1  2  [3, 4]  [3, 4]    3    4
``````

Updated function

``````def unnesting(df, explode, axis):
if axis==1:
idx = df.index.repeat(df[explode].str.len())
df1 = pd.concat([
pd.DataFrame({x: np.concatenate(df[x].values)}) for x in explode], axis=1)
df1.index = idx

return df1.join(df.drop(explode, 1), how="left")
else :
df1 = pd.concat([
pd.DataFrame(df[x].tolist(), index=df.index).add_prefix(x) for x in explode], axis=1)
return df1.join(df.drop(explode, 1), how="left")
``````

Test Output

``````unnesting(df, ["B","C"], axis=0)
Out:
B0  B1  C0  C1  A
0   1   2   1   2  1
1   3   4   3   4  2
``````

Update 2021-02-17 with original explode function

``````def unnesting(df, explode, axis):
if axis==1:
df1 = pd.concat([df[x].explode() for x in explode], axis=1)
return df1.join(df.drop(explode, 1), how="left")
else :
df1 = pd.concat([
pd.DataFrame(df[x].tolist(), index=df.index).add_prefix(x) for x in explode], axis=1)
return df1.join(df.drop(explode, 1), how="left")
``````

If you"re just wanting (semi) contiguous regions, there"s already an easy implementation in Python: SciPy"s ndimage.morphology module. This is a fairly common image morphology operation.

Basically, you have 5 steps:

``````def find_paws(data, smooth_radius=5, threshold=0.0001):
thresh = data > threshold
filled = sp.ndimage.morphology.binary_fill_holes(thresh)
coded_paws, num_paws = sp.ndimage.label(filled)
data_slices = sp.ndimage.find_objects(coded_paws)
return object_slices
``````
1. Blur the input data a bit to make sure the paws have a continuous footprint. (It would be more efficient to just use a larger kernel (the `structure` kwarg to the various `scipy.ndimage.morphology` functions) but this isn"t quite working properly for some reason...)

2. Threshold the array so that you have a boolean array of places where the pressure is over some threshold value (i.e. `thresh = data > value`)

3. Fill any internal holes, so that you have cleaner regions (`filled = sp.ndimage.morphology.binary_fill_holes(thresh)`)

4. Find the separate contiguous regions (`coded_paws, num_paws = sp.ndimage.label(filled)`). This returns an array with the regions coded by number (each region is a contiguous area of a unique integer (1 up to the number of paws) with zeros everywhere else)).

5. Isolate the contiguous regions using `data_slices = sp.ndimage.find_objects(coded_paws)`. This returns a list of tuples of `slice` objects, so you could get the region of the data for each paw with `[data[x] for x in data_slices]`. Instead, we"ll draw a rectangle based on these slices, which takes slightly more work.

The two animations below show your "Overlapping Paws" and "Grouped Paws" example data. This method seems to be working perfectly. (And for whatever it"s worth, this runs much more smoothly than the GIF images below on my machine, so the paw detection algorithm is fairly fast...)  Here"s a full example (now with much more detailed explanations). The vast majority of this is reading the input and making an animation. The actual paw detection is only 5 lines of code.

``````import numpy as np
import scipy as sp
import scipy.ndimage

import matplotlib.pyplot as plt
from matplotlib.patches import Rectangle

def animate(input_filename):
"""Detects paws and animates the position and raw data of each frame
in the input file"""
# With matplotlib, it"s much, much faster to just update the properties
# of a display object than it is to create a new one, so we"ll just update
# the data and position of the same objects throughout this animation...

infile = paw_file(input_filename)

# Since we"re making an animation with matplotlib, we need
plt.ion()
fig = plt.figure()
fig.suptitle(input_filename)

# Make an image based on the first frame that we"ll update later
# (The first frame is never actually displayed)
im = ax.imshow(infile.next())

# Make 4 rectangles that we can later move to the position of each paw
rects = [Rectangle((0,0), 1,1, fc="none", ec="red") for i in range(4)]

title = ax.set_title("Time 0.0 ms")

# Process and display each frame
for time, frame in infile:
paw_slices = find_paws(frame)

# Hide any rectangles that might be visible
[rect.set_visible(False) for rect in rects]

# Set the position and size of a rectangle for each paw and display it
for slice, rect in zip(paw_slices, rects):
dy, dx = slice
rect.set_xy((dx.start, dy.start))
rect.set_width(dx.stop - dx.start + 1)
rect.set_height(dy.stop - dy.start + 1)
rect.set_visible(True)

# Update the image data and title of the plot
title.set_text("Time %0.2f ms" % time)
im.set_data(frame)
im.set_clim([frame.min(), frame.max()])
fig.canvas.draw()

"""Detects and isolates contiguous regions in the input array"""
# Blur the input data a bit so the paws have a continous footprint
# Threshold the blurred data (this needs to be a bit > 0 due to the blur)
thresh = data > threshold
# Fill any interior holes in the paws to get cleaner regions...
filled = sp.ndimage.morphology.binary_fill_holes(thresh)
# Label each contiguous paw
coded_paws, num_paws = sp.ndimage.label(filled)
# Isolate the extent of each paw
data_slices = sp.ndimage.find_objects(coded_paws)
return data_slices

def paw_file(filename):
"""Returns a iterator that yields the time and data in each frame
The infile is an ascii file of timesteps formatted similar to this:

Frame 0 (0.00 ms)
0.0 0.0 0.0
0.0 0.0 0.0

Frame 1 (0.53 ms)
0.0 0.0 0.0
0.0 0.0 0.0
...
"""
with open(filename) as infile:
while True:
try:
yield time, data
except StopIteration:
break

"""Reads a frame from the infile."""
data = []
while True:
line = infile.next().strip().split()
if line == []:
break
data.append(line)
return time, np.array(data, dtype=np.float)

if __name__ == "__main__":
animate("Overlapping paws.bin")
animate("Grouped up paws.bin")
animate("Normal measurement.bin")
``````

Update: As far as identifying which paw is in contact with the sensor at what times, the simplest solution is to just do the same analysis, but use all of the data at once. (i.e. stack the input into a 3D array, and work with it, instead of the individual time frames.) Because SciPy"s ndimage functions are meant to work with n-dimensional arrays, we don"t have to modify the original paw-finding function at all.

``````# This uses functions (and imports) in the previous code example!!
def paw_regions(infile):
# Read in and stack all data together into a 3D array
data, time = [], []
for t, frame in paw_file(infile):
time.append(t)
data.append(frame)
data = np.dstack(data)
time = np.asarray(time)

# Find and label the paw impacts

# Sort by time of initial paw impact... This way we can determine which
# paws are which relative to the first paw with a simple modulo 4.
# (Assuming a 4-legged dog, where all 4 paws contacted the sensor)
data_slices.sort(key=lambda dat_slice: dat_slice.start)

# Plot up a simple analysis
fig = plt.figure()
annotate_paw_prints(time, data, data_slices, ax=ax1)
plot_paw_impacts(time, data_slices, ax=ax2)
fig.suptitle(infile)

def plot_paw_impacts(time, data_slices, ax=None):
if ax is None:
ax = plt.gca()

# Group impacts by paw...
for i, dat_slice in enumerate(data_slices):
dx, dy, dt = dat_slice
paw = i%4 + 1
# Draw a bar over the time interval where each paw is in contact
ax.barh(bottom=paw, width=time[dt].ptp(), height=0.2,
left=time[dt].min(), align="center", color="red")
ax.set_yticks(range(1, 5))
ax.set_yticklabels(["Paw 1", "Paw 2", "Paw 3", "Paw 4"])
ax.set_xlabel("Time (ms) Since Beginning of Experiment")
ax.yaxis.grid(True)
ax.set_title("Periods of Paw Contact")

def annotate_paw_prints(time, data, data_slices, ax=None):
if ax is None:
ax = plt.gca()

# Display all paw impacts (sum over time)
ax.imshow(data.sum(axis=2).T)

# Annotate each impact with which paw it is
# (Relative to the first paw to hit the sensor)
x, y = [], []
for i, region in enumerate(data_slices):
dx, dy, dz = region
# Get x,y center of slice...
x0 = 0.5 * (dx.start + dx.stop)
y0 = 0.5 * (dy.start + dy.stop)
x.append(x0); y.append(y0)

# Annotate the paw impacts
ax.annotate("Paw %i" % (i%4 +1), (x0, y0),
color="red", ha="center", va="bottom")

# Plot line connecting paw impacts
ax.plot(x,y, "-wo")
ax.axis("image")
ax.set_title("Order of Steps")
``````   Get comfortable with `zip`. It comes in handy when dealing with column data.

``````df["new_col"] = list(zip(df.lat, df.long))
``````

It"s less complicated and faster than using `apply` or `map`. Something like `np.dstack` is twice as fast as `zip`, but wouldn"t give you tuples.

## A canonical `cartesian_product` (almost)

There are many approaches to this problem with different properties. Some are faster than others, and some are more general-purpose. After a lot of testing and tweaking, I"ve found that the following function, which calculates an n-dimensional `cartesian_product`, is faster than most others for many inputs. For a pair of approaches that are slightly more complex, but are even a bit faster in many cases, see the answer by Paul Panzer.

Given that answer, this is no longer the fastest implementation of the cartesian product in `numpy` that I"m aware of. However, I think its simplicity will continue to make it a useful benchmark for future improvement:

``````def cartesian_product(*arrays):
la = len(arrays)
dtype = numpy.result_type(*arrays)
arr = numpy.empty([len(a) for a in arrays] + [la], dtype=dtype)
for i, a in enumerate(numpy.ix_(*arrays)):
arr[...,i] = a
return arr.reshape(-1, la)
``````

It"s worth mentioning that this function uses `ix_` in an unusual way; whereas the documented use of `ix_` is to generate indices into an array, it just so happens that arrays with the same shape can be used for broadcasted assignment. Many thanks to mgilson, who inspired me to try using `ix_` this way, and to unutbu, who provided some extremely helpful feedback on this answer, including the suggestion to use `numpy.result_type`.

## Notable alternatives

It"s sometimes faster to write contiguous blocks of memory in Fortran order. That"s the basis of this alternative, `cartesian_product_transpose`, which has proven faster on some hardware than `cartesian_product` (see below). However, Paul Panzer"s answer, which uses the same principle, is even faster. Still, I include this here for interested readers:

``````def cartesian_product_transpose(*arrays):
dtype = numpy.result_type(*arrays)

out = numpy.empty(rows * cols, dtype=dtype)
start, end = 0, rows
out[start:end] = a.reshape(-1)
start, end = end, end + rows
return out.reshape(cols, rows).T
``````

After coming to understand Panzer"s approach, I wrote a new version that"s almost as fast as his, and is almost as simple as `cartesian_product`:

``````def cartesian_product_simple_transpose(arrays):
la = len(arrays)
dtype = numpy.result_type(*arrays)
arr = numpy.empty([la] + [len(a) for a in arrays], dtype=dtype)
for i, a in enumerate(numpy.ix_(*arrays)):
arr[i, ...] = a
return arr.reshape(la, -1).T
``````

This appears to have some constant-time overhead that makes it run slower than Panzer"s for small inputs. But for larger inputs, in all the tests I ran, it performs just as well as his fastest implementation (`cartesian_product_transpose_pp`).

In following sections, I include some tests of other alternatives. These are now somewhat out of date, but rather than duplicate effort, I"ve decided to leave them here out of historical interest. For up-to-date tests, see Panzer"s answer, as well as Nico Schl√∂mer"s.

## Tests against alternatives

Here is a battery of tests that show the performance boost that some of these functions provide relative to a number of alternatives. All the tests shown here were performed on a quad-core machine, running Mac OS 10.12.5, Python 3.6.1, and `numpy` 1.12.1. Variations on hardware and software are known to produce different results, so YMMV. Run these tests for yourself to be sure!

Definitions:

``````import numpy
import itertools
from functools import reduce

### Two-dimensional products ###

def repeat_product(x, y):
return numpy.transpose([numpy.tile(x, len(y)),
numpy.repeat(y, len(x))])

def dstack_product(x, y):
return numpy.dstack(numpy.meshgrid(x, y)).reshape(-1, 2)

### Generalized N-dimensional products ###

def cartesian_product(*arrays):
la = len(arrays)
dtype = numpy.result_type(*arrays)
arr = numpy.empty([len(a) for a in arrays] + [la], dtype=dtype)
for i, a in enumerate(numpy.ix_(*arrays)):
arr[...,i] = a
return arr.reshape(-1, la)

def cartesian_product_transpose(*arrays):
dtype = numpy.result_type(*arrays)

out = numpy.empty(rows * cols, dtype=dtype)
start, end = 0, rows
out[start:end] = a.reshape(-1)
start, end = end, end + rows
return out.reshape(cols, rows).T

# from https://stackoverflow.com/a/1235363/577088

def cartesian_product_recursive(*arrays, out=None):
arrays = [numpy.asarray(x) for x in arrays]
dtype = arrays.dtype

n = numpy.prod([x.size for x in arrays])
if out is None:
out = numpy.zeros([n, len(arrays)], dtype=dtype)

m = n // arrays.size
out[:,0] = numpy.repeat(arrays, m)
if arrays[1:]:
cartesian_product_recursive(arrays[1:], out=out[0:m,1:])
for j in range(1, arrays.size):
out[j*m:(j+1)*m,1:] = out[0:m,1:]
return out

def cartesian_product_itertools(*arrays):
return numpy.array(list(itertools.product(*arrays)))

### Test code ###

name_func = [("repeat_product",
repeat_product),
("dstack_product",
dstack_product),
("cartesian_product",
cartesian_product),
("cartesian_product_transpose",
cartesian_product_transpose),
("cartesian_product_recursive",
cartesian_product_recursive),
("cartesian_product_itertools",
cartesian_product_itertools)]

def test(in_arrays, test_funcs):
global func
global arrays
arrays = in_arrays
for name, func in test_funcs:
print("{}:".format(name))
%timeit func(*arrays)

def test_all(*in_arrays):
test(in_arrays, name_func)

# `cartesian_product_recursive` throws an
# unexpected error when used on more than
# two input arrays, so for now I"ve removed
# it from these tests.

def test_cartesian(*in_arrays):
test(in_arrays, name_func[2:4] + name_func[-1:])

x10 = [numpy.arange(10)]
x50 = [numpy.arange(50)]
x100 = [numpy.arange(100)]
x500 = [numpy.arange(500)]
x1000 = [numpy.arange(1000)]
``````

Test results:

``````In : test_all(*(x100 * 2))
repeat_product:
67.5 ¬µs ¬± 633 ns per loop (mean ¬± std. dev. of 7 runs, 10000 loops each)
dstack_product:
67.7 ¬µs ¬± 1.09 ¬µs per loop (mean ¬± std. dev. of 7 runs, 10000 loops each)
cartesian_product:
33.4 ¬µs ¬± 558 ns per loop (mean ¬± std. dev. of 7 runs, 10000 loops each)
cartesian_product_transpose:
67.7 ¬µs ¬± 932 ns per loop (mean ¬± std. dev. of 7 runs, 10000 loops each)
cartesian_product_recursive:
215 ¬µs ¬± 6.01 ¬µs per loop (mean ¬± std. dev. of 7 runs, 1000 loops each)
cartesian_product_itertools:
3.65 ms ¬± 38.7 ¬µs per loop (mean ¬± std. dev. of 7 runs, 100 loops each)

In : test_all(*(x500 * 2))
repeat_product:
1.31 ms ¬± 9.28 ¬µs per loop (mean ¬± std. dev. of 7 runs, 1000 loops each)
dstack_product:
1.27 ms ¬± 7.5 ¬µs per loop (mean ¬± std. dev. of 7 runs, 1000 loops each)
cartesian_product:
375 ¬µs ¬± 4.5 ¬µs per loop (mean ¬± std. dev. of 7 runs, 1000 loops each)
cartesian_product_transpose:
488 ¬µs ¬± 8.88 ¬µs per loop (mean ¬± std. dev. of 7 runs, 1000 loops each)
cartesian_product_recursive:
2.21 ms ¬± 38.4 ¬µs per loop (mean ¬± std. dev. of 7 runs, 100 loops each)
cartesian_product_itertools:
105 ms ¬± 1.17 ms per loop (mean ¬± std. dev. of 7 runs, 10 loops each)

In : test_all(*(x1000 * 2))
repeat_product:
10.2 ms ¬± 132 ¬µs per loop (mean ¬± std. dev. of 7 runs, 100 loops each)
dstack_product:
12 ms ¬± 120 ¬µs per loop (mean ¬± std. dev. of 7 runs, 100 loops each)
cartesian_product:
4.75 ms ¬± 57.1 ¬µs per loop (mean ¬± std. dev. of 7 runs, 100 loops each)
cartesian_product_transpose:
7.76 ms ¬± 52.7 ¬µs per loop (mean ¬± std. dev. of 7 runs, 100 loops each)
cartesian_product_recursive:
13 ms ¬± 209 ¬µs per loop (mean ¬± std. dev. of 7 runs, 100 loops each)
cartesian_product_itertools:
422 ms ¬± 7.77 ms per loop (mean ¬± std. dev. of 7 runs, 1 loop each)
``````

In all cases, `cartesian_product` as defined at the beginning of this answer is fastest.

For those functions that accept an arbitrary number of input arrays, it"s worth checking performance when `len(arrays) > 2` as well. (Until I can determine why `cartesian_product_recursive` throws an error in this case, I"ve removed it from these tests.)

``````In : test_cartesian(*(x100 * 3))
cartesian_product:
8.8 ms ¬± 138 ¬µs per loop (mean ¬± std. dev. of 7 runs, 100 loops each)
cartesian_product_transpose:
7.87 ms ¬± 91.5 ¬µs per loop (mean ¬± std. dev. of 7 runs, 100 loops each)
cartesian_product_itertools:
518 ms ¬± 5.5 ms per loop (mean ¬± std. dev. of 7 runs, 1 loop each)

In : test_cartesian(*(x50 * 4))
cartesian_product:
169 ms ¬± 5.1 ms per loop (mean ¬± std. dev. of 7 runs, 10 loops each)
cartesian_product_transpose:
184 ms ¬± 4.32 ms per loop (mean ¬± std. dev. of 7 runs, 10 loops each)
cartesian_product_itertools:
3.69 s ¬± 73.5 ms per loop (mean ¬± std. dev. of 7 runs, 1 loop each)

In : test_cartesian(*(x10 * 6))
cartesian_product:
26.5 ms ¬± 449 ¬µs per loop (mean ¬± std. dev. of 7 runs, 10 loops each)
cartesian_product_transpose:
16 ms ¬± 133 ¬µs per loop (mean ¬± std. dev. of 7 runs, 100 loops each)
cartesian_product_itertools:
728 ms ¬± 16 ms per loop (mean ¬± std. dev. of 7 runs, 1 loop each)

In : test_cartesian(*(x10 * 7))
cartesian_product:
650 ms ¬± 8.14 ms per loop (mean ¬± std. dev. of 7 runs, 1 loop each)
cartesian_product_transpose:
518 ms ¬± 7.09 ms per loop (mean ¬± std. dev. of 7 runs, 1 loop each)
cartesian_product_itertools:
8.13 s ¬± 122 ms per loop (mean ¬± std. dev. of 7 runs, 1 loop each)
``````

As these tests show, `cartesian_product` remains competitive until the number of input arrays rises above (roughly) four. After that, `cartesian_product_transpose` does have a slight edge.

It"s worth reiterating that users with other hardware and operating systems may see different results. For example, unutbu reports seeing the following results for these tests using Ubuntu 14.04, Python 3.4.3, and `numpy` 1.14.0.dev0+b7050a9:

``````>>> %timeit cartesian_product_transpose(x500, y500)
1000 loops, best of 3: 682 ¬µs per loop
>>> %timeit cartesian_product(x500, y500)
1000 loops, best of 3: 1.55 ms per loop
``````

Below, I go into a few details about earlier tests I"ve run along these lines. The relative performance of these approaches has changed over time, for different hardware and different versions of Python and `numpy`. While it"s not immediately useful for people using up-to-date versions of `numpy`, it illustrates how things have changed since the first version of this answer.

## A simple alternative: `meshgrid` + `dstack`

The currently accepted answer uses `tile` and `repeat` to broadcast two arrays together. But the `meshgrid` function does practically the same thing. Here"s the output of `tile` and `repeat` before being passed to transpose:

``````In : import numpy
In : x = numpy.array([1,2,3])
...: y = numpy.array([4,5])
...:

In : [numpy.tile(x, len(y)), numpy.repeat(y, len(x))]
Out: [array([1, 2, 3, 1, 2, 3]), array([4, 4, 4, 5, 5, 5])]
``````

And here"s the output of `meshgrid`:

``````In : numpy.meshgrid(x, y)
Out:
[array([[1, 2, 3],
[1, 2, 3]]), array([[4, 4, 4],
[5, 5, 5]])]
``````

As you can see, it"s almost identical. We need only reshape the result to get exactly the same result.

``````In : xt, xr = numpy.meshgrid(x, y)
...: [xt.ravel(), xr.ravel()]
Out: [array([1, 2, 3, 1, 2, 3]), array([4, 4, 4, 5, 5, 5])]
``````

Rather than reshaping at this point, though, we could pass the output of `meshgrid` to `dstack` and reshape afterwards, which saves some work:

``````In : numpy.dstack(numpy.meshgrid(x, y)).reshape(-1, 2)
Out:
array([[1, 4],
[2, 4],
[3, 4],
[1, 5],
[2, 5],
[3, 5]])
``````

Contrary to the claim in this comment, I"ve seen no evidence that different inputs will produce differently shaped outputs, and as the above demonstrates, they do very similar things, so it would be quite strange if they did. Please let me know if you find a counterexample.

## Testing `meshgrid` + `dstack` vs. `repeat` + `transpose`

The relative performance of these two approaches has changed over time. In an earlier version of Python (2.7), the result using `meshgrid` + `dstack` was noticeably faster for small inputs. (Note that these tests are from an old version of this answer.) Definitions:

``````>>> def repeat_product(x, y):
...     return numpy.transpose([numpy.tile(x, len(y)),
numpy.repeat(y, len(x))])
...
>>> def dstack_product(x, y):
...     return numpy.dstack(numpy.meshgrid(x, y)).reshape(-1, 2)
...
``````

For moderately-sized input, I saw a significant speedup. But I retried these tests with more recent versions of Python (3.6.1) and `numpy` (1.12.1), on a newer machine. The two approaches are almost identical now.

Old Test

``````>>> x, y = numpy.arange(500), numpy.arange(500)
>>> %timeit repeat_product(x, y)
10 loops, best of 3: 62 ms per loop
>>> %timeit dstack_product(x, y)
100 loops, best of 3: 12.2 ms per loop
``````

New Test

``````In : x, y = numpy.arange(500), numpy.arange(500)
In : %timeit repeat_product(x, y)
1.32 ms ¬± 24.7 ¬µs per loop (mean ¬± std. dev. of 7 runs, 1000 loops each)
In : %timeit dstack_product(x, y)
1.26 ms ¬± 8.47 ¬µs per loop (mean ¬± std. dev. of 7 runs, 1000 loops each)
``````

As always, YMMV, but this suggests that in recent versions of Python and numpy, these are interchangeable.

## Generalized product functions

In general, we might expect that using built-in functions will be faster for small inputs, while for large inputs, a purpose-built function might be faster. Furthermore for a generalized n-dimensional product, `tile` and `repeat` won"t help, because they don"t have clear higher-dimensional analogues. So it"s worth investigating the behavior of purpose-built functions as well.

Most of the relevant tests appear at the beginning of this answer, but here are a few of the tests performed on earlier versions of Python and `numpy` for comparison.

The `cartesian` function defined in another answer used to perform pretty well for larger inputs. (It"s the same as the function called `cartesian_product_recursive` above.) In order to compare `cartesian` to `dstack_prodct`, we use just two dimensions.

Here again, the old test showed a significant difference, while the new test shows almost none.

Old Test

``````>>> x, y = numpy.arange(1000), numpy.arange(1000)
>>> %timeit cartesian([x, y])
10 loops, best of 3: 25.4 ms per loop
>>> %timeit dstack_product(x, y)
10 loops, best of 3: 66.6 ms per loop
``````

New Test

``````In : x, y = numpy.arange(1000), numpy.arange(1000)
In : %timeit cartesian([x, y])
12.1 ms ¬± 199 ¬µs per loop (mean ¬± std. dev. of 7 runs, 100 loops each)
In : %timeit dstack_product(x, y)
12.7 ms ¬± 334 ¬µs per loop (mean ¬± std. dev. of 7 runs, 100 loops each)
``````

As before, `dstack_product` still beats `cartesian` at smaller scales.

New Test (redundant old test not shown)

``````In : x, y = numpy.arange(100), numpy.arange(100)
In : %timeit cartesian([x, y])
215 ¬µs ¬± 4.75 ¬µs per loop (mean ¬± std. dev. of 7 runs, 1000 loops each)
In : %timeit dstack_product(x, y)
65.7 ¬µs ¬± 1.15 ¬µs per loop (mean ¬± std. dev. of 7 runs, 10000 loops each)
``````

These distinctions are, I think, interesting and worth recording; but they are academic in the end. As the tests at the beginning of this answer showed, all of these versions are almost always slower than `cartesian_product`, defined at the very beginning of this answer -- which is itself a bit slower than the fastest implementations among the answers to this question.