# numpy.bincount () in Python

bincount | NumPy | Python Methods and Functions

Around:
numpy.bincount (arr, weights = None, min_len = 0): in the + ve array of integers, it counts the occurrence of each element. Each bin value is an entry for its index. You can also set the size of the basket accordingly.
Parameters:

`  arr:  [array_like, 1D] Input array, having positive numbers  weights:  [ array_like, optional] same shape as that of arr  min_len:  Minimum number of bins we want in the output array `

Return:

` Output array with no. of occurrence of index value of bin in input - arr. Output array, by default is of the length max element of arr + 1. `

Code 1: Working bincount () in NumPy

` `

` # Python program explaining # working method numpy.bincount ()   import numpy as geek   # 1D array with + ve integers array1 = [ 1 , 6 , 1 , 1 , 1 , < code class = "value"> 2 , 2 ] bin = geek.bincount (array1) print ( "Bincount output :" , bin ) print ( "size of bin:" , len ( bin ), " " )    array2 = [ 1 , 5 , 5 , 5 , 4 , 5 , 5 , 2 , 2 , 2 ] bin = geek.bincount (array2) print ( " Bincount output : " , bin ) print ( "size of bin:" , len ( bin ), "" )   # using min_length attribute length = 10 bin1 = geek.bincount (array2, None , length) print ( "Bincount output :" , bin1)   print ( "size of bin:" , len (bin1), "" ) `

Exit :

` Bincount output: [0 4 2 0 0 0 1] size of bin: 7 Bincount output: [0 1 3 0 1 5] size of bin: 6 Bincount output: [0 1 3 0 1 5 0 0 0 0] size of bin: 10 `

Code 2: we can add according to the element using bincount ()

` `

` # Python program explaining # numpy.bincount () working method   import numpy as geek   # 1D array with + ve integers array2 = [ 10  , 11 , 4 , 6 , 2 , 1 , 9 ] array1 = [ 1 , 3 , 1 , 3 , 1 , 2 , 2 ]   # array2: weight bin = geek.bincount (array1, array2) print ( "Summation element-wise:" , bin )   # Index 0: 0 # Index 1: 10 + 4 + 2 = 16 # Index 2: 1 + 9 = 10 # Index 3: 11 + 6 = 17 `

` `

Output:

` Summation element-wise: [0. 16. 10. 17.] `

https://docs.scipy.org/doc/numpy/reference/generated/numpy.bincount.html#numpy.bincount
,

## numpy.bincount () in Python: StackOverflow Questions

We start by answering the first question:

## Question 1

Why do I get `ValueError: Index contains duplicate entries, cannot reshape`

This occurs because pandas is attempting to reindex either a `columns` or `index` object with duplicate entries. There are varying methods to use that can perform a pivot. Some of them are not well suited to when there are duplicates of the keys in which it is being asked to pivot on. For example. Consider `pd.DataFrame.pivot`. I know there are duplicate entries that share the `row` and `col` values:

``````df.duplicated(["row", "col"]).any()

True
``````

So when I `pivot` using

``````df.pivot(index="row", columns="col", values="val0")
``````

I get the error mentioned above. In fact, I get the same error when I try to perform the same task with:

``````df.set_index(["row", "col"])["val0"].unstack()
``````

Here is a list of idioms we can use to pivot

• Good general approach for doing just about any type of pivot
• You specify all columns that will constitute the pivoted row levels and column levels in one group by. You follow that by selecting the remaining columns you want to aggregate and the function(s) you want to perform the aggregation. Finally, you `unstack` the levels that you want to be in the column index.
1. `pd.DataFrame.pivot_table`

• A glorified version of `groupby` with more intuitive API. For many people, this is the preferred approach. And is the intended approach by the developers.
• Specify row level, column levels, values to be aggregated, and function(s) to perform aggregations.
• Convenient and intuitive for some (myself included). Cannot handle duplicate grouped keys.
• Similar to the `groupby` paradigm, we specify all columns that will eventually be either row or column levels and set those to be the index. We then `unstack` the levels we want in the columns. If either the remaining index levels or column levels are not unique, this method will fail.
2. `pd.DataFrame.pivot`

• Very similar to `set_index` in that it shares the duplicate key limitation. The API is very limited as well. It only takes scalar values for `index`, `columns`, `values`.
• Similar to the `pivot_table` method in that we select rows, columns, and values on which to pivot. However, we cannot aggregate and if either rows or columns are not unique, this method will fail.
3. `pd.crosstab`

• This a specialized version of `pivot_table` and in its purest form is the most intuitive way to perform several tasks.
• This is a highly advanced technique that is very obscure but is very fast. It cannot be used in all circumstances, but when it can be used and you are comfortable using it, you will reap the performance rewards.
• I use this for cleverly performing cross tabulation.

# Examples

What I"m going to do for each subsequent answer and question is to answer it using `pd.DataFrame.pivot_table`. Then I"ll provide alternatives to perform the same task.

## Question 3

How do I pivot `df` such that the `col` values are columns, `row` values are the index, mean of `val0` are the values, and missing values are `0`?

• `pd.DataFrame.pivot_table`

• `fill_value` is not set by default. I tend to set it appropriately. In this case I set it to `0`. Notice I skipped question 2 as it"s the same as this answer without the `fill_value`

• `aggfunc="mean"` is the default and I didn"t have to set it. I included it to be explicit.

``````    df.pivot_table(
values="val0", index="row", columns="col",
fill_value=0, aggfunc="mean")

col   col0   col1   col2   col3  col4
row
row0  0.77  0.605  0.000  0.860  0.65
row2  0.13  0.000  0.395  0.500  0.25
row3  0.00  0.310  0.000  0.545  0.00
row4  0.00  0.100  0.395  0.760  0.24
``````
• `pd.DataFrame.groupby`

``````  df.groupby(["row", "col"])["val0"].mean().unstack(fill_value=0)
``````
• `pd.crosstab`

``````  pd.crosstab(
index=df["row"], columns=df["col"],
values=df["val0"], aggfunc="mean").fillna(0)
``````

## Question 4

Can I get something other than `mean`, like maybe `sum`?

• `pd.DataFrame.pivot_table`

``````  df.pivot_table(
values="val0", index="row", columns="col",
fill_value=0, aggfunc="sum")

col   col0  col1  col2  col3  col4
row
row0  0.77  1.21  0.00  0.86  0.65
row2  0.13  0.00  0.79  0.50  0.50
row3  0.00  0.31  0.00  1.09  0.00
row4  0.00  0.10  0.79  1.52  0.24
``````
• `pd.DataFrame.groupby`

``````  df.groupby(["row", "col"])["val0"].sum().unstack(fill_value=0)
``````
• `pd.crosstab`

``````  pd.crosstab(
index=df["row"], columns=df["col"],
values=df["val0"], aggfunc="sum").fillna(0)
``````

## Question 5

Can I do more that one aggregation at a time?

Notice that for `pivot_table` and `crosstab` I needed to pass list of callables. On the other hand, `groupby.agg` is able to take strings for a limited number of special functions. `groupby.agg` would also have taken the same callables we passed to the others, but it is often more efficient to leverage the string function names as there are efficiencies to be gained.

• `pd.DataFrame.pivot_table`

``````  df.pivot_table(
values="val0", index="row", columns="col",
fill_value=0, aggfunc=[np.size, np.mean])

size                      mean
col  col0 col1 col2 col3 col4  col0   col1   col2   col3  col4
row
row0    1    2    0    1    1  0.77  0.605  0.000  0.860  0.65
row2    1    0    2    1    2  0.13  0.000  0.395  0.500  0.25
row3    0    1    0    2    0  0.00  0.310  0.000  0.545  0.00
row4    0    1    2    2    1  0.00  0.100  0.395  0.760  0.24
``````
• `pd.DataFrame.groupby`

``````  df.groupby(["row", "col"])["val0"].agg(["size", "mean"]).unstack(fill_value=0)
``````
• `pd.crosstab`

``````  pd.crosstab(
index=df["row"], columns=df["col"],
values=df["val0"], aggfunc=[np.size, np.mean]).fillna(0, downcast="infer")
``````

## Question 6

Can I aggregate over multiple value columns?

• `pd.DataFrame.pivot_table` we pass `values=["val0", "val1"]` but we could"ve left that off completely

``````  df.pivot_table(
values=["val0", "val1"], index="row", columns="col",
fill_value=0, aggfunc="mean")

val0                             val1
col   col0   col1   col2   col3  col4  col0   col1  col2   col3  col4
row
row0  0.77  0.605  0.000  0.860  0.65  0.01  0.745  0.00  0.010  0.02
row2  0.13  0.000  0.395  0.500  0.25  0.45  0.000  0.34  0.440  0.79
row3  0.00  0.310  0.000  0.545  0.00  0.00  0.230  0.00  0.075  0.00
row4  0.00  0.100  0.395  0.760  0.24  0.00  0.070  0.42  0.300  0.46
``````
• `pd.DataFrame.groupby`

``````  df.groupby(["row", "col"])["val0", "val1"].mean().unstack(fill_value=0)
``````

## Question 7

Can Subdivide by multiple columns?

• `pd.DataFrame.pivot_table`

``````  df.pivot_table(
values="val0", index="row", columns=["item", "col"],
fill_value=0, aggfunc="mean")

item item0             item1                         item2
col   col2  col3  col4  col0  col1  col2  col3  col4  col0   col1  col3  col4
row
row0  0.00  0.00  0.00  0.77  0.00  0.00  0.00  0.00  0.00  0.605  0.86  0.65
row2  0.35  0.00  0.37  0.00  0.00  0.44  0.00  0.00  0.13  0.000  0.50  0.13
row3  0.00  0.00  0.00  0.00  0.31  0.00  0.81  0.00  0.00  0.000  0.28  0.00
row4  0.15  0.64  0.00  0.00  0.10  0.64  0.88  0.24  0.00  0.000  0.00  0.00
``````
• `pd.DataFrame.groupby`

``````  df.groupby(
["row", "item", "col"]
)["val0"].mean().unstack(["item", "col"]).fillna(0).sort_index(1)
``````

## Question 8

Can Subdivide by multiple columns?

• `pd.DataFrame.pivot_table`

``````  df.pivot_table(
values="val0", index=["key", "row"], columns=["item", "col"],
fill_value=0, aggfunc="mean")

item      item0             item1                         item2
col        col2  col3  col4  col0  col1  col2  col3  col4  col0  col1  col3  col4
key  row
key0 row0  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.86  0.00
row2  0.00  0.00  0.37  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.50  0.00
row3  0.00  0.00  0.00  0.00  0.31  0.00  0.81  0.00  0.00  0.00  0.00  0.00
row4  0.15  0.64  0.00  0.00  0.00  0.00  0.00  0.24  0.00  0.00  0.00  0.00
key1 row0  0.00  0.00  0.00  0.77  0.00  0.00  0.00  0.00  0.00  0.81  0.00  0.65
row2  0.35  0.00  0.00  0.00  0.00  0.44  0.00  0.00  0.00  0.00  0.00  0.13
row3  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.28  0.00
row4  0.00  0.00  0.00  0.00  0.10  0.00  0.00  0.00  0.00  0.00  0.00  0.00
key2 row0  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.40  0.00  0.00
row2  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.13  0.00  0.00  0.00
row4  0.00  0.00  0.00  0.00  0.00  0.64  0.88  0.00  0.00  0.00  0.00  0.00
``````
• `pd.DataFrame.groupby`

``````  df.groupby(
["key", "row", "item", "col"]
)["val0"].mean().unstack(["item", "col"]).fillna(0).sort_index(1)
``````
• `pd.DataFrame.set_index` because the set of keys are unique for both rows and columns

``````  df.set_index(
["key", "row", "item", "col"]
)["val0"].unstack(["item", "col"]).fillna(0).sort_index(1)
``````

## Question 9

Can I aggregate the frequency in which the column and rows occur together, aka "cross tabulation"?

• `pd.DataFrame.pivot_table`

``````  df.pivot_table(index="row", columns="col", fill_value=0, aggfunc="size")

col   col0  col1  col2  col3  col4
row
row0     1     2     0     1     1
row2     1     0     2     1     2
row3     0     1     0     2     0
row4     0     1     2     2     1
``````
• `pd.DataFrame.groupby`

``````  df.groupby(["row", "col"])["val0"].size().unstack(fill_value=0)
``````
• `pd.crosstab`

``````  pd.crosstab(df["row"], df["col"])
``````
• ``````  # get integer factorization `i` and unique values `r`
# for column `"row"`
i, r = pd.factorize(df["row"].values)
# get integer factorization `j` and unique values `c`
# for column `"col"`
j, c = pd.factorize(df["col"].values)
# `n` will be the number of rows
# `m` will be the number of columns
n, m = r.size, c.size
# `i * m + j` is a clever way of counting the
# factorization bins assuming a flat array of length
# `n * m`.  Which is why we subsequently reshape as `(n, m)`
b = np.bincount(i * m + j, minlength=n * m).reshape(n, m)
# BTW, whenever I read this, I think "Bean, Rice, and Cheese"
pd.DataFrame(b, r, c)

col3  col2  col0  col1  col4
row3     2     0     0     1     0
row2     1     2     1     0     2
row0     1     0     1     2     1
row4     2     2     0     1     1
``````
• `pd.get_dummies`

``````  pd.get_dummies(df["row"]).T.dot(pd.get_dummies(df["col"]))

col0  col1  col2  col3  col4
row0     1     2     0     1     1
row2     1     0     2     1     2
row3     0     1     0     2     0
row4     0     1     2     2     1
``````

## Question 10

How do I convert a DataFrame from long to wide by pivoting on ONLY two columns?

• `DataFrame.pivot`

The first step is to assign a number to each row - this number will be the row index of that value in the pivoted result. This is done using `GroupBy.cumcount`:

``````  df2.insert(0, "count", df2.groupby("A").cumcount())
df2

count  A   B
0      0  a   0
1      1  a  11
2      2  a   2
3      3  a  11
4      0  b  10
5      1  b  10
6      2  b  14
7      0  c   7
``````

The second step is to use the newly created column as the index to call `DataFrame.pivot`.

``````  df2.pivot(*df2)
# df2.pivot(index="count", columns="A", values="B")

A         a     b    c
count
0       0.0  10.0  7.0
1      11.0  10.0  NaN
2       2.0  14.0  NaN
3      11.0   NaN  NaN
``````
• `DataFrame.pivot_table`

Whereas `DataFrame.pivot` only accepts columns, `DataFrame.pivot_table` also accepts arrays, so the `GroupBy.cumcount` can be passed directly as the `index` without creating an explicit column.

``````  df2.pivot_table(index=df2.groupby("A").cumcount(), columns="A", values="B")

A         a     b    c
0       0.0  10.0  7.0
1      11.0  10.0  NaN
2       2.0  14.0  NaN
3      11.0   NaN  NaN
``````

## Question 11

How do I flatten the multiple index to single index after `pivot`

If `columns` type `object` with string `join`

``````df.columns = df.columns.map("|".join)
``````

else `format`

``````df.columns = df.columns.map("{0[0]}|{0[1]}".format)
``````

I was also interested in this, so I did a little performance comparison (using perfplot, a pet project of mine). Result:

``````y = np.bincount(a)
ii = np.nonzero(y)[0]
out = np.vstack((ii, y[ii])).T
``````

is by far the fastest. (Note the log-scaling.)

Code to generate the plot:

``````import numpy as np
import pandas as pd
import perfplot
from scipy.stats import itemfreq

def bincount(a):
y = np.bincount(a)
ii = np.nonzero(y)[0]
return np.vstack((ii, y[ii])).T

def unique(a):
unique, counts = np.unique(a, return_counts=True)
return np.asarray((unique, counts)).T

def unique_count(a):
unique, inverse = np.unique(a, return_inverse=True)
count = np.zeros(len(unique), dtype=int)
return np.vstack((unique, count)).T

def pandas_value_counts(a):
out = pd.value_counts(pd.Series(a))
out.sort_index(inplace=True)
out = np.stack([out.keys().values, out.values]).T
return out

perfplot.show(
setup=lambda n: np.random.randint(0, 1000, n),
kernels=[bincount, unique, itemfreq, unique_count, pandas_value_counts],
n_range=[2 ** k for k in range(26)],
xlabel="len(a)",
)
``````

If your list contains all non-negative ints, you should take a look at numpy.bincounts:

http://docs.scipy.org/doc/numpy/reference/generated/numpy.bincount.html

and then probably use np.argmax:

``````a = np.array([1,2,3,1,2,1,1,1,3,2,2,1])
counts = np.bincount(a)
print(np.argmax(counts))
``````

For a more complicated list (that perhaps contains negative numbers or non-integer values), you can use `np.histogram` in a similar way. Alternatively, if you just want to work in python without using numpy, `collections.Counter` is a good way of handling this sort of data.

``````from collections import Counter
a = [1,2,3,1,2,1,1,1,3,2,2,1]
b = Counter(a)
print(b.most_common(1))
``````

Take a look at `np.bincount`:

http://docs.scipy.org/doc/numpy/reference/generated/numpy.bincount.html

``````import numpy as np
x = np.array([1,1,1,2,2,2,5,25,1,1])
y = np.bincount(x)
ii = np.nonzero(y)[0]
``````

And then:

``````zip(ii,y[ii])
# [(1, 5), (2, 3), (5, 1), (25, 1)]
``````

or:

``````np.vstack((ii,y[ii])).T
# array([[ 1,  5],
[ 2,  3],
[ 5,  1],
[25,  1]])
``````

or however you want to combine the counts and the unique values.