Python | Pandas Series.argsort ()

argsort | NumPy | Python Methods and Functions

Series.argsort() Pandas Series.argsort() returns indexes that will sort the underlying data this series object.

Syntax: Series.argsort (axis = 0, kind = `quicksort`, order = None)

Parameter:
axis: Has no effect but is accepted for compatibility with numpy.
kind: {`mergesort`, ` quicksort `,` heapsort `}, default` quicksort `
order: Has no effect but is accepted for compatibility with numpy.

Returns: argsorted: Series, with -1 indicated where nan values ​​are present

Example # 1: Use Series.argsort () to return an index sequence that will sort the underlying data of the given series object.

# import pandas as pd

import pandas as pd

 
# Create a series

sr = pd.Series ([ 34 , 5 , 13 , 32 , 4 , 15 ])

 
# Create index

index_ = [ `Coca Cola` , `Sprite` , ` Coke` , `Fanta` , `Dew` , `ThumbsUp` ]

  
# set index

sr.index = index_

  
# Print series

print (sr)

Output:

 Coca Cola 34 Sprite 5 Coke 13 Fanta 32 Dew 4 ThumbsUp 15 dtype: int64  

We will now use Series.argsort () to return a sequence of indices that will sort the underlying data for this series object.

 

# return indices that will
# sort the series

result = sr.argsort ()

 
# Print the result

print ( result)

 
# Let`s sort the series by result

print (sr [result])

Output:

 Coca Cola 4 Sprite 1 Coke 2 Fanta 5 Dew 3 ThumbsUp 0 dtype : int64 Dew 4 Sprite 5 Coke 13 ThumbsUp 15 Fanta 32 Coca Cola 34 dtype: int64 

As we can see from the output, Series.argsort () mustache Has successfully returned a series object containing the indexes that will sort the given series object.

Example # 2: Use Series.argsort () to return the index sequence which will sort the underlying data of this series object.

# import pandas as pd

import pandas as pd

  
# Create series

sr = pd.Series ([ 11 , 21 , 8 , 18 , 65 , 18 , 32 , 10 , 5 , 32 , None ])

 
# Create index
# apply annual rate

index_ = pd.date_range ( ` 2010-10-09 08:45` , periods = 11 , freq = `Y` )

 
# install index

sr.index = index_

 
# Print series

print (sr)

Output:

 2010-12-31 08:45:00 11.0 2011-12-31 08:45:00 21.0 2012 -12-31 08:45:00 8.0 2013-12-31 08:45:00 18.0 2014-12-31 08:45:00 65.0 2015-12-31 08:45:00 18.0 2016-12-31 08: 45:00 32.0 2017-12-31 08:45:00 10.0 2018-12-31 08:45:00 5.0 2019-12-31 08:45:00 32.0 2020-12-31 08:45:00 NaN Freq: A-DEC, dtype: float64 

We will now use Series.argsort () to return a sequence of indices that will sort the underlying data of a given series object.

# return index che who will
# sort the series

result = sr.argsort ()

 
# Print result

print (result)

 
# Let`s sort the series by result

print (sr [result])

Output:

 2010-12-31 08:45:00 8 2011-12-31 08:45 : 00 2 2012-12-31 08:45:00 7 2013-12-31 08:45:00 0 2014-12-31 08:45:00 3 2015-12-31 08:45:00 5 2016-12 -31 08:45:00 1 2017-12-31 08:45:00 6 2018-12-31 08:45:00 9 2019-12-31 08:45:00 4 2020-12-31 08:45: 00 -1 Freq: A-DEC, dtype: int64 2018-12-31 08:45:00 5.0 2012-12-31 08:45:00 8.0 2017-12-31 08:45 : 00 10.0 2010-12-31 08:45:00 11.0 2013-12-31 08:45:00 18.0 2015-12-31 08:45:00 18.0 2011-12-31 08:45:00 21.0 2016-12 -31 08:45:00 32.0 2019-12-31 08:45:00 32.0 2014-12-31 08:45:00 65.0 2020-12-31 08:45:00 NaN dtype: float64 

As we can see from the output, Series.argsort () has successfully returned a series object containing the indices that will sort this series object. Note that the function returned -1 as the index position for missing values.





Python | Pandas Series.argsort (): StackOverflow Questions

Is it possible to use argsort in descending order?

Consider the following code:

avgDists = np.array([1, 8, 6, 9, 4])
ids = avgDists.argsort()[:n]

This gives me indices of the n smallest elements. Is it possible to use this same argsort in descending order to get the indices of n highest elements?

Numpy argsort - what is it doing?

Why is numpy giving this result:

x = numpy.array([1.48,1.41,0.0,0.1])
print x.argsort()

>[2 3 1 0]

when I"d expect it to do this:

[3 2 0 1]

Clearly my understanding of the function is lacking.

Answer #1

To sort by the second column of a:

a[a[:, 1].argsort()]

Answer #2

Newer NumPy versions (1.8 and up) have a function called argpartition for this. To get the indices of the four largest elements, do

>>> a = np.array([9, 4, 4, 3, 3, 9, 0, 4, 6, 0])
>>> a
array([9, 4, 4, 3, 3, 9, 0, 4, 6, 0])
>>> ind = np.argpartition(a, -4)[-4:]
>>> ind
array([1, 5, 8, 0])
>>> a[ind]
array([4, 9, 6, 9])

Unlike argsort, this function runs in linear time in the worst case, but the returned indices are not sorted, as can be seen from the result of evaluating a[ind]. If you need that too, sort them afterwards:

>>> ind[np.argsort(a[ind])]
array([1, 8, 5, 0])

To get the top-k elements in sorted order in this way takes O(n + k log k) time.

Answer #3

The simplest I"ve been able to come up with is:

In [1]: import numpy as np

In [2]: arr = np.array([1, 3, 2, 4, 5])

In [3]: arr.argsort()[-3:][::-1]
Out[3]: array([4, 3, 1])

This involves a complete sort of the array. I wonder if numpy provides a built-in way to do a partial sort; so far I haven"t been able to find one.

If this solution turns out to be too slow (especially for small n), it may be worth looking at coding something up in Cython.

Answer #4

If you negate an array, the lowest elements become the highest elements and vice-versa. Therefore, the indices of the n highest elements are:

(-avgDists).argsort()[:n]

Another way to reason about this, as mentioned in the comments, is to observe that the big elements are coming last in the argsort. So, you can read from the tail of the argsort to find the n highest elements:

avgDists.argsort()[::-1][:n]

Both methods are O(n log n) in time complexity, because the argsort call is the dominant term here. But the second approach has a nice advantage: it replaces an O(n) negation of the array with an O(1) slice. If you"re working with small arrays inside loops then you may get some performance gains from avoiding that negation, and if you"re working with huge arrays then you can save on memory usage because the negation creates a copy of the entire array.

Note that these methods do not always give equivalent results: if a stable sort implementation is requested to argsort, e.g. by passing the keyword argument kind="mergesort", then the first strategy will preserve the sorting stability, but the second strategy will break stability (i.e. the positions of equal items will get reversed).

Example timings:

Using a small array of 100 floats and a length 30 tail, the view method was about 15% faster

>>> avgDists = np.random.rand(100)
>>> n = 30
>>> timeit (-avgDists).argsort()[:n]
1.93 µs ± 6.68 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
>>> timeit avgDists.argsort()[::-1][:n]
1.64 µs ± 3.39 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
>>> timeit avgDists.argsort()[-n:][::-1]
1.64 µs ± 3.66 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

For larger arrays, the argsort is dominant and there is no significant timing difference

>>> avgDists = np.random.rand(1000)
>>> n = 300
>>> timeit (-avgDists).argsort()[:n]
21.9 µs ± 51.2 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
>>> timeit avgDists.argsort()[::-1][:n]
21.7 µs ± 33.3 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
>>> timeit avgDists.argsort()[-n:][::-1]
21.9 µs ± 37.1 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)

Please note that the comment from nedim below is incorrect. Whether to truncate before or after reversing makes no difference in efficiency, since both of these operations are only striding a view of the array differently and not actually copying data.

Answer #5

If you are using numpy, you have the argsort() function available:

>>> import numpy
>>> numpy.argsort(myList)
array([0, 1, 2, 4, 3])

http://docs.scipy.org/doc/numpy/reference/generated/numpy.argsort.html

This returns the arguments that would sort the array or list.

Answer #6

Nikie"s answer solved my problem, but his answer was in Mathematica. So I thought I should give its OpenCV adaptation here. But after implementing I could see that OpenCV code is much bigger than nikie"s mathematica code. And also, I couldn"t find interpolation method done by nikie in OpenCV ( although it can be done using scipy, i will tell it when time comes.)

1. Image PreProcessing ( closing operation )

import cv2
import numpy as np

img = cv2.imread("dave.jpg")
img = cv2.GaussianBlur(img,(5,5),0)
gray = cv2.cvtColor(img,cv2.COLOR_BGR2GRAY)
mask = np.zeros((gray.shape),np.uint8)
kernel1 = cv2.getStructuringElement(cv2.MORPH_ELLIPSE,(11,11))

close = cv2.morphologyEx(gray,cv2.MORPH_CLOSE,kernel1)
div = np.float32(gray)/(close)
res = np.uint8(cv2.normalize(div,div,0,255,cv2.NORM_MINMAX))
res2 = cv2.cvtColor(res,cv2.COLOR_GRAY2BGR)

Result :

Result of closing

2. Finding Sudoku Square and Creating Mask Image

thresh = cv2.adaptiveThreshold(res,255,0,1,19,2)
contour,hier = cv2.findContours(thresh,cv2.RETR_TREE,cv2.CHAIN_APPROX_SIMPLE)

max_area = 0
best_cnt = None
for cnt in contour:
    area = cv2.contourArea(cnt)
    if area > 1000:
        if area > max_area:
            max_area = area
            best_cnt = cnt

cv2.drawContours(mask,[best_cnt],0,255,-1)
cv2.drawContours(mask,[best_cnt],0,0,2)

res = cv2.bitwise_and(res,mask)

Result :

enter image description here

3. Finding Vertical lines

kernelx = cv2.getStructuringElement(cv2.MORPH_RECT,(2,10))

dx = cv2.Sobel(res,cv2.CV_16S,1,0)
dx = cv2.convertScaleAbs(dx)
cv2.normalize(dx,dx,0,255,cv2.NORM_MINMAX)
ret,close = cv2.threshold(dx,0,255,cv2.THRESH_BINARY+cv2.THRESH_OTSU)
close = cv2.morphologyEx(close,cv2.MORPH_DILATE,kernelx,iterations = 1)

contour, hier = cv2.findContours(close,cv2.RETR_EXTERNAL,cv2.CHAIN_APPROX_SIMPLE)
for cnt in contour:
    x,y,w,h = cv2.boundingRect(cnt)
    if h/w > 5:
        cv2.drawContours(close,[cnt],0,255,-1)
    else:
        cv2.drawContours(close,[cnt],0,0,-1)
close = cv2.morphologyEx(close,cv2.MORPH_CLOSE,None,iterations = 2)
closex = close.copy()

Result :

enter image description here

4. Finding Horizontal Lines

kernely = cv2.getStructuringElement(cv2.MORPH_RECT,(10,2))
dy = cv2.Sobel(res,cv2.CV_16S,0,2)
dy = cv2.convertScaleAbs(dy)
cv2.normalize(dy,dy,0,255,cv2.NORM_MINMAX)
ret,close = cv2.threshold(dy,0,255,cv2.THRESH_BINARY+cv2.THRESH_OTSU)
close = cv2.morphologyEx(close,cv2.MORPH_DILATE,kernely)

contour, hier = cv2.findContours(close,cv2.RETR_EXTERNAL,cv2.CHAIN_APPROX_SIMPLE)
for cnt in contour:
    x,y,w,h = cv2.boundingRect(cnt)
    if w/h > 5:
        cv2.drawContours(close,[cnt],0,255,-1)
    else:
        cv2.drawContours(close,[cnt],0,0,-1)

close = cv2.morphologyEx(close,cv2.MORPH_DILATE,None,iterations = 2)
closey = close.copy()

Result :

enter image description here

Of course, this one is not so good.

5. Finding Grid Points

res = cv2.bitwise_and(closex,closey)

Result :

enter image description here

6. Correcting the defects

Here, nikie does some kind of interpolation, about which I don"t have much knowledge. And i couldn"t find any corresponding function for this OpenCV. (may be it is there, i don"t know).

Check out this SOF which explains how to do this using SciPy, which I don"t want to use : Image transformation in OpenCV

So, here I took 4 corners of each sub-square and applied warp Perspective to each.

For that, first we find the centroids.

contour, hier = cv2.findContours(res,cv2.RETR_LIST,cv2.CHAIN_APPROX_SIMPLE)
centroids = []
for cnt in contour:
    mom = cv2.moments(cnt)
    (x,y) = int(mom["m10"]/mom["m00"]), int(mom["m01"]/mom["m00"])
    cv2.circle(img,(x,y),4,(0,255,0),-1)
    centroids.append((x,y))

But resulting centroids won"t be sorted. Check out below image to see their order:

enter image description here

So we sort them from left to right, top to bottom.

centroids = np.array(centroids,dtype = np.float32)
c = centroids.reshape((100,2))
c2 = c[np.argsort(c[:,1])]

b = np.vstack([c2[i*10:(i+1)*10][np.argsort(c2[i*10:(i+1)*10,0])] for i in xrange(10)])
bm = b.reshape((10,10,2))

Now see below their order :

enter image description here

Finally we apply the transformation and create a new image of size 450x450.

output = np.zeros((450,450,3),np.uint8)
for i,j in enumerate(b):
    ri = i/10
    ci = i%10
    if ci != 9 and ri!=9:
        src = bm[ri:ri+2, ci:ci+2 , :].reshape((4,2))
        dst = np.array( [ [ci*50,ri*50],[(ci+1)*50-1,ri*50],[ci*50,(ri+1)*50-1],[(ci+1)*50-1,(ri+1)*50-1] ], np.float32)
        retval = cv2.getPerspectiveTransform(src,dst)
        warp = cv2.warpPerspective(res2,retval,(450,450))
        output[ri*50:(ri+1)*50-1 , ci*50:(ci+1)*50-1] = warp[ri*50:(ri+1)*50-1 , ci*50:(ci+1)*50-1].copy()

Result :

enter image description here

The result is almost same as nikie"s, but code length is large. May be, better methods are available out there, but until then, this works OK.

Regards ARK.

Answer #7

Use numpy.argsort. It returns the indices one would use to sort the array.

import numpy as np
import numpy.linalg as linalg

A = np.random.random((3,3))
eigenValues, eigenVectors = linalg.eig(A)

idx = eigenValues.argsort()[::-1]   
eigenValues = eigenValues[idx]
eigenVectors = eigenVectors[:,idx]

If the eigenvalues are complex, the sort order is lexicographic (that is, complex numbers are sorted according to their real part first, with ties broken by their imaginary part).

Answer #8

According to the documentation

Returns the indices that would sort an array.

  • 2 is the index of 0.0.
  • 3 is the index of 0.1.
  • 1 is the index of 1.41.
  • 0 is the index of 1.48.

Get Solution for free from DataCamp guru