Change language

All about sorting in Python: a comprehensive guide

| | | |

Sorting in Python is done by sorted() if iterated objects, and by the list.sort() method if it is a list. Let's take a closer look at how it worked in older versions and how it works now.

1. The basics of sorting

To sort in ascending order, simply call the Python sorted() function, which will return a new sorted list:

>>> sorted([5, 2, 3, 1, 4])
[1, 2, 3, 4, 5]

You can also use the list.sort() method, which changes the original list (and returns None to avoid confusion). This is usually not as convenient as using sorted(), but if you don't need the original list, it's a bit more efficient:

>>> a = [5, 2, 3, 1, 4]
>>> a.sort()
>>> a
[1, 2, 3, 4, 5]

In Python, returning None and not returning anything are the same thing.

Another difference is that the list.sort() method is only defined for lists, whereas sorted() works on all iterated objects:

>>> sorted({1: 'D', 2: 'B', 3: 'B', 4: 'E', 5: 'A'})
[1, 2, 3, 4, 5]

When iterating over a dictionary, Python returns its keys. If you want their values or key-value pairs, use dict.values() and dict.items() methods respectively.

Let's take a look at Python's basic sorting functions.

2. Key functions

Since Python 2.4, list.sort() and sorted() now have a key parameter to specify the function to be called on each item before comparison. Here is a case-independent string comparison:

>>> sorted("This is a test string from Andrew".split(), key=str.lower)
['a', 'Andrew', 'from', 'is', 'string', 'test', 'This']

The key value should be a function that takes a single argument and returns a key to sort. It works quickly because the key function is called once for each item.

You will often find code where a complex object is sorted by one of its indexes. For example:

>>> student_tuples = [
        ('john', 'A', 15),
        ('jane', 'B', 12),
        ('dave', 'B', 10),
    ]
>>> sorted(student_tuples, key=lambda student: student[2])   # сортируем по возрасту
[('dave', 'B', 10), ('jane', 'B', 12), ('john', 'A', 15)]

The same method works for objects with named attributes:

>>> class Student:
        def __init__(self, name, grade, age):
            self.name = name
            self.grade = grade
            self.age = age
        def __repr__(self):
            return repr((self.name, self.grade, self.age))
        def weighted_grade(self):
            return 'CBA'.index(self.grade) / self.age

>>> student_objects = [
        Student('john', 'A', 15),
        Student('jane', 'B', 12),
        Student('dave', 'B', 10),
    ]
>>> sorted(student_objects, key=lambda student: student.age)   # сортируем по возрасту
[('dave', 'B', 10), ('jane', 'B', 12), ('john', 'A', 15)]​

3. Operator module functions

The key function examples shown above are so common that Python offers handy functions to make things easier and faster. The operator module contains the itemgetter(), attrgetter(), and, starting with Python 2.6, methodcaller() functions. These are even easier to use:

>>> from operator import itemgetter, attrgetter, methodcaller

>>> sorted(student_tuples, key=itemgetter(2))
[('dave', 'B', 10), ('jane', 'B', 12), ('john', 'A', 15)]

>>> sorted(student_objects, key=attrgetter('age'))
[('dave', 'B', 10), ('jane', 'B', 12), ('john', 'A', 15)]

Operator functions give you the ability to use multiple sort levels in Python. Sort students first by grade and then by age:

>>> sorted(student_tuples, key=itemgetter(1, 2))
[('john', 'A', 15), ('dave', 'B', 10), ('jane', 'B', 12)]

>>> sorted(student_objects, key=attrgetter('grade', 'age'))
[('john', 'A', 15), ('dave', 'B', 10), ('jane', 'B', 12)]

Let's use the methodcaller() function to sort students by weighted grade:

>>> [(student.name, student.weighted_grade()) for student in student_objects]
[('john', 0.13333333333333333), ('jane', 0.08333333333333333), ('dave', 0.1)]
>>> sorted(student_objects, key=methodcaller('weighted_grade'))
[('jane', 'B', 12), ('dave', 'B', 10), ('john', 'A', 15)]

4. Sorting in ascending order and sorting in descending order in Python

List.sort() and sorted() have a reverse parameter that takes a boolean value. It is used to indicate descending sorting. Let's sort students in descending order of age:

>>> sorted(student_tuples, key=itemgetter(2), reverse=True)
[('john', 'A', 15), ('jane', 'B', 12), ('dave', 'B', 10)]

>>> sorted(student_objects, key=attrgetter('age'), reverse=True)
[('john', 'A', 15), ('jane', 'B', 12), ('dave', 'B', 10)]

5. Stable sorting and complex sorting in Python

As of Python 2.2, sorts are guaranteed to be stable: if multiple records have the same keys, their order will remain the same. Example:

>>> data = [('red', 1), ('blue', 1), ('red', 2), ('blue', 2)]
>>> sorted(data, key=itemgetter(0))
[('blue', 1), ('blue', 2), ('red', 1), ('red', 2)]

Note that the two records with 'blue' retain their initial order. This property allows us to do complex sorting by incremental sorting. Next we sort the student data first by age in ascending order, and then by grades in descending order, to get the data sorted first by grade and second by age:

>>> s = sorted(student_objects, key=attrgetter('age'))     # сортируем по вторичному ключу
>>> sorted(s, key=attrgetter('grade'), reverse=True)       # по первичному
[('dave', 'B', 10), ('jane', 'B', 12), ('john', 'A', 15)]

Python sorting algorithms like Timsort perform multiple sorts so efficiently because it can benefit from any order already present in the data set.

6. Decorate-sort-decorate

  • First, the original list is populated with new values that control the sort order.
  • The new list is then sorted.
  • The added values are then removed, leaving a sorted list containing only the original items.

This is how you can sort student data by grade:

>>> decorated = [(student.grade, i, student) for i, student in enumerate(student_objects)]
>>> decorated.sort()
>>> [student for grade, i, student in decorated]               # раздекорируем
[('john', 'A', 15), ('jane', 'B', 12), ('dave', 'B', 10)]

This works because the tuples are compared lexicographically, the first elements are compared, and if they match, the second elements are compared, and so on.

It is not always necessary to include an index in the decorated list, but it does have advantages:

  • Sorting is stable - if two elements have the same key, their order will not change.
  • The source elements need not have a comparison capability, as the order of the decorated tuples will be determined at most by the first two elements. For example, a source list may contain complex numbers which cannot be compared directly.

This idiom is also called the Schwartz transformation, after Randall Schwartz who popularized it among Perl programmers.

For large lists and versions of Python below 2.4, "decorate-sort-decorate" will be the optimal way to sort. For versions 2.4+, the same functionality is provided by key functions.

7. Using the cmp parameter

All versions of Python 2.x supported the cmp parameter to handle custom compare functions. Python 3.0 got rid of this parameter completely. In Python 2.x you could pass a function to sort() which would be used to compare items. It should take two arguments and return a negative value for "less than", a positive value for "greater than" and zero if they are equal:

>>> def numeric_compare(x, y):
        return x - y
>>> sorted([5, 2, 4, 1, 3], cmp=numeric_compare)
[1, 2, 3, 4, 5]

It is possible to compare in reverse order:

>>> def reverse_numeric(x, y):
        return y - x
>>> sorted([5, 2, 4, 1, 3], cmp=reverse_numeric)
[5, 4, 3, 2, 1]

When porting code from version 2.x to 3.x, a situation may arise where you need to convert a custom comparison function into a key function. The following wrapper simplifies this task:

def cmp_to_key(mycmp):
    'Перевести cmp=функция в key=функция'
    class K(object):
        def __init__(self, obj, *args):
            self.obj = obj
        def __lt__(self, other):
            return mycmp(self.obj, other.obj) < 0
        def __gt__(self, other):
            return mycmp(self.obj, other.obj) > 0
        def __eq__(self, other):
            return mycmp(self.obj, other.obj) == 0
        def __le__(self, other):
            return mycmp(self.obj, other.obj) <= 0
        def __ge__(self, other):
            return mycmp(self.obj, other.obj) >= 0
        def __ne__(self, other):
            return mycmp(self.obj, other.obj) != 0
    return K

To perform the conversion, wrap the old function:

>>> sorted([5, 2, 4, 1, 3], key=cmp_to_key(reverse_numeric))
[5, 4, 3, 2, 1]

In Python 2.7, cmp_to_key() has been added to the functools module.

8. Maintaining sort order

There are no modules in the standard Python library similar to C++ data types like set and map. Python delegates this task to third party libraries available in Python Package Index: they use different methods to keep the list, dict and set types in sorted order. Maintaining order by using a special data structure can help avoid very slow behaviour (quadratic runtime) in a naive approach with editing and constant re-sorting of data. Here are some of the modules that implement these data types:

  • SortedContainers is a pure Python implementation of the sorted types list, dict and set, and is as fast as the C implementation. Testing includes 100% code coverage and many hours of stress testing. The documentation includes a full API reference, performance comparisons, and guidelines for contributing.
  • rbtree is a fast C implementation for the dict and set types. The implementation uses a data structure known as a red-black tree.
  • treap is a sorted dict. The implementation uses a Cartesian tree and improves performance with Cython.
  • bintrees - several tree-based implementations of dict and set types in C. The fastest are based on AVL and red-black trees. It extends the common API for providing set operations for dictionaries.
  • banyan is a fast C implementation of dict and set.
  • skiplistcollections - a pure Python implementation based on skip lists, offers a limited API for dict and set types.
  • blist - provides sorted list, dict and set types based on the "blist" data type, a B-tree implementation. Written in Python and C.

9. Other

For language-specific sorting, use locale.strxfrm() as the key function or locale.strcoll() as the compare function. The reverse parameter still preserves sorting stability. This effect can be simulated without the parameter by using the built-in reversed() twice:

>>> data = [('red', 1), ('blue', 1), ('red', 2), ('blue', 2)]
>>> assert sorted(data, reverse=True) == list(reversed(sorted(reversed(data))))

To create a standard sort order for a class, simply add an implementation of the appropriate comparison methods:

>>> Student.__eq__ = lambda self, other: self.age == other.age
>>> Student.__ne__ = lambda self, other: self.age != other.age
>>> Student.__lt__ = lambda self, other: self.age < other.age
>>> Student.__le__ = lambda self, other: self.age <= other.age
>>> Student.__gt__ = lambda self, other: self.age > other.age
>>> Student.__ge__ = lambda self, other: self.age >= other.age
>>> sorted(student_objects)
[('dave', 'B', 10), ('jane', 'B', 12), ('john', 'A', 15)]

For types whose comparison works in the usual way, it is recommended to define all 6 operators. The functools.total_ordering class decorator simplifies their implementation. Key functions don't need to access the internal data of the objects being sorted. They can also access external resources. For example, if a student's grades are stored in a dictionary, they can be used to sort a separate list of students' names:

>>> students = ['dave', 'john', 'jane']
>>> newgrades = {'john': 'F', 'jane':'A', 'dave': 'C'}
>>> sorted(students, key=newgrades.__getitem__)
['jane', 'dave', 'john']