# Python | Sort the list by the second item in the sublist

Loops | Python Methods and Functions | String Variables | sublist

In this article, we will learn how to sort any list by the second element of the sublist present in the main list. We will see two ways to do this. We will explore three methods of doing this kind. One using Bubble Sort, one using the sort () method, and last but not least, using the sorted () method. In this program, we have sorted the list in ascending order.
Examples:

` Input: [['rishav', 10], [' akash', 5], ['ram', 20], [' gaurav', 15]] Output: [['akash', 5], [' rishav', 10], ['gaurav', 15], [' ram', 20]] Input: [['452', 10], [' 256', 5 ], ['100', 20], [' 135', 15]] Output: [['256', 5], [' 452', 10], ['135', 15], [' 100', 20]] `

Method 1: Using the Bubble Sort technique
Here we used the Bubble Sort to perform sorting. We tried to access the second item of the sublists using nested loops. This is done by the in-place sorting method. Time complexity is similar to bubble sort, i.e. O (n ^ 2)

 ` # Python code for sorting lists using the second element of sublists ` ` # Internal sorting, using third variable ` ` def ` ` Sort (sub_li): ` ` ` ` l ` ` = ` ` len ` ` (sub_li) ` ` ` ` for ` ` i ` ` in ` ` range ` ` (` ` 0 ` `, l): ` ` for ` ` j ` ` in ` ` range ` ` (` ` 0 ` `, l ` ` - ` ` i ` ` - ` ` 1 ` `): ` ` if ` ` (sub_li [j] [` ` 1 ` `] & gt; sub_li [j ` ` + ` ` 1 ` `] [` ` 1 ` `]): ` ` tempo ` ` = ` ` sub_li [j] ` ` ` ` sub_li [j] ` ` = ` ` sub_li [j ` ` + ` ` 1 ` `] ` ` sub_li [j ` ` + ` ` 1 ` `] ` ` = ` ` tempo ` ` return ` ` sub_li `   ` Driver code ` < code class = "plain"> sub_li ` = ` ` [[[` ` 'rishav' ` `, ` ` 10 ` `], [` `' akash' ` `, ` ` 5 ` `], [` ` 'ram' ` `, ` ` 20 ` ` ], [` ` 'gaurav' ` `, ` ` 15 ` `]] ` ` print ` ` (Sort (sub_li)) `

Output:

` [['akash', 5 ], ['rishav', 10], [' gaurav', 15], ['ram', 20]] `

Method 2: Sort using the sort () method
When sorting with this method, the actual content of the tuple is changed, and, as in the previous method, it does The in-place sorting method is used.

Output:

` [['akash', 5], [' rishav', 10], ['gaurav', 15], ['ram', 20]] `

Method 3: sorting using the sorted () method
Sorted () sorts the list and always returns the list with the sorted elements unchanged original sequence. It takes three parameters, of which two are optional, here we tried to use all three:

1. Iterable: sequence (list, tuple, string) or collection (dictionary, set, frozenset) or whatever the iterator to be sorted.
2. Key (optional): a function that will serve as the server's key or base for sorting comparisons.
3. Reverse (optional): to sort this in ascending order , we could just ignore the third parameter we made in this program. If set to true, the iteration will be sorted in reverse (descending) order, by default it is set to false.

` `

 ` # Python code for sorting tuples using the second element ` ` Sublist # Internal sorting method using sort () ` ` def ` ` Sort (sub_li): `   ` # reverse = None (ascending sort) ` ` Key # set to sort using the second element ` ` # lambda list was used ` ` sub_li.sort (key ` ` = ` ` lambda ` x: x [ ` 1 ` `]) ` ` return ` ` sub_li `   ` Driver code ` ` sub_li ` ` = ` ` [[` `' rishav' ` `, ` ` 10 ` `], [` ` 'akash' ` `, ` ` 5 ` `], [` ` 'ram' ` `, ` ` 20 ` `], [` ` 'gaurav '` `, ` ` 15 ` `]] ` ` print ` ` (Sort (sub_li)) `
 ` # Python code to sort tuples using the second element ` ` Sublist # Function to sort using sorted () ` ` def ` ` Sort (sub_li): ` ` `  ` # reverse = None ` ` ` ` Key # set to sort using second element ` ` # lambda list was used ` < code class = "keyword"> return ` (` ` sorted ` ` (sub_li, key ` ` = ` ` lambda ` ` x: x [` ` 1 ` `])) `   ` Driver code ` ` sub_li ` ` = ` ` [[` `' rishav' ` `, ` ` 10 ` `], [` ` 'akash' ` `, ` ` 5 ` `], [` ` 'ram' ` `, ` ` 20 ` `], [` ` 'gaurav' ` ` , ` ` 15 ` `]] ` print ` (Sort (sub_li)) `
` `

` `

Output:

` [['akash', 5], [' rishav', 10], ['gaurav', 15], [' ram ', 20]] `

## List of lists changes reflected across sublists unexpectedly

### Question by Charles Anderson

I needed to create a list of lists in Python, so I typed the following:

``````my_list = [[1] * 4] * 3
``````

The list looked like this:

``````[[1, 1, 1, 1], [1, 1, 1, 1], [1, 1, 1, 1]]
``````

Then I changed one of the innermost values:

``````my_list[0][0] = 5
``````

Now my list looks like this:

``````[[5, 1, 1, 1], [5, 1, 1, 1], [5, 1, 1, 1]]
``````

which is not what I wanted or expected. Can someone please explain what"s going on, and how to get around it?

## Split a python list into other "sublists" i.e smaller lists

I have a python list which runs into 1000"s. Something like:

``````data=["I";"am";"a";"python";"programmer".....]
``````

where, len(data)= say 1003

I would now like to create a subset of this list (data) by splitting the orginal list into chunks of 100. So, at the end, Id like to have something like:

``````data_chunk1=[.....] #first 100 items of list data
data_chunk2=[.....] #second 100 items of list data
.
.
.
data_chunk11=[.....] # remainder of the entries,& its len <=100, len(data_chunk_11)=3
``````

Is there a pythonic way to achieve this task? Obviously I can use data[0:100] and so on, but I am assuming that is terribly non-pythonic and very inefficient.

Many thanks.

## Extract first item of each sublist

I am wondering what is the best way to extract the first item of each sublist in a list of lists and append it to a new list. So if I have:

``````lst = [[a,b,c], [1,2,3], [x,y,z]]
``````

and I want to pull out `a`, `1` and `x` and create a separate list from those.

I tried:

``````lst2.append(x[0] for x in lst)
``````

I tested most suggested solutions with perfplot (a pet project of mine, essentially a wrapper around `timeit`), and found

``````import functools
import operator
functools.reduce(operator.iconcat, a, [])
``````

to be the fastest solution, both when many small lists and few long lists are concatenated. (`operator.iadd` is equally fast.)

Code to reproduce the plot:

``````import functools
import itertools
import numpy
import operator
import perfplot

def forfor(a):
return [item for sublist in a for item in sublist]

def sum_brackets(a):
return sum(a, [])

def functools_reduce(a):
return functools.reduce(operator.concat, a)

def functools_reduce_iconcat(a):
return functools.reduce(operator.iconcat, a, [])

def itertools_chain(a):
return list(itertools.chain.from_iterable(a))

def numpy_flat(a):
return list(numpy.array(a).flat)

def numpy_concatenate(a):
return list(numpy.concatenate(a))

perfplot.show(
setup=lambda n: [list(range(10))] * n,
# setup=lambda n: [list(range(n))] * 10,
kernels=[
forfor,
sum_brackets,
functools_reduce,
functools_reduce_iconcat,
itertools_chain,
numpy_flat,
numpy_concatenate,
],
n_range=[2 ** k for k in range(16)],
xlabel="num lists (of length 10)",
# xlabel="len lists (10 lists total)"
)
``````

There is a clean, one-line way of doing this in Pandas:

``````df["col_3"] = df.apply(lambda x: f(x.col_1, x.col_2), axis=1)
``````

This allows `f` to be a user-defined function with multiple input values, and uses (safe) column names rather than (unsafe) numeric indices to access the columns.

Example with data (based on original question):

``````import pandas as pd

df = pd.DataFrame({"ID":["1", "2", "3"], "col_1": [0, 2, 3], "col_2":[1, 4, 5]})
mylist = ["a", "b", "c", "d", "e", "f"]

def get_sublist(sta,end):
return mylist[sta:end+1]

df["col_3"] = df.apply(lambda x: get_sublist(x.col_1, x.col_2), axis=1)
``````

Output of `print(df)`:

``````  ID  col_1  col_2      col_3
0  1      0      1     [a, b]
1  2      2      4  [c, d, e]
2  3      3      5  [d, e, f]
``````

If your column names contain spaces or share a name with an existing dataframe attribute, you can index with square brackets:

``````df["col_3"] = df.apply(lambda x: f(x["col 1"], x["col 2"]), axis=1)
``````

I think all of the answers here cover the core of what the lambda function does in the context of sorted() quite nicely, however I still feel like a description that leads to an intuitive understanding is lacking, so here is my two cents.

For the sake of completeness, I"ll state the obvious up front: sorted() returns a list of sorted elements and if we want to sort in a particular way or if we want to sort a complex list of elements (e.g. nested lists or a list of tuples) we can invoke the key argument.

For me, the intuitive understanding of the key argument, why it has to be callable, and the use of lambda as the (anonymous) callable function to accomplish this comes in two parts.

1. Using lamba ultimately means you don"t have to write (define) an entire function, like the one sblom provided an example of. Lambda functions are created, used, and immediately destroyed - so they don"t funk up your code with more code that will only ever be used once. This, as I understand it, is the core utility of the lambda function and its application for such a role is broad. Its syntax is purely a convention, which is in essence the nature of programmatic syntax in general. Learn the syntax and be done with it.

Lambda syntax is as follows:

``````lambda input_variable(s): tasty one liner
``````

where `lambda` is a python keyword.

e.g.

``````In [1]: f00 = lambda x: x/2

In [2]: f00(10)
Out[2]: 5.0

In [3]: (lambda x: x/2)(10)
Out[3]: 5.0

In [4]: (lambda x, y: x / y)(10, 2)
Out[4]: 5.0

In [5]: (lambda: "amazing lambda")() # func with no args!
Out[5]: "amazing lambda"
``````
1. The idea behind the `key` argument is that it should take in a set of instructions that will essentially point the "sorted()" function at those list elements which should be used to sort by. When it says `key=`, what it really means is: As I iterate through the list, one element at a time (i.e. `for e in some_list`), I"m going to pass the current element to the function specifed by the key argument and use that to create a transformed list which will inform me on the order of the final sorted list.

Check it out:

``````In [6]: mylist = [3, 6, 3, 2, 4, 8, 23]  # an example list
# sorted(mylist, key=HowToSort)  # what we will be doing
``````

Base example:

``````# mylist = [3, 6, 3, 2, 4, 8, 23]
In [7]: sorted(mylist)
Out[7]: [2, 3, 3, 4, 6, 8, 23]
# all numbers are in ascending order (i.e.from low to high).
``````

Example 1:

``````# mylist = [3, 6, 3, 2, 4, 8, 23]
In [8]: sorted(mylist, key=lambda x: x % 2 == 0)

# Quick Tip: The % operator returns the *remainder* of a division
# operation. So the key lambda function here is saying "return True
# if x divided by 2 leaves a remainer of 0, else False". This is a
# typical way to check if a number is even or odd.

Out[8]: [3, 3, 23, 6, 2, 4, 8]
# Does this sorted result make intuitive sense to you?
``````

Notice that my lambda function told `sorted` to check if each element `e` was even or odd before sorting.

BUT WAIT! You may (or perhaps should) be wondering two things.

First, why are the odd numbers coming before the even numbers? After all, the key value seems to be telling the `sorted` function to prioritize evens by using the `mod` operator in `x % 2 == 0`.

Second, why are the even numbers still out of order? 2 comes before 6, right?

By analyzing this result, we"ll learn something deeper about how the "key" argument really works, especially in conjunction with the anonymous lambda function.

Firstly, you"ll notice that while the odds come before the evens, the evens themselves are not sorted. Why is this?? Lets read the docs:

Key Functions Starting with Python 2.4, both list.sort() and sorted() added a key parameter to specify a function to be called on each list element prior to making comparisons.

We have to do a little bit of reading between the lines here, but what this tells us is that the sort function is only called once, and if we specify the key argument, then we sort by the value that key function points us to.

So what does the example using a modulo return? A boolean value: `True == 1`, `False == 0`. So how does sorted deal with this key? It basically transforms the original list to a sequence of 1s and 0s.

`[3, 6, 3, 2, 4, 8, 23]` becomes `[0, 1, 0, 1, 1, 1, 0]`

Now we"re getting somewhere. What do you get when you sort the transformed list?

`[0, 0, 0, 1, 1, 1, 1]`

Okay, so now we know why the odds come before the evens. But the next question is: Why does the 6 still come before the 2 in my final list? Well that"s easy - it is because sorting only happens once! Those 1s still represent the original list values, which are in their original positions relative to each other. Since sorting only happens once, and we don"t call any kind of sort function to order the original even numbers from low to high, those values remain in their original order relative to one another.

The final question is then this: How do I think conceptually about how the order of my boolean values get transformed back in to the original values when I print out the final sorted list?

Sorted() is a built-in method that (fun fact) uses a hybrid sorting algorithm called Timsort that combines aspects of merge sort and insertion sort. It seems clear to me that when you call it, there is a mechanic that holds these values in memory and bundles them with their boolean identity (mask) determined by (...!) the lambda function. The order is determined by their boolean identity calculated from the lambda function, but keep in mind that these sublists (of one"s and zeros) are not themselves sorted by their original values. Hence, the final list, while organized by Odds and Evens, is not sorted by sublist (the evens in this case are out of order). The fact that the odds are ordered is because they were already in order by coincidence in the original list. The takeaway from all this is that when lambda does that transformation, the original order of the sublists are retained.

So how does this all relate back to the original question, and more importantly, our intuition on how we should implement sorted() with its key argument and lambda?

That lambda function can be thought of as a pointer that points to the values we need to sort by, whether its a pointer mapping a value to its boolean transformed by the lambda function, or if its a particular element in a nested list, tuple, dict, etc., again determined by the lambda function.

Lets try and predict what happens when I run the following code.

``````In [9]: mylist = [(3, 5, 8), (6, 2, 8), (2, 9, 4), (6, 8, 5)]
In[10]: sorted(mylist, key=lambda x: x[1])
``````

My `sorted` call obviously says, "Please sort this list". The key argument makes that a little more specific by saying, "for each element `x` in `mylist`, return the second index of that element, then sort all of the elements of the original list `mylist` by the sorted order of the list calculated by the lambda function. Since we have a list of tuples, we can return an indexed element from that tuple using the lambda function.

The pointer that will be used to sort would be:

``````[5, 2, 9, 8] # the second element of each tuple
``````

Sorting this pointer list returns:

``````[2, 5, 8, 9]
``````

Applying this to `mylist`, we get:

``````Out[10]: [(6, 2, 8), (3, 5, 8), (6, 8, 5), (2, 9, 4)]
# Notice the sorted pointer list is the same as the second index of each tuple in this final list
``````

Run that code, and you"ll find that this is the order. Try sorting a list of integers using this key function and you"ll find that the code breaks (why? Because you cannot index an integer of course).

This was a long winded explanation, but I hope this helps to `sort` your intuition on the use of `lambda` functions - as the key argument in sorted(), and beyond.

If you want to flatten a data-structure where you don"t know how deep it"s nested you could use `iteration_utilities.deepflatten`1

``````>>> from iteration_utilities import deepflatten

>>> l = [[1, 2, 3], [4, 5, 6], [7], [8, 9]]
>>> list(deepflatten(l, depth=1))
[1, 2, 3, 4, 5, 6, 7, 8, 9]

>>> l = [[1, 2, 3], [4, [5, 6]], 7, [8, 9]]
>>> list(deepflatten(l))
[1, 2, 3, 4, 5, 6, 7, 8, 9]
``````

It"s a generator so you need to cast the result to a `list` or explicitly iterate over it.

To flatten only one level and if each of the items is itself iterable you can also use `iteration_utilities.flatten` which itself is just a thin wrapper around `itertools.chain.from_iterable`:

``````>>> from iteration_utilities import flatten
>>> l = [[1, 2, 3], [4, 5, 6], [7], [8, 9]]
>>> list(flatten(l))
[1, 2, 3, 4, 5, 6, 7, 8, 9]
``````

Just to add some timings (based on Nico Schl√∂mer"s answer that didn"t include the function presented in this answer):

It"s a log-log plot to accommodate for the huge range of values spanned. For qualitative reasoning: Lower is better.

The results show that if the iterable contains only a few inner iterables then `sum` will be fastest, however for long iterables only the `itertools.chain.from_iterable`, `iteration_utilities.deepflatten` or the nested comprehension have reasonable performance with `itertools.chain.from_iterable` being the fastest (as already noticed by Nico Schl√∂mer).

``````from itertools import chain
from functools import reduce
from collections import Iterable  # or from collections.abc import Iterable
import operator
from iteration_utilities import deepflatten

def nested_list_comprehension(lsts):
return [item for sublist in lsts for item in sublist]

def itertools_chain_from_iterable(lsts):
return list(chain.from_iterable(lsts))

def pythons_sum(lsts):
return sum(lsts, [])

return reduce(lambda x, y: x + y, lsts)

def pylangs_flatten(lsts):
return list(flatten(lsts))

def flatten(items):
"""Yield items from any nested iterable; see REF."""
for x in items:
if isinstance(x, Iterable) and not isinstance(x, (str, bytes)):
yield from flatten(x)
else:
yield x

def reduce_concat(lsts):
return reduce(operator.concat, lsts)

def iteration_utilities_deepflatten(lsts):
return list(deepflatten(lsts, depth=1))

from simple_benchmark import benchmark

b = benchmark(
[nested_list_comprehension, itertools_chain_from_iterable, pythons_sum, reduce_add,
pylangs_flatten, reduce_concat, iteration_utilities_deepflatten],
arguments={2**i: [[0]*5]*(2**i) for i in range(1, 13)},
argument_name="number of inner lists"
)

b.plot()
``````

1 Disclaimer: I"m the author of that library

Given a list of lists `t`,

``````flat_list = [item for sublist in t for item in sublist]
``````

which means:

``````flat_list = []
for sublist in t:
for item in sublist:
flat_list.append(item)
``````

is faster than the shortcuts posted so far. (`t` is the list to flatten.)

Here is the corresponding function:

``````def flatten(t):
return [item for sublist in t for item in sublist]
``````

As evidence, you can use the `timeit` module in the standard library:

``````\$ python -mtimeit -s"t=[[1,2,3],[4,5,6], [7], [8,9]]*99" "[item for sublist in t for item in sublist]"
10000 loops, best of 3: 143 usec per loop
\$ python -mtimeit -s"t=[[1,2,3],[4,5,6], [7], [8,9]]*99" "sum(t, [])"
1000 loops, best of 3: 969 usec per loop
\$ python -mtimeit -s"t=[[1,2,3],[4,5,6], [7], [8,9]]*99" "reduce(lambda x,y: x+y,t)"
1000 loops, best of 3: 1.1 msec per loop
``````

Explanation: the shortcuts based on `+` (including the implied use in `sum`) are, of necessity, `O(T**2)` when there are T sublists -- as the intermediate result list keeps getting longer, at each step a new intermediate result list object gets allocated, and all the items in the previous intermediate result must be copied over (as well as a few new ones added at the end). So, for simplicity and without actual loss of generality, say you have T sublists of k items each: the first k items are copied back and forth T-1 times, the second k items T-2 times, and so on; total number of copies is k times the sum of x for x from 1 to T excluded, i.e., `k * (T**2)/2`.

The list comprehension just generates one list, once, and copies each item over (from its original place of residence to the result list) also exactly once.

When you write `[x]*3` you get, essentially, the list `[x, x, x]`. That is, a list with 3 references to the same `x`. When you then modify this single `x` it is visible via all three references to it:

``````x = [1] * 4
l = [x] * 3
print(f"id(x): {id(x)}")
# id(x): 140560897920048
print(
f"id(l[0]): {id(l[0])}
"
f"id(l[1]): {id(l[1])}
"
f"id(l[2]): {id(l[2])}"
)
# id(l[0]): 140560897920048
# id(l[1]): 140560897920048
# id(l[2]): 140560897920048

x[0] = 42
print(f"x: {x}")
# x: [42, 1, 1, 1]
print(f"l: {l}")
# l: [[42, 1, 1, 1], [42, 1, 1, 1], [42, 1, 1, 1]]
``````

To fix it, you need to make sure that you create a new list at each position. One way to do it is

``````[[1]*4 for _ in range(3)]
``````

which will reevaluate `[1]*4` each time instead of evaluating it once and making 3 references to 1 list.

You might wonder why `*` can"t make independent objects the way the list comprehension does. That"s because the multiplication operator `*` operates on objects, without seeing expressions. When you use `*` to multiply `[[1] * 4]` by 3, `*` only sees the 1-element list `[[1] * 4]` evaluates to, not the `[[1] * 4` expression text. `*` has no idea how to make copies of that element, no idea how to reevaluate `[[1] * 4]`, and no idea you even want copies, and in general, there might not even be a way to copy the element.

The only option `*` has is to make new references to the existing sublist instead of trying to make new sublists. Anything else would be inconsistent or require major redesigning of fundamental language design decisions.

In contrast, a list comprehension reevaluates the element expression on every iteration. `[[1] * 4 for n in range(3)]` reevaluates `[1] * 4` every time for the same reason `[x**2 for x in range(3)]` reevaluates `x**2` every time. Every evaluation of `[1] * 4` generates a new list, so the list comprehension does what you wanted.

Incidentally, `[1] * 4` also doesn"t copy the elements of `[1]`, but that doesn"t matter, since integers are immutable. You can"t do something like `1.value = 2` and turn a 1 into a 2.

``````buckets = [0] * 100
``````

Careful - this technique doesn"t generalize to multidimensional arrays or lists of lists. Which leads to the List of lists changes reflected across sublists unexpectedly problem

If I understood the question correctly, you can use the slicing notation to keep everything except the last item:

``````record = record[:-1]
``````

But a better way is to delete the item directly:

``````del record[-1]
``````

Note 1: Note that using record = record[:-1] does not really remove the last element, but assign the sublist to record. This makes a difference if you run it inside a function and record is a parameter. With record = record[:-1] the original list (outside the function) is unchanged, with del record[-1] or record.pop() the list is changed. (as stated by @pltrdy in the comments)

Note 2: The code could use some Python idioms. I highly recommend reading this:
Code Like a Pythonista: Idiomatic Python (via wayback machine archive).

Flatten the list to "remove the brackets" using a nested list comprehension. This will un-nest each list stored in your list of lists!

``````list_of_lists = [[180.0], [173.8], [164.2], [156.5], [147.2], [138.2]]
flattened = [val for sublist in list_of_lists for val in sublist]
``````

Nested list comprehensions evaluate in the same manner that they unwrap (i.e. add newline and tab for each new loop. So in this case:

``````flattened = [val for sublist in list_of_lists for val in sublist]
``````

is equivalent to:

``````flattened = []
for sublist in list_of_lists:
for val in sublist:
flattened.append(val)
``````

The big difference is that the list comp evaluates MUCH faster than the unraveled loop and eliminates the append calls!

If you have multiple items in a sublist the list comp will even flatten that. ie

``````>>> list_of_lists = [[180.0, 1, 2, 3], [173.8], [164.2], [156.5], [147.2], [138.2]]
>>> flattened  = [val for sublist in list_of_lists for val in sublist]
>>> flattened
[180.0, 1, 2, 3, 173.8, 164.2, 156.5, 147.2,138.2]
``````

If you want:

``````c1 = [1, 6, 7, 10, 13, 28, 32, 41, 58, 63]
c2 = [[13, 17, 18, 21, 32], [7, 11, 13, 14, 28], [1, 5, 6, 8, 15, 16]]
c3 = [[13, 32], [7, 13, 28], [1,6]]
``````

Then here is your solution for Python 2:

``````c3 = [filter(lambda x: x in c1, sublist) for sublist in c2]
``````

In Python 3 `filter` returns an iterable instead of `list`, so you need to wrap `filter` calls with `list()`:

``````c3 = [list(filter(lambda x: x in c1, sublist)) for sublist in c2]
``````

Explanation:

The filter part takes each sublist"s item and checks to see if it is in the source list c1. The list comprehension is executed for each sublist in c2.