👻 See our latest reviews to choose the best laptop for Machine Learning and Deep learning tasks!
I happened to find myself having a basic filtering need: I have a list and I have to filter it by an attribute of the items.
My code looked like this:
my_list = [x for x in my_list if x.attribute == value]
But then I thought, wouldn"t it be better to write it like this?
my_list = filter(lambda x: x.attribute == value, my_list)
It"s more readable, and if needed for performance the lambda could be taken out to gain something.
Question is: are there any caveats in using the second way? Any performance difference? Am I missing the Pythonic Way‚Ñ¢ entirely and should do it in yet another way (such as using itemgetter instead of the lambda)?
👻 Read also: what is the best laptop for engineering students in 2022?
List comprehension vs. lambda + filter filter: Questions
List comprehension vs. lambda + filter
5 answers
I happened to find myself having a basic filtering need: I have a list and I have to filter it by an attribute of the items.
My code looked like this:
my_list = [x for x in my_list if x.attribute == value]
But then I thought, wouldn"t it be better to write it like this?
my_list = filter(lambda x: x.attribute == value, my_list)
It"s more readable, and if needed for performance the lambda could be taken out to gain something.
Question is: are there any caveats in using the second way? Any performance difference? Am I missing the Pythonic Way‚Ñ¢ entirely and should do it in yet another way (such as using itemgetter instead of the lambda)?
Answer #1
It is strange how much beauty varies for different people. I find the list comprehension much clearer than filter
+lambda
, but use whichever you find easier.
There are two things that may slow down your use of filter
.
The first is the function call overhead: as soon as you use a Python function (whether created by def
or lambda
) it is likely that filter will be slower than the list comprehension. It almost certainly is not enough to matter, and you shouldn"t think much about performance until you"ve timed your code and found it to be a bottleneck, but the difference will be there.
The other overhead that might apply is that the lambda is being forced to access a scoped variable (value
). That is slower than accessing a local variable and in Python 2.x the list comprehension only accesses local variables. If you are using Python 3.x the list comprehension runs in a separate function so it will also be accessing value
through a closure and this difference won"t apply.
The other option to consider is to use a generator instead of a list comprehension:
def filterbyvalue(seq, value):
for el in seq:
if el.attribute==value: yield el
Then in your main code (which is where readability really matters) you"ve replaced both list comprehension and filter with a hopefully meaningful function name.
Answer #2
This is a somewhat religious issue in Python. Even though Guido considered removing map
, filter
and reduce
from Python 3, there was enough of a backlash that in the end only reduce
was moved from built-ins to functools.reduce.
Personally I find list comprehensions easier to read. It is more explicit what is happening from the expression [i for i in list if i.attribute == value]
as all the behaviour is on the surface not inside the filter function.
I would not worry too much about the performance difference between the two approaches as it is marginal. I would really only optimise this if it proved to be the bottleneck in your application which is unlikely.
Also since the BDFL wanted filter
gone from the language then surely that automatically makes list comprehensions more Pythonic ;-)
List comprehension vs. lambda + filter filter: Questions
How do I do a not equal in Django queryset filtering?
5 answers
In Django model QuerySets, I see that there is a __gt
and __lt
for comparative values, but is there a __ne
or !=
(not equals)? I want to filter out using a not equals. For example, for
Model:
bool a;
int x;
I want to do
results = Model.objects.exclude(a=True, x!=5)
The !=
is not correct syntax. I also tried __ne
.
I ended up using:
results = Model.objects.exclude(a=True, x__lt=5).exclude(a=True, x__gt=5)
Answer #1
You can use Q objects for this. They can be negated with the ~
operator and combined much like normal Python expressions:
from myapp.models import Entry
from django.db.models import Q
Entry.objects.filter(~Q(id=3))
will return all entries except the one(s) with 3
as their ID:
[<Entry: Entry object>, <Entry: Entry object>, <Entry: Entry object>, ...]
Finding the index of an item in a list
5 answers
Given a list ["foo", "bar", "baz"]
and an item in the list "bar"
, how do I get its index (1
) in Python?
Answer #1
>>> ["foo", "bar", "baz"].index("bar")
1
Reference: Data Structures > More on Lists
Caveats follow
Note that while this is perhaps the cleanest way to answer the question as asked, index
is a rather weak component of the list
API, and I can"t remember the last time I used it in anger. It"s been pointed out to me in the comments that because this answer is heavily referenced, it should be made more complete. Some caveats about list.index
follow. It is probably worth initially taking a look at the documentation for it:
list.index(x[, start[, end]])
Return zero-based index in the list of the first item whose value is equal to x. Raises a
ValueError
if there is no such item.The optional arguments start and end are interpreted as in the slice notation and are used to limit the search to a particular subsequence of the list. The returned index is computed relative to the beginning of the full sequence rather than the start argument.
Linear time-complexity in list length
An index
call checks every element of the list in order, until it finds a match. If your list is long, and you don"t know roughly where in the list it occurs, this search could become a bottleneck. In that case, you should consider a different data structure. Note that if you know roughly where to find the match, you can give index
a hint. For instance, in this snippet, l.index(999_999, 999_990, 1_000_000)
is roughly five orders of magnitude faster than straight l.index(999_999)
, because the former only has to search 10 entries, while the latter searches a million:
>>> import timeit
>>> timeit.timeit("l.index(999_999)", setup="l = list(range(0, 1_000_000))", number=1000)
9.356267921015387
>>> timeit.timeit("l.index(999_999, 999_990, 1_000_000)", setup="l = list(range(0, 1_000_000))", number=1000)
0.0004404920036904514
Only returns the index of the first match to its argument
A call to index
searches through the list in order until it finds a match, and stops there. If you expect to need indices of more matches, you should use a list comprehension, or generator expression.
>>> [1, 1].index(1)
0
>>> [i for i, e in enumerate([1, 2, 1]) if e == 1]
[0, 2]
>>> g = (i for i, e in enumerate([1, 2, 1]) if e == 1)
>>> next(g)
0
>>> next(g)
2
Most places where I once would have used index
, I now use a list comprehension or generator expression because they"re more generalizable. So if you"re considering reaching for index
, take a look at these excellent Python features.
Throws if element not present in list
A call to index
results in a ValueError
if the item"s not present.
>>> [1, 1].index(2)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ValueError: 2 is not in list
If the item might not be present in the list, you should either
- Check for it first with
item in my_list
(clean, readable approach), or - Wrap the
index
call in atry/except
block which catchesValueError
(probably faster, at least when the list to search is long, and the item is usually present.)
Answer #2
One thing that is really helpful in learning Python is to use the interactive help function:
>>> help(["foo", "bar", "baz"])
Help on list object:
class list(object)
...
|
| index(...)
| L.index(value, [start, [stop]]) -> integer -- return first index of value
|
which will often lead you to the method you are looking for.
Answer #3
The majority of answers explain how to find a single index, but their methods do not return multiple indexes if the item is in the list multiple times. Use enumerate()
:
for i, j in enumerate(["foo", "bar", "baz"]):
if j == "bar":
print(i)
The index()
function only returns the first occurrence, while enumerate()
returns all occurrences.
As a list comprehension:
[i for i, j in enumerate(["foo", "bar", "baz"]) if j == "bar"]
Here"s also another small solution with itertools.count()
(which is pretty much the same approach as enumerate):
from itertools import izip as zip, count # izip for maximum efficiency
[i for i, j in zip(count(), ["foo", "bar", "baz"]) if j == "bar"]
This is more efficient for larger lists than using enumerate()
:
$ python -m timeit -s "from itertools import izip as zip, count" "[i for i, j in zip(count(), ["foo", "bar", "baz"]*500) if j == "bar"]"
10000 loops, best of 3: 174 usec per loop
$ python -m timeit "[i for i, j in enumerate(["foo", "bar", "baz"]*500) if j == "bar"]"
10000 loops, best of 3: 196 usec per loop