👻 See our latest reviews to choose the best laptop for Machine Learning and Deep learning tasks!
Without using groupby
how would I filter out data without NaN
?
Let say I have a matrix where customers will fill in "N/A","n/a"
or any of its variations and others leave it blank:
import pandas as pd
import numpy as np
df = pd.DataFrame({"movie": ["thg", "thg", "mol", "mol", "lob", "lob"],
"rating": [3., 4., 5., np.nan, np.nan, np.nan],
"name": ["John", np.nan, "N/A", "Graham", np.nan, np.nan]})
nbs = df["name"].str.extract("^(N/A|NA|na|n/a)")
nms=df[(df["name"] != nbs) ]
output:
>>> nms
movie name rating
0 thg John 3
1 thg NaN 4
3 mol Graham NaN
4 lob NaN NaN
5 lob NaN NaN
How would I filter out NaN
values so I can get results to work with like this:
movie name rating
0 thg John 3
3 mol Graham NaN
I am guessing I need something like ~np.isnan
but the tilda does not work with strings.
👻 Read also: what is the best laptop for engineering students in 2022?
Python pandas Filtering out nan from a data selection of a column of strings filter: Questions
List comprehension vs. lambda + filter
5 answers
I happened to find myself having a basic filtering need: I have a list and I have to filter it by an attribute of the items.
My code looked like this:
my_list = [x for x in my_list if x.attribute == value]
But then I thought, wouldn"t it be better to write it like this?
my_list = filter(lambda x: x.attribute == value, my_list)
It"s more readable, and if needed for performance the lambda could be taken out to gain something.
Question is: are there any caveats in using the second way? Any performance difference? Am I missing the Pythonic Way‚Ñ¢ entirely and should do it in yet another way (such as using itemgetter instead of the lambda)?
Answer #1
It is strange how much beauty varies for different people. I find the list comprehension much clearer than filter
+lambda
, but use whichever you find easier.
There are two things that may slow down your use of filter
.
The first is the function call overhead: as soon as you use a Python function (whether created by def
or lambda
) it is likely that filter will be slower than the list comprehension. It almost certainly is not enough to matter, and you shouldn"t think much about performance until you"ve timed your code and found it to be a bottleneck, but the difference will be there.
The other overhead that might apply is that the lambda is being forced to access a scoped variable (value
). That is slower than accessing a local variable and in Python 2.x the list comprehension only accesses local variables. If you are using Python 3.x the list comprehension runs in a separate function so it will also be accessing value
through a closure and this difference won"t apply.
The other option to consider is to use a generator instead of a list comprehension:
def filterbyvalue(seq, value):
for el in seq:
if el.attribute==value: yield el
Then in your main code (which is where readability really matters) you"ve replaced both list comprehension and filter with a hopefully meaningful function name.
Answer #2
This is a somewhat religious issue in Python. Even though Guido considered removing map
, filter
and reduce
from Python 3, there was enough of a backlash that in the end only reduce
was moved from built-ins to functools.reduce.
Personally I find list comprehensions easier to read. It is more explicit what is happening from the expression [i for i in list if i.attribute == value]
as all the behaviour is on the surface not inside the filter function.
I would not worry too much about the performance difference between the two approaches as it is marginal. I would really only optimise this if it proved to be the bottleneck in your application which is unlikely.
Also since the BDFL wanted filter
gone from the language then surely that automatically makes list comprehensions more Pythonic ;-)
Python pandas Filtering out nan from a data selection of a column of strings filter: Questions
How do I do a not equal in Django queryset filtering?
5 answers
In Django model QuerySets, I see that there is a __gt
and __lt
for comparative values, but is there a __ne
or !=
(not equals)? I want to filter out using a not equals. For example, for
Model:
bool a;
int x;
I want to do
results = Model.objects.exclude(a=True, x!=5)
The !=
is not correct syntax. I also tried __ne
.
I ended up using:
results = Model.objects.exclude(a=True, x__lt=5).exclude(a=True, x__gt=5)
Answer #1
You can use Q objects for this. They can be negated with the ~
operator and combined much like normal Python expressions:
from myapp.models import Entry
from django.db.models import Q
Entry.objects.filter(~Q(id=3))
will return all entries except the one(s) with 3
as their ID:
[<Entry: Entry object>, <Entry: Entry object>, <Entry: Entry object>, ...]
InsecurePlatformWarning: A true SSLContext object is not available. This prevents urllib3 from configuring SSL appropriately
3 answers
Tried to perform REST GET through python requests with the following code and I got error.
Code snip:
import requests
header = {"Authorization": "Bearer..."}
url = az_base_url + az_subscription_id + "/resourcegroups/Default-Networking/resources?" + az_api_version
r = requests.get(url, headers=header)
Error:
/usr/local/lib/python2.7/dist-packages/requests/packages/urllib3/util/ssl_.py:79:
InsecurePlatformWarning: A true SSLContext object is not available.
This prevents urllib3 from configuring SSL appropriately and may cause certain SSL connections to fail.
For more information, see https://urllib3.readthedocs.org/en/latest/security.html#insecureplatformwarning.
InsecurePlatformWarning
My python version is 2.7.3. I tried to install urllib3 and requests[security] as some other thread suggests, I still got the same error.
Wonder if anyone can provide some tips?
Answer #1
The docs give a fair indicator of what"s required., however requests
allow us to skip a few steps:
You only need to install the security
package extras (thanks @admdrew for pointing it out)
$ pip install requests[security]
or, install them directly:
$ pip install pyopenssl ndg-httpsclient pyasn1
Requests will then automatically inject pyopenssl
into urllib3
If you"re on ubuntu, you may run into trouble installing pyopenssl
, you"ll need these dependencies:
$ apt-get install libffi-dev libssl-dev
Dynamic instantiation from string name of a class in dynamically imported module?
3 answers
In python, I have to instantiate certain class, knowing its name in a string, but this class "lives" in a dynamically imported module. An example follows:
loader-class script:
import sys
class loader:
def __init__(self, module_name, class_name): # both args are strings
try:
__import__(module_name)
modul = sys.modules[module_name]
instance = modul.class_name() # obviously this doesn"t works, here is my main problem!
except ImportError:
# manage import error
some-dynamically-loaded-module script:
class myName:
# etc...
I use this arrangement to make any dynamically-loaded-module to be used by the loader-class following certain predefined behaviours in the dyn-loaded-modules...
Answer #1
You can use getattr
getattr(module, class_name)
to access the class. More complete code:
module = __import__(module_name)
class_ = getattr(module, class_name)
instance = class_()
As mentioned below, we may use importlib
import importlib
module = importlib.import_module(module_name)
class_ = getattr(module, class_name)
instance = class_()
How to get all of the immediate subdirectories in Python
3 answers
I"m trying to write a simple Python script that will copy a index.tpl to index.html in all of the subdirectories (with a few exceptions).
I"m getting bogged down by trying to get the list of subdirectories.
Answer #1
import os
def get_immediate_subdirectories(a_dir):
return [name for name in os.listdir(a_dir)
if os.path.isdir(os.path.join(a_dir, name))]