Detect and exclude outliers in Pandas data frame

| | | |

👻 Check our latest review to choose the best laptop for Machine Learning engineers and Deep learning tasks!

I have a pandas data frame with few columns.

Now I know that certain rows are outliers based on a certain column value.

For instance

column "Vol" has all values around 12xx and one value is 4000 (outlier).

Now I would like to exclude those rows that have Vol column like this.

So, essentially I need to put a filter on the data frame such that we select all rows where the values of a certain column are within, say, 3 standard deviations from mean.

What is an elegant way to achieve this?

👻 Read also: what is the best laptop for engineering students?

Detect and exclude outliers in Pandas data frame around: Questions

Removing white space around a saved image in matplotlib

2 answers

I need to take an image and save it after some process. The figure looks fine when I display it, but after saving the figure, I got some white space around the saved image. I have tried the "tight" option for savefig method, did not work either. The code:

  import matplotlib.image as mpimg
  import matplotlib.pyplot as plt

  fig = plt.figure(1)
  img = mpimg.imread(path)
  plt.imshow(img)
  ax=fig.add_subplot(1,1,1)

  extent = ax.get_window_extent().transformed(fig.dpi_scale_trans.inverted())
  plt.savefig("1.png", bbox_inches=extent)

  plt.axis("off") 
  plt.show()

I am trying to draw a basic graph by using NetworkX on a figure and save it. I realized that without a graph it works, but when added a graph I get white space around the saved image;

import matplotlib.image as mpimg
import matplotlib.pyplot as plt
import networkx as nx

G = nx.Graph()
G.add_node(1)
G.add_node(2)
G.add_node(3)
G.add_edge(1,3)
G.add_edge(1,2)
pos = {1:[100,120], 2:[200,300], 3:[50,75]}

fig = plt.figure(1)
img = mpimg.imread("image.jpg")
plt.imshow(img)
ax=fig.add_subplot(1,1,1)

nx.draw(G, pos=pos)

extent = ax.get_window_extent().transformed(fig.dpi_scale_trans.inverted())
plt.savefig("1.png", bbox_inches = extent)

plt.axis("off") 
plt.show()
228

Answer #1

You can remove the white space padding by setting bbox_inches="tight" in savefig:

plt.savefig("test.png",bbox_inches="tight")

You"ll have to put the argument to bbox_inches as a string, perhaps this is why it didn"t work earlier for you.


Possible duplicates:

Matplotlib plots: removing axis, legends and white spaces

How to set the margins for a matplotlib figure?

Reduce left and right margins in matplotlib plot

228

Answer #2

I cannot claim I know exactly why or how my “solution” works, but this is what I had to do when I wanted to plot the outline of a couple of aerofoil sections — without white margins — to a PDF file. (Note that I used matplotlib inside an IPython notebook, with the -pylab flag.)

plt.gca().set_axis_off()
plt.subplots_adjust(top = 1, bottom = 0, right = 1, left = 0, 
            hspace = 0, wspace = 0)
plt.margins(0,0)
plt.gca().xaxis.set_major_locator(plt.NullLocator())
plt.gca().yaxis.set_major_locator(plt.NullLocator())
plt.savefig("filename.pdf", bbox_inches = "tight",
    pad_inches = 0)

I have tried to deactivate different parts of this, but this always lead to a white margin somewhere. You may even have modify this to keep fat lines near the limits of the figure from being shaved by the lack of margins.

List comprehension vs. lambda + filter

5 answers

I happened to find myself having a basic filtering need: I have a list and I have to filter it by an attribute of the items.

My code looked like this:

my_list = [x for x in my_list if x.attribute == value]

But then I thought, wouldn"t it be better to write it like this?

my_list = filter(lambda x: x.attribute == value, my_list)

It"s more readable, and if needed for performance the lambda could be taken out to gain something.

Question is: are there any caveats in using the second way? Any performance difference? Am I missing the Pythonic Way‚Ñ¢ entirely and should do it in yet another way (such as using itemgetter instead of the lambda)?

957

Answer #1

It is strange how much beauty varies for different people. I find the list comprehension much clearer than filter+lambda, but use whichever you find easier.

There are two things that may slow down your use of filter.

The first is the function call overhead: as soon as you use a Python function (whether created by def or lambda) it is likely that filter will be slower than the list comprehension. It almost certainly is not enough to matter, and you shouldn"t think much about performance until you"ve timed your code and found it to be a bottleneck, but the difference will be there.

The other overhead that might apply is that the lambda is being forced to access a scoped variable (value). That is slower than accessing a local variable and in Python 2.x the list comprehension only accesses local variables. If you are using Python 3.x the list comprehension runs in a separate function so it will also be accessing value through a closure and this difference won"t apply.

The other option to consider is to use a generator instead of a list comprehension:

def filterbyvalue(seq, value):
   for el in seq:
       if el.attribute==value: yield el

Then in your main code (which is where readability really matters) you"ve replaced both list comprehension and filter with a hopefully meaningful function name.

957

Answer #2

This is a somewhat religious issue in Python. Even though Guido considered removing map, filter and reduce from Python 3, there was enough of a backlash that in the end only reduce was moved from built-ins to functools.reduce.

Personally I find list comprehensions easier to read. It is more explicit what is happening from the expression [i for i in list if i.attribute == value] as all the behaviour is on the surface not inside the filter function.

I would not worry too much about the performance difference between the two approaches as it is marginal. I would really only optimise this if it proved to be the bottleneck in your application which is unlikely.

Also since the BDFL wanted filter gone from the language then surely that automatically makes list comprehensions more Pythonic ;-)

How do I do a not equal in Django queryset filtering?

5 answers

MikeN By MikeN

In Django model QuerySets, I see that there is a __gt and __lt for comparative values, but is there a __ne or != (not equals)? I want to filter out using a not equals. For example, for

Model:
    bool a;
    int x;

I want to do

results = Model.objects.exclude(a=True, x!=5)

The != is not correct syntax. I also tried __ne.

I ended up using:

results = Model.objects.exclude(a=True, x__lt=5).exclude(a=True, x__gt=5)
784

Answer #1

You can use Q objects for this. They can be negated with the ~ operator and combined much like normal Python expressions:

from myapp.models import Entry
from django.db.models import Q

Entry.objects.filter(~Q(id=3))

will return all entries except the one(s) with 3 as their ID:

[<Entry: Entry object>, <Entry: Entry object>, <Entry: Entry object>, ...]

We hope this article has helped you to resolve the problem. Apart from Detect and exclude outliers in Pandas data frame, check other around-related topics.

Want to excel in Python? See our review of the best Python online courses 2022. If you are interested in Data Science, check also how to learn programming in R.

By the way, this material is also available in other languages:



Marie OConnell

Singapore | 2022-12-07

Maybe there are another answers? What Detect and exclude outliers in Pandas data frame exactly means?. I just hope that will not emerge anymore

Boris Sikorski

New York | 2022-12-07

I was preparing for my coding interview, thanks for clarifying this - Detect and exclude outliers in Pandas data frame in Python is not the simplest one. Will use it in my bachelor thesis

Xu Schteiner

Massachussetts | 2022-12-07

Thanks for explaining! I was stuck with Detect and exclude outliers in Pandas data frame for some hours, finally got it done 🤗. Will get back tomorrow with feedback

Shop

Learn programming in R: courses

$

Best Python online courses for 2022

$

Best laptop for Fortnite

$

Best laptop for Excel

$

Best laptop for Solidworks

$

Best laptop for Roblox

$

Best computer for crypto mining

$

Best laptop for Sims 4

$

Latest questions

NUMPYNUMPY

Common xlabel/ylabel for matplotlib subplots

12 answers

NUMPYNUMPY

How to specify multiple return types using type-hints

12 answers

NUMPYNUMPY

Why do I get "Pickle - EOFError: Ran out of input" reading an empty file?

12 answers

NUMPYNUMPY

Flake8: Ignore specific warning for entire file

12 answers

NUMPYNUMPY

glob exclude pattern

12 answers

NUMPYNUMPY

How to avoid HTTP error 429 (Too Many Requests) python

12 answers

NUMPYNUMPY

Python CSV error: line contains NULL byte

12 answers

NUMPYNUMPY

csv.Error: iterator should return strings, not bytes

12 answers

News


Wiki

Python | How to copy data from one Excel sheet to another

Common xlabel/ylabel for matplotlib subplots

Check if one list is a subset of another in Python

sin

How to specify multiple return types using type-hints

exp

Printing words vertically in Python

exp

Python Extract words from a given string

Cyclic redundancy check in Python

Finding mean, median, mode in Python without libraries

cos

Python add suffix / add prefix to strings in a list

Why do I get "Pickle - EOFError: Ran out of input" reading an empty file?

Python - Move item to the end of the list

Python - Print list vertically