Replacing blank values (white space) with NaN in pandas

| | |

👻 Check our latest review to choose the best laptop for Machine Learning engineers and Deep learning tasks!

I want to find all values in a Pandas dataframe that contain whitespace (any arbitrary amount) and replace those values with NaNs.

Any ideas how this can be improved?

Basically I want to turn this:

                   A    B    C
2000-01-01 -0.532681  foo    0
2000-01-02  1.490752  bar    1
2000-01-03 -1.387326  foo    2
2000-01-04  0.814772  baz     
2000-01-05 -0.222552         4
2000-01-06 -1.176781  qux     

Into this:

                   A     B     C
2000-01-01 -0.532681   foo     0
2000-01-02  1.490752   bar     1
2000-01-03 -1.387326   foo     2
2000-01-04  0.814772   baz   NaN
2000-01-05 -0.222552   NaN     4
2000-01-06 -1.176781   qux   NaN

I"ve managed to do it with the code below, but man is it ugly. It"s not Pythonic and I"m sure it"s not the most efficient use of pandas either. I loop through each column and do boolean replacement against a column mask generated by applying a function that does a regex search of each value, matching on whitespace.

for i in df.columns:
    df[i][df[i].apply(lambda i: True if re.search("^s*$", str(i)) else False)]=None

It could be optimized a bit by only iterating through fields that could contain empty strings:

if df[i].dtype == np.dtype("object")

But that"s not much of an improvement

And finally, this code sets the target strings to None, which works with Pandas" functions like fillna(), but it would be nice for completeness if I could actually insert a NaN directly instead of None.

👻 Read also: what is the best laptop for engineering students?

Replacing blank values (white space) with NaN in pandas find: Questions

Finding the index of an item in a list

5 answers

Given a list ["foo", "bar", "baz"] and an item in the list "bar", how do I get its index (1) in Python?

3740

Answer #1

>>> ["foo", "bar", "baz"].index("bar")
1

Reference: Data Structures > More on Lists

Caveats follow

Note that while this is perhaps the cleanest way to answer the question as asked, index is a rather weak component of the list API, and I can"t remember the last time I used it in anger. It"s been pointed out to me in the comments that because this answer is heavily referenced, it should be made more complete. Some caveats about list.index follow. It is probably worth initially taking a look at the documentation for it:

list.index(x[, start[, end]])

Return zero-based index in the list of the first item whose value is equal to x. Raises a ValueError if there is no such item.

The optional arguments start and end are interpreted as in the slice notation and are used to limit the search to a particular subsequence of the list. The returned index is computed relative to the beginning of the full sequence rather than the start argument.

Linear time-complexity in list length

An index call checks every element of the list in order, until it finds a match. If your list is long, and you don"t know roughly where in the list it occurs, this search could become a bottleneck. In that case, you should consider a different data structure. Note that if you know roughly where to find the match, you can give index a hint. For instance, in this snippet, l.index(999_999, 999_990, 1_000_000) is roughly five orders of magnitude faster than straight l.index(999_999), because the former only has to search 10 entries, while the latter searches a million:

>>> import timeit
>>> timeit.timeit("l.index(999_999)", setup="l = list(range(0, 1_000_000))", number=1000)
9.356267921015387
>>> timeit.timeit("l.index(999_999, 999_990, 1_000_000)", setup="l = list(range(0, 1_000_000))", number=1000)
0.0004404920036904514
 

Only returns the index of the first match to its argument

A call to index searches through the list in order until it finds a match, and stops there. If you expect to need indices of more matches, you should use a list comprehension, or generator expression.

>>> [1, 1].index(1)
0
>>> [i for i, e in enumerate([1, 2, 1]) if e == 1]
[0, 2]
>>> g = (i for i, e in enumerate([1, 2, 1]) if e == 1)
>>> next(g)
0
>>> next(g)
2

Most places where I once would have used index, I now use a list comprehension or generator expression because they"re more generalizable. So if you"re considering reaching for index, take a look at these excellent Python features.

Throws if element not present in list

A call to index results in a ValueError if the item"s not present.

>>> [1, 1].index(2)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ValueError: 2 is not in list

If the item might not be present in the list, you should either

  1. Check for it first with item in my_list (clean, readable approach), or
  2. Wrap the index call in a try/except block which catches ValueError (probably faster, at least when the list to search is long, and the item is usually present.)

3740

Answer #2

One thing that is really helpful in learning Python is to use the interactive help function:

>>> help(["foo", "bar", "baz"])
Help on list object:

class list(object)
 ...

 |
 |  index(...)
 |      L.index(value, [start, [stop]]) -> integer -- return first index of value
 |

which will often lead you to the method you are looking for.

3740

Answer #3

The majority of answers explain how to find a single index, but their methods do not return multiple indexes if the item is in the list multiple times. Use enumerate():

for i, j in enumerate(["foo", "bar", "baz"]):
    if j == "bar":
        print(i)

The index() function only returns the first occurrence, while enumerate() returns all occurrences.

As a list comprehension:

[i for i, j in enumerate(["foo", "bar", "baz"]) if j == "bar"]

Here"s also another small solution with itertools.count() (which is pretty much the same approach as enumerate):

from itertools import izip as zip, count # izip for maximum efficiency
[i for i, j in zip(count(), ["foo", "bar", "baz"]) if j == "bar"]

This is more efficient for larger lists than using enumerate():

$ python -m timeit -s "from itertools import izip as zip, count" "[i for i, j in zip(count(), ["foo", "bar", "baz"]*500) if j == "bar"]"
10000 loops, best of 3: 174 usec per loop
$ python -m timeit "[i for i, j in enumerate(["foo", "bar", "baz"]*500) if j == "bar"]"
10000 loops, best of 3: 196 usec per loop

How to insert newlines on argparse help text?

5 answers

I"m using argparse in Python 2.7 for parsing input options. One of my options is a multiple choice. I want to make a list in its help text, e.g.

from argparse import ArgumentParser

parser = ArgumentParser(description="test")

parser.add_argument("-g", choices=["a", "b", "g", "d", "e"], default="a",
    help="Some option, where
"
         " a = alpha
"
         " b = beta
"
         " g = gamma
"
         " d = delta
"
         " e = epsilon")

parser.parse_args()

However, argparse strips all newlines and consecutive spaces. The result looks like

~/Downloads:52$ python2.7 x.py -h
usage: x.py [-h] [-g {a,b,g,d,e}]

test

optional arguments:
  -h, --help      show this help message and exit
  -g {a,b,g,d,e}  Some option, where a = alpha b = beta g = gamma d = delta e
                  = epsilon

How to insert newlines in the help text?

406

Answer #1

Try using RawTextHelpFormatter:

from argparse import RawTextHelpFormatter
parser = ArgumentParser(description="test", formatter_class=RawTextHelpFormatter)

Is a Python list guaranteed to have its elements stay in the order they are inserted in?

5 answers

If I have the following Python code

>>> x = []
>>> x = x + [1]
>>> x = x + [2]
>>> x = x + [3]
>>> x
[1, 2, 3]

Will x be guaranteed to always be [1,2,3], or are other orderings of the interim elements possible?

366

Answer #1

Yes, the order of elements in a python list is persistent.

Inserting image into IPython notebook markdown

5 answers

I am starting to depend heavily on the IPython notebook app to develop and document algorithms. It is awesome; but there is something that seems like it should be possible, but I can"t figure out how to do it:

I would like to insert a local image into my (local) IPython notebook markdown to aid in documenting an algorithm. I know enough to add something like <img src="image.png"> to the markdown, but that is about as far as my knowledge goes. I assume I could put the image in the directory represented by 127.0.0.1:8888 (or some subdirectory) to be able to access it, but I can"t figure out where that directory is. (I"m working on a mac.) So, is it possible to do what I"m trying to do without too much trouble?

277

Answer #1

Most of the answers given so far go in the wrong direction, suggesting to load additional libraries and use the code instead of markup. In Ipython/Jupyter Notebooks it is very simple. Make sure the cell is indeed in markup and to display a image use:

![alt text](imagename.png "Title")

Further advantage compared to the other methods proposed is that you can display all common file formats including jpg, png, and gif (animations).

Shop

Learn programming in R: courses

$

Best Python online courses for 2022

$

Best laptop for Fortnite

$

Best laptop for Excel

$

Best laptop for Solidworks

$

Best laptop for Roblox

$

Best computer for crypto mining

$

Best laptop for Sims 4

$

Latest questions

NUMPYNUMPY

psycopg2: insert multiple rows with one query

12 answers

NUMPYNUMPY

How to convert Nonetype to int or string?

12 answers

NUMPYNUMPY

How to specify multiple return types using type-hints

12 answers

NUMPYNUMPY

Javascript Error: IPython is not defined in JupyterLab

12 answers

News


Wiki

Python OpenCV | cv2.putText () method

numpy.arctan2 () in Python

Python | os.path.realpath () method

Python OpenCV | cv2.circle () method

Python OpenCV cv2.cvtColor () method

Python - Move item to the end of the list

time.perf_counter () function in Python

Check if one list is a subset of another in Python

Python os.path.join () method