How do I remove whitespace from the end of a string in Python?

StackOverflow | whitespace

I need to remove whitespaces after the word in the string. Can this be done in one line of code?

Example:

string = "    xyz     "

desired result : "    xyz" 

Answer rating: 204

>>> "    xyz     ".rstrip()
"    xyz"

There is more about rstrip in the documentation.





How do I remove whitespace from the end of a string in Python?: StackOverflow Questions

How do I trim whitespace from a string?

Question by robert

How do I remove leading and trailing whitespace from a string in Python?

For example:

" Hello " --> "Hello"
" Hello"  --> "Hello"
"Hello "  --> "Hello"
"Bob has a cat" --> "Bob has a cat"

How do I trim whitespace?

Is there a Python function that will trim whitespace (spaces and tabs) from a string?

Example: example string ‚Üí example string

Remove all whitespace in a string

I want to eliminate all the whitespace from a string, on both ends, and in between words.

I have this Python code:

def my_handle(self):
    sentence = " hello  apple  "
    sentence.strip()

But that only eliminates the whitespace on both sides of the string. How do I remove all whitespace?

Split string on whitespace in Python

I"m looking for the Python equivalent of

String str = "many   fancy word 
hello    	hi";
String whiteSpaceRegex = "\s";
String[] words = str.split(whiteSpaceRegex);

["many", "fancy", "word", "hello", "hi"]

Split by comma and strip whitespace in Python

I have some python code that splits on comma, but doesn"t strip the whitespace:

>>> string = "blah, lots  ,  of ,  spaces, here "
>>> mylist = string.split(",")
>>> print mylist
["blah", " lots  ", "  of ", "  spaces", " here "]

I would rather end up with whitespace removed like this:

["blah", "lots", "of", "spaces", "here"]

I am aware that I could loop through the list and strip() each item but, as this is Python, I"m guessing there"s a quicker, easier and more elegant way of doing it.

Substitute multiple whitespace with single whitespace in Python

I have this string:

mystring = "Here is  some   text   I      wrote   "

How can I substitute the double, triple (...) whitespace chracters with a single space, so that I get:

mystring = "Here is some text I wrote"

Check if string contains only whitespace

How can I test if a string contains only whitespace?

Example strings:

  • " " (space, space, space)

  • " " (space, tab, space, newline, space)

  • " " (newline, newline, newline, tab, newline)

How do I replace whitespaces with underscore?

I want to replace whitespace with underscore in a string to create nice URLs. So that for example:

"This should be connected" becomes "This_should_be_connected" 

I am using Python with Django. Can this be solved using regular expressions?

How to strip all whitespace from string

How do I strip all the spaces in a python string? For example, I want a string like strip my spaces to be turned into stripmyspaces, but I cannot seem to accomplish that with strip():

>>> "strip my spaces".strip()
"strip my spaces"

How do I remove leading whitespace in Python?

I have a text string that starts with a number of spaces, varying between 2 & 4.

What is the simplest way to remove the leading whitespace? (ie. remove everything before a certain character?)

"  Example"   -> "Example"
"  Example  " -> "Example  "
"    Example" -> "Example"

Answer #1

TL;DR version:

For the simple case of:

  • I have a text column with a delimiter and I want two columns

The simplest solution is:

df[["A", "B"]] = df["AB"].str.split(" ", 1, expand=True)

You must use expand=True if your strings have a non-uniform number of splits and you want None to replace the missing values.

Notice how, in either case, the .tolist() method is not necessary. Neither is zip().

In detail:

Andy Hayden"s solution is most excellent in demonstrating the power of the str.extract() method.

But for a simple split over a known separator (like, splitting by dashes, or splitting by whitespace), the .str.split() method is enough1. It operates on a column (Series) of strings, and returns a column (Series) of lists:

>>> import pandas as pd
>>> df = pd.DataFrame({"AB": ["A1-B1", "A2-B2"]})
>>> df

      AB
0  A1-B1
1  A2-B2
>>> df["AB_split"] = df["AB"].str.split("-")
>>> df

      AB  AB_split
0  A1-B1  [A1, B1]
1  A2-B2  [A2, B2]

1: If you"re unsure what the first two parameters of .str.split() do, I recommend the docs for the plain Python version of the method.

But how do you go from:

  • a column containing two-element lists

to:

  • two columns, each containing the respective element of the lists?

Well, we need to take a closer look at the .str attribute of a column.

It"s a magical object that is used to collect methods that treat each element in a column as a string, and then apply the respective method in each element as efficient as possible:

>>> upper_lower_df = pd.DataFrame({"U": ["A", "B", "C"]})
>>> upper_lower_df

   U
0  A
1  B
2  C
>>> upper_lower_df["L"] = upper_lower_df["U"].str.lower()
>>> upper_lower_df

   U  L
0  A  a
1  B  b
2  C  c

But it also has an "indexing" interface for getting each element of a string by its index:

>>> df["AB"].str[0]

0    A
1    A
Name: AB, dtype: object

>>> df["AB"].str[1]

0    1
1    2
Name: AB, dtype: object

Of course, this indexing interface of .str doesn"t really care if each element it"s indexing is actually a string, as long as it can be indexed, so:

>>> df["AB"].str.split("-", 1).str[0]

0    A1
1    A2
Name: AB, dtype: object

>>> df["AB"].str.split("-", 1).str[1]

0    B1
1    B2
Name: AB, dtype: object

Then, it"s a simple matter of taking advantage of the Python tuple unpacking of iterables to do

>>> df["A"], df["B"] = df["AB"].str.split("-", 1).str
>>> df

      AB  AB_split   A   B
0  A1-B1  [A1, B1]  A1  B1
1  A2-B2  [A2, B2]  A2  B2

Of course, getting a DataFrame out of splitting a column of strings is so useful that the .str.split() method can do it for you with the expand=True parameter:

>>> df["AB"].str.split("-", 1, expand=True)

    0   1
0  A1  B1
1  A2  B2

So, another way of accomplishing what we wanted is to do:

>>> df = df[["AB"]]
>>> df

      AB
0  A1-B1
1  A2-B2

>>> df.join(df["AB"].str.split("-", 1, expand=True).rename(columns={0:"A", 1:"B"}))

      AB   A   B
0  A1-B1  A1  B1
1  A2-B2  A2  B2

The expand=True version, although longer, has a distinct advantage over the tuple unpacking method. Tuple unpacking doesn"t deal well with splits of different lengths:

>>> df = pd.DataFrame({"AB": ["A1-B1", "A2-B2", "A3-B3-C3"]})
>>> df
         AB
0     A1-B1
1     A2-B2
2  A3-B3-C3
>>> df["A"], df["B"], df["C"] = df["AB"].str.split("-")
Traceback (most recent call last):
  [...]    
ValueError: Length of values does not match length of index
>>> 

But expand=True handles it nicely by placing None in the columns for which there aren"t enough "splits":

>>> df.join(
...     df["AB"].str.split("-", expand=True).rename(
...         columns={0:"A", 1:"B", 2:"C"}
...     )
... )
         AB   A   B     C
0     A1-B1  A1  B1  None
1     A2-B2  A2  B2  None
2  A3-B3-C3  A3  B3    C3

Answer #2

To somewhat expand on the earlier answers here, there are a number of details which are commonly overlooked.

  • Prefer subprocess.run() over subprocess.check_call() and friends over subprocess.call() over subprocess.Popen() over os.system() over os.popen()
  • Understand and probably use text=True, aka universal_newlines=True.
  • Understand the meaning of shell=True or shell=False and how it changes quoting and the availability of shell conveniences.
  • Understand differences between sh and Bash
  • Understand how a subprocess is separate from its parent, and generally cannot change the parent.
  • Avoid running the Python interpreter as a subprocess of Python.

These topics are covered in some more detail below.

Prefer subprocess.run() or subprocess.check_call()

The subprocess.Popen() function is a low-level workhorse but it is tricky to use correctly and you end up copy/pasting multiple lines of code ... which conveniently already exist in the standard library as a set of higher-level wrapper functions for various purposes, which are presented in more detail in the following.

Here"s a paragraph from the documentation:

The recommended approach to invoking subprocesses is to use the run() function for all use cases it can handle. For more advanced use cases, the underlying Popen interface can be used directly.

Unfortunately, the availability of these wrapper functions differs between Python versions.

  • subprocess.run() was officially introduced in Python 3.5. It is meant to replace all of the following.
  • subprocess.check_output() was introduced in Python 2.7 / 3.1. It is basically equivalent to subprocess.run(..., check=True, stdout=subprocess.PIPE).stdout
  • subprocess.check_call() was introduced in Python 2.5. It is basically equivalent to subprocess.run(..., check=True)
  • subprocess.call() was introduced in Python 2.4 in the original subprocess module (PEP-324). It is basically equivalent to subprocess.run(...).returncode

High-level API vs subprocess.Popen()

The refactored and extended subprocess.run() is more logical and more versatile than the older legacy functions it replaces. It returns a CompletedProcess object which has various methods which allow you to retrieve the exit status, the standard output, and a few other results and status indicators from the finished subprocess.

subprocess.run() is the way to go if you simply need a program to run and return control to Python. For more involved scenarios (background processes, perhaps with interactive I/O with the Python parent program) you still need to use subprocess.Popen() and take care of all the plumbing yourself. This requires a fairly intricate understanding of all the moving parts and should not be undertaken lightly. The simpler Popen object represents the (possibly still-running) process which needs to be managed from your code for the remainder of the lifetime of the subprocess.

It should perhaps be emphasized that just subprocess.Popen() merely creates a process. If you leave it at that, you have a subprocess running concurrently alongside with Python, so a "background" process. If it doesn"t need to do input or output or otherwise coordinate with you, it can do useful work in parallel with your Python program.

Avoid os.system() and os.popen()

Since time eternal (well, since Python 2.5) the os module documentation has contained the recommendation to prefer subprocess over os.system():

The subprocess module provides more powerful facilities for spawning new processes and retrieving their results; using that module is preferable to using this function.

The problems with system() are that it"s obviously system-dependent and doesn"t offer ways to interact with the subprocess. It simply runs, with standard output and standard error outside of Python"s reach. The only information Python receives back is the exit status of the command (zero means success, though the meaning of non-zero values is also somewhat system-dependent).

PEP-324 (which was already mentioned above) contains a more detailed rationale for why os.system is problematic and how subprocess attempts to solve those issues.

os.popen() used to be even more strongly discouraged:

Deprecated since version 2.6: This function is obsolete. Use the subprocess module.

However, since sometime in Python 3, it has been reimplemented to simply use subprocess, and redirects to the subprocess.Popen() documentation for details.

Understand and usually use check=True

You"ll also notice that subprocess.call() has many of the same limitations as os.system(). In regular use, you should generally check whether the process finished successfully, which subprocess.check_call() and subprocess.check_output() do (where the latter also returns the standard output of the finished subprocess). Similarly, you should usually use check=True with subprocess.run() unless you specifically need to allow the subprocess to return an error status.

In practice, with check=True or subprocess.check_*, Python will throw a CalledProcessError exception if the subprocess returns a nonzero exit status.

A common error with subprocess.run() is to omit check=True and be surprised when downstream code fails if the subprocess failed.

On the other hand, a common problem with check_call() and check_output() was that users who blindly used these functions were surprised when the exception was raised e.g. when grep did not find a match. (You should probably replace grep with native Python code anyway, as outlined below.)

All things counted, you need to understand how shell commands return an exit code, and under what conditions they will return a non-zero (error) exit code, and make a conscious decision how exactly it should be handled.

Understand and probably use text=True aka universal_newlines=True

Since Python 3, strings internal to Python are Unicode strings. But there is no guarantee that a subprocess generates Unicode output, or strings at all.

(If the differences are not immediately obvious, Ned Batchelder"s Pragmatic Unicode is recommended, if not outright obligatory, reading. There is a 36-minute video presentation behind the link if you prefer, though reading the page yourself will probably take significantly less time.)

Deep down, Python has to fetch a bytes buffer and interpret it somehow. If it contains a blob of binary data, it shouldn"t be decoded into a Unicode string, because that"s error-prone and bug-inducing behavior - precisely the sort of pesky behavior which riddled many Python 2 scripts, before there was a way to properly distinguish between encoded text and binary data.

With text=True, you tell Python that you, in fact, expect back textual data in the system"s default encoding, and that it should be decoded into a Python (Unicode) string to the best of Python"s ability (usually UTF-8 on any moderately up to date system, except perhaps Windows?)

If that"s not what you request back, Python will just give you bytes strings in the stdout and stderr strings. Maybe at some later point you do know that they were text strings after all, and you know their encoding. Then, you can decode them.

normal = subprocess.run([external, arg],
    stdout=subprocess.PIPE, stderr=subprocess.PIPE,
    check=True,
    text=True)
print(normal.stdout)

convoluted = subprocess.run([external, arg],
    stdout=subprocess.PIPE, stderr=subprocess.PIPE,
    check=True)
# You have to know (or guess) the encoding
print(convoluted.stdout.decode("utf-8"))

Python 3.7 introduced the shorter and more descriptive and understandable alias text for the keyword argument which was previously somewhat misleadingly called universal_newlines.

Understand shell=True vs shell=False

With shell=True you pass a single string to your shell, and the shell takes it from there.

With shell=False you pass a list of arguments to the OS, bypassing the shell.

When you don"t have a shell, you save a process and get rid of a fairly substantial amount of hidden complexity, which may or may not harbor bugs or even security problems.

On the other hand, when you don"t have a shell, you don"t have redirection, wildcard expansion, job control, and a large number of other shell features.

A common mistake is to use shell=True and then still pass Python a list of tokens, or vice versa. This happens to work in some cases, but is really ill-defined and could break in interesting ways.

# XXX AVOID THIS BUG
buggy = subprocess.run("dig +short stackoverflow.com")

# XXX AVOID THIS BUG TOO
broken = subprocess.run(["dig", "+short", "stackoverflow.com"],
    shell=True)

# XXX DEFINITELY AVOID THIS
pathological = subprocess.run(["dig +short stackoverflow.com"],
    shell=True)

correct = subprocess.run(["dig", "+short", "stackoverflow.com"],
    # Probably don"t forget these, too
    check=True, text=True)

# XXX Probably better avoid shell=True
# but this is nominally correct
fixed_but_fugly = subprocess.run("dig +short stackoverflow.com",
    shell=True,
    # Probably don"t forget these, too
    check=True, text=True)

The common retort "but it works for me" is not a useful rebuttal unless you understand exactly under what circumstances it could stop working.

Refactoring Example

Very often, the features of the shell can be replaced with native Python code. Simple Awk or sed scripts should probably simply be translated to Python instead.

To partially illustrate this, here is a typical but slightly silly example which involves many shell features.

cmd = """while read -r x;
   do ping -c 3 "$x" | grep "round-trip min/avg/max"
   done <hosts.txt"""

# Trivial but horrible
results = subprocess.run(
    cmd, shell=True, universal_newlines=True, check=True)
print(results.stdout)

# Reimplement with shell=False
with open("hosts.txt") as hosts:
    for host in hosts:
        host = host.rstrip("
")  # drop newline
        ping = subprocess.run(
             ["ping", "-c", "3", host],
             text=True,
             stdout=subprocess.PIPE,
             check=True)
        for line in ping.stdout.split("
"):
             if "round-trip min/avg/max" in line:
                 print("{}: {}".format(host, line))

Some things to note here:

  • With shell=False you don"t need the quoting that the shell requires around strings. Putting quotes anyway is probably an error.
  • It often makes sense to run as little code as possible in a subprocess. This gives you more control over execution from within your Python code.
  • Having said that, complex shell pipelines are tedious and sometimes challenging to reimplement in Python.

The refactored code also illustrates just how much the shell really does for you with a very terse syntax -- for better or for worse. Python says explicit is better than implicit but the Python code is rather verbose and arguably looks more complex than this really is. On the other hand, it offers a number of points where you can grab control in the middle of something else, as trivially exemplified by the enhancement that we can easily include the host name along with the shell command output. (This is by no means challenging to do in the shell, either, but at the expense of yet another diversion and perhaps another process.)

Common Shell Constructs

For completeness, here are brief explanations of some of these shell features, and some notes on how they can perhaps be replaced with native Python facilities.

  • Globbing aka wildcard expansion can be replaced with glob.glob() or very often with simple Python string comparisons like for file in os.listdir("."): if not file.endswith(".png"): continue. Bash has various other expansion facilities like .{png,jpg} brace expansion and {1..100} as well as tilde expansion (~ expands to your home directory, and more generally ~account to the home directory of another user)
  • Shell variables like $SHELL or $my_exported_var can sometimes simply be replaced with Python variables. Exported shell variables are available as e.g. os.environ["SHELL"] (the meaning of export is to make the variable available to subprocesses -- a variable which is not available to subprocesses will obviously not be available to Python running as a subprocess of the shell, or vice versa. The env= keyword argument to subprocess methods allows you to define the environment of the subprocess as a dictionary, so that"s one way to make a Python variable visible to a subprocess). With shell=False you will need to understand how to remove any quotes; for example, cd "$HOME" is equivalent to os.chdir(os.environ["HOME"]) without quotes around the directory name. (Very often cd is not useful or necessary anyway, and many beginners omit the double quotes around the variable and get away with it until one day ...)
  • Redirection allows you to read from a file as your standard input, and write your standard output to a file. grep "foo" <inputfile >outputfile opens outputfile for writing and inputfile for reading, and passes its contents as standard input to grep, whose standard output then lands in outputfile. This is not generally hard to replace with native Python code.
  • Pipelines are a form of redirection. echo foo | nl runs two subprocesses, where the standard output of echo is the standard input of nl (on the OS level, in Unix-like systems, this is a single file handle). If you cannot replace one or both ends of the pipeline with native Python code, perhaps think about using a shell after all, especially if the pipeline has more than two or three processes (though look at the pipes module in the Python standard library or a number of more modern and versatile third-party competitors).
  • Job control lets you interrupt jobs, run them in the background, return them to the foreground, etc. The basic Unix signals to stop and continue a process are of course available from Python, too. But jobs are a higher-level abstraction in the shell which involve process groups etc which you have to understand if you want to do something like this from Python.
  • Quoting in the shell is potentially confusing until you understand that everything is basically a string. So ls -l / is equivalent to "ls" "-l" "/" but the quoting around literals is completely optional. Unquoted strings which contain shell metacharacters undergo parameter expansion, whitespace tokenization and wildcard expansion; double quotes prevent whitespace tokenization and wildcard expansion but allow parameter expansions (variable substitution, command substitution, and backslash processing). This is simple in theory but can get bewildering, especially when there are several layers of interpretation (a remote shell command, for example).

Understand differences between sh and Bash

subprocess runs your shell commands with /bin/sh unless you specifically request otherwise (except of course on Windows, where it uses the value of the COMSPEC variable). This means that various Bash-only features like arrays, [[ etc are not available.

If you need to use Bash-only syntax, you can pass in the path to the shell as executable="/bin/bash" (where of course if your Bash is installed somewhere else, you need to adjust the path).

subprocess.run("""
    # This for loop syntax is Bash only
    for((i=1;i<=$#;i++)); do
        # Arrays are Bash-only
        array[i]+=123
    done""",
    shell=True, check=True,
    executable="/bin/bash")

A subprocess is separate from its parent, and cannot change it

A somewhat common mistake is doing something like

subprocess.run("cd /tmp", shell=True)
subprocess.run("pwd", shell=True)  # Oops, doesn"t print /tmp

The same thing will happen if the first subprocess tries to set an environment variable, which of course will have disappeared when you run another subprocess, etc.

A child process runs completely separate from Python, and when it finishes, Python has no idea what it did (apart from the vague indicators that it can infer from the exit status and output from the child process). A child generally cannot change the parent"s environment; it cannot set a variable, change the working directory, or, in so many words, communicate with its parent without cooperation from the parent.

The immediate fix in this particular case is to run both commands in a single subprocess;

subprocess.run("cd /tmp; pwd", shell=True)

though obviously this particular use case isn"t very useful; instead, use the cwd keyword argument, or simply os.chdir() before running the subprocess. Similarly, for setting a variable, you can manipulate the environment of the current process (and thus also its children) via

os.environ["foo"] = "bar"

or pass an environment setting to a child process with

subprocess.run("echo "$foo"", shell=True, env={"foo": "bar"})

(not to mention the obvious refactoring subprocess.run(["echo", "bar"]); but echo is a poor example of something to run in a subprocess in the first place, of course).

Don"t run Python from Python

This is slightly dubious advice; there are certainly situations where it does make sense or is even an absolute requirement to run the Python interpreter as a subprocess from a Python script. But very frequently, the correct approach is simply to import the other Python module into your calling script and call its functions directly.

If the other Python script is under your control, and it isn"t a module, consider turning it into one. (This answer is too long already so I will not delve into details here.)

If you need parallelism, you can run Python functions in subprocesses with the multiprocessing module. There is also threading which runs multiple tasks in a single process (which is more lightweight and gives you more control, but also more constrained in that threads within a process are tightly coupled, and bound to a single GIL.)

Answer #3

There is already an inbuilt function in python for this.

>>> from textwrap import wrap
>>> s = "1234567890"
>>> wrap(s, 2)
["12", "34", "56", "78", "90"]

This is what the docstring for wrap says:

>>> help(wrap)
"""
Help on function wrap in module textwrap:

wrap(text, width=70, **kwargs)
    Wrap a single paragraph of text, returning a list of wrapped lines.

    Reformat the single paragraph in "text" so it fits in lines of no
    more than "width" columns, and return a list of wrapped lines.  By
    default, tabs in "text" are expanded with string.expandtabs(), and
    all other whitespace characters (including newline) are converted to
    space.  See TextWrapper class for available keyword args to customize
    wrapping behaviour.
"""

Answer #4

Python"s pass mainly exists because in Python whitespace matters within a block. In Javascript, the equivalent would be putting nothing within the block, i.e. {}.

Answer #5

An alternative is to use regular expressions and match these strange white-space characters too. Here are some examples:

Remove ALL spaces in a string, even between words:

import re
sentence = re.sub(r"s+", "", sentence, flags=re.UNICODE)

Remove spaces in the BEGINNING of a string:

import re
sentence = re.sub(r"^s+", "", sentence, flags=re.UNICODE)

Remove spaces in the END of a string:

import re
sentence = re.sub(r"s+$", "", sentence, flags=re.UNICODE)

Remove spaces both in the BEGINNING and in the END of a string:

import re
sentence = re.sub("^s+|s+$", "", sentence, flags=re.UNICODE)

Remove ONLY DUPLICATE spaces:

import re
sentence = " ".join(re.split("s+", sentence, flags=re.UNICODE))

(All examples work in both Python 2 and Python 3)

Answer #6

Read and write JSON files with Python 2+3; works with unicode

# -*- coding: utf-8 -*-
import json

# Make it work for Python 2+3 and with Unicode
import io
try:
    to_unicode = unicode
except NameError:
    to_unicode = str

# Define data
data = {"a list": [1, 42, 3.141, 1337, "help", u"€"],
        "a string": "bla",
        "another dict": {"foo": "bar",
                         "key": "value",
                         "the answer": 42}}

# Write JSON file
with io.open("data.json", "w", encoding="utf8") as outfile:
    str_ = json.dumps(data,
                      indent=4, sort_keys=True,
                      separators=(",", ": "), ensure_ascii=False)
    outfile.write(to_unicode(str_))

# Read JSON file
with open("data.json") as data_file:
    data_loaded = json.load(data_file)

print(data == data_loaded)

Explanation of the parameters of json.dump:

  • indent: Use 4 spaces to indent each entry, e.g. when a new dict is started (otherwise all will be in one line),
  • sort_keys: sort the keys of dictionaries. This is useful if you want to compare json files with a diff tool / put them under version control.
  • separators: To prevent Python from adding trailing whitespaces

With a package

Have a look at my utility package mpu for a super simple and easy to remember one:

import mpu.io
data = mpu.io.read("example.json")
mpu.io.write("example.json", data)

Created JSON file

{
    "a list":[
        1,
        42,
        3.141,
        1337,
        "help",
        "€"
    ],
    "a string":"bla",
    "another dict":{
        "foo":"bar",
        "key":"value",
        "the answer":42
    }
}

Common file endings

.json

Alternatives

For your application, the following might be important:

  • Support by other programming languages
  • Reading / writing performance
  • Compactness (file size)

See also: Comparison of data serialization formats

In case you are rather looking for a way to make configuration files, you might want to read my short article Configuration files in Python

Answer #7

If you want indentation in terms of nesting level rather than spaces and tabs, things get tricky. For example, in the following code:

if True:
    print(
get_nesting_level())

the call to get_nesting_level is actually nested one level deep, despite the fact that there is no leading whitespace on the line of the get_nesting_level call. Meanwhile, in the following code:

print(1,
      2,
      get_nesting_level())

the call to get_nesting_level is nested zero levels deep, despite the presence of leading whitespace on its line.

In the following code:

if True:
  if True:
    print(get_nesting_level())

if True:
    print(get_nesting_level())

the two calls to get_nesting_level are at different nesting levels, despite the fact that the leading whitespace is identical.

In the following code:

if True: print(get_nesting_level())

is that nested zero levels, or one? In terms of INDENT and DEDENT tokens in the formal grammar, it"s zero levels deep, but you might not feel the same way.


If you want to do this, you"re going to have to tokenize the whole file up to the point of the call and count INDENT and DEDENT tokens. The tokenize module would be very useful for such a function:

import inspect
import tokenize

def get_nesting_level():
    caller_frame = inspect.currentframe().f_back
    filename, caller_lineno, _, _, _ = inspect.getframeinfo(caller_frame)
    with open(filename) as f:
        indentation_level = 0
        for token_record in tokenize.generate_tokens(f.readline):
            token_type, _, (token_lineno, _), _, _ = token_record
            if token_lineno > caller_lineno:
                break
            elif token_type == tokenize.INDENT:
                indentation_level += 1
            elif token_type == tokenize.DEDENT:
                indentation_level -= 1
        return indentation_level

Answer #8

If you are using Python 3.4+, you can use textwrap.shorten from the standard library:

Collapse and truncate the given text to fit in the given width.

First the whitespace in text is collapsed (all whitespace is replaced by single spaces). If the result fits in the width, it is returned. Otherwise, enough words are dropped from the end so that the remaining words plus the placeholder fit within width:

>>> textwrap.shorten("Hello  world!", width=12)
"Hello world!"
>>> textwrap.shorten("Hello  world!", width=11)
"Hello [...]"
>>> textwrap.shorten("Hello world", width=10, placeholder="...")
"Hello..."

Answer #9

There are several ways to disable warnings & errors from Pylint. Which one to use has to do with how globally or locally you want to apply the disablement -- an important design decision.

Multiple Approaches

  1. In one or more pylintrc files.

This involves more than the ~/.pylintrc file (in your $HOME directory) as described by Chris Morgan. Pylint will search for rc files, with a precedence that values "closer" files more highly:

  • A pylintrc file in the current working directory; or

  • If the current working directory is in a Python module (i.e. it contains an __init__.py file), searching up the hierarchy of Python modules until a pylintrc file is found; or

  • The file named by the environment variable PYLINTRC; or

  • If you have a home directory that isn‚Äôt /root:

    • ~/.pylintrc; or

    • ~/.config/pylintrc; or

    • /etc/pylintrc

Note that most of these files are named pylintrc -- only the file in ~ has a leading dot.

To your pylintrc file, add lines to disable specific pylint messages. For example:

[MESSAGES CONTROL]
disable=locally-disabled
  1. Further disables from the pylint command line, as described by Aboo and Cairnarvon. This looks like pylint --disable=bad-builtin. Repeat --disable to suppress additional items.

  2. Further disables from individual Python code lines, as described by Imolit. These look like some statement # pylint: disable=broad-except (extra comment on the end of the original source line) and apply only to the current line. My approach is to always put these on the end of other lines of code so they won"t be confused with the block style, see below.

  3. Further disables defined for larger blocks of Python code, up to complete source files.

    • These look like # pragma pylint: disable=bad-whitespace (note the pragma key word).

    • These apply to every line after the pragma. Putting a block of these at the top of a file makes the suppressions apply to the whole file. Putting the same block lower in the file makes them apply only to lines following the block. My approach is to always put these on a line of their own so they won"t be confused with the single-line style, see above.

    • When a suppression should only apply within a span of code, use # pragma pylint: enable=bad-whitespace (now using enable not disable) to stop suppressing.

Note that disabling for a single line uses the # pylint syntax while disabling for this line onward uses the # pragma pylint syntax. These are easy to confuse especially when copying & pasting.

Putting It All Together

I usually use a mix of these approaches.

  • I use ~/.pylintrc for absolutely global standards -- very few of these.

  • I use project-level pylintrc at different levels within Python modules when there are project-specific standards. Especially when you"re taking in code from another person or team, you may find they use conventions that you don"t prefer, but you don"t want to rework the code. Keeping the settings at this level helps not spread those practices to other projects.

  • I use the block style pragmas at the top of single source files. I like to turn the pragmas off (stop suppressing messages) in the heat of development even for Pylint standards I don"t agree with (like "too few public methods" -- I always get that warning on custom Exception classes) -- but it"s helpful to see more / maybe all Pylint messages while you"re developing. That way you can find the cases you want to address with single-line pragmas (see below), or just add comments for the next developer to explain why that warning is OK in this case.

  • I leave some of the block-style pragmas enabled even when the code is ready to check in. I try to use few of those, but when it makes sense for the module, it"s OK to do as documentation. However I try to leave as few on as possible, preferably none.

  • I use the single-line-comment style to address especially potent errors. For example, if there"s a place where it actually makes sense to do except Exception as exc, I put the # pylint: disable=broad-except on that line instead of a more global approach because this is a strange exception and needs to be called out, basically as a form of documentation.


Like everything else in Python, you can act at different levels of indirection. My advice is to think about what belongs at what level so you don"t end up with a too-lenient approach to Pylint.

Answer #10

Pythonic + Pandorable: df[df["col"].astype(bool)]

Empty strings are falsy, which means you can filter on bool values like this:

df = pd.DataFrame({
    "A": range(5),
    "B": ["foo", "", "bar", "", "xyz"]
})
df
   A    B
0  0  foo
1  1     
2  2  bar
3  3     
4  4  xyz
df["B"].astype(bool)                                                                                                                      
0     True
1    False
2     True
3    False
4     True
Name: B, dtype: bool

df[df["B"].astype(bool)]                                                                                                                  
   A    B
0  0  foo
2  2  bar
4  4  xyz

If your goal is to remove not only empty strings, but also strings only containing whitespace, use str.strip beforehand:

df[df["B"].str.strip().astype(bool)]
   A    B
0  0  foo
2  2  bar
4  4  xyz

Faster than you Think

.astype is a vectorised operation, this is faster than every option presented thus far. At least, from my tests. YMMV.

Here is a timing comparison, I"ve thrown in some other methods I could think of.

enter image description here

Benchmarking code, for reference:

import pandas as pd
import perfplot

df1 = pd.DataFrame({
    "A": range(5),
    "B": ["foo", "", "bar", "", "xyz"]
})

perfplot.show(
    setup=lambda n: pd.concat([df1] * n, ignore_index=True),
    kernels=[
        lambda df: df[df["B"].astype(bool)],
        lambda df: df[df["B"] != ""],
        lambda df: df[df["B"].replace("", np.nan).notna()],  # optimized 1-col
        lambda df: df.replace({"B": {"": np.nan}}).dropna(subset=["B"]),  
    ],
    labels=["astype", "!= """, "replace + notna", "replace + dropna", ],
    n_range=[2**k for k in range(1, 15)],
    xlabel="N",
    logx=True,
    logy=True,
    equality_check=pd.DataFrame.equals)

Get Solution for free from DataCamp guru