# Numpy MaskedArray.std () Function | python

NumPy | Python Methods and Functions | std

`numpy.MaskedArray.std()` is used to calculate the standard deviation along the specified axis. Masked entries are ignored here. The standard deviation is calculated for the aligned array by default, otherwise along the specified axis.

Syntax: ` numpy.ma.std (arr, axis = None, dtype = None, out = None, ddof = 0, keepdims = False) `

Parameters:

axis: [int, optional] Axis along which the standard deviation is computed.
dtype: [ dtype, optional] Type of the returned array, as well as of the accumulator in which the elements are multiplied.
out: [ndarray, optional] A location into which the result is stored.
- & gt; If provided, it must have a shape that the inputs broadcast to.
- & gt; If not provided or None, a freshly-allocated array is returned.
ddof: [int, optional] “Delta Degrees of Freedom”: the divisor used in the calculation is N - ddof, where N represents the number of elements. By default ddof is zero.
keepdims: [bool, optional] If this is set to True, the axes which are reduced are left in the result as dimensions with size one. With this option, the result will broadcast correctly against the input array.

Return: [standard_deviation_along_axis, ndarray] A new array holding the result is returned unless out is specified, in which case a reference to out is returned.

Code # 1:

` `

` # Program Python explaining # numpy.MaskedArray.std () method   # import numy as geek # and numpy.ma module as ma import numpy as geek  import numpy.ma as ma    # create input array < / p> in_arr = geek.array ([[ 1 , 2 ], [ 3 , - 1 ], [ 5 , - 3 ]]) print ( " Input array: " , in_arr)    # Now we create a masked array. # invalidating the post. mask_arr = ma.masked_array (in_arr, mask < code class = "keyword"> = [[ 1 , 0 ], [ 1 , 0 ], [ 0 , 0 ]])  print ( " Masked array: " , mask_arr)    # apply MaskedArray.std # methods of the masked array out_arr = ma.std (mask_arr)  print ( "standard deviation of masked array along default axi s: " , out_arr) `

` ` Output:

` Input array: [[1 2] [3 -1] [5 -3]] Masked array: [[- 2] [- -1] [ 5 -3]] standard deviation of masked array along default axis: 3.031088913245535 `

Code # 2:

 ` # Python program explaining ` ` # numpy.MaskedArray.std () method `   ` # import numy as geek ` ` # and the numpy.ma module as ma ` ` import ` ` numpy as geek ` ` import ` ` numpy.ma as ma `   ` # create input array `  ` in_arr ` ` = ` ` geek.array ([[` ` 1 ` `, ` ` 0 ` `, ` ` 3 ` `], [` ` 4 ` `, ` ` 1 ` `, ` ` 6 ` `]]) ` ` print ` ` (` ` "Input array:" ` `, in_arr) `   ` # We are now creating a masked array. ` ` # invalidating one entry. ` ` mask_arr ` ` = ` ` ma.masked_array (in_arr, mask ` ` = ` ` [[` 0 `, ` ` 0 ` `, ` ` 0 ` `], [` ` 0 ` ` , ` ` 0 ` `, ` ` 1 ` `]]) ` ` print ` ` (` ` "Masked array:" ` `, mask_arr) `   ` # applying the MaskedArray.std methods ` ` # into the masked array ` ` out_arr1 ` ` = ` ` ma.std (mask_arr, axis ` ` = ` ` 0 ` `) ` ` print ` ` (` ` "standard deviati on of masked array along 0 axis: "` `, out_arr1) ` ` `  ` out_arr2 ` ` = ` ` ma.std (mask_arr, axis ` ` = ` ` 1 ` `) ` ` print ` ` (` ` "standard deviation of masked array along 1 axis:" ` `, out_arr2) `

Exit:

` Input array: [[1 0 3]  [4 1 6]]  Masked array: [[1 0 3]  [4 1 -]]  standard deviation of masked array along 0 axis: [1.5 0.5 0.0]  standard deviation of masked array along 1 axis: [1.247219128924647 1.5] `

## Why is reading lines from stdin much slower in C++ than Python?

I wanted to compare reading lines of string input from stdin using Python and C++ and was shocked to see my C++ code run an order of magnitude slower than the equivalent Python code. Since my C++ is rusty and I"m not yet an expert Pythonista, please tell me if I"m doing something wrong or if I"m misunderstanding something.

(TLDR answer: include the statement: `cin.sync_with_stdio(false)` or just use `fgets` instead.

TLDR results: scroll all the way down to the bottom of my question and look at the table.)

C++ code:

``````#include <iostream>
#include <time.h>

using namespace std;

int main() {
string input_line;
long line_count = 0;
time_t start = time(NULL);
int sec;
int lps;

while (cin) {
getline(cin, input_line);
if (!cin.eof())
line_count++;
};

sec = (int) time(NULL) - start;
cerr << "Read " << line_count << " lines in " << sec << " seconds.";
if (sec > 0) {
lps = line_count / sec;
cerr << " LPS: " << lps << endl;
} else
cerr << endl;
return 0;
}

// Compiled with:
// g++ -O3 -o readline_test_cpp foo.cpp
``````

Python Equivalent:

``````#!/usr/bin/env python
import time
import sys

count = 0
start = time.time()

for line in  sys.stdin:
count += 1

delta_sec = int(time.time() - start_time)
if delta_sec >= 0:
lines_per_sec = int(round(count/delta_sec))
print("Read {0} lines in {1} seconds. LPS: {2}".format(count, delta_sec,
lines_per_sec))
``````

Here are my results:

``````\$ cat test_lines | ./readline_test_cpp
Read 5570000 lines in 9 seconds. LPS: 618889

Read 5570000 lines in 1 seconds. LPS: 5570000
``````

I should note that I tried this both under Mac¬†OS¬†X¬†v10.6.8 (Snow¬†Leopard) and Linux 2.6.32 (Red Hat Linux 6.2). The former is a MacBook Pro, and the latter is a very beefy server, not that this is too pertinent.

``````\$ for i in {1..5}; do echo "Test run \$i at `date`"; echo -n "CPP:"; cat test_lines | ./readline_test_cpp ; echo -n "Python:"; cat test_lines | ./readline_test.py ; done
``````
``````Test run 1 at Mon Feb 20 21:29:28 EST 2012
CPP:   Read 5570001 lines in 9 seconds. LPS: 618889
Python:Read 5570000 lines in 1 seconds. LPS: 5570000
Test run 2 at Mon Feb 20 21:29:39 EST 2012
CPP:   Read 5570001 lines in 9 seconds. LPS: 618889
Python:Read 5570000 lines in 1 seconds. LPS: 5570000
Test run 3 at Mon Feb 20 21:29:50 EST 2012
CPP:   Read 5570001 lines in 9 seconds. LPS: 618889
Python:Read 5570000 lines in 1 seconds. LPS: 5570000
Test run 4 at Mon Feb 20 21:30:01 EST 2012
CPP:   Read 5570001 lines in 9 seconds. LPS: 618889
Python:Read 5570000 lines in 1 seconds. LPS: 5570000
Test run 5 at Mon Feb 20 21:30:11 EST 2012
CPP:   Read 5570001 lines in 10 seconds. LPS: 557000
Python:Read 5570000 lines in  1 seconds. LPS: 5570000
``````

For completeness, I thought I"d update the read speed for the same file on the same box with the original (synced) C++ code. Again, this is for a 100M line file on a fast disk. Here"s the comparison, with several solutions/approaches:

Implementation Lines per second
python (default) 3,571,428
cin (default/naive) 819,672
cin (no sync) 12,500,000
fgets 14,285,714
wc (not fair comparison) 54,644,808

## How do you read from stdin?

I"m trying to do some of the code golf challenges, but they all require the input to be taken from `stdin`. How do I get that in Python?

## How to print to stderr in Python?

There are several ways to write to stderr:

``````# Note: this first one does not work in Python 3
print >> sys.stderr, "spam"

sys.stderr.write("spam
")

os.write(2, b"spam
")

from __future__ import print_function
print("spam", file=sys.stderr)
``````

That seems to contradict zen of Python #13 ‚Ä†, so what"s the difference here and are there any advantages or disadvantages to one way or the other? Which way should be used?

‚Ä† There should be one ‚Äî and preferably only one ‚Äî obvious way to do it.

## Finding local IP addresses using Python"s stdlib

### Question by Unkwntech

How can I find local IP addresses (i.e. 192.168.x.x or 10.0.x.x) in Python platform independently and using only the standard library?

## Making Python loggers output all messages to stdout in addition to log file

### Question by user248237

Is there a way to make Python logging using the `logging` module automatically output things to stdout in addition to the log file where they are supposed to go? For example, I"d like all calls to `logger.warning`, `logger.critical`, `logger.error` to go to their intended places but in addition always be copied to `stdout`. This is to avoid duplicating messages like:

``````mylogger.critical("something failed")
print "something failed"
``````

## logger configuration to log to file and print to stdout

I"m using Python"s logging module to log some debug strings to a file which works pretty well. Now in addition, I"d like to use this module to also print the strings out to stdout. How do I do this? In order to log my strings to a file I use following code:

``````import logging
import logging.handlers
logger = logging.getLogger("")
logger.setLevel(logging.DEBUG)
handler = logging.handlers.RotatingFileHandler(
LOGFILE, maxBytes=(1048576*5), backupCount=7
)
formatter = logging.Formatter("%(asctime)s - %(name)s - %(levelname)s - %(message)s")
handler.setFormatter(formatter)
``````

and then call a logger function like

``````logger.debug("I am written to the file")
``````

Thank you for some help here!

## The difference between sys.stdout.write and print?

Are there situations in which `sys.stdout.write()` is preferable to `print`?

(Examples: better performance; code that makes more sense)

## Redirect stdout to a file in Python?

### Question by user234932

How do I redirect stdout to an arbitrary file in Python?

When a long-running Python script (e.g, web application) is started from within the ssh session and backgounded, and the ssh session is closed, the application will raise IOError and fail the moment it tries to write to stdout. I needed to find a way to make the application and modules output to a file rather than stdout to prevent failure due to IOError. Currently, I employ nohup to redirect output to a file, and that gets the job done, but I was wondering if there was a way to do it without using nohup, out of curiosity.

I have already tried `sys.stdout = open("somefile", "w")`, but this does not seem to prevent some external modules from still outputting to terminal (or maybe the `sys.stdout = ...` line did not fire at all). I know it should work from simpler scripts I"ve tested on, but I also didn"t have time yet to test on a web application yet.

## Setting the correct encoding when piping stdout in Python

### Question by cortex

When piping the output of a Python program, the Python interpreter gets confused about encoding and sets it to None. This means a program like this:

``````# -*- coding: utf-8 -*-
print u"√•√§√∂"
``````

will work fine when run normally, but fail with:

UnicodeEncodeError: "ascii" codec can"t encode character u"xa0" in position 0: ordinal not in range(128)

when used in a pipe sequence.

What is the best way to make this work when piping? Can I just tell it to use whatever encoding the shell/filesystem/whatever is using?

The suggestions I have seen thus far is to modify your site.py directly, or hardcoding the defaultencoding using this hack:

``````# -*- coding: utf-8 -*-
import sys
sys.setdefaultencoding("utf-8")
print u"√•√§√∂"
``````

Is there a better way to make piping work?

## How do I pass a string into subprocess.Popen (using the stdin argument)?

### Question by Daryl Spitzer

If I do the following:

``````import subprocess
from cStringIO import StringIO
subprocess.Popen(["grep","f"],stdout=subprocess.PIPE,stdin=StringIO("one
two
three
four
five
six
")).communicate()[0]
``````

I get:

``````Traceback (most recent call last):
File "<stdin>", line 1, in ?
File "/build/toolchain/mac32/python-2.4.3/lib/python2.4/subprocess.py", line 533, in __init__
File "/build/toolchain/mac32/python-2.4.3/lib/python2.4/subprocess.py", line 830, in _get_handles
AttributeError: "cStringIO.StringI" object has no attribute "fileno"
``````

Apparently a cStringIO.StringIO object doesn"t quack close enough to a file duck to suit subprocess.Popen. How do I work around this?

The Python 3 `range()` object doesn"t produce numbers immediately; it is a smart sequence object that produces numbers on demand. All it contains is your start, stop and step values, then as you iterate over the object the next integer is calculated each iteration.

The object also implements the `object.__contains__` hook, and calculates if your number is part of its range. Calculating is a (near) constant time operation *. There is never a need to scan through all possible integers in the range.

The advantage of the `range` type over a regular `list` or `tuple` is that a range object will always take the same (small) amount of memory, no matter the size of the range it represents (as it only stores the `start`, `stop` and `step` values, calculating individual items and subranges as needed).

So at a minimum, your `range()` object would do:

``````class my_range:
def __init__(self, start, stop=None, step=1, /):
if stop is None:
start, stop = 0, start
self.start, self.stop, self.step = start, stop, step
if step < 0:
lo, hi, step = stop, start, -step
else:
lo, hi = start, stop
self.length = 0 if lo > hi else ((hi - lo - 1) // step) + 1

def __iter__(self):
current = self.start
if self.step < 0:
while current > self.stop:
yield current
current += self.step
else:
while current < self.stop:
yield current
current += self.step

def __len__(self):
return self.length

def __getitem__(self, i):
if i < 0:
i += self.length
if 0 <= i < self.length:
return self.start + i * self.step
raise IndexError("my_range object index out of range")

def __contains__(self, num):
if self.step < 0:
if not (self.stop < num <= self.start):
return False
else:
if not (self.start <= num < self.stop):
return False
return (num - self.start) % self.step == 0
``````

This is still missing several things that a real `range()` supports (such as the `.index()` or `.count()` methods, hashing, equality testing, or slicing), but should give you an idea.

I also simplified the `__contains__` implementation to only focus on integer tests; if you give a real `range()` object a non-integer value (including subclasses of `int`), a slow scan is initiated to see if there is a match, just as if you use a containment test against a list of all the contained values. This was done to continue to support other numeric types that just happen to support equality testing with integers but are not expected to support integer arithmetic as well. See the original Python issue that implemented the containment test.

* Near constant time because Python integers are unbounded and so math operations also grow in time as N grows, making this a O(log N) operation. Since it‚Äôs all executed in optimised C code and Python stores integer values in 30-bit chunks, you‚Äôd run out of memory before you saw any performance impact due to the size of the integers involved here.

# In Python, what is the purpose of `__slots__` and what are the cases one should avoid this?

## TLDR:

The special attribute `__slots__` allows you to explicitly state which instance attributes you expect your object instances to have, with the expected results:

1. faster attribute access.
2. space savings in memory.

The space savings is from

1. Storing value references in slots instead of `__dict__`.
2. Denying `__dict__` and `__weakref__` creation if parent classes deny them and you declare `__slots__`.

### Quick Caveats

Small caveat, you should only declare a particular slot one time in an inheritance tree. For example:

``````class Base:
__slots__ = "foo", "bar"

class Right(Base):
__slots__ = "baz",

class Wrong(Base):
__slots__ = "foo", "bar", "baz"        # redundant foo and bar
``````

Python doesn"t object when you get this wrong (it probably should), problems might not otherwise manifest, but your objects will take up more space than they otherwise should. Python 3.8:

``````>>> from sys import getsizeof
>>> getsizeof(Right()), getsizeof(Wrong())
(56, 72)
``````

This is because the Base"s slot descriptor has a slot separate from the Wrong"s. This shouldn"t usually come up, but it could:

``````>>> w = Wrong()
>>> w.foo = "foo"
>>> Base.foo.__get__(w)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
AttributeError: foo
>>> Wrong.foo.__get__(w)
"foo"
``````

The biggest caveat is for multiple inheritance - multiple "parent classes with nonempty slots" cannot be combined.

To accommodate this restriction, follow best practices: Factor out all but one or all parents" abstraction which their concrete class respectively and your new concrete class collectively will inherit from - giving the abstraction(s) empty slots (just like abstract base classes in the standard library).

See section on multiple inheritance below for an example.

### Requirements:

• To have attributes named in `__slots__` to actually be stored in slots instead of a `__dict__`, a class must inherit from `object` (automatic in Python 3, but must be explicit in Python 2).

• To prevent the creation of a `__dict__`, you must inherit from `object` and all classes in the inheritance must declare `__slots__` and none of them can have a `"__dict__"` entry.

There are a lot of details if you wish to keep reading.

## Why use `__slots__`: Faster attribute access.

The creator of Python, Guido van Rossum, states that he actually created `__slots__` for faster attribute access.

It is trivial to demonstrate measurably significant faster access:

``````import timeit

class Foo(object): __slots__ = "foo",

class Bar(object): pass

slotted = Foo()
not_slotted = Bar()

def get_set_delete_fn(obj):
def get_set_delete():
obj.foo = "foo"
obj.foo
del obj.foo
return get_set_delete
``````

and

``````>>> min(timeit.repeat(get_set_delete_fn(slotted)))
0.2846834529991611
>>> min(timeit.repeat(get_set_delete_fn(not_slotted)))
0.3664822799983085
``````

The slotted access is almost 30% faster in Python 3.5 on Ubuntu.

``````>>> 0.3664822799983085 / 0.2846834529991611
1.2873325658284342
``````

In Python 2 on Windows I have measured it about 15% faster.

## Why use `__slots__`: Memory Savings

Another purpose of `__slots__` is to reduce the space in memory that each object instance takes up.

The space saved over using `__dict__` can be significant.

SQLAlchemy attributes a lot of memory savings to `__slots__`.

To verify this, using the Anaconda distribution of Python 2.7 on Ubuntu Linux, with `guppy.hpy` (aka heapy) and `sys.getsizeof`, the size of a class instance without `__slots__` declared, and nothing else, is 64 bytes. That does not include the `__dict__`. Thank you Python for lazy evaluation again, the `__dict__` is apparently not called into existence until it is referenced, but classes without data are usually useless. When called into existence, the `__dict__` attribute is a minimum of 280 bytes additionally.

In contrast, a class instance with `__slots__` declared to be `()` (no data) is only 16 bytes, and 56 total bytes with one item in slots, 64 with two.

For 64 bit Python, I illustrate the memory consumption in bytes in Python 2.7 and 3.6, for `__slots__` and `__dict__` (no slots defined) for each point where the dict grows in 3.6 (except for 0, 1, and 2 attributes):

``````       Python 2.7             Python 3.6
attrs  __slots__  __dict__*   __slots__  __dict__* | *(no slots defined)
none   16         56 + 272‚Ä†   16         56 + 112‚Ä† | ‚Ä†if __dict__ referenced
one    48         56 + 272    48         56 + 112
two    56         56 + 272    56         56 + 112
six    88         56 + 1040   88         56 + 152
11     128        56 + 1040   128        56 + 240
22     216        56 + 3344   216        56 + 408
43     384        56 + 3344   384        56 + 752
``````

So, in spite of smaller dicts in Python 3, we see how nicely `__slots__` scale for instances to save us memory, and that is a major reason you would want to use `__slots__`.

Just for completeness of my notes, note that there is a one-time cost per slot in the class"s namespace of 64 bytes in Python 2, and 72 bytes in Python 3, because slots use data descriptors like properties, called "members".

``````>>> Foo.foo
<member "foo" of "Foo" objects>
>>> type(Foo.foo)
<class "member_descriptor">
>>> getsizeof(Foo.foo)
72
``````

## Demonstration of `__slots__`:

To deny the creation of a `__dict__`, you must subclass `object`. Everything subclasses `object` in Python 3, but in Python 2 you had to be explicit:

``````class Base(object):
__slots__ = ()
``````

now:

``````>>> b = Base()
>>> b.a = "a"
Traceback (most recent call last):
File "<pyshell#38>", line 1, in <module>
b.a = "a"
AttributeError: "Base" object has no attribute "a"
``````

Or subclass another class that defines `__slots__`

``````class Child(Base):
__slots__ = ("a",)
``````

and now:

``````c = Child()
c.a = "a"
``````

but:

``````>>> c.b = "b"
Traceback (most recent call last):
File "<pyshell#42>", line 1, in <module>
c.b = "b"
AttributeError: "Child" object has no attribute "b"
``````

To allow `__dict__` creation while subclassing slotted objects, just add `"__dict__"` to the `__slots__` (note that slots are ordered, and you shouldn"t repeat slots that are already in parent classes):

``````class SlottedWithDict(Child):
__slots__ = ("__dict__", "b")

swd = SlottedWithDict()
swd.a = "a"
swd.b = "b"
swd.c = "c"
``````

and

``````>>> swd.__dict__
{"c": "c"}
``````

Or you don"t even need to declare `__slots__` in your subclass, and you will still use slots from the parents, but not restrict the creation of a `__dict__`:

``````class NoSlots(Child): pass
ns = NoSlots()
ns.a = "a"
ns.b = "b"
``````

And:

``````>>> ns.__dict__
{"b": "b"}
``````

However, `__slots__` may cause problems for multiple inheritance:

``````class BaseA(object):
__slots__ = ("a",)

class BaseB(object):
__slots__ = ("b",)
``````

Because creating a child class from parents with both non-empty slots fails:

``````>>> class Child(BaseA, BaseB): __slots__ = ()
Traceback (most recent call last):
File "<pyshell#68>", line 1, in <module>
class Child(BaseA, BaseB): __slots__ = ()
TypeError: Error when calling the metaclass bases
multiple bases have instance lay-out conflict
``````

If you run into this problem, You could just remove `__slots__` from the parents, or if you have control of the parents, give them empty slots, or refactor to abstractions:

``````from abc import ABC

class AbstractA(ABC):
__slots__ = ()

class BaseA(AbstractA):
__slots__ = ("a",)

class AbstractB(ABC):
__slots__ = ()

class BaseB(AbstractB):
__slots__ = ("b",)

class Child(AbstractA, AbstractB):
__slots__ = ("a", "b")

c = Child() # no problem!
``````

### Add `"__dict__"` to `__slots__` to get dynamic assignment:

``````class Foo(object):
__slots__ = "bar", "baz", "__dict__"
``````

and now:

``````>>> foo = Foo()
>>> foo.boink = "boink"
``````

So with `"__dict__"` in slots we lose some of the size benefits with the upside of having dynamic assignment and still having slots for the names we do expect.

When you inherit from an object that isn"t slotted, you get the same sort of semantics when you use `__slots__` - names that are in `__slots__` point to slotted values, while any other values are put in the instance"s `__dict__`.

Avoiding `__slots__` because you want to be able to add attributes on the fly is actually not a good reason - just add `"__dict__"` to your `__slots__` if this is required.

You can similarly add `__weakref__` to `__slots__` explicitly if you need that feature.

### Set to empty tuple when subclassing a namedtuple:

The namedtuple builtin make immutable instances that are very lightweight (essentially, the size of tuples) but to get the benefits, you need to do it yourself if you subclass them:

``````from collections import namedtuple
class MyNT(namedtuple("MyNT", "bar baz")):
"""MyNT is an immutable and lightweight object"""
__slots__ = ()
``````

usage:

``````>>> nt = MyNT("bar", "baz")
>>> nt.bar
"bar"
>>> nt.baz
"baz"
``````

And trying to assign an unexpected attribute raises an `AttributeError` because we have prevented the creation of `__dict__`:

``````>>> nt.quux = "quux"
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
AttributeError: "MyNT" object has no attribute "quux"
``````

You can allow `__dict__` creation by leaving off `__slots__ = ()`, but you can"t use non-empty `__slots__` with subtypes of tuple.

## Biggest Caveat: Multiple inheritance

Even when non-empty slots are the same for multiple parents, they cannot be used together:

``````class Foo(object):
__slots__ = "foo", "bar"
class Bar(object):
__slots__ = "foo", "bar" # alas, would work if empty, i.e. ()

>>> class Baz(Foo, Bar): pass
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: Error when calling the metaclass bases
multiple bases have instance lay-out conflict
``````

Using an empty `__slots__` in the parent seems to provide the most flexibility, allowing the child to choose to prevent or allow (by adding `"__dict__"` to get dynamic assignment, see section above) the creation of a `__dict__`:

``````class Foo(object): __slots__ = ()
class Bar(object): __slots__ = ()
class Baz(Foo, Bar): __slots__ = ("foo", "bar")
b = Baz()
b.foo, b.bar = "foo", "bar"
``````

You don"t have to have slots - so if you add them, and remove them later, it shouldn"t cause any problems.

Going out on a limb here: If you"re composing mixins or using abstract base classes, which aren"t intended to be instantiated, an empty `__slots__` in those parents seems to be the best way to go in terms of flexibility for subclassers.

To demonstrate, first, let"s create a class with code we"d like to use under multiple inheritance

``````class AbstractBase:
__slots__ = ()
def __init__(self, a, b):
self.a = a
self.b = b
def __repr__(self):
return f"{type(self).__name__}({repr(self.a)}, {repr(self.b)})"
``````

We could use the above directly by inheriting and declaring the expected slots:

``````class Foo(AbstractBase):
__slots__ = "a", "b"
``````

But we don"t care about that, that"s trivial single inheritance, we need another class we might also inherit from, maybe with a noisy attribute:

``````class AbstractBaseC:
__slots__ = ()
@property
def c(self):
print("getting c!")
return self._c
@c.setter
def c(self, arg):
print("setting c!")
self._c = arg
``````

Now if both bases had nonempty slots, we couldn"t do the below. (In fact, if we wanted, we could have given `AbstractBase` nonempty slots a and b, and left them out of the below declaration - leaving them in would be wrong):

``````class Concretion(AbstractBase, AbstractBaseC):
__slots__ = "a b _c".split()
``````

And now we have functionality from both via multiple inheritance, and can still deny `__dict__` and `__weakref__` instantiation:

``````>>> c = Concretion("a", "b")
>>> c.c = c
setting c!
>>> c.c
getting c!
Concretion("a", "b")
>>> c.d = "d"
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
AttributeError: "Concretion" object has no attribute "d"
``````

## Other cases to avoid slots:

• Avoid them when you want to perform `__class__` assignment with another class that doesn"t have them (and you can"t add them) unless the slot layouts are identical. (I am very interested in learning who is doing this and why.)
• Avoid them if you want to subclass variable length builtins like long, tuple, or str, and you want to add attributes to them.
• Avoid them if you insist on providing default values via class attributes for instance variables.

You may be able to tease out further caveats from the rest of the `__slots__` documentation (the 3.7 dev docs are the most current), which I have made significant recent contributions to.

The current top answers cite outdated information and are quite hand-wavy and miss the mark in some important ways.

### Do not "only use `__slots__` when instantiating lots of objects"

I quote:

"You would want to use `__slots__` if you are going to instantiate a lot (hundreds, thousands) of objects of the same class."

Abstract Base Classes, for example, from the `collections` module, are not instantiated, yet `__slots__` are declared for them.

Why?

If a user wishes to deny `__dict__` or `__weakref__` creation, those things must not be available in the parent classes.

`__slots__` contributes to reusability when creating interfaces or mixins.

It is true that many Python users aren"t writing for reusability, but when you are, having the option to deny unnecessary space usage is valuable.

### `__slots__` doesn"t break pickling

When pickling a slotted object, you may find it complains with a misleading `TypeError`:

``````>>> pickle.loads(pickle.dumps(f))
TypeError: a class that defines __slots__ without defining __getstate__ cannot be pickled
``````

This is actually incorrect. This message comes from the oldest protocol, which is the default. You can select the latest protocol with the `-1` argument. In Python 2.7 this would be `2` (which was introduced in 2.3), and in 3.6 it is `4`.

``````>>> pickle.loads(pickle.dumps(f, -1))
<__main__.Foo object at 0x1129C770>
``````

in Python 2.7:

``````>>> pickle.loads(pickle.dumps(f, 2))
<__main__.Foo object at 0x1129C770>
``````

in Python 3.6

``````>>> pickle.loads(pickle.dumps(f, 4))
<__main__.Foo object at 0x1129C770>
``````

So I would keep this in mind, as it is a solved problem.

## Critique of the (until Oct 2, 2016) accepted answer

The first paragraph is half short explanation, half predictive. Here"s the only part that actually answers the question

The proper use of `__slots__` is to save space in objects. Instead of having a dynamic dict that allows adding attributes to objects at anytime, there is a static structure which does not allow additions after creation. This saves the overhead of one dict for every object that uses slots

The second half is wishful thinking, and off the mark:

While this is sometimes a useful optimization, it would be completely unnecessary if the Python interpreter was dynamic enough so that it would only require the dict when there actually were additions to the object.

Python actually does something similar to this, only creating the `__dict__` when it is accessed, but creating lots of objects with no data is fairly ridiculous.

The second paragraph oversimplifies and misses actual reasons to avoid `__slots__`. The below is not a real reason to avoid slots (for actual reasons, see the rest of my answer above.):

They change the behavior of the objects that have slots in a way that can be abused by control freaks and static typing weenies.

It then goes on to discuss other ways of accomplishing that perverse goal with Python, not discussing anything to do with `__slots__`.

The third paragraph is more wishful thinking. Together it is mostly off-the-mark content that the answerer didn"t even author and contributes to ammunition for critics of the site.

# Memory usage evidence

Create some normal objects and slotted objects:

``````>>> class Foo(object): pass
>>> class Bar(object): __slots__ = ()
``````

Instantiate a million of them:

``````>>> foos = [Foo() for f in xrange(1000000)]
>>> bars = [Bar() for b in xrange(1000000)]
``````

Inspect with `guppy.hpy().heap()`:

``````>>> guppy.hpy().heap()
Partition of a set of 2028259 objects. Total size = 99763360 bytes.
Index  Count   %     Size   % Cumulative  % Kind (class / dict of class)
0 1000000  49 64000000  64  64000000  64 __main__.Foo
1     169   0 16281480  16  80281480  80 list
2 1000000  49 16000000  16  96281480  97 __main__.Bar
3   12284   1   987472   1  97268952  97 str
...
``````

Access the regular objects and their `__dict__` and inspect again:

``````>>> for f in foos:
...     f.__dict__
>>> guppy.hpy().heap()
Partition of a set of 3028258 objects. Total size = 379763480 bytes.
Index  Count   %      Size    % Cumulative  % Kind (class / dict of class)
0 1000000  33 280000000  74 280000000  74 dict of __main__.Foo
1 1000000  33  64000000  17 344000000  91 __main__.Foo
2     169   0  16281480   4 360281480  95 list
3 1000000  33  16000000   4 376281480  99 __main__.Bar
4   12284   0    987472   0 377268952  99 str
...
``````

This is consistent with the history of Python, from Unifying types and classes in Python 2.2

If you subclass a built-in type, extra space is automatically added to the instances to accomodate `__dict__` and `__weakrefs__`. (The `__dict__` is not initialized until you use it though, so you shouldn"t worry about the space occupied by an empty dictionary for each instance you create.) If you don"t need this extra space, you can add the phrase "`__slots__ = []`" to your class.

# `os.listdir()` - list in the current directory

With listdir in os module you get the files and the folders in the current dir

`````` import os
arr = os.listdir()
print(arr)

>>> ["\$RECYCLE.BIN", "work.txt", "3ebooks.txt", "documents"]
``````

## Looking in a directory

``````arr = os.listdir("c:\files")
``````

# `glob` from glob

with glob you can specify a type of file to list like this

``````import glob

txtfiles = []
for file in glob.glob("*.txt"):
txtfiles.append(file)
``````

## `glob` in a list comprehension

``````mylist = [f for f in glob.glob("*.txt")]
``````

## get the full path of only files in the current directory

``````import os
from os import listdir
from os.path import isfile, join

cwd = os.getcwd()
onlyfiles = [os.path.join(cwd, f) for f in os.listdir(cwd) if
os.path.isfile(os.path.join(cwd, f))]
print(onlyfiles)

["G:\getfilesname\getfilesname.py", "G:\getfilesname\example.txt"]
``````

## Getting the full path name with `os.path.abspath`

You get the full path in return

`````` import os
files_path = [os.path.abspath(x) for x in os.listdir()]
print(files_path)

["F:\documentiapplications.txt", "F:\documenticollections.txt"]
``````

## Walk: going through sub directories

os.walk returns the root, the directories list and the files list, that is why I unpacked them in r, d, f in the for loop; it, then, looks for other files and directories in the subfolders of the root and so on until there are no subfolders.

``````import os

# Getting the current work directory (cwd)
thisdir = os.getcwd()

# r=root, d=directories, f = files
for r, d, f in os.walk(thisdir):
for file in f:
if file.endswith(".docx"):
print(os.path.join(r, file))
``````

### `os.listdir()`: get files in the current directory (Python 2)

In Python 2, if you want the list of the files in the current directory, you have to give the argument as "." or os.getcwd() in the os.listdir method.

`````` import os
arr = os.listdir(".")
print(arr)

>>> ["\$RECYCLE.BIN", "work.txt", "3ebooks.txt", "documents"]
``````

### To go up in the directory tree

``````# Method 1
x = os.listdir("..")

# Method 2
x= os.listdir("/")
``````

### Get files: `os.listdir()` in a particular directory (Python 2 and 3)

`````` import os
arr = os.listdir("F:\python")
print(arr)

>>> ["\$RECYCLE.BIN", "work.txt", "3ebooks.txt", "documents"]
``````

### Get files of a particular subdirectory with `os.listdir()`

``````import os

x = os.listdir("./content")
``````

### `os.walk(".")` - current directory

`````` import os
arr = next(os.walk("."))[2]
print(arr)

>>> ["5bs_Turismo1.pdf", "5bs_Turismo1.pptx", "esperienza.txt"]
``````

### `next(os.walk("."))` and `os.path.join("dir", "file")`

`````` import os
arr = []
for d,r,f in next(os.walk("F:\_python")):
for file in f:
arr.append(os.path.join(r,file))

for f in arr:
print(files)

>>> F:\_python\dict_class.py
>>> F:\_python\programmi.txt
``````

### `next(os.walk("F:\")` - get the full path - list comprehension

`````` [os.path.join(r,file) for r,d,f in next(os.walk("F:\_python")) for file in f]

>>> ["F:\_python\dict_class.py", "F:\_python\programmi.txt"]
``````

### `os.walk` - get full path - all files in sub dirs**

``````x = [os.path.join(r,file) for r,d,f in os.walk("F:\_python") for file in f]
print(x)

``````

### `os.listdir()` - get only txt files

`````` arr_txt = [x for x in os.listdir() if x.endswith(".txt")]
print(arr_txt)

>>> ["work.txt", "3ebooks.txt"]
``````

## Using `glob` to get the full path of the files

If I should need the absolute path of the files:

``````from path import path
from glob import glob
x = [path(f).abspath() for f in glob("F:\*.txt")]
for f in x:
print(f)

>>> F:acquistionline.txt
>>> F:acquisti_2018.txt
>>> F:ootstrap_jquery_ecc.txt
``````

## Using `os.path.isfile` to avoid directories in the list

``````import os.path
listOfFiles = [f for f in os.listdir() if os.path.isfile(f)]
print(listOfFiles)

>>> ["a simple game.py", "data.txt", "decorator.py"]
``````

## Using `pathlib` from Python 3.4

``````import pathlib

flist = []
for p in pathlib.Path(".").iterdir():
if p.is_file():
print(p)
flist.append(p)

>>> error.PNG
>>> exemaker.bat
>>> guiprova.mp3
>>> setup.py
>>> speak_gui2.py
>>> thumb.PNG
``````

With `list comprehension`:

``````flist = [p for p in pathlib.Path(".").iterdir() if p.is_file()]
``````

Alternatively, use `pathlib.Path()` instead of `pathlib.Path(".")`

## Use glob method in pathlib.Path()

``````import pathlib

py = pathlib.Path().glob("*.py")
for file in py:
print(file)

>>> stack_overflow_list.py
>>> stack_overflow_list_tkinter.py
``````

## Get all and only files with os.walk

``````import os
x = [i[2] for i in os.walk(".")]
y=[]
for t in x:
for f in t:
y.append(f)
print(y)

>>> ["append_to_list.py", "data.txt", "data1.txt", "data2.txt", "data_180617", "os_walk.py", "READ2.py", "read_data.py", "somma_defaltdic.py", "substitute_words.py", "sum_data.py", "data.txt", "data1.txt", "data_180617"]
``````

## Get only files with next and walk in a directory

`````` import os
x = next(os.walk("F://python"))[2]
print(x)

>>> ["calculator.bat","calculator.py"]
``````

## Get only directories with next and walk in a directory

`````` import os
next(os.walk("F://python"))[1] # for the current dir use (".")

>>> ["python3","others"]
``````

## Get all the subdir names with `walk`

``````for r,d,f in os.walk("F:\_python"):
for dirs in d:
print(dirs)

>>> .vscode
>>> pyexcel
>>> pyschool.py
>>> subtitles
>>> _metaprogramming
>>> .ipynb_checkpoints
``````

## `os.scandir()` from Python 3.5 and greater

``````import os
x = [f.name for f in os.scandir() if f.is_file()]
print(x)

>>> ["calculator.bat","calculator.py"]

# Another example with scandir (a little variation from docs.python.org)
# This one is more efficient than os.listdir.
# In this case, it shows the files only in the current directory
# where the script is executed.

import os
with os.scandir() as i:
for entry in i:
if entry.is_file():
print(entry.name)

>>> ebookmaker.py
>>> error.PNG
>>> exemaker.bat
>>> guiprova.mp3
>>> setup.py
>>> speakgui4.py
>>> speak_gui2.py
>>> speak_gui3.py
>>> thumb.PNG
``````

# Examples:

## Ex. 1: How many files are there in the subdirectories?

In this example, we look for the number of files that are included in all the directory and its subdirectories.

``````import os

def count(dir, counter=0):
"returns number of files in dir and subdirs"
for pack in os.walk(dir):
for f in pack[2]:
counter += 1
return dir + " : " + str(counter) + "files"

print(count("F:\python"))

>>> "F:\python" : 12057 files"
``````

## Ex.2: How to copy all files from a directory to another?

A script to make order in your computer finding all files of a type (default: pptx) and copying them in a new folder.

``````import os
import shutil
from path import path

destination = "F:\file_copied"
# os.makedirs(destination)

def copyfile(dir, filetype="pptx", counter=0):
"Searches for pptx (or other - pptx is the default) files and copies them"
for pack in os.walk(dir):
for f in pack[2]:
if f.endswith(filetype):
fullpath = pack[0] + "\" + f
print(fullpath)
shutil.copy(fullpath, destination)
counter += 1
if counter > 0:
print("-" * 30)
print("	==> Found in: `" + dir + "` : " + str(counter) + " files
")

for dir in os.listdir():
"searches for folders that starts with `_`"
if dir[0] == "_":
# copyfile(dir, filetype="pdf")
copyfile(dir, filetype="txt")

>>> _compiti18Compito Contabilit√† 1conti.txt
>>> _compiti18Compito Contabilit√† 1modula4.txt
>>> _compiti18Compito Contabilit√† 1moduloa4.txt
>>> ------------------------
>>> ==> Found in: `_compiti18` : 3 files
``````

## Ex. 3: How to get all the files in a txt file

In case you want to create a txt file with all the file names:

``````import os
mylist = ""
with open("filelist.txt", "w", encoding="utf-8") as file:
for eachfile in os.listdir():
mylist += eachfile + "
"
file.write(mylist)
``````

## Example: txt with all the files of an hard drive

``````"""
We are going to save a txt file with all the files in your directory.
We will use the function walk()
"""

import os

# see all the methods of os
# print(*dir(os), sep=", ")
listafile = []
percorso = []
with open("lista_file.txt", "w", encoding="utf-8") as testo:
for root, dirs, files in os.walk("D:\"):
for file in files:
listafile.append(file)
percorso.append(root + "\" + file)
testo.write(file + "
")
listafile.sort()
print("N. of files", len(listafile))
with open("lista_file_ordinata.txt", "w", encoding="utf-8") as testo_ordinato:
for file in listafile:
testo_ordinato.write(file + "
")

with open("percorso.txt", "w", encoding="utf-8") as file_percorso:
for file in percorso:
file_percorso.write(file + "
")

os.system("lista_file.txt")
os.system("lista_file_ordinata.txt")
os.system("percorso.txt")
``````

## All the file of C: in one text file

This is a shorter version of the previous code. Change the folder where to start finding the files if you need to start from another position. This code generate a 50 mb on text file on my computer with something less then 500.000 lines with files with the complete path.

``````import os

with open("file.txt", "w", encoding="utf-8") as filewrite:
for r, d, f in os.walk("C:\"):
for file in f:
filewrite.write(f"{r + file}
")
``````

## How to write a file with all paths in a folder of a type

With this function you can create a txt file that will have the name of a type of file that you look for (ex. pngfile.txt) with all the full path of all the files of that type. It can be useful sometimes, I think.

``````import os

def searchfiles(extension=".ttf", folder="H:\"):
"Create a txt file with all the file of a type"
with open(extension[1:] + "file.txt", "w", encoding="utf-8") as filewrite:
for r, d, f in os.walk(folder):
for file in f:
if file.endswith(extension):
filewrite.write(f"{r + file}
")

# looking for png file (fonts) in the hard disk H:
searchfiles(".png", "H:\")

>>> H:4bs_18Dolphins5.png
>>> H:4bs_18Dolphins6.png
>>> H:4bs_18Dolphins7.png
>>> H:5_18marketing htmlassetsimageslogo2.png
>>> H:7z001.png
>>> H:7z002.png
``````

## (New) Find all files and open them with tkinter GUI

I just wanted to add in this 2019 a little app to search for all files in a dir and be able to open them by doubleclicking on the name of the file in the list.

``````import tkinter as tk
import os

def searchfiles(extension=".txt", folder="H:\"):
"insert all files in the listbox"
for r, d, f in os.walk(folder):
for file in f:
if file.endswith(extension):
lb.insert(0, r + "\" + file)

def open_file():
os.startfile(lb.get(lb.curselection()[0]))

root = tk.Tk()
root.geometry("400x400")
bt = tk.Button(root, text="Search", command=lambda:searchfiles(".png", "H:\"))
bt.pack()
lb = tk.Listbox(root)
lb.pack(fill="both", expand=1)
lb.bind("<Double-Button>", lambda x: open_file())
root.mainloop()
``````

### Is there any reason for a class declaration to inherit from `object`?

In Python 3, apart from compatibility between Python 2 and 3, no reason. In Python 2, many reasons.

### Python 2.x story:

In Python 2.x (from 2.2 onwards) there"s two styles of classes depending on the presence or absence of `object` as a base-class:

1. "classic" style classes: they don"t have `object` as a base class:

``````>>> class ClassicSpam:      # no base class
...     pass
>>> ClassicSpam.__bases__
()
``````
2. "new" style classes: they have, directly or indirectly (e.g inherit from a built-in type), `object` as a base class:

``````>>> class NewSpam(object):           # directly inherit from object
...    pass
>>> NewSpam.__bases__
(<type "object">,)
>>> class IntSpam(int):              # indirectly inherit from object...
...    pass
>>> IntSpam.__bases__
(<type "int">,)
>>> IntSpam.__bases__[0].__bases__   # ... because int inherits from object
(<type "object">,)
``````

Without a doubt, when writing a class you"ll always want to go for new-style classes. The perks of doing so are numerous, to list some of them:

If you don"t inherit from `object`, forget these. A more exhaustive description of the previous bullet points along with other perks of "new" style classes can be found here.

One of the downsides of new-style classes is that the class itself is more memory demanding. Unless you"re creating many class objects, though, I doubt this would be an issue and it"s a negative sinking in a sea of positives.

### Python 3.x story:

In Python 3, things are simplified. Only new-style classes exist (referred to plainly as classes) so, the only difference in adding `object` is requiring you to type in 8 more characters. This:

``````class ClassicSpam:
pass
``````

is completely equivalent (apart from their name :-) to this:

``````class NewSpam(object):
pass
``````

and to this:

``````class Spam():
pass
``````

All have `object` in their `__bases__`.

``````>>> [object in cls.__bases__ for cls in {Spam, NewSpam, ClassicSpam}]
[True, True, True]
``````

## So, what should you do?

In Python 2: always inherit from `object` explicitly. Get the perks.

In Python 3: inherit from `object` if you are writing code that tries to be Python agnostic, that is, it needs to work both in Python 2 and in Python 3. Otherwise don"t, it really makes no difference since Python inserts it for you behind the scenes.

The simplest way to log to stdout:

``````import logging
import sys
logging.basicConfig(stream=sys.stdout, level=logging.DEBUG)
``````

Comparing strings in a case insensitive way seems trivial, but it"s not. I will be using Python 3, since Python 2 is underdeveloped here.

The first thing to note is that case-removing conversions in Unicode aren"t trivial. There is text for which `text.lower() != text.upper().lower()`, such as `"√ü"`:

``````"√ü".lower()
#>>> "√ü"

"√ü".upper().lower()
#>>> "ss"
``````

But let"s say you wanted to caselessly compare `"BUSSE"` and `"Bu√üe"`. Heck, you probably also want to compare `"BUSSE"` and `"BU·∫ûE"` equal - that"s the newer capital form. The recommended way is to use `casefold`:

str.casefold()

Return a casefolded copy of the string. Casefolded strings may be used for caseless matching.

Casefolding is similar to lowercasing but more aggressive because it is intended to remove all case distinctions in a string. [...]

Do not just use `lower`. If `casefold` is not available, doing `.upper().lower()` helps (but only somewhat).

Then you should consider accents. If your font renderer is good, you probably think `"√™" == "eÃÇ"` - but it doesn"t:

``````"√™" == "eÃÇ"
#>>> False
``````

This is because the accent on the latter is a combining character.

``````import unicodedata

[unicodedata.name(char) for char in "√™"]
#>>> ["LATIN SMALL LETTER E WITH CIRCUMFLEX"]

[unicodedata.name(char) for char in "eÃÇ"]
#>>> ["LATIN SMALL LETTER E", "COMBINING CIRCUMFLEX ACCENT"]
``````

The simplest way to deal with this is `unicodedata.normalize`. You probably want to use NFKD normalization, but feel free to check the documentation. Then one does

``````unicodedata.normalize("NFKD", "√™") == unicodedata.normalize("NFKD", "eÃÇ")
#>>> True
``````

To finish up, here this is expressed in functions:

``````import unicodedata

def normalize_caseless(text):
return unicodedata.normalize("NFKD", text.casefold())

def caseless_equal(left, right):
return normalize_caseless(left) == normalize_caseless(right)
``````

# tl;dr / quick fix

• Don"t decode/encode willy nilly
• Don"t assume your strings are UTF-8 encoded
• Try to convert strings to Unicode strings as soon as possible in your code
• Fix your locale: How to solve UnicodeDecodeError in Python 3.6?
• Don"t be tempted to use quick `reload` hacks

# Unicode Zen in Python 2.x - The Long Version

Without seeing the source it"s difficult to know the root cause, so I"ll have to speak generally.

`UnicodeDecodeError: "ascii" codec can"t decode byte` generally happens when you try to convert a Python 2.x `str` that contains non-ASCII to a Unicode string without specifying the encoding of the original string.

In brief, Unicode strings are an entirely separate type of Python string that does not contain any encoding. They only hold Unicode point codes and therefore can hold any Unicode point from across the entire spectrum. Strings contain encoded text, beit UTF-8, UTF-16, ISO-8895-1, GBK, Big5 etc. Strings are decoded to Unicode and Unicodes are encoded to strings. Files and text data are always transferred in encoded strings.

The Markdown module authors probably use `unicode()` (where the exception is thrown) as a quality gate to the rest of the code - it will convert ASCII or re-wrap existing Unicodes strings to a new Unicode string. The Markdown authors can"t know the encoding of the incoming string so will rely on you to decode strings to Unicode strings before passing to Markdown.

Unicode strings can be declared in your code using the `u` prefix to strings. E.g.

``````>>> my_u = u"my √ºnic√¥d√© strƒØng"
>>> type(my_u)
<type "unicode">
``````

Unicode strings may also come from file, databases and network modules. When this happens, you don"t need to worry about the encoding.

# Gotchas

Conversion from `str` to Unicode can happen even when you don"t explicitly call `unicode()`.

The following scenarios cause `UnicodeDecodeError` exceptions:

``````# Explicit conversion without encoding
unicode("‚Ç¨")

# New style format string into Unicode string
# Python will try to convert value string to Unicode first
u"The currency is: {}".format("‚Ç¨")

# Old style format string into Unicode string
# Python will try to convert value string to Unicode first
u"The currency is: %s" % "‚Ç¨"

# Append string to Unicode
# Python will try to convert string to Unicode first
u"The currency is: " + "‚Ç¨"
``````

## Examples

In the following diagram, you can see how the word `caf√©` has been encoded in either "UTF-8" or "Cp1252" encoding depending on the terminal type. In both examples, `caf` is just regular ascii. In UTF-8, `√©` is encoded using two bytes. In "Cp1252", √© is 0xE9 (which is also happens to be the Unicode point value (it"s no coincidence)). The correct `decode()` is invoked and conversion to a Python Unicode is successfull:

In this diagram, `decode()` is called with `ascii` (which is the same as calling `unicode()` without an encoding given). As ASCII can"t contain bytes greater than `0x7F`, this will throw a `UnicodeDecodeError` exception:

# The Unicode Sandwich

It"s good practice to form a Unicode sandwich in your code, where you decode all incoming data to Unicode strings, work with Unicodes, then encode to `str`s on the way out. This saves you from worrying about the encoding of strings in the middle of your code.

## Input / Decode

### Source code

If you need to bake non-ASCII into your source code, just create Unicode strings by prefixing the string with a `u`. E.g.

``````u"Z√ºrich"
``````

To allow Python to decode your source code, you will need to add an encoding header to match the actual encoding of your file. For example, if your file was encoded as "UTF-8", you would use:

``````# encoding: utf-8
``````

This is only necessary when you have non-ASCII in your source code.

### Files

Usually non-ASCII data is received from a file. The `io` module provides a TextWrapper that decodes your file on the fly, using a given `encoding`. You must use the correct encoding for the file - it can"t be easily guessed. For example, for a UTF-8 file:

``````import io
with io.open("my_utf8_file.txt", "r", encoding="utf-8") as my_file:
``````

`my_unicode_string` would then be suitable for passing to Markdown. If a `UnicodeDecodeError` from the `read()` line, then you"ve probably used the wrong encoding value.

### CSV Files

The Python 2.7 CSV module does not support non-ASCII characters üò©. Help is at hand, however, with https://pypi.python.org/pypi/backports.csv.

Use it like above but pass the opened file to it:

``````from backports import csv
import io
with io.open("my_utf8_file.txt", "r", encoding="utf-8") as my_file:
yield row
``````

### Databases

Most Python database drivers can return data in Unicode, but usually require a little configuration. Always use Unicode strings for SQL queries.

MySQL

``````charset="utf8",
use_unicode=True
``````

E.g.

``````>>> db = MySQLdb.connect(host="localhost", user="root", passwd="passwd", db="sandbox", use_unicode=True, charset="utf8")
``````
PostgreSQL

``````psycopg2.extensions.register_type(psycopg2.extensions.UNICODE)
psycopg2.extensions.register_type(psycopg2.extensions.UNICODEARRAY)
``````

### HTTP

Web pages can be encoded in just about any encoding. The `Content-type` header should contain a `charset` field to hint at the encoding. The content can then be decoded manually against this value. Alternatively, Python-Requests returns Unicodes in `response.text`.

### Manually

If you must decode strings manually, you can simply do `my_string.decode(encoding)`, where `encoding` is the appropriate encoding. Python 2.x supported codecs are given here: Standard Encodings. Again, if you get `UnicodeDecodeError` then you"ve probably got the wrong encoding.

## The meat of the sandwich

Work with Unicodes as you would normal strs.

## Output

### stdout / printing

`print` writes through the stdout stream. Python tries to configure an encoder on stdout so that Unicodes are encoded to the console"s encoding. For example, if a Linux shell"s `locale` is `en_GB.UTF-8`, the output will be encoded to `UTF-8`. On Windows, you will be limited to an 8bit code page.

An incorrectly configured console, such as corrupt locale, can lead to unexpected print errors. `PYTHONIOENCODING` environment variable can force the encoding for stdout.

### Files

Just like input, `io.open` can be used to transparently convert Unicodes to encoded byte strings.

### Database

The same configuration for reading will allow Unicodes to be written directly.

# Python 3

Python 3 is no more Unicode capable than Python 2.x is, however it is slightly less confused on the topic. E.g the regular `str` is now a Unicode string and the old `str` is now `bytes`.

The default encoding is UTF-8, so if you `.decode()` a byte string without giving an encoding, Python 3 uses UTF-8 encoding. This probably fixes 50% of people"s Unicode problems.

Further, `open()` operates in text mode by default, so returns decoded `str` (Unicode ones). The encoding is derived from your locale, which tends to be UTF-8 on Un*x systems or an 8-bit code page, such as windows-1251, on Windows boxes.

# Why you shouldn"t use `sys.setdefaultencoding("utf8")`

It"s a nasty hack (there"s a reason you have to use `reload`) that will only mask problems and hinder your migration to Python 3.x. Understand the problem, fix the root cause and enjoy Unicode zen. See Why should we NOT use sys.setdefaultencoding("utf-8") in a py script? for further details

# The short answer, or TL;DR

Basically, `eval` is used to evaluate a single dynamically generated Python expression, and `exec` is used to execute dynamically generated Python code only for its side effects.

`eval` and `exec` have these two differences:

1. `eval` accepts only a single expression, `exec` can take a code block that has Python statements: loops, `try: except:`, `class` and function/method `def`initions and so on.

An expression in Python is whatever you can have as the value in a variable assignment:

``````a_variable = (anything you can put within these parentheses is an expression)
``````
2. `eval` returns the value of the given expression, whereas `exec` ignores the return value from its code, and always returns `None` (in Python 2 it is a statement and cannot be used as an expression, so it really does not return anything).

In versions 1.0 - 2.7, `exec` was a statement, because CPython needed to produce a different kind of code object for functions that used `exec` for its side effects inside the function.

In Python 3, `exec` is a function; its use has no effect on the compiled bytecode of the function where it is used.

Thus basically:

``````>>> a = 5
>>> eval("37 + a")   # it is an expression
42
>>> exec("37 + a")   # it is an expression statement; value is ignored (None is returned)
>>> exec("a = 47")   # modify a global variable as a side effect
>>> a
47
>>> eval("a = 47")  # you cannot evaluate a statement
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "<string>", line 1
a = 47
^
SyntaxError: invalid syntax
``````

The `compile` in `"exec"` mode compiles any number of statements into a bytecode that implicitly always returns `None`, whereas in `"eval"` mode it compiles a single expression into bytecode that returns the value of that expression.

``````>>> eval(compile("42", "<string>", "exec"))  # code returns None
>>> eval(compile("42", "<string>", "eval"))  # code returns 42
42
>>> exec(compile("42", "<string>", "eval"))  # code returns 42,
>>>                                          # but ignored by exec
``````

In the `"eval"` mode (and thus with the `eval` function if a string is passed in), the `compile` raises an exception if the source code contains statements or anything else beyond a single expression:

``````>>> compile("for i in range(3): print(i)", "<string>", "eval")
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "<string>", line 1
for i in range(3): print(i)
^
SyntaxError: invalid syntax
``````

Actually the statement "eval accepts only a single expression" applies only when a string (which contains Python source code) is passed to `eval`. Then it is internally compiled to bytecode using `compile(source, "<string>", "eval")` This is where the difference really comes from.

If a `code` object (which contains Python bytecode) is passed to `exec` or `eval`, they behave identically, excepting for the fact that `exec` ignores the return value, still returning `None` always. So it is possible use `eval` to execute something that has statements, if you just `compile`d it into bytecode before instead of passing it as a string:

``````>>> eval(compile("if 1: print("Hello")", "<string>", "exec"))
Hello
>>>
``````

works without problems, even though the compiled code contains statements. It still returns `None`, because that is the return value of the code object returned from `compile`.

In the `"eval"` mode (and thus with the `eval` function if a string is passed in), the `compile` raises an exception if the source code contains statements or anything else beyond a single expression:

``````>>> compile("for i in range(3): print(i)", "<string>". "eval")
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "<string>", line 1
for i in range(3): print(i)
^
SyntaxError: invalid syntax
``````

# The longer answer, a.k.a the gory details

## `exec` and `eval`

The `exec` function (which was a statement in Python 2) is used for executing a dynamically created statement or program:

``````>>> program = """
for i in range(3):
print("Python is cool")
"""
>>> exec(program)
Python is cool
Python is cool
Python is cool
>>>
``````

The `eval` function does the same for a single expression, and returns the value of the expression:

``````>>> a = 2
>>> my_calculation = "42 * a"
>>> result = eval(my_calculation)
>>> result
84
``````

`exec` and `eval` both accept the program/expression to be run either as a `str`, `unicode` or `bytes` object containing source code, or as a `code` object which contains Python bytecode.

If a `str`/`unicode`/`bytes` containing source code was passed to `exec`, it behaves equivalently to:

``````exec(compile(source, "<string>", "exec"))
``````

and `eval` similarly behaves equivalent to:

``````eval(compile(source, "<string>", "eval"))
``````

Since all expressions can be used as statements in Python (these are called the `Expr` nodes in the Python abstract grammar; the opposite is not true), you can always use `exec` if you do not need the return value. That is to say, you can use either `eval("my_func(42)")` or `exec("my_func(42)")`, the difference being that `eval` returns the value returned by `my_func`, and `exec` discards it:

``````>>> def my_func(arg):
...     print("Called with %d" % arg)
...     return arg * 2
...
>>> exec("my_func(42)")
Called with 42
>>> eval("my_func(42)")
Called with 42
84
>>>
``````

Of the 2, only `exec` accepts source code that contains statements, like `def`, `for`, `while`, `import`, or `class`, the assignment statement (a.k.a `a = 42`), or entire programs:

``````>>> exec("for i in range(3): print(i)")
0
1
2
>>> eval("for i in range(3): print(i)")
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "<string>", line 1
for i in range(3): print(i)
^
SyntaxError: invalid syntax
``````

Both `exec` and `eval` accept 2 additional positional arguments - `globals` and `locals` - which are the global and local variable scopes that the code sees. These default to the `globals()` and `locals()` within the scope that called `exec` or `eval`, but any dictionary can be used for `globals` and any `mapping` for `locals` (including `dict` of course). These can be used not only to restrict/modify the variables that the code sees, but are often also used for capturing the variables that the `exec`uted code creates:

``````>>> g = dict()
>>> l = dict()
>>> exec("global a; a, b = 123, 42", g, l)
>>> g["a"]
123
>>> l
{"b": 42}
``````

(If you display the value of the entire `g`, it would be much longer, because `exec` and `eval` add the built-ins module as `__builtins__` to the globals automatically if it is missing).

In Python 2, the official syntax for the `exec` statement is actually `exec code in globals, locals`, as in

``````>>> exec "global a; a, b = 123, 42" in g, l
``````

However the alternate syntax `exec(code, globals, locals)` has always been accepted too (see below).

## `compile`

The `compile(source, filename, mode, flags=0, dont_inherit=False, optimize=-1)` built-in can be used to speed up repeated invocations of the same code with `exec` or `eval` by compiling the source into a `code` object beforehand. The `mode` parameter controls the kind of code fragment the `compile` function accepts and the kind of bytecode it produces. The choices are `"eval"`, `"exec"` and `"single"`:

• `"eval"` mode expects a single expression, and will produce bytecode that when run will return the value of that expression:

``````>>> dis.dis(compile("a + b", "<string>", "eval"))
7 RETURN_VALUE
``````
• `"exec"` accepts any kinds of python constructs from single expressions to whole modules of code, and executes them as if they were module top-level statements. The code object returns `None`:

``````>>> dis.dis(compile("a + b", "<string>", "exec"))
``````
• `"single"` is a limited form of `"exec"` which accepts a source code containing a single statement (or multiple statements separated by `;`) if the last statement is an expression statement, the resulting bytecode also prints the `repr` of the value of that expression to the standard output(!).

An `if`-`elif`-`else` chain, a loop with `else`, and `try` with its `except`, `else` and `finally` blocks is considered a single statement.

A source fragment containing 2 top-level statements is an error for the `"single"`, except in Python 2 there is a bug that sometimes allows multiple toplevel statements in the code; only the first is compiled; the rest are ignored:

In Python 2.7.8:

``````>>> exec(compile("a = 5
a = 6", "<string>", "single"))
>>> a
5
``````

And in Python 3.4.2:

``````>>> exec(compile("a = 5
a = 6", "<string>", "single"))
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "<string>", line 1
a = 5
^
SyntaxError: multiple statements found while compiling a single statement
``````

This is very useful for making interactive Python shells. However, the value of the expression is not returned, even if you `eval` the resulting code.

Thus greatest distinction of `exec` and `eval` actually comes from the `compile` function and its modes.

In addition to compiling source code to bytecode, `compile` supports compiling abstract syntax trees (parse trees of Python code) into `code` objects; and source code into abstract syntax trees (the `ast.parse` is written in Python and just calls `compile(source, filename, mode, PyCF_ONLY_AST)`); these are used for example for modifying source code on the fly, and also for dynamic code creation, as it is often easier to handle the code as a tree of nodes instead of lines of text in complex cases.

While `eval` only allows you to evaluate a string that contains a single expression, you can `eval` a whole statement, or even a whole module that has been `compile`d into bytecode; that is, with Python 2, `print` is a statement, and cannot be `eval`led directly:

``````>>> eval("for i in range(3): print("Python is cool")")
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "<string>", line 1
for i in range(3): print("Python is cool")
^
SyntaxError: invalid syntax
``````

`compile` it with `"exec"` mode into a `code` object and you can `eval` it; the `eval` function will return `None`.

``````>>> code = compile("for i in range(3): print("Python is cool")",
"foo.py", "exec")
>>> eval(code)
Python is cool
Python is cool
Python is cool
``````

If one looks into `eval` and `exec` source code in CPython 3, this is very evident; they both call `PyEval_EvalCode` with same arguments, the only difference being that `exec` explicitly returns `None`.

## Syntax differences of `exec` between Python 2 and Python 3

One of the major differences in Python 2 is that `exec` is a statement and `eval` is a built-in function (both are built-in functions in Python 3). It is a well-known fact that the official syntax of `exec` in Python 2 is `exec code [in globals[, locals]]`.

Unlike majority of the Python 2-to-3 porting guides seem to suggest, the `exec` statement in CPython 2 can be also used with syntax that looks exactly like the `exec` function invocation in Python 3. The reason is that Python 0.9.9 had the `exec(code, globals, locals)` built-in function! And that built-in function was replaced with `exec` statement somewhere before Python 1.0 release.

Since it was desirable to not break backwards compatibility with Python 0.9.9, Guido van Rossum added a compatibility hack in 1993: if the `code` was a tuple of length 2 or 3, and `globals` and `locals` were not passed into the `exec` statement otherwise, the `code` would be interpreted as if the 2nd and 3rd element of the tuple were the `globals` and `locals` respectively. The compatibility hack was not mentioned even in Python 1.4 documentation (the earliest available version online); and thus was not known to many writers of the porting guides and tools, until it was documented again in November 2012:

The first expression may also be a tuple of length 2 or 3. In this case, the optional parts must be omitted. The form `exec(expr, globals)` is equivalent to `exec expr in globals`, while the form `exec(expr, globals, locals)` is equivalent to `exec expr in globals, locals`. The tuple form of `exec` provides compatibility with Python 3, where `exec` is a function rather than a statement.

Yes, in CPython 2.7 that it is handily referred to as being a forward-compatibility option (why confuse people over that there is a backward compatibility option at all), when it actually had been there for backward-compatibility for two decades.

Thus while `exec` is a statement in Python 1 and Python 2, and a built-in function in Python 3 and Python 0.9.9,

``````>>> exec("print(a)", globals(), {"a": 42})
42
``````

has had identical behaviour in possibly every widely released Python version ever; and works in Jython 2.5.2, PyPy 2.3.1 (Python 2.7.6) and IronPython 2.6.1 too (kudos to them following the undocumented behaviour of CPython closely).

What you cannot do in Pythons 1.0 - 2.7 with its compatibility hack, is to store the return value of `exec` into a variable:

``````Python 2.7.11+ (default, Apr 17 2016, 14:00:29)
[GCC 5.3.1 20160413] on linux2
>>> a = exec("print(42)")
File "<stdin>", line 1
a = exec("print(42)")
^
SyntaxError: invalid syntax
``````

(which wouldn"t be useful in Python 3 either, as `exec` always returns `None`), or pass a reference to `exec`:

``````>>> call_later(exec, "print(42)", delay=1000)
File "<stdin>", line 1
call_later(exec, "print(42)", delay=1000)
^
SyntaxError: invalid syntax
``````

Which a pattern that someone might actually have used, though unlikely;

Or use it in a list comprehension:

``````>>> [exec(i) for i in ["print(42)", "print(foo)"]
File "<stdin>", line 1
[exec(i) for i in ["print(42)", "print(foo)"]
^
SyntaxError: invalid syntax
``````

which is abuse of list comprehensions (use a `for` loop instead!).

one easy way by using Pandas: (here I want to use mean normalization)

``````normalized_df=(df-df.mean())/df.std()
``````

to use min-max normalization:

``````normalized_df=(df-df.min())/(df.max()-df.min())
``````

Edit: To address some concerns, need to say that Pandas automatically applies colomn-wise function in the code above.

# TL;DR version:

For the simple case of:

• I have a text column with a delimiter and I want two columns

The simplest solution is:

``````df[["A", "B"]] = df["AB"].str.split(" ", 1, expand=True)
``````

You must use `expand=True` if your strings have a non-uniform number of splits and you want `None` to replace the missing values.

Notice how, in either case, the `.tolist()` method is not necessary. Neither is `zip()`.

# In detail:

Andy Hayden"s solution is most excellent in demonstrating the power of the `str.extract()` method.

But for a simple split over a known separator (like, splitting by dashes, or splitting by whitespace), the `.str.split()` method is enough1. It operates on a column (Series) of strings, and returns a column (Series) of lists:

``````>>> import pandas as pd
>>> df = pd.DataFrame({"AB": ["A1-B1", "A2-B2"]})
>>> df

AB
0  A1-B1
1  A2-B2
>>> df["AB_split"] = df["AB"].str.split("-")
>>> df

AB  AB_split
0  A1-B1  [A1, B1]
1  A2-B2  [A2, B2]
``````

1: If you"re unsure what the first two parameters of `.str.split()` do, I recommend the docs for the plain Python version of the method.

But how do you go from:

• a column containing two-element lists

to:

• two columns, each containing the respective element of the lists?

Well, we need to take a closer look at the `.str` attribute of a column.

It"s a magical object that is used to collect methods that treat each element in a column as a string, and then apply the respective method in each element as efficient as possible:

``````>>> upper_lower_df = pd.DataFrame({"U": ["A", "B", "C"]})
>>> upper_lower_df

U
0  A
1  B
2  C
>>> upper_lower_df["L"] = upper_lower_df["U"].str.lower()
>>> upper_lower_df

U  L
0  A  a
1  B  b
2  C  c
``````

But it also has an "indexing" interface for getting each element of a string by its index:

``````>>> df["AB"].str[0]

0    A
1    A
Name: AB, dtype: object

>>> df["AB"].str[1]

0    1
1    2
Name: AB, dtype: object
``````

Of course, this indexing interface of `.str` doesn"t really care if each element it"s indexing is actually a string, as long as it can be indexed, so:

``````>>> df["AB"].str.split("-", 1).str[0]

0    A1
1    A2
Name: AB, dtype: object

>>> df["AB"].str.split("-", 1).str[1]

0    B1
1    B2
Name: AB, dtype: object
``````

Then, it"s a simple matter of taking advantage of the Python tuple unpacking of iterables to do

``````>>> df["A"], df["B"] = df["AB"].str.split("-", 1).str
>>> df

AB  AB_split   A   B
0  A1-B1  [A1, B1]  A1  B1
1  A2-B2  [A2, B2]  A2  B2
``````

Of course, getting a DataFrame out of splitting a column of strings is so useful that the `.str.split()` method can do it for you with the `expand=True` parameter:

``````>>> df["AB"].str.split("-", 1, expand=True)

0   1
0  A1  B1
1  A2  B2
``````

So, another way of accomplishing what we wanted is to do:

``````>>> df = df[["AB"]]
>>> df

AB
0  A1-B1
1  A2-B2

>>> df.join(df["AB"].str.split("-", 1, expand=True).rename(columns={0:"A", 1:"B"}))

AB   A   B
0  A1-B1  A1  B1
1  A2-B2  A2  B2
``````

The `expand=True` version, although longer, has a distinct advantage over the tuple unpacking method. Tuple unpacking doesn"t deal well with splits of different lengths:

``````>>> df = pd.DataFrame({"AB": ["A1-B1", "A2-B2", "A3-B3-C3"]})
>>> df
AB
0     A1-B1
1     A2-B2
2  A3-B3-C3
>>> df["A"], df["B"], df["C"] = df["AB"].str.split("-")
Traceback (most recent call last):
[...]
ValueError: Length of values does not match length of index
>>>
``````

But `expand=True` handles it nicely by placing `None` in the columns for which there aren"t enough "splits":

``````>>> df.join(
...     df["AB"].str.split("-", expand=True).rename(
...         columns={0:"A", 1:"B", 2:"C"}
...     )
... )
AB   A   B     C
0     A1-B1  A1  B1  None
1     A2-B2  A2  B2  None
2  A3-B3-C3  A3  B3    C3
``````