# Python | Arithmetic operations in an excel file using openpyxl

File handling | open | Python Methods and Functions | sin

`Openpyxl ` — is a Python library with which you can perform several operations on Excel files, such as reading, writing, arithmetic, and graphing. Let's see how to perform various arithmetic operations using openpyxl.

• = SUM (cell1: cell2): adds all numbers in a range of cells.

 ` # import openpyxl module ` ` import ` ` openpyxl `   ` # Calling the Workbook () function from openpyxl ` ` # create a new blank Workbook object ` ` wb ` ` = ` ` openpyxl.Workbook () `   ` # Get a working sheet of the active sheet ` ` # from the active attribute. ` ` sheet ` ` = ` ` wb.active `   ` # write to worksheet cell Excel ` ` sheet [` ` 'A1' ` `] ` ` = ` ` 200 ` ` sheet [` ` 'A2' ` `] ` ` = ` ` 300 ` ` sheet [` ` 'A3' ` `] ` ` = ` ` 400 ` ` sheet [` ` 'A4' ` `] ` ` = ` ` 500 ` ` sheet [` ` 'A5' ` `] ` ` = ` ` 600 `   ` # The value in cell A7 is set to the formula ` ` # which sums the values ​​in A1, A2, A3, A4, A5 . ` ` sheet [` ` 'A7' ` `] ` ` = ` ` '= SUM (A1: A5)' ` ` `  ` # save file ` ` wb.save (` ` "sum.xlsx" ` `) `

Exit:

• = PRODUCT (cell 1: cell 2): ​​ Multiplies all numbers in a range of cells.

` `

` import openpyxl    wb = openpyxl.Workbook () sheet = wb.active   sheet [ 'A1' ] = 2 sheet [ 'A2' ] = 3 sheet [ 'A3' ] = 4 sheet [ 'A4' ] = 5 sheet [ ' A5' ] = 6   # The value in cell A7 is set to the formula # which multiplies the values ​​in A1, A2, A3, A4, A5. sheet [ 'A7' ] = ' = PRODUCT (A1: A5) '   wb.save ( "product.xlsx" ) `

Output:

• = MEDIUM (cell 1: cell 2): it gives the average (arithmetic mean) of all numbers that are present in the given range of cells.

 ` import ` ` openpyxl `   ` wb ` ` = ` ` openpyxl.Workbook () ` ` sheet ` ` = ` ` wb.active `   ` sheet [` ` 'A1' ` `] ` ` = ` ` 200 ` ` sheet [` ` 'A2 '` `] ` ` = ` ` 300 ` ` sheet [` ` 'A3' ] = 400 `` sheet [ 'A4' ] = 500 sheet [ 'A5' ] = 600   # The value in cell A7 is set to the formula # which return average in A1, A2, A3, A4, A5. sheet [ 'A7' ] = '= AVERAGE (A1: A5)'   wb.save ( "average.xlsx" ) `

Output:

• = QUOTIENT (num1, num2): returns the integer part of the division.

 ` import ` ` openpyxl `   ` wb ` ` = ` ` openpyxl.Workbook () ` ` sheet ` ` = ` ` wb.active `   ` # The value in the cell is set by the formula ` ` # which gives the factor value. ` ` sheet [` ` 'A1' ` `] ` ` = ` ` '= QUOTIENT (64, 8)' ` ` sheet [` ` 'A2 '` `] ` ` = ` `' = QUOTIENT (25, 4) '`   ` wb.save (` `" quotient.xlsx "` `) `

Output:

• = MOD (num1, num2): returns the remainder after dividing the number by the divisor.

` `

` import openpyxl   wb = openpyxl.Workbook () sheet = wb.active   # The value in the cell is set by the formula # which gives the remainder or value module. sheet [ 'A1' ] = '= MOD (64, 8)' sheet [ 'A2' ] = '= MOD (25, 4)'   wb.save ( "modulus.xlsx" ) `

` `

Output:

• = COUNT (cell a1: cell2): counts the number of cells in a range that contain a number.

 ` import ` ` openpyxl `   ` wb ` ` = ` ` openpyxl.Workbook () ` ` sheet ` ` = ` ` wb.active `   ` sheet [` ` 'A1' ` `] ` ` = ` ` 200 ` ` sheet [` ` 'A2' ` `] ` ` = ` ` 300 ` ` sheet [` ` 'A3' ` `] ` ` = ` ` 400 ` ` sheet [` ` 'A4' ` `] ` ` = ` ` 500 ` ` sheet [` ` 'A5' ` `] ` ` = ` ` 600 `   ` # The value in cell A7 is set to the formula ` ` # which gives a count of the number present in the cells. ` ` sheet [` ` 'A7' ` `] ` ` = ` ` '= COUNT (A1: A6)' `   ` wb.save (` ` "count.xlsx" ` `) `

Output:

## How can I open multiple files using "with open" in Python?

I want to change a couple of files at one time, iff I can write to all of them. I"m wondering if I somehow can combine the multiple open calls with the `with` statement:

``````try:
with open("a", "w") as a and open("b", "w") as b:
do_something()
except IOError as e:
print "Operation failed: %s" % e.strerror
``````

If that"s not possible, what would an elegant solution to this problem look like?

## open() in Python does not create a file if it doesn"t exist

What is the best way to open a file as read/write if it exists, or if it does not, then create it and open it as read/write? From what I read, `file = open("myfile.dat", "rw")` should do this, right?

It is not working for me (Python 2.6.2) and I"m wondering if it is a version problem, or not supposed to work like that or what.

The bottom line is, I just need a solution for the problem. I am curious about the other stuff, but all I need is a nice way to do the opening part.

The enclosing directory was writeable by user and group, not other (I"m on a Linux system... so permissions 775 in other words), and the exact error was:

IOError: no such file or directory.

## Difference between modes a, a+, w, w+, and r+ in built-in open function?

In the python built-in open function, what is the exact difference between the modes `w`, `a`, `w+`, `a+`, and `r+`?

In particular, the documentation implies that all of these will allow writing to the file, and says that it opens the files for "appending", "writing", and "updating" specifically, but does not define what these terms mean.

## Simple Digit Recognition OCR in OpenCV-Python

I am trying to implement a "Digit Recognition OCR" in OpenCV-Python (cv2). It is just for learning purposes. I would like to learn both KNearest and SVM features in OpenCV.

I have 100 samples (i.e. images) of each digit. I would like to train with them.

There is a sample `letter_recog.py` that comes with OpenCV sample. But I still couldn"t figure out on how to use it. I don"t understand what are the samples, responses etc. Also, it loads a txt file at first, which I didn"t understand first.

Later on searching a little bit, I could find a letter_recognition.data in cpp samples. I used it and made a code for cv2.KNearest in the model of letter_recog.py (just for testing):

``````import numpy as np
import cv2

fn = "letter-recognition.data"
a = np.loadtxt(fn, np.float32, delimiter=",", converters={ 0 : lambda ch : ord(ch)-ord("A") })
samples, responses = a[:,1:], a[:,0]

model = cv2.KNearest()
retval = model.train(samples,responses)
retval, results, neigh_resp, dists = model.find_nearest(samples, k = 10)
print results.ravel()
``````

It gave me an array of size 20000, I don"t understand what it is.

Questions:

1) What is letter_recognition.data file? How to build that file from my own data set?

2) What does `results.reval()` denote?

3) How we can write a simple digit recognition tool using letter_recognition.data file (either KNearest or SVM)?

## Does reading an entire file leave the file handle open?

If you read an entire file with `content = open("Path/to/file", "r").read()` is the file handle left open until the script exits? Is there a more concise method to read a whole file?

## Store output of subprocess.Popen call in a string

I"m trying to make a system call in Python and store the output to a string that I can manipulate in the Python program.

``````#!/usr/bin/python
import subprocess
p2 = subprocess.Popen("ntpq -p")
``````

I"ve tried a few things including some of the suggestions here:

Retrieving the output of subprocess.call()

but without any luck.

## "Unicode Error "unicodeescape" codec can"t decode bytes... Cannot open text files in Python 3

I am using Python 3.1 on a Windows 7 machine. Russian is the default system language, and utf-8 is the default encoding.

Looking at the answer to a previous question, I have attempting using the "codecs" module to give me a little luck. Here"s a few examples:

``````>>> g = codecs.open("C:UsersEricDesktopeeline.txt", "r", encoding="utf-8")
SyntaxError: (unicode error) "unicodeescape" codec can"t decode bytes in position 2-4: truncated UXXXXXXXX escape (<pyshell#39>, line 1)
``````
``````>>> g = codecs.open("C:UsersEricDesktopSite.txt", "r", encoding="utf-8")
SyntaxError: (unicode error) "unicodeescape" codec can"t decode bytes in position 2-4: truncated UXXXXXXXX escape (<pyshell#40>, line 1)
``````
``````>>> g = codecs.open("C:Python31Notes.txt", "r", encoding="utf-8")
SyntaxError: (unicode error) "unicodeescape" codec can"t decode bytes in position 11-12: malformed N character escape (<pyshell#41>, line 1)
``````
``````>>> g = codecs.open("C:UsersEricDesktopSite.txt", "r", encoding="utf-8")
SyntaxError: (unicode error) "unicodeescape" codec can"t decode bytes in position 2-4: truncated UXXXXXXXX escape (<pyshell#44>, line 1)
``````

My last idea was, I thought it might have been the fact that Windows "translates" a few folders, such as the "users" folder, into Russian (though typing "users" is still the correct path), so I tried it in the Python31 folder. Still, no luck. Any ideas?

## Python subprocess/Popen with a modified environment

I believe that running an external command with a slightly modified environment is a very common case. That"s how I tend to do it:

``````import subprocess, os
my_env = os.environ
my_env["PATH"] = "/usr/sbin:/sbin:" + my_env["PATH"]
subprocess.Popen(my_command, env=my_env)
``````

I"ve got a gut feeling that there"s a better way; does it look alright?

## Cannot find module cv2 when using OpenCV

I have installed OpenCV on the Occidentalis operating system (a variant of Raspbian) on a Raspberry Pi, using jayrambhia"s script found here. It installed version 2.4.5.

When I try `import cv2` in a Python program, I get the following message:

``````[email protected]~\$ python cam.py
Traceback (most recent call last)
File "cam.py", line 1, in <module>
import cv2
ImportError: No module named cv2
``````

The file `cv2.so` is stored in `/usr/local/lib/python2.7/site-packages/...`

There are also folders in `/usr/local/lib` called python3.2 and python2.6, which could be a problem but I"m not sure.

Is this a path error perhaps? Any help is appreciated, I am new to Linux.

## How to crop an image in OpenCV using Python

How can I crop images, like I"ve done before in PIL, using OpenCV.

Working example on PIL

``````im = Image.open("0.png").convert("L")
im = im.crop((1, 1, 98, 33))
im.save("_0.png")
``````

But how I can do it on OpenCV?

This is what I tried:

``````im = cv.imread("0.png", cv.CV_LOAD_IMAGE_GRAYSCALE)
(thresh, im_bw) = cv.threshold(im, 128, 255, cv.THRESH_OTSU)
im = cv.getRectSubPix(im_bw, (98, 33), (1, 1))
cv.imshow("Img", im)
cv.waitKey(0)
``````

But it doesn"t work.

I think I incorrectly used `getRectSubPix`. If this is the case, please explain how I can correctly use this function.

Since this question was asked in 2010, there has been real simplification in how to do simple multithreading with Python with map and pool.

The code below comes from an article/blog post that you should definitely check out (no affiliation) - Parallelism in one line: A Better Model for Day to Day Threading Tasks. I"ll summarize below - it ends up being just a few lines of code:

``````from multiprocessing.dummy import Pool as ThreadPool
results = pool.map(my_function, my_array)
``````

Which is the multithreaded version of:

``````results = []
for item in my_array:
results.append(my_function(item))
``````

Description

Map is a cool little function, and the key to easily injecting parallelism into your Python code. For those unfamiliar, map is something lifted from functional languages like Lisp. It is a function which maps another function over a sequence.

Map handles the iteration over the sequence for us, applies the function, and stores all of the results in a handy list at the end.

Implementation

Parallel versions of the map function are provided by two libraries:multiprocessing, and also its little known, but equally fantastic step child:multiprocessing.dummy.

`multiprocessing.dummy` is exactly the same as multiprocessing module, but uses threads instead (an important distinction - use multiple processes for CPU-intensive tasks; threads for (and during) I/O):

multiprocessing.dummy replicates the API of multiprocessing, but is no more than a wrapper around the threading module.

``````import urllib2
from multiprocessing.dummy import Pool as ThreadPool

urls = [
"http://www.python.org",
"http://www.onlamp.com/pub/a/python/2003/04/17/metaclasses.html",
"http://www.python.org/doc/",
"http://www.python.org/getit/",
"http://www.python.org/community/",
"https://wiki.python.org/moin/",
]

# Make the Pool of workers

# Open the URLs in their own threads
# and return the results
results = pool.map(urllib2.urlopen, urls)

# Close the pool and wait for the work to finish
pool.close()
pool.join()
``````

And the timing results:

``````Single thread:   14.4 seconds
4 Pool:   3.1 seconds
8 Pool:   1.4 seconds
13 Pool:   1.3 seconds
``````

Passing multiple arguments (works like this only in Python 3.3 and later):

To pass multiple arrays:

``````results = pool.starmap(function, zip(list_a, list_b))
``````

Or to pass a constant and an array:

``````results = pool.starmap(function, zip(itertools.repeat(constant), list_a))
``````

If you are using an earlier version of Python, you can pass multiple arguments via this workaround).

(Thanks to user136036 for the helpful comment.)

# `os.listdir()` - list in the current directory

With listdir in os module you get the files and the folders in the current dir

`````` import os
arr = os.listdir()
print(arr)

>>> ["\$RECYCLE.BIN", "work.txt", "3ebooks.txt", "documents"]
``````

## Looking in a directory

``````arr = os.listdir("c:\files")
``````

# `glob` from glob

with glob you can specify a type of file to list like this

``````import glob

txtfiles = []
for file in glob.glob("*.txt"):
txtfiles.append(file)
``````

## `glob` in a list comprehension

``````mylist = [f for f in glob.glob("*.txt")]
``````

## get the full path of only files in the current directory

``````import os
from os import listdir
from os.path import isfile, join

cwd = os.getcwd()
onlyfiles = [os.path.join(cwd, f) for f in os.listdir(cwd) if
os.path.isfile(os.path.join(cwd, f))]
print(onlyfiles)

["G:\getfilesname\getfilesname.py", "G:\getfilesname\example.txt"]
``````

## Getting the full path name with `os.path.abspath`

You get the full path in return

`````` import os
files_path = [os.path.abspath(x) for x in os.listdir()]
print(files_path)

["F:\documentiapplications.txt", "F:\documenticollections.txt"]
``````

## Walk: going through sub directories

os.walk returns the root, the directories list and the files list, that is why I unpacked them in r, d, f in the for loop; it, then, looks for other files and directories in the subfolders of the root and so on until there are no subfolders.

``````import os

# Getting the current work directory (cwd)
thisdir = os.getcwd()

# r=root, d=directories, f = files
for r, d, f in os.walk(thisdir):
for file in f:
if file.endswith(".docx"):
print(os.path.join(r, file))
``````

### `os.listdir()`: get files in the current directory (Python 2)

In Python 2, if you want the list of the files in the current directory, you have to give the argument as "." or os.getcwd() in the os.listdir method.

`````` import os
arr = os.listdir(".")
print(arr)

>>> ["\$RECYCLE.BIN", "work.txt", "3ebooks.txt", "documents"]
``````

### To go up in the directory tree

``````# Method 1
x = os.listdir("..")

# Method 2
x= os.listdir("/")
``````

### Get files: `os.listdir()` in a particular directory (Python 2 and 3)

`````` import os
arr = os.listdir("F:\python")
print(arr)

>>> ["\$RECYCLE.BIN", "work.txt", "3ebooks.txt", "documents"]
``````

### Get files of a particular subdirectory with `os.listdir()`

``````import os

x = os.listdir("./content")
``````

### `os.walk(".")` - current directory

`````` import os
arr = next(os.walk("."))[2]
print(arr)

>>> ["5bs_Turismo1.pdf", "5bs_Turismo1.pptx", "esperienza.txt"]
``````

### `next(os.walk("."))` and `os.path.join("dir", "file")`

`````` import os
arr = []
for d,r,f in next(os.walk("F:\_python")):
for file in f:
arr.append(os.path.join(r,file))

for f in arr:
print(files)

>>> F:\_python\dict_class.py
>>> F:\_python\programmi.txt
``````

### `next(os.walk("F:\")` - get the full path - list comprehension

`````` [os.path.join(r,file) for r,d,f in next(os.walk("F:\_python")) for file in f]

>>> ["F:\_python\dict_class.py", "F:\_python\programmi.txt"]
``````

### `os.walk` - get full path - all files in sub dirs**

``````x = [os.path.join(r,file) for r,d,f in os.walk("F:\_python") for file in f]
print(x)

``````

### `os.listdir()` - get only txt files

`````` arr_txt = [x for x in os.listdir() if x.endswith(".txt")]
print(arr_txt)

>>> ["work.txt", "3ebooks.txt"]
``````

## Using `glob` to get the full path of the files

If I should need the absolute path of the files:

``````from path import path
from glob import glob
x = [path(f).abspath() for f in glob("F:\*.txt")]
for f in x:
print(f)

>>> F:acquistionline.txt
>>> F:acquisti_2018.txt
>>> F:ootstrap_jquery_ecc.txt
``````

## Using `os.path.isfile` to avoid directories in the list

``````import os.path
listOfFiles = [f for f in os.listdir() if os.path.isfile(f)]
print(listOfFiles)

>>> ["a simple game.py", "data.txt", "decorator.py"]
``````

## Using `pathlib` from Python 3.4

``````import pathlib

flist = []
for p in pathlib.Path(".").iterdir():
if p.is_file():
print(p)
flist.append(p)

>>> error.PNG
>>> exemaker.bat
>>> guiprova.mp3
>>> setup.py
>>> speak_gui2.py
>>> thumb.PNG
``````

With `list comprehension`:

``````flist = [p for p in pathlib.Path(".").iterdir() if p.is_file()]
``````

Alternatively, use `pathlib.Path()` instead of `pathlib.Path(".")`

## Use glob method in pathlib.Path()

``````import pathlib

py = pathlib.Path().glob("*.py")
for file in py:
print(file)

>>> stack_overflow_list.py
>>> stack_overflow_list_tkinter.py
``````

## Get all and only files with os.walk

``````import os
x = [i[2] for i in os.walk(".")]
y=[]
for t in x:
for f in t:
y.append(f)
print(y)

>>> ["append_to_list.py", "data.txt", "data1.txt", "data2.txt", "data_180617", "os_walk.py", "READ2.py", "read_data.py", "somma_defaltdic.py", "substitute_words.py", "sum_data.py", "data.txt", "data1.txt", "data_180617"]
``````

## Get only files with next and walk in a directory

`````` import os
x = next(os.walk("F://python"))[2]
print(x)

>>> ["calculator.bat","calculator.py"]
``````

## Get only directories with next and walk in a directory

`````` import os
next(os.walk("F://python"))[1] # for the current dir use (".")

>>> ["python3","others"]
``````

## Get all the subdir names with `walk`

``````for r,d,f in os.walk("F:\_python"):
for dirs in d:
print(dirs)

>>> .vscode
>>> pyexcel
>>> pyschool.py
>>> subtitles
>>> _metaprogramming
>>> .ipynb_checkpoints
``````

## `os.scandir()` from Python 3.5 and greater

``````import os
x = [f.name for f in os.scandir() if f.is_file()]
print(x)

>>> ["calculator.bat","calculator.py"]

# Another example with scandir (a little variation from docs.python.org)
# This one is more efficient than os.listdir.
# In this case, it shows the files only in the current directory
# where the script is executed.

import os
with os.scandir() as i:
for entry in i:
if entry.is_file():
print(entry.name)

>>> ebookmaker.py
>>> error.PNG
>>> exemaker.bat
>>> guiprova.mp3
>>> setup.py
>>> speakgui4.py
>>> speak_gui2.py
>>> speak_gui3.py
>>> thumb.PNG
``````

# Examples:

## Ex. 1: How many files are there in the subdirectories?

In this example, we look for the number of files that are included in all the directory and its subdirectories.

``````import os

def count(dir, counter=0):
"returns number of files in dir and subdirs"
for pack in os.walk(dir):
for f in pack[2]:
counter += 1
return dir + " : " + str(counter) + "files"

print(count("F:\python"))

>>> "F:\python" : 12057 files"
``````

## Ex.2: How to copy all files from a directory to another?

A script to make order in your computer finding all files of a type (default: pptx) and copying them in a new folder.

``````import os
import shutil
from path import path

destination = "F:\file_copied"
# os.makedirs(destination)

def copyfile(dir, filetype="pptx", counter=0):
"Searches for pptx (or other - pptx is the default) files and copies them"
for pack in os.walk(dir):
for f in pack[2]:
if f.endswith(filetype):
fullpath = pack[0] + "\" + f
print(fullpath)
shutil.copy(fullpath, destination)
counter += 1
if counter > 0:
print("-" * 30)
print("	==> Found in: `" + dir + "` : " + str(counter) + " files
")

for dir in os.listdir():
"searches for folders that starts with `_`"
if dir[0] == "_":
# copyfile(dir, filetype="pdf")
copyfile(dir, filetype="txt")

>>> _compiti18Compito Contabilit√† 1conti.txt
>>> _compiti18Compito Contabilit√† 1modula4.txt
>>> _compiti18Compito Contabilit√† 1moduloa4.txt
>>> ------------------------
>>> ==> Found in: `_compiti18` : 3 files
``````

## Ex. 3: How to get all the files in a txt file

In case you want to create a txt file with all the file names:

``````import os
mylist = ""
with open("filelist.txt", "w", encoding="utf-8") as file:
for eachfile in os.listdir():
mylist += eachfile + "
"
file.write(mylist)
``````

## Example: txt with all the files of an hard drive

``````"""
We are going to save a txt file with all the files in your directory.
We will use the function walk()
"""

import os

# see all the methods of os
# print(*dir(os), sep=", ")
listafile = []
percorso = []
with open("lista_file.txt", "w", encoding="utf-8") as testo:
for root, dirs, files in os.walk("D:\"):
for file in files:
listafile.append(file)
percorso.append(root + "\" + file)
testo.write(file + "
")
listafile.sort()
print("N. of files", len(listafile))
with open("lista_file_ordinata.txt", "w", encoding="utf-8") as testo_ordinato:
for file in listafile:
testo_ordinato.write(file + "
")

with open("percorso.txt", "w", encoding="utf-8") as file_percorso:
for file in percorso:
file_percorso.write(file + "
")

os.system("lista_file.txt")
os.system("lista_file_ordinata.txt")
os.system("percorso.txt")
``````

## All the file of C: in one text file

This is a shorter version of the previous code. Change the folder where to start finding the files if you need to start from another position. This code generate a 50 mb on text file on my computer with something less then 500.000 lines with files with the complete path.

``````import os

with open("file.txt", "w", encoding="utf-8") as filewrite:
for r, d, f in os.walk("C:\"):
for file in f:
filewrite.write(f"{r + file}
")
``````

## How to write a file with all paths in a folder of a type

With this function you can create a txt file that will have the name of a type of file that you look for (ex. pngfile.txt) with all the full path of all the files of that type. It can be useful sometimes, I think.

``````import os

def searchfiles(extension=".ttf", folder="H:\"):
"Create a txt file with all the file of a type"
with open(extension[1:] + "file.txt", "w", encoding="utf-8") as filewrite:
for r, d, f in os.walk(folder):
for file in f:
if file.endswith(extension):
filewrite.write(f"{r + file}
")

# looking for png file (fonts) in the hard disk H:
searchfiles(".png", "H:\")

>>> H:4bs_18Dolphins5.png
>>> H:4bs_18Dolphins6.png
>>> H:4bs_18Dolphins7.png
>>> H:5_18marketing htmlassetsimageslogo2.png
>>> H:7z001.png
>>> H:7z002.png
``````

## (New) Find all files and open them with tkinter GUI

I just wanted to add in this 2019 a little app to search for all files in a dir and be able to open them by doubleclicking on the name of the file in the list.

``````import tkinter as tk
import os

def searchfiles(extension=".txt", folder="H:\"):
"insert all files in the listbox"
for r, d, f in os.walk(folder):
for file in f:
if file.endswith(extension):
lb.insert(0, r + "\" + file)

def open_file():
os.startfile(lb.get(lb.curselection()[0]))

root = tk.Tk()
root.geometry("400x400")
bt = tk.Button(root, text="Search", command=lambda:searchfiles(".png", "H:\"))
bt.pack()
lb = tk.Listbox(root)
lb.pack(fill="both", expand=1)
lb.bind("<Double-Button>", lambda x: open_file())
root.mainloop()
``````

I just used the following which was quite simple. First open a console then cd to where you"ve downloaded your file like some-package.whl and use

``````pip install some-package.whl
``````

Note: if pip.exe is not recognized, you may find it in the "Scripts" directory from where python has been installed. If pip is not installed, this page can help: How do I install pip on Windows?

Note: for clarification
If you copy the `*.whl` file to your local drive (ex. C:some-dirsome-file.whl) use the following command line parameters --

``````pip install C:/some-dir/some-file.whl
``````

This is the behaviour to adopt when the referenced object is deleted. It is not specific to Django; this is an SQL standard. Although Django has its own implementation on top of SQL. (1)

There are seven possible actions to take when such event occurs:

• `CASCADE`: When the referenced object is deleted, also delete the objects that have references to it (when you remove a blog post for instance, you might want to delete comments as well). SQL equivalent: `CASCADE`.
• `PROTECT`: Forbid the deletion of the referenced object. To delete it you will have to delete all objects that reference it manually. SQL equivalent: `RESTRICT`.
• `RESTRICT`: (introduced in Django 3.1) Similar behavior as `PROTECT` that matches SQL"s `RESTRICT` more accurately. (See django documentation example)
• `SET_NULL`: Set the reference to NULL (requires the field to be nullable). For instance, when you delete a User, you might want to keep the comments he posted on blog posts, but say it was posted by an anonymous (or deleted) user. SQL equivalent: `SET NULL`.
• `SET_DEFAULT`: Set the default value. SQL equivalent: `SET DEFAULT`.
• `SET(...)`: Set a given value. This one is not part of the SQL standard and is entirely handled by Django.
• `DO_NOTHING`: Probably a very bad idea since this would create integrity issues in your database (referencing an object that actually doesn"t exist). SQL equivalent: `NO ACTION`. (2)

Source: Django documentation

In most cases, `CASCADE` is the expected behaviour, but for every ForeignKey, you should always ask yourself what is the expected behaviour in this situation. `PROTECT` and `SET_NULL` are often useful. Setting `CASCADE` where it should not, can potentially delete all of your database in cascade, by simply deleting a single user.

It"s funny to notice that the direction of the `CASCADE` action is not clear to many people. Actually, it"s funny to notice that only the `CASCADE` action is not clear. I understand the cascade behavior might be confusing, however you must think that it is the same direction as any other action. Thus, if you feel that `CASCADE` direction is not clear to you, it actually means that `on_delete` behavior is not clear to you.

In your database, a foreign key is basically represented by an integer field which value is the primary key of the foreign object. Let"s say you have an entry comment_A, which has a foreign key to an entry article_B. If you delete the entry comment_A, everything is fine. article_B used to live without comment_A and don"t bother if it"s deleted. However, if you delete article_B, then comment_A panics! It never lived without article_B and needs it, and it"s part of its attributes (`article=article_B`, but what is article_B???). This is where `on_delete` steps in, to determine how to resolve this integrity error, either by saying:

• "No! Please! Don"t! I can"t live without you!" (which is said `PROTECT` or `RESTRICT` in Django/SQL)
• "All right, if I"m not yours, then I"m nobody"s" (which is said `SET_NULL`)
• "Good bye world, I can"t live without article_B" and commit suicide (this is the `CASCADE` behavior).
• "It"s OK, I"ve got spare lover, and I"ll reference article_C from now" (`SET_DEFAULT`, or even `SET(...)`).
• "I can"t face reality, and I"ll keep calling your name even if that"s the only thing left to me!" (`DO_NOTHING`)

I hope it makes cascade direction clearer. :)

Footnotes

(1) Django has its own implementation on top of SQL. And, as mentioned by @JoeMjr2 in the comments below, Django will not create the SQL constraints. If you want the constraints to be ensured by your database (for instance, if your database is used by another application, or if you hang in the database console from time to time), you might want to set the related constraints manually yourself. There is an open ticket to add support for database-level on delete constrains in Django.

(2) Actually, there is one case where `DO_NOTHING` can be useful: If you want to skip Django"s implementation and implement the constraint yourself at the database-level.

Running `brew reinstall [email protected]` didn"t work for my existing Python 2.7 virtual environments. Inside them there were still `ERROR:root:code for hash sha1 was not found` errors.

I encountered this problem after I ran `brew upgrade openssl`. And here"s the fix:

``````\$ ls /usr/local/Cellar/openssl
``````

...which shows

``````1.0.2t
``````

According to the existing version, run:

``````\$ brew switch openssl 1.0.2t
``````

...which shows

``````Cleaning /usr/local/Cellar/openssl/1.0.2t
``````

After that, run the following command in a Python 2.7 virtualenv:

``````(my-venv) \$ python -c "import hashlib;m=hashlib.md5();print(m.hexdigest())"
``````

...which shows

``````d41d8cd98f00b204e9800998ecf8427e
``````

No more errors.

You opened the file in binary mode:

``````with open(fname, "rb") as f:
``````

This means that all data read from the file is returned as `bytes` objects, not `str`. You cannot then use a string in a containment test:

``````if "some-pattern" in tmp: continue
``````

You"d have to use a `bytes` object to test against `tmp` instead:

``````if b"some-pattern" in tmp: continue
``````

or open the file as a textfile instead by replacing the `"rb"` mode with `"r"`.

‚ö°Ô∏è TL;DR ‚Äî One line solution.

All you have to do is:

``````sudo easy_install pip
``````

2019: ‚ö†Ô∏è`easy_install` has been deprecated. Check Method #2 below for preferred installation!

Details:

‚ö°Ô∏è OK, I read the solutions given above, but here"s an EASY solution to install `pip`.

MacOS comes with `Python` installed. But to make sure that you have `Python` installed open the terminal and run the following command.

``````python --version
``````

If this command returns a version number that means `Python` exists. Which also means that you already have access to `easy_install` considering you are using `macOS/OSX`.

‚ÑπÔ∏è Now, all you have to do is run the following command.

``````sudo easy_install pip
``````

After that, `pip` will be installed and you"ll be able to use it for installing other packages.

Let me know if you have any problems installing `pip` this way.

Cheers!

P.S. I ended up blogging a post about it. QuickTip: How Do I Install pip on macOS or OS X?

‚úÖ UPDATE (Jan 2019): METHOD #2: Two line solution ‚Äî

`easy_install` has been deprecated. Please use `get-pip.py` instead.

First of all download the `get-pip` file

``````curl https://bootstrap.pypa.io/get-pip.py -o get-pip.py
``````

Now run this file to install `pip`

``````python get-pip.py
``````

That should do it.

Another gif you said? Here ya go!

I noticed that every now and then I need to Google fopen all over again, just to build a mental image of what the primary differences between the modes are. So, I thought a diagram will be faster to read next time. Maybe someone else will find that helpful too.

It helps to install a python package `foo` on your machine (can also be in `virtualenv`) so that you can import the package `foo` from other projects and also from [I]Python prompts.

It does the similar job of `pip`, `easy_install` etc.,

Using `setup.py`

Package - A folder/directory that contains `__init__.py` file.
Module - A valid python file with `.py` extension.
Distribution - How one package relates to other packages and modules.

Let"s say you want to install a package named `foo`. Then you do,

``````\$ git clone https://github.com/user/foo
\$ cd foo
\$ python setup.py install
``````

Instead, if you don"t want to actually install it but still would like to use it. Then do,

``````\$ python setup.py develop
``````

This command will create symlinks to the source directory within site-packages instead of copying things. Because of this, it is quite fast (particularly for large packages).

Creating `setup.py`

If you have your package tree like,

``````foo
‚îú‚îÄ‚îÄ foo
‚îÇ¬†¬† ‚îú‚îÄ‚îÄ data_struct.py
‚îÇ¬†¬† ‚îú‚îÄ‚îÄ __init__.py
‚îÇ¬†¬† ‚îî‚îÄ‚îÄ internals.py
‚îú‚îÄ‚îÄ requirements.txt
‚îî‚îÄ‚îÄ setup.py
``````

Then, you do the following in your `setup.py` script so that it can be installed on some machine:

``````from setuptools import setup

setup(
name="foo",
version="1.0",
description="A useful module",
author="Man Foo",
author_email="[email protected]",
packages=["foo"],  #same as name
install_requires=["wheel", "bar", "greek"], #external packages as dependencies
)
``````

Instead, if your package tree is more complex like the one below:

``````foo
‚îú‚îÄ‚îÄ foo
‚îÇ¬†¬† ‚îú‚îÄ‚îÄ data_struct.py
‚îÇ¬†¬† ‚îú‚îÄ‚îÄ __init__.py
‚îÇ¬†¬† ‚îî‚îÄ‚îÄ internals.py
‚îú‚îÄ‚îÄ requirements.txt
‚îú‚îÄ‚îÄ scripts
‚îÇ¬†¬† ‚îú‚îÄ‚îÄ cool
‚îÇ¬†¬† ‚îî‚îÄ‚îÄ skype
‚îî‚îÄ‚îÄ setup.py
``````

Then, your `setup.py` in this case would be like:

``````from setuptools import setup

setup(
name="foo",
version="1.0",
description="A useful module",
author="Man Foo",
author_email="[email protected]",
packages=["foo"],  #same as name
install_requires=["wheel", "bar", "greek"], #external packages as dependencies
scripts=[
"scripts/cool",
"scripts/skype",
]
)
``````

Add more stuff to (`setup.py`) & make it decent:

``````from setuptools import setup

setup(
name="foo",
version="1.0",
description="A useful module",
long_description=long_description,
author="Man Foo",
author_email="[email protected]",
url="http://www.foopackage.com/",
packages=["foo"],  #same as name
install_requires=["wheel", "bar", "greek"], #external packages as dependencies
scripts=[
"scripts/cool",
"scripts/skype",
]
)
``````

The `long_description` is used in pypi.org as the README description of your package.

And finally, you"re now ready to upload your package to PyPi.org so that others can install your package using `pip install yourpackage`.

At this point there are two options.

• publish in the temporary test.pypi.org server to make oneself familiarize with the procedure, and then publish it on the permanent pypi.org server for the public to use your package.
• publish straight away on the permanent pypi.org server, if you are already familiar with the procedure and have your user credentials (e.g., username, password, package name)

Once your package name is registered in pypi.org, nobody can claim or use it. Python packaging suggests the twine package for uploading purposes (of your package to PyPi). Thus,

(1) the first step is to locally build the distributions using:

``````# prereq: wheel (pip install wheel)
\$ python setup.py sdist bdist_wheel
``````

(2) then using `twine` for uploading either to test.pypi.org or pypi.org:

``````\$ twine upload --repository testpypi dist/*
``````

It will take few minutes for the package to appear on test.pypi.org. Once you"re satisfied with it, you can then upload your package to the real & permanent index of pypi.org simply with:

``````\$ twine upload dist/*
``````

Optionally, you can also sign the files in your package with a `GPG` by:

``````\$ twine upload dist/* --sign
``````

# tl;dr / quick fix

• Don"t decode/encode willy nilly
• Don"t assume your strings are UTF-8 encoded
• Try to convert strings to Unicode strings as soon as possible in your code
• Fix your locale: How to solve UnicodeDecodeError in Python 3.6?
• Don"t be tempted to use quick `reload` hacks

# Unicode Zen in Python 2.x - The Long Version

Without seeing the source it"s difficult to know the root cause, so I"ll have to speak generally.

`UnicodeDecodeError: "ascii" codec can"t decode byte` generally happens when you try to convert a Python 2.x `str` that contains non-ASCII to a Unicode string without specifying the encoding of the original string.

In brief, Unicode strings are an entirely separate type of Python string that does not contain any encoding. They only hold Unicode point codes and therefore can hold any Unicode point from across the entire spectrum. Strings contain encoded text, beit UTF-8, UTF-16, ISO-8895-1, GBK, Big5 etc. Strings are decoded to Unicode and Unicodes are encoded to strings. Files and text data are always transferred in encoded strings.

The Markdown module authors probably use `unicode()` (where the exception is thrown) as a quality gate to the rest of the code - it will convert ASCII or re-wrap existing Unicodes strings to a new Unicode string. The Markdown authors can"t know the encoding of the incoming string so will rely on you to decode strings to Unicode strings before passing to Markdown.

Unicode strings can be declared in your code using the `u` prefix to strings. E.g.

``````>>> my_u = u"my √ºnic√¥d√© strƒØng"
>>> type(my_u)
<type "unicode">
``````

Unicode strings may also come from file, databases and network modules. When this happens, you don"t need to worry about the encoding.

# Gotchas

Conversion from `str` to Unicode can happen even when you don"t explicitly call `unicode()`.

The following scenarios cause `UnicodeDecodeError` exceptions:

``````# Explicit conversion without encoding
unicode("‚Ç¨")

# New style format string into Unicode string
# Python will try to convert value string to Unicode first
u"The currency is: {}".format("‚Ç¨")

# Old style format string into Unicode string
# Python will try to convert value string to Unicode first
u"The currency is: %s" % "‚Ç¨"

# Append string to Unicode
# Python will try to convert string to Unicode first
u"The currency is: " + "‚Ç¨"
``````

## Examples

In the following diagram, you can see how the word `caf√©` has been encoded in either "UTF-8" or "Cp1252" encoding depending on the terminal type. In both examples, `caf` is just regular ascii. In UTF-8, `√©` is encoded using two bytes. In "Cp1252", √© is 0xE9 (which is also happens to be the Unicode point value (it"s no coincidence)). The correct `decode()` is invoked and conversion to a Python Unicode is successfull:

In this diagram, `decode()` is called with `ascii` (which is the same as calling `unicode()` without an encoding given). As ASCII can"t contain bytes greater than `0x7F`, this will throw a `UnicodeDecodeError` exception:

# The Unicode Sandwich

It"s good practice to form a Unicode sandwich in your code, where you decode all incoming data to Unicode strings, work with Unicodes, then encode to `str`s on the way out. This saves you from worrying about the encoding of strings in the middle of your code.

## Input / Decode

### Source code

If you need to bake non-ASCII into your source code, just create Unicode strings by prefixing the string with a `u`. E.g.

``````u"Z√ºrich"
``````

To allow Python to decode your source code, you will need to add an encoding header to match the actual encoding of your file. For example, if your file was encoded as "UTF-8", you would use:

``````# encoding: utf-8
``````

This is only necessary when you have non-ASCII in your source code.

### Files

Usually non-ASCII data is received from a file. The `io` module provides a TextWrapper that decodes your file on the fly, using a given `encoding`. You must use the correct encoding for the file - it can"t be easily guessed. For example, for a UTF-8 file:

``````import io
with io.open("my_utf8_file.txt", "r", encoding="utf-8") as my_file:
``````

`my_unicode_string` would then be suitable for passing to Markdown. If a `UnicodeDecodeError` from the `read()` line, then you"ve probably used the wrong encoding value.

### CSV Files

The Python 2.7 CSV module does not support non-ASCII characters üò©. Help is at hand, however, with https://pypi.python.org/pypi/backports.csv.

Use it like above but pass the opened file to it:

``````from backports import csv
import io
with io.open("my_utf8_file.txt", "r", encoding="utf-8") as my_file:
yield row
``````

### Databases

Most Python database drivers can return data in Unicode, but usually require a little configuration. Always use Unicode strings for SQL queries.

MySQL

``````charset="utf8",
use_unicode=True
``````

E.g.

``````>>> db = MySQLdb.connect(host="localhost", user="root", passwd="passwd", db="sandbox", use_unicode=True, charset="utf8")
``````
PostgreSQL

``````psycopg2.extensions.register_type(psycopg2.extensions.UNICODE)
psycopg2.extensions.register_type(psycopg2.extensions.UNICODEARRAY)
``````

### HTTP

Web pages can be encoded in just about any encoding. The `Content-type` header should contain a `charset` field to hint at the encoding. The content can then be decoded manually against this value. Alternatively, Python-Requests returns Unicodes in `response.text`.

### Manually

If you must decode strings manually, you can simply do `my_string.decode(encoding)`, where `encoding` is the appropriate encoding. Python 2.x supported codecs are given here: Standard Encodings. Again, if you get `UnicodeDecodeError` then you"ve probably got the wrong encoding.

## The meat of the sandwich

Work with Unicodes as you would normal strs.

## Output

### stdout / printing

`print` writes through the stdout stream. Python tries to configure an encoder on stdout so that Unicodes are encoded to the console"s encoding. For example, if a Linux shell"s `locale` is `en_GB.UTF-8`, the output will be encoded to `UTF-8`. On Windows, you will be limited to an 8bit code page.

An incorrectly configured console, such as corrupt locale, can lead to unexpected print errors. `PYTHONIOENCODING` environment variable can force the encoding for stdout.

### Files

Just like input, `io.open` can be used to transparently convert Unicodes to encoded byte strings.

### Database

The same configuration for reading will allow Unicodes to be written directly.

# Python 3

Python 3 is no more Unicode capable than Python 2.x is, however it is slightly less confused on the topic. E.g the regular `str` is now a Unicode string and the old `str` is now `bytes`.

The default encoding is UTF-8, so if you `.decode()` a byte string without giving an encoding, Python 3 uses UTF-8 encoding. This probably fixes 50% of people"s Unicode problems.

Further, `open()` operates in text mode by default, so returns decoded `str` (Unicode ones). The encoding is derived from your locale, which tends to be UTF-8 on Un*x systems or an 8-bit code page, such as windows-1251, on Windows boxes.

# Why you shouldn"t use `sys.setdefaultencoding("utf8")`

It"s a nasty hack (there"s a reason you have to use `reload`) that will only mask problems and hinder your migration to Python 3.x. Understand the problem, fix the root cause and enjoy Unicode zen. See Why should we NOT use sys.setdefaultencoding("utf-8") in a py script? for further details

## How do I merge two dictionaries in a single expression (taking union of dictionaries)?

### Question by Carl Meyer

I have two Python dictionaries, and I want to write a single expression that returns these two dictionaries, merged (i.e. taking the union). The `update()` method would be what I need, if it returned its result instead of modifying a dictionary in-place.

``````>>> x = {"a": 1, "b": 2}
>>> y = {"b": 10, "c": 11}
>>> z = x.update(y)
>>> print(z)
None
>>> x
{"a": 1, "b": 10, "c": 11}
``````

How can I get that final merged dictionary in `z`, not `x`?

(To be extra-clear, the last-one-wins conflict-handling of `dict.update()` is what I"m looking for as well.)

## Accessing the index in "for" loops?

### Question by Joan Venge

How do I access the index in a `for` loop like the following?

``````ints = [8, 23, 45, 12, 78]
for i in ints:
print("item #{} = {}".format(???, i))
``````

I want to get this output:

``````item #1 = 8
item #2 = 23
item #3 = 45
item #4 = 12
item #5 = 78
``````

When I loop through it using a `for` loop, how do I access the loop index, from 1 to 5 in this case?

## Iterating over dictionaries using "for" loops

I am a bit puzzled by the following code:

``````d = {"x": 1, "y": 2, "z": 3}
for key in d:
print (key, "corresponds to", d[key])
``````

What I don"t understand is the `key` portion. How does Python recognize that it needs only to read the key from the dictionary? Is `key` a special word in Python? Or is it simply a variable?

## Using global variables in a function

How can I create or use a global variable in a function?

If I create a global variable in one function, how can I use that global variable in another function? Do I need to store the global variable in a local variable of the function which needs its access?

## Manually raising (throwing) an exception in Python

How can I raise an exception in Python so that it can later be caught via an `except` block?

## Calling a function of a module by using its name (a string)

What is the best way to go about calling a function given a string with the function"s name in a Python program. For example, let"s say that I have a module `foo`, and I have a string whose content is `"bar"`. What is the best way to call `foo.bar()`?

I need to get the return value of the function, which is why I don"t just use `eval`. I figured out how to do it by using `eval` to define a temp function that returns the result of that function call, but I"m hoping that there is a more elegant way to do this.

## What is the meaning of single and double underscore before an object name?

Can someone please explain the exact meaning of having single and double leading underscores before an object"s name in Python, and the difference between both?

Also, does that meaning stay the same regardless of whether the object in question is a variable, a function, a method, etc.?

## Save plot to image file instead of displaying it using Matplotlib

I am writing a quick-and-dirty script to generate plots on the fly. I am using the code below (from Matplotlib documentation) as a starting point:

``````from pylab import figure, axes, pie, title, show

# Make a square figure and axes
figure(1, figsize=(6, 6))
ax = axes([0.1, 0.1, 0.8, 0.8])

labels = "Frogs", "Hogs", "Dogs", "Logs"
fracs = [15, 30, 45, 10]

explode = (0, 0.05, 0, 0)
title("Raining Hogs and Dogs", bbox={"facecolor": "0.8", "pad": 5})

show()  # Actually, don"t show, just save to foo.png
``````

I don"t want to display the plot on a GUI, instead, I want to save the plot to a file (say foo.png), so that, for example, it can be used in batch scripts. How do I do that?

## What are the differences between type() and isinstance()?

What are the differences between these two code fragments?

Using `type()`:

``````import types

if type(a) is types.DictType:
do_something()
if type(b) in types.StringTypes:
do_something_else()
``````

Using `isinstance()`:

``````if isinstance(a, dict):
do_something()
if isinstance(b, str) or isinstance(b, unicode):
do_something_else()
``````

## How can I install packages using pip according to the requirements.txt file from a local directory?

Here is the problem:

I have a requirements.txt file that looks like:

``````BeautifulSoup==3.2.0
Django==1.3
Fabric==1.2.0
Jinja2==2.5.5
PyYAML==3.09
Pygments==1.4
SQLAlchemy==0.7.1
South==0.7.3
amqplib==0.6.1
anyjson==0.3
...
``````

I have a local archive directory containing all the packages + others.

I have created a new virtualenv with

``````bin/virtualenv testing
``````

Upon activating it, I tried to install the packages according to requirements.txt from the local archive directory.

``````source bin/activate
pip install -r /path/to/requirements.txt -f file:///path/to/archive/
``````

I got some output that seems to indicate that the installation is fine:

``````Downloading/unpacking Fabric==1.2.0 (from -r ../testing/requirements.txt (line 3))
Running setup.py egg_info for package Fabric
warning: no previously-included files matching "*" found under directory "docs/_build"
warning: no files found matching "fabfile.py"
Running setup.py egg_info for package South
....
``````

But a later check revealed none of the package is installed properly. I cannot import the package, and none is found in the site-packages directory of my virtualenv. So what went wrong?

The Python 3 `range()` object doesn"t produce numbers immediately; it is a smart sequence object that produces numbers on demand. All it contains is your start, stop and step values, then as you iterate over the object the next integer is calculated each iteration.

The object also implements the `object.__contains__` hook, and calculates if your number is part of its range. Calculating is a (near) constant time operation *. There is never a need to scan through all possible integers in the range.

The advantage of the `range` type over a regular `list` or `tuple` is that a range object will always take the same (small) amount of memory, no matter the size of the range it represents (as it only stores the `start`, `stop` and `step` values, calculating individual items and subranges as needed).

So at a minimum, your `range()` object would do:

``````class my_range:
def __init__(self, start, stop=None, step=1, /):
if stop is None:
start, stop = 0, start
self.start, self.stop, self.step = start, stop, step
if step < 0:
lo, hi, step = stop, start, -step
else:
lo, hi = start, stop
self.length = 0 if lo > hi else ((hi - lo - 1) // step) + 1

def __iter__(self):
current = self.start
if self.step < 0:
while current > self.stop:
yield current
current += self.step
else:
while current < self.stop:
yield current
current += self.step

def __len__(self):
return self.length

def __getitem__(self, i):
if i < 0:
i += self.length
if 0 <= i < self.length:
return self.start + i * self.step
raise IndexError("my_range object index out of range")

def __contains__(self, num):
if self.step < 0:
if not (self.stop < num <= self.start):
return False
else:
if not (self.start <= num < self.stop):
return False
return (num - self.start) % self.step == 0
``````

This is still missing several things that a real `range()` supports (such as the `.index()` or `.count()` methods, hashing, equality testing, or slicing), but should give you an idea.

I also simplified the `__contains__` implementation to only focus on integer tests; if you give a real `range()` object a non-integer value (including subclasses of `int`), a slow scan is initiated to see if there is a match, just as if you use a containment test against a list of all the contained values. This was done to continue to support other numeric types that just happen to support equality testing with integers but are not expected to support integer arithmetic as well. See the original Python issue that implemented the containment test.

* Near constant time because Python integers are unbounded and so math operations also grow in time as N grows, making this a O(log N) operation. Since it‚Äôs all executed in optimised C code and Python stores integer values in 30-bit chunks, you‚Äôd run out of memory before you saw any performance impact due to the size of the integers involved here.

# Recommendation for beginners:

This is my personal recommendation for beginners: start by learning `virtualenv` and `pip`, tools which work with both Python 2 and 3 and in a variety of situations, and pick up other tools once you start needing them.

# PyPI packages not in the standard library:

• `virtualenv` is a very popular tool that creates isolated Python environments for Python libraries. If you"re not familiar with this tool, I highly recommend learning it, as it is a very useful tool, and I"ll be making comparisons to it for the rest of this answer.

It works by installing a bunch of files in a directory (eg: `env/`), and then modifying the `PATH` environment variable to prefix it with a custom `bin` directory (eg: `env/bin/`). An exact copy of the `python` or `python3` binary is placed in this directory, but Python is programmed to look for libraries relative to its path first, in the environment directory. It"s not part of Python"s standard library, but is officially blessed by the PyPA (Python Packaging Authority). Once activated, you can install packages in the virtual environment using `pip`.

• `pyenv` is used to isolate Python versions. For example, you may want to test your code against Python 2.7, 3.6, 3.7 and 3.8, so you"ll need a way to switch between them. Once activated, it prefixes the `PATH` environment variable with `~/.pyenv/shims`, where there are special files matching the Python commands (`python`, `pip`). These are not copies of the Python-shipped commands; they are special scripts that decide on the fly which version of Python to run based on the `PYENV_VERSION` environment variable, or the `.python-version` file, or the `~/.pyenv/version` file. `pyenv` also makes the process of downloading and installing multiple Python versions easier, using the command `pyenv install`.

• `pyenv-virtualenv` is a plugin for `pyenv` by the same author as `pyenv`, to allow you to use `pyenv` and `virtualenv` at the same time conveniently. However, if you"re using Python 3.3 or later, `pyenv-virtualenv` will try to run `python -m venv` if it is available, instead of `virtualenv`. You can use `virtualenv` and `pyenv` together without `pyenv-virtualenv`, if you don"t want the convenience features.

• `virtualenvwrapper` is a set of extensions to `virtualenv` (see docs). It gives you commands like `mkvirtualenv`, `lssitepackages`, and especially `workon` for switching between different `virtualenv` directories. This tool is especially useful if you want multiple `virtualenv` directories.

• `pyenv-virtualenvwrapper` is a plugin for `pyenv` by the same author as `pyenv`, to conveniently integrate `virtualenvwrapper` into `pyenv`.

• `pipenv` aims to combine `Pipfile`, `pip` and `virtualenv` into one command on the command-line. The `virtualenv` directory typically gets placed in `~/.local/share/virtualenvs/XXX`, with `XXX` being a hash of the path of the project directory. This is different from `virtualenv`, where the directory is typically in the current working directory. `pipenv` is meant to be used when developing Python applications (as opposed to libraries). There are alternatives to `pipenv`, such as `poetry`, which I won"t list here since this question is only about the packages that are similarly named.

# Standard library:

• `pyvenv` (not to be confused with `pyenv` in the previous section) is a script shipped with Python 3 but deprecated in Python 3.6 as it had problems (not to mention the confusing name). In Python 3.6+, the exact equivalent is `python3 -m venv`.

• `venv` is a package shipped with Python 3, which you can run using `python3 -m venv` (although for some reason some distros separate it out into a separate distro package, such as `python3-venv` on Ubuntu/Debian). It serves the same purpose as `virtualenv`, but only has a subset of its features (see a comparison here). `virtualenv` continues to be more popular than `venv`, especially since the former supports both Python 2 and 3.

You have four main options for converting types in pandas:

1. `to_numeric()` - provides functionality to safely convert non-numeric types (e.g. strings) to a suitable numeric type. (See also `to_datetime()` and `to_timedelta()`.)

2. `astype()` - convert (almost) any type to (almost) any other type (even if it"s not necessarily sensible to do so). Also allows you to convert to categorial types (very useful).

3. `infer_objects()` - a utility method to convert object columns holding Python objects to a pandas type if possible.

4. `convert_dtypes()` - convert DataFrame columns to the "best possible" dtype that supports `pd.NA` (pandas" object to indicate a missing value).

Read on for more detailed explanations and usage of each of these methods.

# 1. `to_numeric()`

The best way to convert one or more columns of a DataFrame to numeric values is to use `pandas.to_numeric()`.

This function will try to change non-numeric objects (such as strings) into integers or floating point numbers as appropriate.

## Basic usage

The input to `to_numeric()` is a Series or a single column of a DataFrame.

``````>>> s = pd.Series(["8", 6, "7.5", 3, "0.9"]) # mixed string and numeric values
>>> s
0      8
1      6
2    7.5
3      3
4    0.9
dtype: object

>>> pd.to_numeric(s) # convert everything to float values
0    8.0
1    6.0
2    7.5
3    3.0
4    0.9
dtype: float64
``````

As you can see, a new Series is returned. Remember to assign this output to a variable or column name to continue using it:

``````# convert Series
my_series = pd.to_numeric(my_series)

# convert column "a" of a DataFrame
df["a"] = pd.to_numeric(df["a"])
``````

You can also use it to convert multiple columns of a DataFrame via the `apply()` method:

``````# convert all columns of DataFrame
df = df.apply(pd.to_numeric) # convert all columns of DataFrame

# convert just columns "a" and "b"
df[["a", "b"]] = df[["a", "b"]].apply(pd.to_numeric)
``````

As long as your values can all be converted, that"s probably all you need.

## Error handling

But what if some values can"t be converted to a numeric type?

`to_numeric()` also takes an `errors` keyword argument that allows you to force non-numeric values to be `NaN`, or simply ignore columns containing these values.

Here"s an example using a Series of strings `s` which has the object dtype:

``````>>> s = pd.Series(["1", "2", "4.7", "pandas", "10"])
>>> s
0         1
1         2
2       4.7
3    pandas
4        10
dtype: object
``````

The default behaviour is to raise if it can"t convert a value. In this case, it can"t cope with the string "pandas":

``````>>> pd.to_numeric(s) # or pd.to_numeric(s, errors="raise")
ValueError: Unable to parse string
``````

Rather than fail, we might want "pandas" to be considered a missing/bad numeric value. We can coerce invalid values to `NaN` as follows using the `errors` keyword argument:

``````>>> pd.to_numeric(s, errors="coerce")
0     1.0
1     2.0
2     4.7
3     NaN
4    10.0
dtype: float64
``````

The third option for `errors` is just to ignore the operation if an invalid value is encountered:

``````>>> pd.to_numeric(s, errors="ignore")
# the original Series is returned untouched
``````

This last option is particularly useful when you want to convert your entire DataFrame, but don"t not know which of our columns can be converted reliably to a numeric type. In that case just write:

``````df.apply(pd.to_numeric, errors="ignore")
``````

The function will be applied to each column of the DataFrame. Columns that can be converted to a numeric type will be converted, while columns that cannot (e.g. they contain non-digit strings or dates) will be left alone.

## Downcasting

By default, conversion with `to_numeric()` will give you either a `int64` or `float64` dtype (or whatever integer width is native to your platform).

That"s usually what you want, but what if you wanted to save some memory and use a more compact dtype, like `float32`, or `int8`?

`to_numeric()` gives you the option to downcast to either "integer", "signed", "unsigned", "float". Here"s an example for a simple series `s` of integer type:

``````>>> s = pd.Series([1, 2, -7])
>>> s
0    1
1    2
2   -7
dtype: int64
``````

Downcasting to "integer" uses the smallest possible integer that can hold the values:

``````>>> pd.to_numeric(s, downcast="integer")
0    1
1    2
2   -7
dtype: int8
``````

Downcasting to "float" similarly picks a smaller than normal floating type:

``````>>> pd.to_numeric(s, downcast="float")
0    1.0
1    2.0
2   -7.0
dtype: float32
``````

# 2. `astype()`

The `astype()` method enables you to be explicit about the dtype you want your DataFrame or Series to have. It"s very versatile in that you can try and go from one type to the any other.

## Basic usage

Just pick a type: you can use a NumPy dtype (e.g. `np.int16`), some Python types (e.g. bool), or pandas-specific types (like the categorical dtype).

Call the method on the object you want to convert and `astype()` will try and convert it for you:

``````# convert all DataFrame columns to the int64 dtype
df = df.astype(int)

# convert column "a" to int64 dtype and "b" to complex type
df = df.astype({"a": int, "b": complex})

# convert Series to float16 type
s = s.astype(np.float16)

# convert Series to Python strings
s = s.astype(str)

# convert Series to categorical type - see docs for more details
s = s.astype("category")
``````

Notice I said "try" - if `astype()` does not know how to convert a value in the Series or DataFrame, it will raise an error. For example if you have a `NaN` or `inf` value you"ll get an error trying to convert it to an integer.

As of pandas 0.20.0, this error can be suppressed by passing `errors="ignore"`. Your original object will be return untouched.

## Be careful

`astype()` is powerful, but it will sometimes convert values "incorrectly". For example:

``````>>> s = pd.Series([1, 2, -7])
>>> s
0    1
1    2
2   -7
dtype: int64
``````

These are small integers, so how about converting to an unsigned 8-bit type to save memory?

``````>>> s.astype(np.uint8)
0      1
1      2
2    249
dtype: uint8
``````

The conversion worked, but the -7 was wrapped round to become 249 (i.e. 28 - 7)!

Trying to downcast using `pd.to_numeric(s, downcast="unsigned")` instead could help prevent this error.

# 3. `infer_objects()`

Version 0.21.0 of pandas introduced the method `infer_objects()` for converting columns of a DataFrame that have an object datatype to a more specific type (soft conversions).

For example, here"s a DataFrame with two columns of object type. One holds actual integers and the other holds strings representing integers:

``````>>> df = pd.DataFrame({"a": [7, 1, 5], "b": ["3","2","1"]}, dtype="object")
>>> df.dtypes
a    object
b    object
dtype: object
``````

Using `infer_objects()`, you can change the type of column "a" to int64:

``````>>> df = df.infer_objects()
>>> df.dtypes
a     int64
b    object
dtype: object
``````

Column "b" has been left alone since its values were strings, not integers. If you wanted to try and force the conversion of both columns to an integer type, you could use `df.astype(int)` instead.

# 4. `convert_dtypes()`

Version 1.0 and above includes a method `convert_dtypes()` to convert Series and DataFrame columns to the best possible dtype that supports the `pd.NA` missing value.

Here "best possible" means the type most suited to hold the values. For example, this a pandas integer type if all of the values are integers (or missing values): an object column of Python integer objects is converted to `Int64`, a column of NumPy `int32` values will become the pandas dtype `Int32`.

With our `object` DataFrame `df`, we get the following result:

``````>>> df.convert_dtypes().dtypes
a     Int64
b    string
dtype: object
``````

Since column "a" held integer values, it was converted to the `Int64` type (which is capable of holding missing values, unlike `int64`).

Column "b" contained string objects, so was changed to pandas" `string` dtype.

By default, this method will infer the type from object values in each column. We can change this by passing `infer_objects=False`:

``````>>> df.convert_dtypes(infer_objects=False).dtypes
a    object
b    string
dtype: object
``````

Now column "a" remained an object column: pandas knows it can be described as an "integer" column (internally it ran `infer_dtype`) but didn"t infer exactly what dtype of integer it should have so did not convert it. Column "b" was again converted to "string" dtype as it was recognised as holding "string" values.

Since this question was asked in 2010, there has been real simplification in how to do simple multithreading with Python with map and pool.

The code below comes from an article/blog post that you should definitely check out (no affiliation) - Parallelism in one line: A Better Model for Day to Day Threading Tasks. I"ll summarize below - it ends up being just a few lines of code:

``````from multiprocessing.dummy import Pool as ThreadPool
results = pool.map(my_function, my_array)
``````

Which is the multithreaded version of:

``````results = []
for item in my_array:
results.append(my_function(item))
``````

Description

Map is a cool little function, and the key to easily injecting parallelism into your Python code. For those unfamiliar, map is something lifted from functional languages like Lisp. It is a function which maps another function over a sequence.

Map handles the iteration over the sequence for us, applies the function, and stores all of the results in a handy list at the end.

Implementation

Parallel versions of the map function are provided by two libraries:multiprocessing, and also its little known, but equally fantastic step child:multiprocessing.dummy.

`multiprocessing.dummy` is exactly the same as multiprocessing module, but uses threads instead (an important distinction - use multiple processes for CPU-intensive tasks; threads for (and during) I/O):

multiprocessing.dummy replicates the API of multiprocessing, but is no more than a wrapper around the threading module.

``````import urllib2
from multiprocessing.dummy import Pool as ThreadPool

urls = [
"http://www.python.org",
"http://www.onlamp.com/pub/a/python/2003/04/17/metaclasses.html",
"http://www.python.org/doc/",
"http://www.python.org/getit/",
"http://www.python.org/community/",
"https://wiki.python.org/moin/",
]

# Make the Pool of workers

# Open the URLs in their own threads
# and return the results
results = pool.map(urllib2.urlopen, urls)

# Close the pool and wait for the work to finish
pool.close()
pool.join()
``````

And the timing results:

``````Single thread:   14.4 seconds
4 Pool:   3.1 seconds
8 Pool:   1.4 seconds
13 Pool:   1.3 seconds
``````

Passing multiple arguments (works like this only in Python 3.3 and later):

To pass multiple arrays:

``````results = pool.starmap(function, zip(list_a, list_b))
``````

Or to pass a constant and an array:

``````results = pool.starmap(function, zip(itertools.repeat(constant), list_a))
``````

If you are using an earlier version of Python, you can pass multiple arguments via this workaround).

(Thanks to user136036 for the helpful comment.)

## How to iterate over rows in a DataFrame in Pandas?

Iteration in Pandas is an anti-pattern and is something you should only do when you have exhausted every other option. You should not use any function with "`iter`" in its name for more than a few thousand rows or you will have to get used to a lot of waiting.

Do you want to print a DataFrame? Use `DataFrame.to_string()`.

Do you want to compute something? In that case, search for methods in this order (list modified from here):

1. Vectorization
2. Cython routines
3. List Comprehensions (vanilla `for` loop)
4. `DataFrame.apply()`: i) ¬†Reductions that can be performed in Cython, ii) Iteration in Python space
5. `DataFrame.itertuples()` and `iteritems()`
6. `DataFrame.iterrows()`

`iterrows` and `itertuples` (both receiving many votes in answers to this question) should be used in very rare circumstances, such as generating row objects/nametuples for sequential processing, which is really the only thing these functions are useful for.

Appeal to Authority

The documentation page on iteration has a huge red warning box that says:

Iterating through pandas objects is generally slow. In many cases, iterating manually over the rows is not needed [...].

* It"s actually a little more complicated than "don"t". `df.iterrows()` is the correct answer to this question, but "vectorize your ops" is the better one. I will concede that there are circumstances where iteration cannot be avoided (for example, some operations where the result depends on the value computed for the previous row). However, it takes some familiarity with the library to know when. If you"re not sure whether you need an iterative solution, you probably don"t. PS: To know more about my rationale for writing this answer, skip to the very bottom.

## Faster than Looping: Vectorization, Cython

A good number of basic operations and computations are "vectorised" by pandas (either through NumPy, or through Cythonized functions). This includes arithmetic, comparisons, (most) reductions, reshaping (such as pivoting), joins, and groupby operations. Look through the documentation on Essential Basic Functionality to find a suitable vectorised method for your problem.

If none exists, feel free to write your own using custom Cython extensions.

## Next Best Thing: List Comprehensions*

List comprehensions should be your next port of call if 1) there is no vectorized solution available, 2) performance is important, but not important enough to go through the hassle of cythonizing your code, and 3) you"re trying to perform elementwise transformation on your code. There is a good amount of evidence to suggest that list comprehensions are sufficiently fast (and even sometimes faster) for many common Pandas tasks.

The formula is simple,

``````# Iterating over one column - `f` is some function that processes your data
result = [f(x) for x in df["col"]]
# Iterating over two columns, use `zip`
result = [f(x, y) for x, y in zip(df["col1"], df["col2"])]
# Iterating over multiple columns - same data type
result = [f(row[0], ..., row[n]) for row in df[["col1", ...,"coln"]].to_numpy()]
# Iterating over multiple columns - differing data type
result = [f(row[0], ..., row[n]) for row in zip(df["col1"], ..., df["coln"])]
``````

If you can encapsulate your business logic into a function, you can use a list comprehension that calls it. You can make arbitrarily complex things work through the simplicity and speed of raw Python code.

Caveats

List comprehensions assume that your data is easy to work with - what that means is your data types are consistent and you don"t have NaNs, but this cannot always be guaranteed.

1. The first one is more obvious, but when dealing with NaNs, prefer in-built pandas methods if they exist (because they have much better corner-case handling logic), or ensure your business logic includes appropriate NaN handling logic.
2. When dealing with mixed data types you should iterate over `zip(df["A"], df["B"], ...)` instead of `df[["A", "B"]].to_numpy()` as the latter implicitly upcasts data to the most common type. As an example if A is numeric and B is string, `to_numpy()` will cast the entire array to string, which may not be what you want. Fortunately `zip`ping your columns together is the most straightforward workaround to this.

*Your mileage may vary for the reasons outlined in the Caveats section above.

## An Obvious Example

Let"s demonstrate the difference with a simple example of adding two pandas columns `A + B`. This is a vectorizable operaton, so it will be easy to contrast the performance of the methods discussed above.

Benchmarking code, for your reference. The line at the bottom measures a function written in numpandas, a style of Pandas that mixes heavily with NumPy to squeeze out maximum performance. Writing numpandas code should be avoided unless you know what you"re doing. Stick to the API where you can (i.e., prefer `vec` over `vec_numpy`).

I should mention, however, that it isn"t always this cut and dry. Sometimes the answer to "what is the best method for an operation" is "it depends on your data". My advice is to test out different approaches on your data before settling on one.

* Pandas string methods are "vectorized" in the sense that they are specified on the series but operate on each element. The underlying mechanisms are still iterative, because string operations are inherently hard to vectorize.

## Why I Wrote this Answer

A common trend I notice from new users is to ask questions of the form "How can I iterate over my df to do X?". Showing code that calls `iterrows()` while doing something inside a `for` loop. Here is why. A new user to the library who has not been introduced to the concept of vectorization will likely envision the code that solves their problem as iterating over their data to do something. Not knowing how to iterate over a DataFrame, the first thing they do is Google it and end up here, at this question. They then see the accepted answer telling them how to, and they close their eyes and run this code without ever first questioning if iteration is not the right thing to do.

The aim of this answer is to help new users understand that iteration is not necessarily the solution to every problem, and that better, faster and more idiomatic solutions could exist, and that it is worth investing time in exploring them. I"m not trying to start a war of iteration vs. vectorization, but I want new users to be informed when developing solutions to their problems with this library.

# In Python, what is the purpose of `__slots__` and what are the cases one should avoid this?

## TLDR:

The special attribute `__slots__` allows you to explicitly state which instance attributes you expect your object instances to have, with the expected results:

1. faster attribute access.
2. space savings in memory.

The space savings is from

1. Storing value references in slots instead of `__dict__`.
2. Denying `__dict__` and `__weakref__` creation if parent classes deny them and you declare `__slots__`.

### Quick Caveats

Small caveat, you should only declare a particular slot one time in an inheritance tree. For example:

``````class Base:
__slots__ = "foo", "bar"

class Right(Base):
__slots__ = "baz",

class Wrong(Base):
__slots__ = "foo", "bar", "baz"        # redundant foo and bar
``````

Python doesn"t object when you get this wrong (it probably should), problems might not otherwise manifest, but your objects will take up more space than they otherwise should. Python 3.8:

``````>>> from sys import getsizeof
>>> getsizeof(Right()), getsizeof(Wrong())
(56, 72)
``````

This is because the Base"s slot descriptor has a slot separate from the Wrong"s. This shouldn"t usually come up, but it could:

``````>>> w = Wrong()
>>> w.foo = "foo"
>>> Base.foo.__get__(w)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
AttributeError: foo
>>> Wrong.foo.__get__(w)
"foo"
``````

The biggest caveat is for multiple inheritance - multiple "parent classes with nonempty slots" cannot be combined.

To accommodate this restriction, follow best practices: Factor out all but one or all parents" abstraction which their concrete class respectively and your new concrete class collectively will inherit from - giving the abstraction(s) empty slots (just like abstract base classes in the standard library).

See section on multiple inheritance below for an example.

### Requirements:

• To have attributes named in `__slots__` to actually be stored in slots instead of a `__dict__`, a class must inherit from `object` (automatic in Python 3, but must be explicit in Python 2).

• To prevent the creation of a `__dict__`, you must inherit from `object` and all classes in the inheritance must declare `__slots__` and none of them can have a `"__dict__"` entry.

There are a lot of details if you wish to keep reading.

## Why use `__slots__`: Faster attribute access.

The creator of Python, Guido van Rossum, states that he actually created `__slots__` for faster attribute access.

It is trivial to demonstrate measurably significant faster access:

``````import timeit

class Foo(object): __slots__ = "foo",

class Bar(object): pass

slotted = Foo()
not_slotted = Bar()

def get_set_delete_fn(obj):
def get_set_delete():
obj.foo = "foo"
obj.foo
del obj.foo
return get_set_delete
``````

and

``````>>> min(timeit.repeat(get_set_delete_fn(slotted)))
0.2846834529991611
>>> min(timeit.repeat(get_set_delete_fn(not_slotted)))
0.3664822799983085
``````

The slotted access is almost 30% faster in Python 3.5 on Ubuntu.

``````>>> 0.3664822799983085 / 0.2846834529991611
1.2873325658284342
``````

In Python 2 on Windows I have measured it about 15% faster.

## Why use `__slots__`: Memory Savings

Another purpose of `__slots__` is to reduce the space in memory that each object instance takes up.

The space saved over using `__dict__` can be significant.

SQLAlchemy attributes a lot of memory savings to `__slots__`.

To verify this, using the Anaconda distribution of Python 2.7 on Ubuntu Linux, with `guppy.hpy` (aka heapy) and `sys.getsizeof`, the size of a class instance without `__slots__` declared, and nothing else, is 64 bytes. That does not include the `__dict__`. Thank you Python for lazy evaluation again, the `__dict__` is apparently not called into existence until it is referenced, but classes without data are usually useless. When called into existence, the `__dict__` attribute is a minimum of 280 bytes additionally.

In contrast, a class instance with `__slots__` declared to be `()` (no data) is only 16 bytes, and 56 total bytes with one item in slots, 64 with two.

For 64 bit Python, I illustrate the memory consumption in bytes in Python 2.7 and 3.6, for `__slots__` and `__dict__` (no slots defined) for each point where the dict grows in 3.6 (except for 0, 1, and 2 attributes):

``````       Python 2.7             Python 3.6
attrs  __slots__  __dict__*   __slots__  __dict__* | *(no slots defined)
none   16         56 + 272‚Ä†   16         56 + 112‚Ä† | ‚Ä†if __dict__ referenced
one    48         56 + 272    48         56 + 112
two    56         56 + 272    56         56 + 112
six    88         56 + 1040   88         56 + 152
11     128        56 + 1040   128        56 + 240
22     216        56 + 3344   216        56 + 408
43     384        56 + 3344   384        56 + 752
``````

So, in spite of smaller dicts in Python 3, we see how nicely `__slots__` scale for instances to save us memory, and that is a major reason you would want to use `__slots__`.

Just for completeness of my notes, note that there is a one-time cost per slot in the class"s namespace of 64 bytes in Python 2, and 72 bytes in Python 3, because slots use data descriptors like properties, called "members".

``````>>> Foo.foo
<member "foo" of "Foo" objects>
>>> type(Foo.foo)
<class "member_descriptor">
>>> getsizeof(Foo.foo)
72
``````

## Demonstration of `__slots__`:

To deny the creation of a `__dict__`, you must subclass `object`. Everything subclasses `object` in Python 3, but in Python 2 you had to be explicit:

``````class Base(object):
__slots__ = ()
``````

now:

``````>>> b = Base()
>>> b.a = "a"
Traceback (most recent call last):
File "<pyshell#38>", line 1, in <module>
b.a = "a"
AttributeError: "Base" object has no attribute "a"
``````

Or subclass another class that defines `__slots__`

``````class Child(Base):
__slots__ = ("a",)
``````

and now:

``````c = Child()
c.a = "a"
``````

but:

``````>>> c.b = "b"
Traceback (most recent call last):
File "<pyshell#42>", line 1, in <module>
c.b = "b"
AttributeError: "Child" object has no attribute "b"
``````

To allow `__dict__` creation while subclassing slotted objects, just add `"__dict__"` to the `__slots__` (note that slots are ordered, and you shouldn"t repeat slots that are already in parent classes):

``````class SlottedWithDict(Child):
__slots__ = ("__dict__", "b")

swd = SlottedWithDict()
swd.a = "a"
swd.b = "b"
swd.c = "c"
``````

and

``````>>> swd.__dict__
{"c": "c"}
``````

Or you don"t even need to declare `__slots__` in your subclass, and you will still use slots from the parents, but not restrict the creation of a `__dict__`:

``````class NoSlots(Child): pass
ns = NoSlots()
ns.a = "a"
ns.b = "b"
``````

And:

``````>>> ns.__dict__
{"b": "b"}
``````

However, `__slots__` may cause problems for multiple inheritance:

``````class BaseA(object):
__slots__ = ("a",)

class BaseB(object):
__slots__ = ("b",)
``````

Because creating a child class from parents with both non-empty slots fails:

``````>>> class Child(BaseA, BaseB): __slots__ = ()
Traceback (most recent call last):
File "<pyshell#68>", line 1, in <module>
class Child(BaseA, BaseB): __slots__ = ()
TypeError: Error when calling the metaclass bases
multiple bases have instance lay-out conflict
``````

If you run into this problem, You could just remove `__slots__` from the parents, or if you have control of the parents, give them empty slots, or refactor to abstractions:

``````from abc import ABC

class AbstractA(ABC):
__slots__ = ()

class BaseA(AbstractA):
__slots__ = ("a",)

class AbstractB(ABC):
__slots__ = ()

class BaseB(AbstractB):
__slots__ = ("b",)

class Child(AbstractA, AbstractB):
__slots__ = ("a", "b")

c = Child() # no problem!
``````

### Add `"__dict__"` to `__slots__` to get dynamic assignment:

``````class Foo(object):
__slots__ = "bar", "baz", "__dict__"
``````

and now:

``````>>> foo = Foo()
>>> foo.boink = "boink"
``````

So with `"__dict__"` in slots we lose some of the size benefits with the upside of having dynamic assignment and still having slots for the names we do expect.

When you inherit from an object that isn"t slotted, you get the same sort of semantics when you use `__slots__` - names that are in `__slots__` point to slotted values, while any other values are put in the instance"s `__dict__`.

Avoiding `__slots__` because you want to be able to add attributes on the fly is actually not a good reason - just add `"__dict__"` to your `__slots__` if this is required.

You can similarly add `__weakref__` to `__slots__` explicitly if you need that feature.

### Set to empty tuple when subclassing a namedtuple:

The namedtuple builtin make immutable instances that are very lightweight (essentially, the size of tuples) but to get the benefits, you need to do it yourself if you subclass them:

``````from collections import namedtuple
class MyNT(namedtuple("MyNT", "bar baz")):
"""MyNT is an immutable and lightweight object"""
__slots__ = ()
``````

usage:

``````>>> nt = MyNT("bar", "baz")
>>> nt.bar
"bar"
>>> nt.baz
"baz"
``````

And trying to assign an unexpected attribute raises an `AttributeError` because we have prevented the creation of `__dict__`:

``````>>> nt.quux = "quux"
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
AttributeError: "MyNT" object has no attribute "quux"
``````

You can allow `__dict__` creation by leaving off `__slots__ = ()`, but you can"t use non-empty `__slots__` with subtypes of tuple.

## Biggest Caveat: Multiple inheritance

Even when non-empty slots are the same for multiple parents, they cannot be used together:

``````class Foo(object):
__slots__ = "foo", "bar"
class Bar(object):
__slots__ = "foo", "bar" # alas, would work if empty, i.e. ()

>>> class Baz(Foo, Bar): pass
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: Error when calling the metaclass bases
multiple bases have instance lay-out conflict
``````

Using an empty `__slots__` in the parent seems to provide the most flexibility, allowing the child to choose to prevent or allow (by adding `"__dict__"` to get dynamic assignment, see section above) the creation of a `__dict__`:

``````class Foo(object): __slots__ = ()
class Bar(object): __slots__ = ()
class Baz(Foo, Bar): __slots__ = ("foo", "bar")
b = Baz()
b.foo, b.bar = "foo", "bar"
``````

You don"t have to have slots - so if you add them, and remove them later, it shouldn"t cause any problems.

Going out on a limb here: If you"re composing mixins or using abstract base classes, which aren"t intended to be instantiated, an empty `__slots__` in those parents seems to be the best way to go in terms of flexibility for subclassers.

To demonstrate, first, let"s create a class with code we"d like to use under multiple inheritance

``````class AbstractBase:
__slots__ = ()
def __init__(self, a, b):
self.a = a
self.b = b
def __repr__(self):
return f"{type(self).__name__}({repr(self.a)}, {repr(self.b)})"
``````

We could use the above directly by inheriting and declaring the expected slots:

``````class Foo(AbstractBase):
__slots__ = "a", "b"
``````

But we don"t care about that, that"s trivial single inheritance, we need another class we might also inherit from, maybe with a noisy attribute:

``````class AbstractBaseC:
__slots__ = ()
@property
def c(self):
print("getting c!")
return self._c
@c.setter
def c(self, arg):
print("setting c!")
self._c = arg
``````

Now if both bases had nonempty slots, we couldn"t do the below. (In fact, if we wanted, we could have given `AbstractBase` nonempty slots a and b, and left them out of the below declaration - leaving them in would be wrong):

``````class Concretion(AbstractBase, AbstractBaseC):
__slots__ = "a b _c".split()
``````

And now we have functionality from both via multiple inheritance, and can still deny `__dict__` and `__weakref__` instantiation:

``````>>> c = Concretion("a", "b")
>>> c.c = c
setting c!
>>> c.c
getting c!
Concretion("a", "b")
>>> c.d = "d"
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
AttributeError: "Concretion" object has no attribute "d"
``````

## Other cases to avoid slots:

• Avoid them when you want to perform `__class__` assignment with another class that doesn"t have them (and you can"t add them) unless the slot layouts are identical. (I am very interested in learning who is doing this and why.)
• Avoid them if you want to subclass variable length builtins like long, tuple, or str, and you want to add attributes to them.
• Avoid them if you insist on providing default values via class attributes for instance variables.

You may be able to tease out further caveats from the rest of the `__slots__` documentation (the 3.7 dev docs are the most current), which I have made significant recent contributions to.

The current top answers cite outdated information and are quite hand-wavy and miss the mark in some important ways.

### Do not "only use `__slots__` when instantiating lots of objects"

I quote:

"You would want to use `__slots__` if you are going to instantiate a lot (hundreds, thousands) of objects of the same class."

Abstract Base Classes, for example, from the `collections` module, are not instantiated, yet `__slots__` are declared for them.

Why?

If a user wishes to deny `__dict__` or `__weakref__` creation, those things must not be available in the parent classes.

`__slots__` contributes to reusability when creating interfaces or mixins.

It is true that many Python users aren"t writing for reusability, but when you are, having the option to deny unnecessary space usage is valuable.

### `__slots__` doesn"t break pickling

When pickling a slotted object, you may find it complains with a misleading `TypeError`:

``````>>> pickle.loads(pickle.dumps(f))
TypeError: a class that defines __slots__ without defining __getstate__ cannot be pickled
``````

This is actually incorrect. This message comes from the oldest protocol, which is the default. You can select the latest protocol with the `-1` argument. In Python 2.7 this would be `2` (which was introduced in 2.3), and in 3.6 it is `4`.

``````>>> pickle.loads(pickle.dumps(f, -1))
<__main__.Foo object at 0x1129C770>
``````

in Python 2.7:

``````>>> pickle.loads(pickle.dumps(f, 2))
<__main__.Foo object at 0x1129C770>
``````

in Python 3.6

``````>>> pickle.loads(pickle.dumps(f, 4))
<__main__.Foo object at 0x1129C770>
``````

So I would keep this in mind, as it is a solved problem.

## Critique of the (until Oct 2, 2016) accepted answer

The first paragraph is half short explanation, half predictive. Here"s the only part that actually answers the question

The proper use of `__slots__` is to save space in objects. Instead of having a dynamic dict that allows adding attributes to objects at anytime, there is a static structure which does not allow additions after creation. This saves the overhead of one dict for every object that uses slots

The second half is wishful thinking, and off the mark:

While this is sometimes a useful optimization, it would be completely unnecessary if the Python interpreter was dynamic enough so that it would only require the dict when there actually were additions to the object.

Python actually does something similar to this, only creating the `__dict__` when it is accessed, but creating lots of objects with no data is fairly ridiculous.

The second paragraph oversimplifies and misses actual reasons to avoid `__slots__`. The below is not a real reason to avoid slots (for actual reasons, see the rest of my answer above.):

They change the behavior of the objects that have slots in a way that can be abused by control freaks and static typing weenies.

It then goes on to discuss other ways of accomplishing that perverse goal with Python, not discussing anything to do with `__slots__`.

The third paragraph is more wishful thinking. Together it is mostly off-the-mark content that the answerer didn"t even author and contributes to ammunition for critics of the site.

# Memory usage evidence

Create some normal objects and slotted objects:

``````>>> class Foo(object): pass
>>> class Bar(object): __slots__ = ()
``````

Instantiate a million of them:

``````>>> foos = [Foo() for f in xrange(1000000)]
>>> bars = [Bar() for b in xrange(1000000)]
``````

Inspect with `guppy.hpy().heap()`:

``````>>> guppy.hpy().heap()
Partition of a set of 2028259 objects. Total size = 99763360 bytes.
Index  Count   %     Size   % Cumulative  % Kind (class / dict of class)
0 1000000  49 64000000  64  64000000  64 __main__.Foo
1     169   0 16281480  16  80281480  80 list
2 1000000  49 16000000  16  96281480  97 __main__.Bar
3   12284   1   987472   1  97268952  97 str
...
``````

Access the regular objects and their `__dict__` and inspect again:

``````>>> for f in foos:
...     f.__dict__
>>> guppy.hpy().heap()
Partition of a set of 3028258 objects. Total size = 379763480 bytes.
Index  Count   %      Size    % Cumulative  % Kind (class / dict of class)
0 1000000  33 280000000  74 280000000  74 dict of __main__.Foo
1 1000000  33  64000000  17 344000000  91 __main__.Foo
2     169   0  16281480   4 360281480  95 list
3 1000000  33  16000000   4 376281480  99 __main__.Bar
4   12284   0    987472   0 377268952  99 str
...
``````

This is consistent with the history of Python, from Unifying types and classes in Python 2.2

If you subclass a built-in type, extra space is automatically added to the instances to accomodate `__dict__` and `__weakrefs__`. (The `__dict__` is not initialized until you use it though, so you shouldn"t worry about the space occupied by an empty dictionary for each instance you create.) If you don"t need this extra space, you can add the phrase "`__slots__ = []`" to your class.

# `os.listdir()` - list in the current directory

With listdir in os module you get the files and the folders in the current dir

`````` import os
arr = os.listdir()
print(arr)

>>> ["\$RECYCLE.BIN", "work.txt", "3ebooks.txt", "documents"]
``````

## Looking in a directory

``````arr = os.listdir("c:\files")
``````

# `glob` from glob

with glob you can specify a type of file to list like this

``````import glob

txtfiles = []
for file in glob.glob("*.txt"):
txtfiles.append(file)
``````

## `glob` in a list comprehension

``````mylist = [f for f in glob.glob("*.txt")]
``````

## get the full path of only files in the current directory

``````import os
from os import listdir
from os.path import isfile, join

cwd = os.getcwd()
onlyfiles = [os.path.join(cwd, f) for f in os.listdir(cwd) if
os.path.isfile(os.path.join(cwd, f))]
print(onlyfiles)

["G:\getfilesname\getfilesname.py", "G:\getfilesname\example.txt"]
``````

## Getting the full path name with `os.path.abspath`

You get the full path in return

`````` import os
files_path = [os.path.abspath(x) for x in os.listdir()]
print(files_path)

["F:\documentiapplications.txt", "F:\documenticollections.txt"]
``````

## Walk: going through sub directories

os.walk returns the root, the directories list and the files list, that is why I unpacked them in r, d, f in the for loop; it, then, looks for other files and directories in the subfolders of the root and so on until there are no subfolders.

``````import os

# Getting the current work directory (cwd)
thisdir = os.getcwd()

# r=root, d=directories, f = files
for r, d, f in os.walk(thisdir):
for file in f:
if file.endswith(".docx"):
print(os.path.join(r, file))
``````

### `os.listdir()`: get files in the current directory (Python 2)

In Python 2, if you want the list of the files in the current directory, you have to give the argument as "." or os.getcwd() in the os.listdir method.

`````` import os
arr = os.listdir(".")
print(arr)

>>> ["\$RECYCLE.BIN", "work.txt", "3ebooks.txt", "documents"]
``````

### To go up in the directory tree

``````# Method 1
x = os.listdir("..")

# Method 2
x= os.listdir("/")
``````

### Get files: `os.listdir()` in a particular directory (Python 2 and 3)

`````` import os
arr = os.listdir("F:\python")
print(arr)

>>> ["\$RECYCLE.BIN", "work.txt", "3ebooks.txt", "documents"]
``````

### Get files of a particular subdirectory with `os.listdir()`

``````import os

x = os.listdir("./content")
``````

### `os.walk(".")` - current directory

`````` import os
arr = next(os.walk("."))[2]
print(arr)

>>> ["5bs_Turismo1.pdf", "5bs_Turismo1.pptx", "esperienza.txt"]
``````

### `next(os.walk("."))` and `os.path.join("dir", "file")`

`````` import os
arr = []
for d,r,f in next(os.walk("F:\_python")):
for file in f:
arr.append(os.path.join(r,file))

for f in arr:
print(files)

>>> F:\_python\dict_class.py
>>> F:\_python\programmi.txt
``````

### `next(os.walk("F:\")` - get the full path - list comprehension

`````` [os.path.join(r,file) for r,d,f in next(os.walk("F:\_python")) for file in f]

>>> ["F:\_python\dict_class.py", "F:\_python\programmi.txt"]
``````

### `os.walk` - get full path - all files in sub dirs**

``````x = [os.path.join(r,file) for r,d,f in os.walk("F:\_python") for file in f]
print(x)

``````

### `os.listdir()` - get only txt files

`````` arr_txt = [x for x in os.listdir() if x.endswith(".txt")]
print(arr_txt)

>>> ["work.txt", "3ebooks.txt"]
``````

## Using `glob` to get the full path of the files

If I should need the absolute path of the files:

``````from path import path
from glob import glob
x = [path(f).abspath() for f in glob("F:\*.txt")]
for f in x:
print(f)

>>> F:acquistionline.txt
>>> F:acquisti_2018.txt
>>> F:ootstrap_jquery_ecc.txt
``````

## Using `os.path.isfile` to avoid directories in the list

``````import os.path
listOfFiles = [f for f in os.listdir() if os.path.isfile(f)]
print(listOfFiles)

>>> ["a simple game.py", "data.txt", "decorator.py"]
``````

## Using `pathlib` from Python 3.4

``````import pathlib

flist = []
for p in pathlib.Path(".").iterdir():
if p.is_file():
print(p)
flist.append(p)

>>> error.PNG
>>> exemaker.bat
>>> guiprova.mp3
>>> setup.py
>>> speak_gui2.py
>>> thumb.PNG
``````

With `list comprehension`:

``````flist = [p for p in pathlib.Path(".").iterdir() if p.is_file()]
``````

Alternatively, use `pathlib.Path()` instead of `pathlib.Path(".")`

## Use glob method in pathlib.Path()

``````import pathlib

py = pathlib.Path().glob("*.py")
for file in py:
print(file)

>>> stack_overflow_list.py
>>> stack_overflow_list_tkinter.py
``````

## Get all and only files with os.walk

``````import os
x = [i[2] for i in os.walk(".")]
y=[]
for t in x:
for f in t:
y.append(f)
print(y)

>>> ["append_to_list.py", "data.txt", "data1.txt", "data2.txt", "data_180617", "os_walk.py", "READ2.py", "read_data.py", "somma_defaltdic.py", "substitute_words.py", "sum_data.py", "data.txt", "data1.txt", "data_180617"]
``````

## Get only files with next and walk in a directory

`````` import os
x = next(os.walk("F://python"))[2]
print(x)

>>> ["calculator.bat","calculator.py"]
``````

## Get only directories with next and walk in a directory

`````` import os
next(os.walk("F://python"))[1] # for the current dir use (".")

>>> ["python3","others"]
``````

## Get all the subdir names with `walk`

``````for r,d,f in os.walk("F:\_python"):
for dirs in d:
print(dirs)

>>> .vscode
>>> pyexcel
>>> pyschool.py
>>> subtitles
>>> _metaprogramming
>>> .ipynb_checkpoints
``````

## `os.scandir()` from Python 3.5 and greater

``````import os
x = [f.name for f in os.scandir() if f.is_file()]
print(x)

>>> ["calculator.bat","calculator.py"]

# Another example with scandir (a little variation from docs.python.org)
# This one is more efficient than os.listdir.
# In this case, it shows the files only in the current directory
# where the script is executed.

import os
with os.scandir() as i:
for entry in i:
if entry.is_file():
print(entry.name)

>>> ebookmaker.py
>>> error.PNG
>>> exemaker.bat
>>> guiprova.mp3
>>> setup.py
>>> speakgui4.py
>>> speak_gui2.py
>>> speak_gui3.py
>>> thumb.PNG
``````

# Examples:

## Ex. 1: How many files are there in the subdirectories?

In this example, we look for the number of files that are included in all the directory and its subdirectories.

``````import os

def count(dir, counter=0):
"returns number of files in dir and subdirs"
for pack in os.walk(dir):
for f in pack[2]:
counter += 1
return dir + " : " + str(counter) + "files"

print(count("F:\python"))

>>> "F:\python" : 12057 files"
``````

## Ex.2: How to copy all files from a directory to another?

A script to make order in your computer finding all files of a type (default: pptx) and copying them in a new folder.

``````import os
import shutil
from path import path

destination = "F:\file_copied"
# os.makedirs(destination)

def copyfile(dir, filetype="pptx", counter=0):
"Searches for pptx (or other - pptx is the default) files and copies them"
for pack in os.walk(dir):
for f in pack[2]:
if f.endswith(filetype):
fullpath = pack[0] + "\" + f
print(fullpath)
shutil.copy(fullpath, destination)
counter += 1
if counter > 0:
print("-" * 30)
print("	==> Found in: `" + dir + "` : " + str(counter) + " files
")

for dir in os.listdir():
"searches for folders that starts with `_`"
if dir[0] == "_":
# copyfile(dir, filetype="pdf")
copyfile(dir, filetype="txt")

>>> _compiti18Compito Contabilit√† 1conti.txt
>>> _compiti18Compito Contabilit√† 1modula4.txt
>>> _compiti18Compito Contabilit√† 1moduloa4.txt
>>> ------------------------
>>> ==> Found in: `_compiti18` : 3 files
``````

## Ex. 3: How to get all the files in a txt file

In case you want to create a txt file with all the file names:

``````import os
mylist = ""
with open("filelist.txt", "w", encoding="utf-8") as file:
for eachfile in os.listdir():
mylist += eachfile + "
"
file.write(mylist)
``````

## Example: txt with all the files of an hard drive

``````"""
We are going to save a txt file with all the files in your directory.
We will use the function walk()
"""

import os

# see all the methods of os
# print(*dir(os), sep=", ")
listafile = []
percorso = []
with open("lista_file.txt", "w", encoding="utf-8") as testo:
for root, dirs, files in os.walk("D:\"):
for file in files:
listafile.append(file)
percorso.append(root + "\" + file)
testo.write(file + "
")
listafile.sort()
print("N. of files", len(listafile))
with open("lista_file_ordinata.txt", "w", encoding="utf-8") as testo_ordinato:
for file in listafile:
testo_ordinato.write(file + "
")

with open("percorso.txt", "w", encoding="utf-8") as file_percorso:
for file in percorso:
file_percorso.write(file + "
")

os.system("lista_file.txt")
os.system("lista_file_ordinata.txt")
os.system("percorso.txt")
``````

## All the file of C: in one text file

This is a shorter version of the previous code. Change the folder where to start finding the files if you need to start from another position. This code generate a 50 mb on text file on my computer with something less then 500.000 lines with files with the complete path.

``````import os

with open("file.txt", "w", encoding="utf-8") as filewrite:
for r, d, f in os.walk("C:\"):
for file in f:
filewrite.write(f"{r + file}
")
``````

## How to write a file with all paths in a folder of a type

With this function you can create a txt file that will have the name of a type of file that you look for (ex. pngfile.txt) with all the full path of all the files of that type. It can be useful sometimes, I think.

``````import os

def searchfiles(extension=".ttf", folder="H:\"):
"Create a txt file with all the file of a type"
with open(extension[1:] + "file.txt", "w", encoding="utf-8") as filewrite:
for r, d, f in os.walk(folder):
for file in f:
if file.endswith(extension):
filewrite.write(f"{r + file}
")

# looking for png file (fonts) in the hard disk H:
searchfiles(".png", "H:\")

>>> H:4bs_18Dolphins5.png
>>> H:4bs_18Dolphins6.png
>>> H:4bs_18Dolphins7.png
>>> H:5_18marketing htmlassetsimageslogo2.png
>>> H:7z001.png
>>> H:7z002.png
``````

## (New) Find all files and open them with tkinter GUI

I just wanted to add in this 2019 a little app to search for all files in a dir and be able to open them by doubleclicking on the name of the file in the list.

``````import tkinter as tk
import os

def searchfiles(extension=".txt", folder="H:\"):
"insert all files in the listbox"
for r, d, f in os.walk(folder):
for file in f:
if file.endswith(extension):
lb.insert(0, r + "\" + file)

def open_file():
os.startfile(lb.get(lb.curselection()[0]))

root = tk.Tk()
root.geometry("400x400")
bt = tk.Button(root, text="Search", command=lambda:searchfiles(".png", "H:\"))
bt.pack()
lb = tk.Listbox(root)
lb.pack(fill="both", expand=1)
lb.bind("<Double-Button>", lambda x: open_file())
root.mainloop()
``````

This is the behaviour to adopt when the referenced object is deleted. It is not specific to Django; this is an SQL standard. Although Django has its own implementation on top of SQL. (1)

There are seven possible actions to take when such event occurs:

• `CASCADE`: When the referenced object is deleted, also delete the objects that have references to it (when you remove a blog post for instance, you might want to delete comments as well). SQL equivalent: `CASCADE`.
• `PROTECT`: Forbid the deletion of the referenced object. To delete it you will have to delete all objects that reference it manually. SQL equivalent: `RESTRICT`.
• `RESTRICT`: (introduced in Django 3.1) Similar behavior as `PROTECT` that matches SQL"s `RESTRICT` more accurately. (See django documentation example)
• `SET_NULL`: Set the reference to NULL (requires the field to be nullable). For instance, when you delete a User, you might want to keep the comments he posted on blog posts, but say it was posted by an anonymous (or deleted) user. SQL equivalent: `SET NULL`.
• `SET_DEFAULT`: Set the default value. SQL equivalent: `SET DEFAULT`.
• `SET(...)`: Set a given value. This one is not part of the SQL standard and is entirely handled by Django.
• `DO_NOTHING`: Probably a very bad idea since this would create integrity issues in your database (referencing an object that actually doesn"t exist). SQL equivalent: `NO ACTION`. (2)

Source: Django documentation

In most cases, `CASCADE` is the expected behaviour, but for every ForeignKey, you should always ask yourself what is the expected behaviour in this situation. `PROTECT` and `SET_NULL` are often useful. Setting `CASCADE` where it should not, can potentially delete all of your database in cascade, by simply deleting a single user.

It"s funny to notice that the direction of the `CASCADE` action is not clear to many people. Actually, it"s funny to notice that only the `CASCADE` action is not clear. I understand the cascade behavior might be confusing, however you must think that it is the same direction as any other action. Thus, if you feel that `CASCADE` direction is not clear to you, it actually means that `on_delete` behavior is not clear to you.

In your database, a foreign key is basically represented by an integer field which value is the primary key of the foreign object. Let"s say you have an entry comment_A, which has a foreign key to an entry article_B. If you delete the entry comment_A, everything is fine. article_B used to live without comment_A and don"t bother if it"s deleted. However, if you delete article_B, then comment_A panics! It never lived without article_B and needs it, and it"s part of its attributes (`article=article_B`, but what is article_B???). This is where `on_delete` steps in, to determine how to resolve this integrity error, either by saying:

• "No! Please! Don"t! I can"t live without you!" (which is said `PROTECT` or `RESTRICT` in Django/SQL)
• "All right, if I"m not yours, then I"m nobody"s" (which is said `SET_NULL`)
• "Good bye world, I can"t live without article_B" and commit suicide (this is the `CASCADE` behavior).
• "It"s OK, I"ve got spare lover, and I"ll reference article_C from now" (`SET_DEFAULT`, or even `SET(...)`).
• "I can"t face reality, and I"ll keep calling your name even if that"s the only thing left to me!" (`DO_NOTHING`)

I hope it makes cascade direction clearer. :)

Footnotes

(1) Django has its own implementation on top of SQL. And, as mentioned by @JoeMjr2 in the comments below, Django will not create the SQL constraints. If you want the constraints to be ensured by your database (for instance, if your database is used by another application, or if you hang in the database console from time to time), you might want to set the related constraints manually yourself. There is an open ticket to add support for database-level on delete constrains in Django.

(2) Actually, there is one case where `DO_NOTHING` can be useful: If you want to skip Django"s implementation and implement the constraint yourself at the database-level.

## Label vs. Location

The main distinction between the two methods is:

• `loc` gets rows (and/or columns) with particular labels.

• `iloc` gets rows (and/or columns) at integer locations.

To demonstrate, consider a series `s` of characters with a non-monotonic integer index:

``````>>> s = pd.Series(list("abcdef"), index=[49, 48, 47, 0, 1, 2])
49    a
48    b
47    c
0     d
1     e
2     f

>>> s.loc[0]    # value at index label 0
"d"

>>> s.iloc[0]   # value at index location 0
"a"

>>> s.loc[0:1]  # rows at index labels between 0 and 1 (inclusive)
0    d
1    e

>>> s.iloc[0:1] # rows at index location between 0 and 1 (exclusive)
49    a
``````

Here are some of the differences/similarities between `s.loc` and `s.iloc` when passed various objects:

<object> description `s.loc[<object>]` `s.iloc[<object>]`
`0` single item Value at index label `0` (the string `"d"`) Value at index location 0 (the string `"a"`)
`0:1` slice Two rows (labels `0` and `1`) One row (first row at location 0)
`1:47` slice with out-of-bounds end Zero rows (empty Series) Five rows (location 1 onwards)
`1:47:-1` slice with negative step three rows (labels `1` back to `47`) Zero rows (empty Series)
`[2, 0]` integer list Two rows with given labels Two rows with given locations
`s > "e"` Bool series (indicating which values have the property) One row (containing `"f"`) `NotImplementedError`
`(s>"e").values` Bool array One row (containing `"f"`) Same as `loc`
`999` int object not in index `KeyError` `IndexError` (out of bounds)
`-1` int object not in index `KeyError` Returns last value in `s`
`lambda x: x.index[3]` callable applied to series (here returning 3rd item in index) `s.loc[s.index[3]]` `s.iloc[s.index[3]]`

`loc`"s label-querying capabilities extend well-beyond integer indexes and it"s worth highlighting a couple of additional examples.

Here"s a Series where the index contains string objects:

``````>>> s2 = pd.Series(s.index, index=s.values)
>>> s2
a    49
b    48
c    47
d     0
e     1
f     2
``````

Since `loc` is label-based, it can fetch the first value in the Series using `s2.loc["a"]`. It can also slice with non-integer objects:

``````>>> s2.loc["c":"e"]  # all rows lying between "c" and "e" (inclusive)
c    47
d     0
e     1
``````

For DateTime indexes, we don"t need to pass the exact date/time to fetch by label. For example:

``````>>> s3 = pd.Series(list("abcde"), pd.date_range("now", periods=5, freq="M"))
>>> s3
2021-01-31 16:41:31.879768    a
2021-02-28 16:41:31.879768    b
2021-03-31 16:41:31.879768    c
2021-04-30 16:41:31.879768    d
2021-05-31 16:41:31.879768    e
``````

Then to fetch the row(s) for March/April 2021 we only need:

``````>>> s3.loc["2021-03":"2021-04"]
2021-03-31 17:04:30.742316    c
2021-04-30 17:04:30.742316    d
``````

## Rows and Columns

`loc` and `iloc` work the same way with DataFrames as they do with Series. It"s useful to note that both methods can address columns and rows together.

When given a tuple, the first element is used to index the rows and, if it exists, the second element is used to index the columns.

Consider the DataFrame defined below:

``````>>> import numpy as np
>>> df = pd.DataFrame(np.arange(25).reshape(5, 5),
index=list("abcde"),
columns=["x","y","z", 8, 9])
>>> df
x   y   z   8   9
a   0   1   2   3   4
b   5   6   7   8   9
c  10  11  12  13  14
d  15  16  17  18  19
e  20  21  22  23  24
``````

Then for example:

``````>>> df.loc["c": , :"z"]  # rows "c" and onwards AND columns up to "z"
x   y   z
c  10  11  12
d  15  16  17
e  20  21  22

>>> df.iloc[:, 3]        # all rows, but only the column at index location 3
a     3
b     8
c    13
d    18
e    23
``````

Sometimes we want to mix label and positional indexing methods for the rows and columns, somehow combining the capabilities of `loc` and `iloc`.

For example, consider the following DataFrame. How best to slice the rows up to and including "c" and take the first four columns?

``````>>> import numpy as np
>>> df = pd.DataFrame(np.arange(25).reshape(5, 5),
index=list("abcde"),
columns=["x","y","z", 8, 9])
>>> df
x   y   z   8   9
a   0   1   2   3   4
b   5   6   7   8   9
c  10  11  12  13  14
d  15  16  17  18  19
e  20  21  22  23  24
``````

We can achieve this result using `iloc` and the help of another method:

``````>>> df.iloc[:df.index.get_loc("c") + 1, :4]
x   y   z   8
a   0   1   2   3
b   5   6   7   8
c  10  11  12  13
``````

`get_loc()` is an index method meaning "get the position of the label in this index". Note that since slicing with `iloc` is exclusive of its endpoint, we must add 1 to this value if we want row "c" as well.

The simplest way to get row counts per group is by calling `.size()`, which returns a `Series`:

``````df.groupby(["col1","col2"]).size()
``````

Usually you want this result as a `DataFrame` (instead of a `Series`) so you can do:

``````df.groupby(["col1", "col2"]).size().reset_index(name="counts")
``````

If you want to find out how to calculate the row counts and other statistics for each group continue reading below.

## Detailed example:

Consider the following example dataframe:

``````In [2]: df
Out[2]:
col1 col2  col3  col4  col5  col6
0    A    B  0.20 -0.61 -0.49  1.49
1    A    B -1.53 -1.01 -0.39  1.82
2    A    B -0.44  0.27  0.72  0.11
3    A    B  0.28 -1.32  0.38  0.18
4    C    D  0.12  0.59  0.81  0.66
5    C    D -0.13 -1.65 -1.64  0.50
6    C    D -1.42 -0.11 -0.18 -0.44
7    E    F -0.00  1.42 -0.26  1.17
8    E    F  0.91 -0.47  1.35 -0.34
9    G    H  1.48 -0.63 -1.14  0.17
``````

First let"s use `.size()` to get the row counts:

``````In [3]: df.groupby(["col1", "col2"]).size()
Out[3]:
col1  col2
A     B       4
C     D       3
E     F       2
G     H       1
dtype: int64
``````

Then let"s use `.size().reset_index(name="counts")` to get the row counts:

``````In [4]: df.groupby(["col1", "col2"]).size().reset_index(name="counts")
Out[4]:
col1 col2  counts
0    A    B       4
1    C    D       3
2    E    F       2
3    G    H       1
``````

### Including results for more statistics

When you want to calculate statistics on grouped data, it usually looks like this:

``````In [5]: (df
...: .groupby(["col1", "col2"])
...: .agg({
...:     "col3": ["mean", "count"],
...:     "col4": ["median", "min", "count"]
...: }))
Out[5]:
col4                  col3
median   min count      mean count
col1 col2
A    B    -0.810 -1.32     4 -0.372500     4
C    D    -0.110 -1.65     3 -0.476667     3
E    F     0.475 -0.47     2  0.455000     2
G    H    -0.630 -0.63     1  1.480000     1
``````

The result above is a little annoying to deal with because of the nested column labels, and also because row counts are on a per column basis.

To gain more control over the output I usually split the statistics into individual aggregations that I then combine using `join`. It looks like this:

``````In [6]: gb = df.groupby(["col1", "col2"])
...: counts = gb.size().to_frame(name="counts")
...: (counts
...:  .join(gb.agg({"col3": "mean"}).rename(columns={"col3": "col3_mean"}))
...:  .join(gb.agg({"col4": "median"}).rename(columns={"col4": "col4_median"}))
...:  .join(gb.agg({"col4": "min"}).rename(columns={"col4": "col4_min"}))
...:  .reset_index()
...: )
...:
Out[6]:
col1 col2  counts  col3_mean  col4_median  col4_min
0    A    B       4  -0.372500       -0.810     -1.32
1    C    D       3  -0.476667       -0.110     -1.65
2    E    F       2   0.455000        0.475     -0.47
3    G    H       1   1.480000       -0.630     -0.63
``````

### Footnotes

The code used to generate the test data is shown below:

``````In [1]: import numpy as np
...: import pandas as pd
...:
...: keys = np.array([
...:         ["A", "B"],
...:         ["A", "B"],
...:         ["A", "B"],
...:         ["A", "B"],
...:         ["C", "D"],
...:         ["C", "D"],
...:         ["C", "D"],
...:         ["E", "F"],
...:         ["E", "F"],
...:         ["G", "H"]
...:         ])
...:
...: df = pd.DataFrame(
...:     np.hstack([keys,np.random.randn(10,4).round(2)]),
...:     columns = ["col1", "col2", "col3", "col4", "col5", "col6"]
...: )
...:
...: df[["col3", "col4", "col5", "col6"]] =
...:     df[["col3", "col4", "col5", "col6"]].astype(float)
...:
``````

Disclaimer:

If some of the columns that you are aggregating have null values, then you really want to be looking at the group row counts as an independent aggregation for each column. Otherwise you may be misled as to how many records are actually being used to calculate things like the mean because pandas will drop `NaN` entries in the mean calculation without telling you about it.