# Python program to find the sum of the absolute difference between all pairs in a list

absolute | Counters | find | Loops | Python Methods and Functions

Examples :

`  Input:  [9, 2, 14]  Output:  24  Explanation:  (abs (9-2) + abs (9-14) + abs (2-14))  Input:  [1, 2, 3, 4]  Output:  10  Explanation:  (abs (1-2) + abs (1-3) + abs (1-4) + abs (2-3) + abs (2-4) + abs (3-4)) `

First approach — this is the brute force method that

` # Python3 program for finding sum `
` # absolute differences in all pairs `

` def ` ` sumPairs (lst): `

` `

` diffs ` ` = ` ` [] `

` for ` ` i, x ` ` in ` ` enumerate ` ` (lst): `

` for ` ` j, y ` ` in ` ` enumerate ` ` (lst): `

` ` ` if ` ` i! ` ` = ` ` j: `

` diffs.append (` ` abs ` ` (x ` ` - ` ` y)) `

` return ` ` int ` ` (` ` sum ` ` (diffs) ` ` / ` ` 2 ` `) `

` # Driver program `

` lst ` ` = ` ` [` ` 1 ` `, ` ` 2 ` `, ` ` 3 ` `, ` ` 4 ` `] `

` print ` ` (sumPairs (lst)) `

Exit:

` 10 `

Approach # 2 Using itertools
Python itertools consists of a permutation () method. This method takes a list as input and returns a list of tuple objects that contain all permutations in the form of a list. Here, to find the absolute difference, we essentially need to swap the two elements. Since each pair will be counted twice, we divide the total by 2.

 ` # Python3 program for finding sum ` ` # absolute differences in all pairs ` ` import ` ` itertools `   ` def ` ` sumPairs (lst): `   ` diffs ` ` = ` ` [` ` abs ` ` (e [` ` 1 ` `] ` ` - ` ` e [` ` 0 ` `]) ` ` for ` ` e ` ` in ` ` itertools.permutations (lst, ` ` 2 ` `)] ` ` return ` ` int ` ` (` ` sum ` ` ( diffs) ` ` / ` ` 2 ` `) `   ` # Driver program ` ` lst ` ` = ` ` [` ` 9 ` `, ` ` 8 ` `, ` ` 1 ` `, ` ` 16 ` `, ` ` 15 ` `] ` ` print ` (sumPairs (lst))

Output:

` 74 `

## How to get an absolute file path in Python

### Question by izb

Given a path such as `"mydir/myfile.txt"`, how do I find the file"s absolute path relative to the current working directory in Python? E.g. on Windows, I might end up with:

``````"C:/example/cwd/mydir/myfile.txt"
``````

## What does from __future__ import absolute_import actually do?

I have answered a question regarding absolute imports in Python, which I thought I understood based on reading the Python 2.5 changelog and accompanying PEP. However, upon installing Python 2.5 and attempting to craft an example of properly using `from __future__ import absolute_import`, I realize things are not so clear.

Straight from the changelog linked above, this statement accurately summarized my understanding of the absolute import change:

Let"s say you have a package directory like this:

``````pkg/
pkg/__init__.py
pkg/main.py
pkg/string.py
``````

This defines a package named `pkg` containing the `pkg.main` and `pkg.string` submodules.

Consider the code in the main.py module. What happens if it executes the statement `import string`? In Python 2.4 and earlier, it will first look in the package"s directory to perform a relative import, finds pkg/string.py, imports the contents of that file as the `pkg.string` module, and that module is bound to the name `"string"` in the `pkg.main` module"s namespace.

So I created this exact directory structure:

``````\$ ls -R
.:
pkg/

./pkg:
__init__.py  main.py  string.py
``````

`__init__.py` and `string.py` are empty. `main.py` contains the following code:

``````import string
print string.ascii_uppercase
``````

As expected, running this with Python 2.5 fails with an `AttributeError`:

``````\$ python2.5 pkg/main.py
Traceback (most recent call last):
File "pkg/main.py", line 2, in <module>
print string.ascii_uppercase
AttributeError: "module" object has no attribute "ascii_uppercase"
``````

However, further along in the 2.5 changelog, we find this (emphasis added):

In Python 2.5, you can switch `import`"s behaviour to absolute imports using a `from __future__ import absolute_import` directive. This absolute-import behaviour will become the default in a future version (probably Python 2.7). Once absolute imports are the default, `import string` will always find the standard library"s version.

I thus created `pkg/main2.py`, identical to `main.py` but with the additional future import directive. It now looks like this:

``````from __future__ import absolute_import
import string
print string.ascii_uppercase
``````

Running this with Python 2.5, however... fails with an `AttributeError`:

``````\$ python2.5 pkg/main2.py
Traceback (most recent call last):
File "pkg/main2.py", line 3, in <module>
print string.ascii_uppercase
AttributeError: "module" object has no attribute "ascii_uppercase"
``````

This pretty flatly contradicts the statement that `import string` will always find the std-lib version with absolute imports enabled. What"s more, despite the warning that absolute imports are scheduled to become the "new default" behavior, I hit this same problem using both Python 2.7, with or without the `__future__` directive:

``````\$ python2.7 pkg/main.py
Traceback (most recent call last):
File "pkg/main.py", line 2, in <module>
print string.ascii_uppercase
AttributeError: "module" object has no attribute "ascii_uppercase"

\$ python2.7 pkg/main2.py
Traceback (most recent call last):
File "pkg/main2.py", line 3, in <module>
print string.ascii_uppercase
AttributeError: "module" object has no attribute "ascii_uppercase"
``````

as well as Python 3.5, with or without (assuming the `print` statement is changed in both files):

``````\$ python3.5 pkg/main.py
Traceback (most recent call last):
File "pkg/main.py", line 2, in <module>
print(string.ascii_uppercase)
AttributeError: module "string" has no attribute "ascii_uppercase"

\$ python3.5 pkg/main2.py
Traceback (most recent call last):
File "pkg/main2.py", line 3, in <module>
print(string.ascii_uppercase)
AttributeError: module "string" has no attribute "ascii_uppercase"
``````

I have tested other variations of this. Instead of `string.py`, I have created an empty module -- a directory named `string` containing only an empty `__init__.py` -- and instead of issuing imports from `main.py`, I have `cd`"d to `pkg` and run imports directly from the REPL. Neither of these variations (nor a combination of them) changed the results above. I cannot reconcile this with what I have read about the `__future__` directive and absolute imports.

It seems to me that this is easily explicable by the following (this is from the Python 2 docs but this statement remains unchanged in the same docs for Python 3):

### sys.path

(...)

As initialized upon program startup, the first item of this list, `path[0]`, is the directory containing the script that was used to invoke the Python interpreter. If the script directory is not available (e.g. if the interpreter is invoked interactively or if the script is read from standard input), `path[0]` is the empty string, which directs Python to search modules in the current directory first.

So what am I missing? Why does the `__future__` statement seemingly not do what it says, and what is the resolution of this contradiction between these two sections of documentation, as well as between described and actual behavior?

## How to check if a path is absolute path or relative path in a cross-platform way with Python?

UNIX absolute path starts with "/", whereas Windows starts with alphabet "C:" or "". Does python have a standard function to check if a path is absolute or relative?

## Get relative path from comparing two absolute paths

Say, I have two absolute paths. I need to check if the location referring to by one of the paths is a descendant of the other. If true, I need to find out the relative path of the descendant from the ancestor. What"s a good way to implement this in Python? Any library that I can benefit from?

## How to join absolute and relative urls?

I have two urls:

``````url1 = "http://127.0.0.1/test1/test2/test3/test5.xml"
url2 = "../../test4/test6.xml"
``````

How can I get an absolute url for url2?

## Is module __file__ attribute absolute or relative?

I"m having trouble understanding `__file__`. From what I understand, `__file__` returns the absolute path from which the module was loaded.

I"m having problem producing this: I have a `abc.py` with one statement `print __file__`, running from `/d/projects/` `python abc.py` returns `abc.py`. running from `/d/` returns `projects/abc.py`. Any reasons why?

# `os.listdir()` - list in the current directory

With listdir in os module you get the files and the folders in the current dir

`````` import os
arr = os.listdir()
print(arr)

>>> ["\$RECYCLE.BIN", "work.txt", "3ebooks.txt", "documents"]
``````

## Looking in a directory

``````arr = os.listdir("c:\files")
``````

# `glob` from glob

with glob you can specify a type of file to list like this

``````import glob

txtfiles = []
for file in glob.glob("*.txt"):
txtfiles.append(file)
``````

## `glob` in a list comprehension

``````mylist = [f for f in glob.glob("*.txt")]
``````

## get the full path of only files in the current directory

``````import os
from os import listdir
from os.path import isfile, join

cwd = os.getcwd()
onlyfiles = [os.path.join(cwd, f) for f in os.listdir(cwd) if
os.path.isfile(os.path.join(cwd, f))]
print(onlyfiles)

["G:\getfilesname\getfilesname.py", "G:\getfilesname\example.txt"]
``````

## Getting the full path name with `os.path.abspath`

You get the full path in return

`````` import os
files_path = [os.path.abspath(x) for x in os.listdir()]
print(files_path)

["F:\documentiapplications.txt", "F:\documenticollections.txt"]
``````

## Walk: going through sub directories

os.walk returns the root, the directories list and the files list, that is why I unpacked them in r, d, f in the for loop; it, then, looks for other files and directories in the subfolders of the root and so on until there are no subfolders.

``````import os

# Getting the current work directory (cwd)
thisdir = os.getcwd()

# r=root, d=directories, f = files
for r, d, f in os.walk(thisdir):
for file in f:
if file.endswith(".docx"):
print(os.path.join(r, file))
``````

### `os.listdir()`: get files in the current directory (Python 2)

In Python 2, if you want the list of the files in the current directory, you have to give the argument as "." or os.getcwd() in the os.listdir method.

`````` import os
arr = os.listdir(".")
print(arr)

>>> ["\$RECYCLE.BIN", "work.txt", "3ebooks.txt", "documents"]
``````

### To go up in the directory tree

``````# Method 1
x = os.listdir("..")

# Method 2
x= os.listdir("/")
``````

### Get files: `os.listdir()` in a particular directory (Python 2 and 3)

`````` import os
arr = os.listdir("F:\python")
print(arr)

>>> ["\$RECYCLE.BIN", "work.txt", "3ebooks.txt", "documents"]
``````

### Get files of a particular subdirectory with `os.listdir()`

``````import os

x = os.listdir("./content")
``````

### `os.walk(".")` - current directory

`````` import os
arr = next(os.walk("."))[2]
print(arr)

>>> ["5bs_Turismo1.pdf", "5bs_Turismo1.pptx", "esperienza.txt"]
``````

### `next(os.walk("."))` and `os.path.join("dir", "file")`

`````` import os
arr = []
for d,r,f in next(os.walk("F:\_python")):
for file in f:
arr.append(os.path.join(r,file))

for f in arr:
print(files)

>>> F:\_python\dict_class.py
>>> F:\_python\programmi.txt
``````

### `next(os.walk("F:\")` - get the full path - list comprehension

`````` [os.path.join(r,file) for r,d,f in next(os.walk("F:\_python")) for file in f]

>>> ["F:\_python\dict_class.py", "F:\_python\programmi.txt"]
``````

### `os.walk` - get full path - all files in sub dirs**

``````x = [os.path.join(r,file) for r,d,f in os.walk("F:\_python") for file in f]
print(x)

``````

### `os.listdir()` - get only txt files

`````` arr_txt = [x for x in os.listdir() if x.endswith(".txt")]
print(arr_txt)

>>> ["work.txt", "3ebooks.txt"]
``````

## Using `glob` to get the full path of the files

If I should need the absolute path of the files:

``````from path import path
from glob import glob
x = [path(f).abspath() for f in glob("F:\*.txt")]
for f in x:
print(f)

>>> F:acquistionline.txt
>>> F:acquisti_2018.txt
>>> F:ootstrap_jquery_ecc.txt
``````

## Using `os.path.isfile` to avoid directories in the list

``````import os.path
listOfFiles = [f for f in os.listdir() if os.path.isfile(f)]
print(listOfFiles)

>>> ["a simple game.py", "data.txt", "decorator.py"]
``````

## Using `pathlib` from Python 3.4

``````import pathlib

flist = []
for p in pathlib.Path(".").iterdir():
if p.is_file():
print(p)
flist.append(p)

>>> error.PNG
>>> exemaker.bat
>>> guiprova.mp3
>>> setup.py
>>> speak_gui2.py
>>> thumb.PNG
``````

With `list comprehension`:

``````flist = [p for p in pathlib.Path(".").iterdir() if p.is_file()]
``````

Alternatively, use `pathlib.Path()` instead of `pathlib.Path(".")`

## Use glob method in pathlib.Path()

``````import pathlib

py = pathlib.Path().glob("*.py")
for file in py:
print(file)

>>> stack_overflow_list.py
>>> stack_overflow_list_tkinter.py
``````

## Get all and only files with os.walk

``````import os
x = [i[2] for i in os.walk(".")]
y=[]
for t in x:
for f in t:
y.append(f)
print(y)

>>> ["append_to_list.py", "data.txt", "data1.txt", "data2.txt", "data_180617", "os_walk.py", "READ2.py", "read_data.py", "somma_defaltdic.py", "substitute_words.py", "sum_data.py", "data.txt", "data1.txt", "data_180617"]
``````

## Get only files with next and walk in a directory

`````` import os
x = next(os.walk("F://python"))[2]
print(x)

>>> ["calculator.bat","calculator.py"]
``````

## Get only directories with next and walk in a directory

`````` import os
next(os.walk("F://python"))[1] # for the current dir use (".")

>>> ["python3","others"]
``````

## Get all the subdir names with `walk`

``````for r,d,f in os.walk("F:\_python"):
for dirs in d:
print(dirs)

>>> .vscode
>>> pyexcel
>>> pyschool.py
>>> subtitles
>>> _metaprogramming
>>> .ipynb_checkpoints
``````

## `os.scandir()` from Python 3.5 and greater

``````import os
x = [f.name for f in os.scandir() if f.is_file()]
print(x)

>>> ["calculator.bat","calculator.py"]

# Another example with scandir (a little variation from docs.python.org)
# This one is more efficient than os.listdir.
# In this case, it shows the files only in the current directory
# where the script is executed.

import os
with os.scandir() as i:
for entry in i:
if entry.is_file():
print(entry.name)

>>> ebookmaker.py
>>> error.PNG
>>> exemaker.bat
>>> guiprova.mp3
>>> setup.py
>>> speakgui4.py
>>> speak_gui2.py
>>> speak_gui3.py
>>> thumb.PNG
``````

# Examples:

## Ex. 1: How many files are there in the subdirectories?

In this example, we look for the number of files that are included in all the directory and its subdirectories.

``````import os

def count(dir, counter=0):
"returns number of files in dir and subdirs"
for pack in os.walk(dir):
for f in pack[2]:
counter += 1
return dir + " : " + str(counter) + "files"

print(count("F:\python"))

>>> "F:\python" : 12057 files"
``````

## Ex.2: How to copy all files from a directory to another?

A script to make order in your computer finding all files of a type (default: pptx) and copying them in a new folder.

``````import os
import shutil
from path import path

destination = "F:\file_copied"
# os.makedirs(destination)

def copyfile(dir, filetype="pptx", counter=0):
"Searches for pptx (or other - pptx is the default) files and copies them"
for pack in os.walk(dir):
for f in pack[2]:
if f.endswith(filetype):
fullpath = pack[0] + "\" + f
print(fullpath)
shutil.copy(fullpath, destination)
counter += 1
if counter > 0:
print("-" * 30)
print("	==> Found in: `" + dir + "` : " + str(counter) + " files
")

for dir in os.listdir():
"searches for folders that starts with `_`"
if dir[0] == "_":
# copyfile(dir, filetype="pdf")
copyfile(dir, filetype="txt")

>>> _compiti18Compito Contabilit√† 1conti.txt
>>> _compiti18Compito Contabilit√† 1modula4.txt
>>> _compiti18Compito Contabilit√† 1moduloa4.txt
>>> ------------------------
>>> ==> Found in: `_compiti18` : 3 files
``````

## Ex. 3: How to get all the files in a txt file

In case you want to create a txt file with all the file names:

``````import os
mylist = ""
with open("filelist.txt", "w", encoding="utf-8") as file:
for eachfile in os.listdir():
mylist += eachfile + "
"
file.write(mylist)
``````

## Example: txt with all the files of an hard drive

``````"""
We are going to save a txt file with all the files in your directory.
We will use the function walk()
"""

import os

# see all the methods of os
# print(*dir(os), sep=", ")
listafile = []
percorso = []
with open("lista_file.txt", "w", encoding="utf-8") as testo:
for root, dirs, files in os.walk("D:\"):
for file in files:
listafile.append(file)
percorso.append(root + "\" + file)
testo.write(file + "
")
listafile.sort()
print("N. of files", len(listafile))
with open("lista_file_ordinata.txt", "w", encoding="utf-8") as testo_ordinato:
for file in listafile:
testo_ordinato.write(file + "
")

with open("percorso.txt", "w", encoding="utf-8") as file_percorso:
for file in percorso:
file_percorso.write(file + "
")

os.system("lista_file.txt")
os.system("lista_file_ordinata.txt")
os.system("percorso.txt")
``````

## All the file of C: in one text file

This is a shorter version of the previous code. Change the folder where to start finding the files if you need to start from another position. This code generate a 50 mb on text file on my computer with something less then 500.000 lines with files with the complete path.

``````import os

with open("file.txt", "w", encoding="utf-8") as filewrite:
for r, d, f in os.walk("C:\"):
for file in f:
filewrite.write(f"{r + file}
")
``````

## How to write a file with all paths in a folder of a type

With this function you can create a txt file that will have the name of a type of file that you look for (ex. pngfile.txt) with all the full path of all the files of that type. It can be useful sometimes, I think.

``````import os

def searchfiles(extension=".ttf", folder="H:\"):
"Create a txt file with all the file of a type"
with open(extension[1:] + "file.txt", "w", encoding="utf-8") as filewrite:
for r, d, f in os.walk(folder):
for file in f:
if file.endswith(extension):
filewrite.write(f"{r + file}
")

# looking for png file (fonts) in the hard disk H:
searchfiles(".png", "H:\")

>>> H:4bs_18Dolphins5.png
>>> H:4bs_18Dolphins6.png
>>> H:4bs_18Dolphins7.png
>>> H:5_18marketing htmlassetsimageslogo2.png
>>> H:7z001.png
>>> H:7z002.png
``````

## (New) Find all files and open them with tkinter GUI

I just wanted to add in this 2019 a little app to search for all files in a dir and be able to open them by doubleclicking on the name of the file in the list.

``````import tkinter as tk
import os

def searchfiles(extension=".txt", folder="H:\"):
"insert all files in the listbox"
for r, d, f in os.walk(folder):
for file in f:
if file.endswith(extension):
lb.insert(0, r + "\" + file)

def open_file():
os.startfile(lb.get(lb.curselection()[0]))

root = tk.Tk()
root.geometry("400x400")
bt = tk.Button(root, text="Search", command=lambda:searchfiles(".png", "H:\"))
bt.pack()
lb = tk.Listbox(root)
lb.pack(fill="both", expand=1)
lb.bind("<Double-Button>", lambda x: open_file())
root.mainloop()
``````

# Explanation

From PEP 328

Relative imports use a module"s __name__ attribute to determine that module"s position in the package hierarchy. If the module"s name does not contain any package information (e.g. it is set to "__main__") then relative imports are resolved as if the module were a top level module, regardless of where the module is actually located on the file system.

At some point PEP 338 conflicted with PEP 328:

... relative imports rely on __name__ to determine the current module"s position in the package hierarchy. In a main module, the value of __name__ is always "__main__", so explicit relative imports will always fail (as they only work for a module inside a package)

and to address the issue, PEP 366 introduced the top level variable `__package__`:

By adding a new module level attribute, this PEP allows relative imports to work automatically if the module is executed using the -m switch. A small amount of boilerplate in the module itself will allow the relative imports to work when the file is executed by name. [...] When it [the attribute] is present, relative imports will be based on this attribute rather than the module __name__ attribute. [...] When the main module is specified by its filename, then the __package__ attribute will be set to None. [...] When the import system encounters an explicit relative import in a module without __package__ set (or with it set to None), it will calculate and store the correct value (__name__.rpartition(".")[0] for normal modules and __name__ for package initialisation modules)

(emphasis mine)

If the `__name__` is `"__main__"`, `__name__.rpartition(".")[0]` returns empty string. This is why there"s empty string literal in the error description:

``````SystemError: Parent module "" not loaded, cannot perform relative import
``````

The relevant part of the CPython"s `PyImport_ImportModuleLevelObject` function:

``````if (PyDict_GetItem(interp->modules, package) == NULL) {
PyErr_Format(PyExc_SystemError,
"Parent module %R not loaded, cannot perform relative "
"import", package);
goto error;
}
``````

CPython raises this exception if it was unable to find `package` (the name of the package) in `interp->modules` (accessible as `sys.modules`). Since `sys.modules` is "a dictionary that maps module names to modules which have already been loaded", it"s now clear that the parent module must be explicitly absolute-imported before performing relative import.

Note: The patch from the issue 18018 has added another `if` block, which will be executed before the code above:

``````if (PyUnicode_CompareWithASCIIString(package, "") == 0) {
PyErr_SetString(PyExc_ImportError,
"attempted relative import with no known parent package");
goto error;
} /* else if (PyDict_GetItem(interp->modules, package) == NULL) {
...
*/
``````

If `package` (same as above) is empty string, the error message will be

``````ImportError: attempted relative import with no known parent package
``````

However, you will only see this in Python 3.6 or newer.

# Solution #1: Run your script using -m

Consider a directory (which is a Python package):

``````.
‚îú‚îÄ‚îÄ package
‚îÇ¬†¬† ‚îú‚îÄ‚îÄ __init__.py
‚îÇ¬†¬† ‚îú‚îÄ‚îÄ module.py
‚îÇ¬†¬† ‚îî‚îÄ‚îÄ standalone.py
``````

All of the files in package begin with the same 2 lines of code:

``````from pathlib import Path
print("Running" if __name__ == "__main__" else "Importing", Path(__file__).resolve())
``````

I"m including these two lines only to make the order of operations obvious. We can ignore them completely, since they don"t affect the execution.

__init__.py and module.py contain only those two lines (i.e., they are effectively empty).

standalone.py additionally attempts to import module.py via relative import:

``````from . import module  # explicit relative import
``````

We"re well aware that `/path/to/python/interpreter package/standalone.py` will fail. However, we can run the module with the `-m` command line option that will "search `sys.path` for the named module and execute its contents as the `__main__` module":

``````[email protected]:~\$ python3 -i -m package.standalone
Importing /home/vaultah/package/__init__.py
Running /home/vaultah/package/standalone.py
Importing /home/vaultah/package/module.py
>>> __file__
"/home/vaultah/package/standalone.py"
>>> __package__
"package"
>>> # The __package__ has been correctly set and module.py has been imported.
... # What"s inside sys.modules?
... import sys
>>> sys.modules["__main__"]
<module "package.standalone" from "/home/vaultah/package/standalone.py">
>>> sys.modules["package.module"]
<module "package.module" from "/home/vaultah/package/module.py">
>>> sys.modules["package"]
<module "package" from "/home/vaultah/package/__init__.py">
``````

`-m` does all the importing stuff for you and automatically sets `__package__`, but you can do that yourself in the

# Solution #2: Set __package__ manually

Please treat it as a proof of concept rather than an actual solution. It isn"t well-suited for use in real-world code.

PEP 366 has a workaround to this problem, however, it"s incomplete, because setting `__package__` alone is not enough. You"re going to need to import at least N preceding packages in the module hierarchy, where N is the number of parent directories (relative to the directory of the script) that will be searched for the module being imported.

Thus,

1. Add the parent directory of the Nth predecessor of the current module to `sys.path`

2. Remove the current file"s directory from `sys.path`

3. Import the parent module of the current module using its fully-qualified name

4. Set `__package__` to the fully-qualified name from 2

5. Perform the relative import

I"ll borrow files from the Solution #1 and add some more subpackages:

``````package
‚îú‚îÄ‚îÄ __init__.py
‚îú‚îÄ‚îÄ module.py
‚îî‚îÄ‚îÄ subpackage
‚îú‚îÄ‚îÄ __init__.py
‚îî‚îÄ‚îÄ subsubpackage
‚îú‚îÄ‚îÄ __init__.py
‚îî‚îÄ‚îÄ standalone.py
``````

This time standalone.py will import module.py from the package package using the following relative import

``````from ... import module  # N = 3
``````

We"ll need to precede that line with the boilerplate code, to make it work.

``````import sys
from pathlib import Path

if __name__ == "__main__" and __package__ is None:
file = Path(__file__).resolve()
parent, top = file.parent, file.parents[3]

sys.path.append(str(top))
try:
sys.path.remove(str(parent))
pass

import package.subpackage.subsubpackage
__package__ = "package.subpackage.subsubpackage"

from ... import module # N = 3
``````

It allows us to execute standalone.py by filename:

``````[email protected]:~\$ python3 package/subpackage/subsubpackage/standalone.py
Running /home/vaultah/package/subpackage/subsubpackage/standalone.py
Importing /home/vaultah/package/__init__.py
Importing /home/vaultah/package/subpackage/__init__.py
Importing /home/vaultah/package/subpackage/subsubpackage/__init__.py
Importing /home/vaultah/package/module.py
``````

A more general solution wrapped in a function can be found here. Example usage:

``````if __name__ == "__main__" and __package__ is None:
import_parents(level=3) # N = 3

from ... import module
from ...module.submodule import thing
``````

# Solution #3: Use absolute imports and setuptools

The steps are -

1. Replace explicit relative imports with equivalent absolute imports

2. Install `package` to make it importable

For instance, the directory structure may be as follows

``````.
‚îú‚îÄ‚îÄ project
‚îÇ¬†¬† ‚îú‚îÄ‚îÄ package
‚îÇ¬†¬† ‚îÇ¬†¬† ‚îú‚îÄ‚îÄ __init__.py
‚îÇ¬†¬† ‚îÇ¬†¬† ‚îú‚îÄ‚îÄ module.py
‚îÇ¬†¬† ‚îÇ¬†¬† ‚îî‚îÄ‚îÄ standalone.py
‚îÇ¬†¬† ‚îî‚îÄ‚îÄ setup.py
``````

where setup.py is

``````from setuptools import setup, find_packages
setup(
name = "your_package_name",
packages = find_packages(),
)
``````

The rest of the files were borrowed from the Solution #1.

Installation will allow you to import the package regardless of your working directory (assuming there"ll be no naming issues).

We can modify standalone.py to use this advantage (step 1):

``````from package import module  # absolute import
``````

Change your working directory to `project` and run `/path/to/python/interpreter setup.py install --user` (`--user` installs the package in your site-packages directory) (step 2):

``````[email protected]:~\$ cd project
[email protected]:~/project\$ python3 setup.py install --user
``````

Let"s verify that it"s now possible to run standalone.py as a script:

``````[email protected]:~/project\$ python3 -i package/standalone.py
Running /home/vaultah/project/package/standalone.py
Importing /home/vaultah/.local/lib/python3.6/site-packages/your_package_name-0.0.0-py3.6.egg/package/__init__.py
Importing /home/vaultah/.local/lib/python3.6/site-packages/your_package_name-0.0.0-py3.6.egg/package/module.py
>>> module
<module "package.module" from "/home/vaultah/.local/lib/python3.6/site-packages/your_package_name-0.0.0-py3.6.egg/package/module.py">
>>> import sys
>>> sys.modules["package"]
<module "package" from "/home/vaultah/.local/lib/python3.6/site-packages/your_package_name-0.0.0-py3.6.egg/package/__init__.py">
>>> sys.modules["package.module"]
<module "package.module" from "/home/vaultah/.local/lib/python3.6/site-packages/your_package_name-0.0.0-py3.6.egg/package/module.py">
``````

Note: If you decide to go down this route, you"d be better off using virtual environments to install packages in isolation.

# Solution #4: Use absolute imports and some boilerplate code

Frankly, the installation is not necessary - you could add some boilerplate code to your script to make absolute imports work.

I"m going to borrow files from Solution #1 and change standalone.py:

1. Add the parent directory of package to `sys.path` before attempting to import anything from package using absolute imports:

``````import sys
from pathlib import Path # if you haven"t already done so
file = Path(__file__).resolve()
parent, root = file.parent, file.parents[1]
sys.path.append(str(root))

# Additionally remove the current file"s directory from sys.path
try:
sys.path.remove(str(parent))
pass
``````
2. Replace the relative import by the absolute import:

``````from package import module  # absolute import
``````

standalone.py runs without problems:

``````[email protected]:~\$ python3 -i package/standalone.py
Running /home/vaultah/package/standalone.py
Importing /home/vaultah/package/__init__.py
Importing /home/vaultah/package/module.py
>>> module
<module "package.module" from "/home/vaultah/package/module.py">
>>> import sys
>>> sys.modules["package"]
<module "package" from "/home/vaultah/package/__init__.py">
>>> sys.modules["package.module"]
<module "package.module" from "/home/vaultah/package/module.py">
``````

I feel that I should warn you: try not to do this, especially if your project has a complex structure.

As a side note, PEP 8 recommends the use of absolute imports, but states that in some scenarios explicit relative imports are acceptable:

Absolute imports are recommended, as they are usually more readable and tend to be better behaved (or at least give better error messages). [...] However, explicit relative imports are an acceptable alternative to absolute imports, especially when dealing with complex package layouts where using absolute imports would be unnecessarily verbose.

Python 3.5 adds the `math.isclose` and `cmath.isclose` functions as described in PEP 485.

If you"re using an earlier version of Python, the equivalent function is given in the documentation.

``````def isclose(a, b, rel_tol=1e-09, abs_tol=0.0):
return abs(a-b) <= max(rel_tol * max(abs(a), abs(b)), abs_tol)
``````

`rel_tol` is a relative tolerance, it is multiplied by the greater of the magnitudes of the two arguments; as the values get larger, so does the allowed difference between them while still considering them equal.

`abs_tol` is an absolute tolerance that is applied as-is in all cases. If the difference is less than either of those tolerances, the values are considered equal.

I posted a similar answer also to the question regarding imports from sibling packages. You can see it here.

# Solution without `sys.path` hacks

### Summary

• Wrap the code into one folder (e.g. `packaged_stuff`)
• Use create `setup.py` script where you use setuptools.setup().
• Pip install the package in editable state with `pip install -e <myproject_folder>`
• Import using `from packaged_stuff.modulename import function_name`

## Setup

I assume the same folder structure as in the question

``````.
‚îî‚îÄ‚îÄ ptdraft
‚îú‚îÄ‚îÄ __init__.py
‚îú‚îÄ‚îÄ nib.py
‚îî‚îÄ‚îÄ simulations
‚îú‚îÄ‚îÄ __init__.py
‚îî‚îÄ‚îÄ life
‚îú‚îÄ‚îÄ __init__.py
‚îî‚îÄ‚îÄ life.py
``````

I call the `.` the root folder, and in my case it is located in `C: mp est_imports`.

## Steps

1. Add a `setup.py` to the root folder -- The contents of the `setup.py` can be simply

from setuptools import setup, find_packages

setup(name="myproject", version="1.0", packages=find_packages())

Basically "any" `setup.py` would work. This is just a minimal working example.

1. Use a virtual environment

If you are familiar with virtual environments, activate one, and skip to the next step. Usage of virtual environments are not absolutely required, but they will really help you out in the long run (when you have more than 1 project ongoing..). The most basic steps are (run in the root folder)

• Create virtual env
• `python -m venv venv`
• Activate virtual env
• `. venv/bin/activate` (Linux) or `./venv/Scripts/activate` (Win)
• Deactivate virtual env
• `deactivate` (Linux)

Once you have made and activated a virtual environment, your console should give the name of the virtual environment in parenthesis

``````PS C:	mp	est_imports> python -m venv venv
PS C:	mp	est_imports> .venvScriptsactivate
(venv) PS C:	mp	est_imports>
``````
1. pip install your project in editable state

Install your top level package `myproject` using `pip`. The trick is to use the `-e` flag when doing the install. This way it is installed in an editable state, and all the edits made to the .py files will be automatically included in the installed package.

In the root directory, run

`pip install -e .` (note the dot, it stands for "current directory")

You can also see that it is installed by using `pip freeze`

``````(venv) PS C:	mp	est_imports> pip install -e .
Obtaining file:///C:/tmp/test_imports
Installing collected packages: myproject
Running setup.py develop for myproject
Successfully installed myproject
(venv) PS C:	mp	est_imports> pip freeze
myproject==1.0
``````
1. Import by prepending `mainfolder` to every import

In this example, the `mainfolder` would be `ptdraft`. This has the advantage that you will not run into name collisions with other module names (from python standard library or 3rd party modules).

# Example Usage

## nib.py

``````def function_from_nib():
print("I am the return value from function_from_nib!")
``````

## life.py

``````from ptdraft.nib import function_from_nib

if __name__ == "__main__":
function_from_nib()
``````

## Running life.py

``````(venv) PS C:	mp	est_imports> python .ptdraftsimulationslifelife.py
I am the return value from function_from_nib!
``````

To somewhat expand on the earlier answers here, there are a number of details which are commonly overlooked.

• Prefer `subprocess.run()` over `subprocess.check_call()` and friends over `subprocess.call()` over `subprocess.Popen()` over `os.system()` over `os.popen()`
• Understand and probably use `text=True`, aka `universal_newlines=True`.
• Understand the meaning of `shell=True` or `shell=False` and how it changes quoting and the availability of shell conveniences.
• Understand differences between `sh` and Bash
• Understand how a subprocess is separate from its parent, and generally cannot change the parent.
• Avoid running the Python interpreter as a subprocess of Python.

These topics are covered in some more detail below.

# Prefer `subprocess.run()` or `subprocess.check_call()`

The `subprocess.Popen()` function is a low-level workhorse but it is tricky to use correctly and you end up copy/pasting multiple lines of code ... which conveniently already exist in the standard library as a set of higher-level wrapper functions for various purposes, which are presented in more detail in the following.

Here"s a paragraph from the documentation:

The recommended approach to invoking subprocesses is to use the `run()` function for all use cases it can handle. For more advanced use cases, the underlying `Popen` interface can be used directly.

Unfortunately, the availability of these wrapper functions differs between Python versions.

• `subprocess.run()` was officially introduced in Python 3.5. It is meant to replace all of the following.
• `subprocess.check_output()` was introduced in Python 2.7 / 3.1. It is basically equivalent to `subprocess.run(..., check=True, stdout=subprocess.PIPE).stdout`
• `subprocess.check_call()` was introduced in Python 2.5. It is basically equivalent to `subprocess.run(..., check=True)`
• `subprocess.call()` was introduced in Python 2.4 in the original `subprocess` module (PEP-324). It is basically equivalent to `subprocess.run(...).returncode`

### High-level API vs `subprocess.Popen()`

The refactored and extended `subprocess.run()` is more logical and more versatile than the older legacy functions it replaces. It returns a `CompletedProcess` object which has various methods which allow you to retrieve the exit status, the standard output, and a few other results and status indicators from the finished subprocess.

`subprocess.run()` is the way to go if you simply need a program to run and return control to Python. For more involved scenarios (background processes, perhaps with interactive I/O with the Python parent program) you still need to use `subprocess.Popen()` and take care of all the plumbing yourself. This requires a fairly intricate understanding of all the moving parts and should not be undertaken lightly. The simpler `Popen` object represents the (possibly still-running) process which needs to be managed from your code for the remainder of the lifetime of the subprocess.

It should perhaps be emphasized that just `subprocess.Popen()` merely creates a process. If you leave it at that, you have a subprocess running concurrently alongside with Python, so a "background" process. If it doesn"t need to do input or output or otherwise coordinate with you, it can do useful work in parallel with your Python program.

### Avoid `os.system()` and `os.popen()`

Since time eternal (well, since Python 2.5) the `os` module documentation has contained the recommendation to prefer `subprocess` over `os.system()`:

The `subprocess` module provides more powerful facilities for spawning new processes and retrieving their results; using that module is preferable to using this function.

The problems with `system()` are that it"s obviously system-dependent and doesn"t offer ways to interact with the subprocess. It simply runs, with standard output and standard error outside of Python"s reach. The only information Python receives back is the exit status of the command (zero means success, though the meaning of non-zero values is also somewhat system-dependent).

PEP-324 (which was already mentioned above) contains a more detailed rationale for why `os.system` is problematic and how `subprocess` attempts to solve those issues.

`os.popen()` used to be even more strongly discouraged:

Deprecated since version 2.6: This function is obsolete. Use the `subprocess` module.

However, since sometime in Python 3, it has been reimplemented to simply use `subprocess`, and redirects to the `subprocess.Popen()` documentation for details.

### Understand and usually use `check=True`

You"ll also notice that `subprocess.call()` has many of the same limitations as `os.system()`. In regular use, you should generally check whether the process finished successfully, which `subprocess.check_call()` and `subprocess.check_output()` do (where the latter also returns the standard output of the finished subprocess). Similarly, you should usually use `check=True` with `subprocess.run()` unless you specifically need to allow the subprocess to return an error status.

In practice, with `check=True` or `subprocess.check_*`, Python will throw a `CalledProcessError` exception if the subprocess returns a nonzero exit status.

A common error with `subprocess.run()` is to omit `check=True` and be surprised when downstream code fails if the subprocess failed.

On the other hand, a common problem with `check_call()` and `check_output()` was that users who blindly used these functions were surprised when the exception was raised e.g. when `grep` did not find a match. (You should probably replace `grep` with native Python code anyway, as outlined below.)

All things counted, you need to understand how shell commands return an exit code, and under what conditions they will return a non-zero (error) exit code, and make a conscious decision how exactly it should be handled.

# Understand and probably use `text=True` aka `universal_newlines=True`

Since Python 3, strings internal to Python are Unicode strings. But there is no guarantee that a subprocess generates Unicode output, or strings at all.

(If the differences are not immediately obvious, Ned Batchelder"s Pragmatic Unicode is recommended, if not outright obligatory, reading. There is a 36-minute video presentation behind the link if you prefer, though reading the page yourself will probably take significantly less time.)

Deep down, Python has to fetch a `bytes` buffer and interpret it somehow. If it contains a blob of binary data, it shouldn"t be decoded into a Unicode string, because that"s error-prone and bug-inducing behavior - precisely the sort of pesky behavior which riddled many Python 2 scripts, before there was a way to properly distinguish between encoded text and binary data.

With `text=True`, you tell Python that you, in fact, expect back textual data in the system"s default encoding, and that it should be decoded into a Python (Unicode) string to the best of Python"s ability (usually UTF-8 on any moderately up to date system, except perhaps Windows?)

If that"s not what you request back, Python will just give you `bytes` strings in the `stdout` and `stderr` strings. Maybe at some later point you do know that they were text strings after all, and you know their encoding. Then, you can decode them.

``````normal = subprocess.run([external, arg],
stdout=subprocess.PIPE, stderr=subprocess.PIPE,
check=True,
text=True)
print(normal.stdout)

convoluted = subprocess.run([external, arg],
stdout=subprocess.PIPE, stderr=subprocess.PIPE,
check=True)
# You have to know (or guess) the encoding
print(convoluted.stdout.decode("utf-8"))
``````

Python 3.7 introduced the shorter and more descriptive and understandable alias `text` for the keyword argument which was previously somewhat misleadingly called `universal_newlines`.

# Understand `shell=True` vs `shell=False`

With `shell=True` you pass a single string to your shell, and the shell takes it from there.

With `shell=False` you pass a list of arguments to the OS, bypassing the shell.

When you don"t have a shell, you save a process and get rid of a fairly substantial amount of hidden complexity, which may or may not harbor bugs or even security problems.

On the other hand, when you don"t have a shell, you don"t have redirection, wildcard expansion, job control, and a large number of other shell features.

A common mistake is to use `shell=True` and then still pass Python a list of tokens, or vice versa. This happens to work in some cases, but is really ill-defined and could break in interesting ways.

``````# XXX AVOID THIS BUG
buggy = subprocess.run("dig +short stackoverflow.com")

# XXX AVOID THIS BUG TOO
broken = subprocess.run(["dig", "+short", "stackoverflow.com"],
shell=True)

# XXX DEFINITELY AVOID THIS
pathological = subprocess.run(["dig +short stackoverflow.com"],
shell=True)

correct = subprocess.run(["dig", "+short", "stackoverflow.com"],
# Probably don"t forget these, too
check=True, text=True)

# XXX Probably better avoid shell=True
# but this is nominally correct
fixed_but_fugly = subprocess.run("dig +short stackoverflow.com",
shell=True,
# Probably don"t forget these, too
check=True, text=True)
``````

The common retort "but it works for me" is not a useful rebuttal unless you understand exactly under what circumstances it could stop working.

### Refactoring Example

Very often, the features of the shell can be replaced with native Python code. Simple Awk or `sed` scripts should probably simply be translated to Python instead.

To partially illustrate this, here is a typical but slightly silly example which involves many shell features.

``````cmd = """while read -r x;
do ping -c 3 "\$x" | grep "round-trip min/avg/max"
done <hosts.txt"""

# Trivial but horrible
results = subprocess.run(
cmd, shell=True, universal_newlines=True, check=True)
print(results.stdout)

# Reimplement with shell=False
with open("hosts.txt") as hosts:
for host in hosts:
host = host.rstrip("
")  # drop newline
ping = subprocess.run(
["ping", "-c", "3", host],
text=True,
stdout=subprocess.PIPE,
check=True)
for line in ping.stdout.split("
"):
if "round-trip min/avg/max" in line:
print("{}: {}".format(host, line))
``````

Some things to note here:

• With `shell=False` you don"t need the quoting that the shell requires around strings. Putting quotes anyway is probably an error.
• It often makes sense to run as little code as possible in a subprocess. This gives you more control over execution from within your Python code.
• Having said that, complex shell pipelines are tedious and sometimes challenging to reimplement in Python.

The refactored code also illustrates just how much the shell really does for you with a very terse syntax -- for better or for worse. Python says explicit is better than implicit but the Python code is rather verbose and arguably looks more complex than this really is. On the other hand, it offers a number of points where you can grab control in the middle of something else, as trivially exemplified by the enhancement that we can easily include the host name along with the shell command output. (This is by no means challenging to do in the shell, either, but at the expense of yet another diversion and perhaps another process.)

### Common Shell Constructs

For completeness, here are brief explanations of some of these shell features, and some notes on how they can perhaps be replaced with native Python facilities.

• Globbing aka wildcard expansion can be replaced with `glob.glob()` or very often with simple Python string comparisons like `for file in os.listdir("."): if not file.endswith(".png"): continue`. Bash has various other expansion facilities like `.{png,jpg}` brace expansion and `{1..100}` as well as tilde expansion (`~` expands to your home directory, and more generally `~account` to the home directory of another user)
• Shell variables like `\$SHELL` or `\$my_exported_var` can sometimes simply be replaced with Python variables. Exported shell variables are available as e.g. `os.environ["SHELL"]` (the meaning of `export` is to make the variable available to subprocesses -- a variable which is not available to subprocesses will obviously not be available to Python running as a subprocess of the shell, or vice versa. The `env=` keyword argument to `subprocess` methods allows you to define the environment of the subprocess as a dictionary, so that"s one way to make a Python variable visible to a subprocess). With `shell=False` you will need to understand how to remove any quotes; for example, `cd "\$HOME"` is equivalent to `os.chdir(os.environ["HOME"])` without quotes around the directory name. (Very often `cd` is not useful or necessary anyway, and many beginners omit the double quotes around the variable and get away with it until one day ...)
• Redirection allows you to read from a file as your standard input, and write your standard output to a file. `grep "foo" <inputfile >outputfile` opens `outputfile` for writing and `inputfile` for reading, and passes its contents as standard input to `grep`, whose standard output then lands in `outputfile`. This is not generally hard to replace with native Python code.
• Pipelines are a form of redirection. `echo foo | nl` runs two subprocesses, where the standard output of `echo` is the standard input of `nl` (on the OS level, in Unix-like systems, this is a single file handle). If you cannot replace one or both ends of the pipeline with native Python code, perhaps think about using a shell after all, especially if the pipeline has more than two or three processes (though look at the `pipes` module in the Python standard library or a number of more modern and versatile third-party competitors).
• Job control lets you interrupt jobs, run them in the background, return them to the foreground, etc. The basic Unix signals to stop and continue a process are of course available from Python, too. But jobs are a higher-level abstraction in the shell which involve process groups etc which you have to understand if you want to do something like this from Python.
• Quoting in the shell is potentially confusing until you understand that everything is basically a string. So `ls -l /` is equivalent to `"ls" "-l" "/"` but the quoting around literals is completely optional. Unquoted strings which contain shell metacharacters undergo parameter expansion, whitespace tokenization and wildcard expansion; double quotes prevent whitespace tokenization and wildcard expansion but allow parameter expansions (variable substitution, command substitution, and backslash processing). This is simple in theory but can get bewildering, especially when there are several layers of interpretation (a remote shell command, for example).

# Understand differences between `sh` and Bash

`subprocess` runs your shell commands with `/bin/sh` unless you specifically request otherwise (except of course on Windows, where it uses the value of the `COMSPEC` variable). This means that various Bash-only features like arrays, `[[` etc are not available.

If you need to use Bash-only syntax, you can pass in the path to the shell as `executable="/bin/bash"` (where of course if your Bash is installed somewhere else, you need to adjust the path).

``````subprocess.run("""
# This for loop syntax is Bash only
for((i=1;i<=\$#;i++)); do
# Arrays are Bash-only
array[i]+=123
done""",
shell=True, check=True,
executable="/bin/bash")
``````

# A `subprocess` is separate from its parent, and cannot change it

A somewhat common mistake is doing something like

``````subprocess.run("cd /tmp", shell=True)
subprocess.run("pwd", shell=True)  # Oops, doesn"t print /tmp
``````

The same thing will happen if the first subprocess tries to set an environment variable, which of course will have disappeared when you run another subprocess, etc.

A child process runs completely separate from Python, and when it finishes, Python has no idea what it did (apart from the vague indicators that it can infer from the exit status and output from the child process). A child generally cannot change the parent"s environment; it cannot set a variable, change the working directory, or, in so many words, communicate with its parent without cooperation from the parent.

The immediate fix in this particular case is to run both commands in a single subprocess;

``````subprocess.run("cd /tmp; pwd", shell=True)
``````

though obviously this particular use case isn"t very useful; instead, use the `cwd` keyword argument, or simply `os.chdir()` before running the subprocess. Similarly, for setting a variable, you can manipulate the environment of the current process (and thus also its children) via

``````os.environ["foo"] = "bar"
``````

or pass an environment setting to a child process with

``````subprocess.run("echo "\$foo"", shell=True, env={"foo": "bar"})
``````

(not to mention the obvious refactoring `subprocess.run(["echo", "bar"])`; but `echo` is a poor example of something to run in a subprocess in the first place, of course).

# Don"t run Python from Python

This is slightly dubious advice; there are certainly situations where it does make sense or is even an absolute requirement to run the Python interpreter as a subprocess from a Python script. But very frequently, the correct approach is simply to `import` the other Python module into your calling script and call its functions directly.

If the other Python script is under your control, and it isn"t a module, consider turning it into one. (This answer is too long already so I will not delve into details here.)

If you need parallelism, you can run Python functions in subprocesses with the `multiprocessing` module. There is also `threading` which runs multiple tasks in a single process (which is more lightweight and gives you more control, but also more constrained in that threads within a process are tightly coupled, and bound to a single GIL.)

I recommend anytree (I am the author).

Example:

``````from anytree import Node, RenderTree

udo = Node("Udo")
marc = Node("Marc", parent=udo)
lian = Node("Lian", parent=marc)
dan = Node("Dan", parent=udo)
jet = Node("Jet", parent=dan)
jan = Node("Jan", parent=dan)
joe = Node("Joe", parent=dan)

print(udo)
Node("/Udo")
print(joe)
Node("/Udo/Dan/Joe")

for pre, fill, node in RenderTree(udo):
print("%s%s" % (pre, node.name))
Udo
‚îú‚îÄ‚îÄ Marc
‚îÇ   ‚îî‚îÄ‚îÄ Lian
‚îî‚îÄ‚îÄ Dan
‚îú‚îÄ‚îÄ Jet
‚îú‚îÄ‚îÄ Jan
‚îî‚îÄ‚îÄ Joe

print(dan.children)
(Node("/Udo/Dan/Jet"), Node("/Udo/Dan/Jan"), Node("/Udo/Dan/Joe"))
``````

anytree has also a powerful API with:

• simple tree creation
• simple tree modification
• pre-order tree iteration
• post-order tree iteration
• resolve relative and absolute node paths
• walking from one node to an other.
• tree rendering (see example above)
• node attach/detach hookups

Anaconda has not updated python internally to 3.6.

a) Method 1

1. If you wanted to update you will type `conda update python`
2. To update anaconda type `conda update anaconda`
3. If you want to upgrade between major python version like 3.5 to 3.6, you"ll have to do

``````conda install python=\$pythonversion\$
``````

b) Method 2 - Create a new environment (Better Method)

``````conda create --name py36 python=3.6
``````

c) To get the absolute latest python(3.6.5 at time of writing)

``````conda create --name py365 python=3.6.5 --channel conda-forge
``````

You can see all this from here

Also, refer to this for force upgrading

EDIT: Anaconda now has a Python 3.6 version here

Consider the following example python package where `a.py` and `b.py` depend on each other:

``````/package
__init__.py
a.py
b.py
``````

# Types of circular import problems

Circular import dependencies typically fall into two categories depending on what you"re trying to import and where you"re using it inside each module. (And whether you"re using python 2 or 3).

# 1. Errors importing modules with circular imports

In some cases, just importing a module with a circular import dependency can result in errors even if you"re not referencing anything from the imported module.

There are several standard ways to import a module in python

``````import package.a           # (1) Absolute import
import package.a as a_mod  # (2) Absolute import bound to different name
from package import a      # (3) Alternate absolute import
import a                   # (4) Implicit relative import (deprecated, python 2 only)
from . import a            # (5) Explicit relative import
``````

Unfortunately, only the 1st and 4th options actually work when you have circular dependencies (the rest all raise `ImportError` or `AttributeError`). In general, you shouldn"t be using the 4th syntax, since it only works in python2 and runs the risk of clashing with other 3rd party modules. So really, only the first syntax is guaranteed to work.

EDIT: The `ImportError` and `AttributeError` issues only occur in python 2. In python 3 the import machinery has been rewritten and all of these import statements (with the exception of 4) will work, even with circular dependencies. While the solutions in this section may help refactoring python 3 code, they are mainly intended for people using python 2.

## Absolute Import

Just use the first import syntax above. The downside to this method is that the import names can get super long for large packages.

In `a.py`

``````import package.b
``````

In `b.py`

``````import package.a
``````

## Defer import until later

I"ve seen this method used in lots of packages, but it still feels hacky to me, and I dislike that I can"t look at the top of a module and see all its dependencies, I have to go searching through all the functions as well.

In `a.py`

``````def func():
from package import b
``````

In `b.py`

``````def func():
from package import a
``````

## Put all imports in a central module

This also works, but has the same problem as the first method, where all the package and submodule calls get super long. It also has two major flaws -- it forces all the submodules to be imported, even if you"re only using one or two, and you still can"t look at any of the submodules and quickly see their dependencies at the top, you have to go sifting through functions.

In `__init__.py`

``````from . import a
from . import b
``````

In `a.py`

``````import package

def func():
package.b.some_object()
``````

In `b.py`

``````import package

def func():
package.a.some_object()
``````

# 2. Errors using imported objects with circular dependencies

Now, while you may be able to import a module with a circular import dependency, you won"t be able to import any objects defined in the module or actually be able to reference that imported module anywhere in the top level of the module where you"re importing it. You can, however, use the imported module inside functions and code blocks that don"t get run on import.

For example, this will work:

package/a.py

``````import package.b

def func_a():
return "a"
``````

package/b.py

``````import package.a

def func_b():
# Notice how package.a is only referenced *inside* a function
# and not the top level of the module.
return package.a.func_a() + "b"
``````

But this won"t work

package/a.py

``````import package.b

class A(object):
pass
``````

package/b.py

``````import package.a

# package.a is referenced at the top level of the module
class B(package.a.A):
pass
``````

You"ll get an exception

AttributeError: module "package" has no attribute "a"

Generally, in most valid cases of circular dependencies, it"s possible to refactor or reorganize the code to prevent these errors and move module references inside a code block.

# Tired of sys.path hacks?

There are plenty of `sys.path.append` -hacks available, but I found an alternative way of solving the problem in hand.

### Summary

• Wrap the code into one folder (e.g. `packaged_stuff`)
• Create `setup.py` script where you use setuptools.setup(). (see minimal `setup.py` below)
• Pip install the package in editable state with `pip install -e <myproject_folder>`
• Import using `from packaged_stuff.modulename import function_name`

# Setup

The starting point is the file structure you have provided, wrapped in a folder called `myproject`.

``````.
‚îî‚îÄ‚îÄ myproject
‚îú‚îÄ‚îÄ api
‚îÇ   ‚îú‚îÄ‚îÄ api_key.py
‚îÇ   ‚îú‚îÄ‚îÄ api.py
‚îÇ   ‚îî‚îÄ‚îÄ __init__.py
‚îú‚îÄ‚îÄ examples
‚îÇ   ‚îú‚îÄ‚îÄ example_one.py
‚îÇ   ‚îú‚îÄ‚îÄ example_two.py
‚îÇ   ‚îî‚îÄ‚îÄ __init__.py
‚îú‚îÄ‚îÄ LICENCE.md
‚îî‚îÄ‚îÄ tests
‚îú‚îÄ‚îÄ __init__.py
‚îî‚îÄ‚îÄ test_one.py
``````

I will call the `.` the root folder, and in my example case it is located at `C: mp est_imports`.

## api.py

As a test case, let"s use the following ./api/api.py

``````def function_from_api():
return "I am the return value from api.api!"
``````

## test_one.py

``````from api.api import function_from_api

def test_function():
print(function_from_api())

if __name__ == "__main__":
test_function()
``````

## Try to run test_one:

``````PS C:	mp	est_imports> python .myproject	ests	est_one.py
Traceback (most recent call last):
File ".myproject	ests	est_one.py", line 1, in <module>
from api.api import function_from_api
ModuleNotFoundError: No module named "api"
``````

## Also trying relative imports wont work:

Using `from ..api.api import function_from_api` would result into

``````PS C:	mp	est_imports> python .myproject	ests	est_one.py
Traceback (most recent call last):
File ".	ests	est_one.py", line 1, in <module>
from ..api.api import function_from_api
ValueError: attempted relative import beyond top-level package
``````

# Steps

1. Make a setup.py file to the root level directory

The contents for the `setup.py` would be*

``````from setuptools import setup, find_packages

setup(name="myproject", version="1.0", packages=find_packages())
``````
1. Use a virtual environment

If you are familiar with virtual environments, activate one, and skip to the next step. Usage of virtual environments are not absolutely required, but they will really help you out in the long run (when you have more than 1 project ongoing..). The most basic steps are (run in the root folder)

• Create virtual env
• `python -m venv venv`
• Activate virtual env
• `source ./venv/bin/activate` (Linux, macOS) or `./venv/Scripts/activate` (Win)

Once you have made and activated a virtual environment, your console should give the name of the virtual environment in parenthesis

``````PS C:	mp	est_imports> python -m venv venv
PS C:	mp	est_imports> .venvScriptsactivate
(venv) PS C:	mp	est_imports>
``````

and your folder tree should look like this**

``````.
‚îú‚îÄ‚îÄ myproject
‚îÇ   ‚îú‚îÄ‚îÄ api
‚îÇ   ‚îÇ   ‚îú‚îÄ‚îÄ api_key.py
‚îÇ   ‚îÇ   ‚îú‚îÄ‚îÄ api.py
‚îÇ   ‚îÇ   ‚îî‚îÄ‚îÄ __init__.py
‚îÇ   ‚îú‚îÄ‚îÄ examples
‚îÇ   ‚îÇ   ‚îú‚îÄ‚îÄ example_one.py
‚îÇ   ‚îÇ   ‚îú‚îÄ‚îÄ example_two.py
‚îÇ   ‚îÇ   ‚îî‚îÄ‚îÄ __init__.py
‚îÇ   ‚îú‚îÄ‚îÄ LICENCE.md
‚îÇ   ‚îî‚îÄ‚îÄ tests
‚îÇ       ‚îú‚îÄ‚îÄ __init__.py
‚îÇ       ‚îî‚îÄ‚îÄ test_one.py
‚îú‚îÄ‚îÄ setup.py
‚îî‚îÄ‚îÄ venv
‚îú‚îÄ‚îÄ Include
‚îú‚îÄ‚îÄ Lib
‚îú‚îÄ‚îÄ pyvenv.cfg
‚îî‚îÄ‚îÄ Scripts [87 entries exceeds filelimit, not opening dir]
``````
1. pip install your project in editable state

Install your top level package `myproject` using `pip`. The trick is to use the `-e` flag when doing the install. This way it is installed in an editable state, and all the edits made to the .py files will be automatically included in the installed package.

In the root directory, run

`pip install -e .` (note the dot, it stands for "current directory")

You can also see that it is installed by using `pip freeze`

``````(venv) PS C:	mp	est_imports> pip install -e .
Obtaining file:///C:/tmp/test_imports
Installing collected packages: myproject
Running setup.py develop for myproject
Successfully installed myproject
(venv) PS C:	mp	est_imports> pip freeze
myproject==1.0
``````
1. Add `myproject.` into your imports

Note that you will have to add `myproject.` only into imports that would not work otherwise. Imports that worked without the `setup.py` & `pip install` will work still work fine. See an example below.

# Test the solution

Now, let"s test the solution using `api.py` defined above, and `test_one.py` defined below.

## test_one.py

``````from myproject.api.api import function_from_api

def test_function():
print(function_from_api())

if __name__ == "__main__":
test_function()
``````

## running the test

``````(venv) PS C:	mp	est_imports> python .myproject	ests	est_one.py
I am the return value from api.api!
``````

* See the setuptools docs for more verbose setup.py examples.

** In reality, you could put your virtual environment anywhere on your hard disk.

There are several ways to post an image in Jupyter notebooks:

## via HTML:

``````from IPython.display import Image
from IPython.core.display import HTML
Image(url= "http://my_site.com/my_picture.jpg")
``````

You retain the ability to use HTML tags to resize, etc...

``````Image(url= "http://my_site.com/my_picture.jpg", width=100, height=100)
``````

You can also display images stored locally, either via relative or absolute path.

``````PATH = "/Users/reblochonMasque/Documents/Drawings/"
Image(filename = PATH + "My_picture.jpg", width=100, height=100)
``````

if the image it wider than the display settings: thanks

use `unconfined=True` to disable max-width confinement of the image

``````from IPython.core.display import Image, display
display(Image(url="https://i.ytimg.com/vi/j22DmsZEv30/maxresdefault.jpg", width=1900, unconfined=True))
``````

## or via markdown:

• make sure the cell is a markdown cell, and not a code cell, thanks @Ê∏∏ÂáØË∂Ö in the comments)
• Please note that on some systems, the markdown does not allow white space in the filenames. Thanks to @CoffeeTableEspresso and @zebralamy in the comments)
(On macos, as long as you are on a markdown cell you would do like this: `![title](../image 1.png)`, and not worry about the white space).

for a web image:

``````![Image of Yaktocat](https://octodex.github.com/images/yaktocat.png)
``````

as shown by @cristianmtr Paying attention not to use either these quotes `""` or those `""` around the url.

or a local one:

``````![title](img/picture.png)
``````

demonstrated by @Sebastian

## Finding the index of an item in a list

Given a list `["foo", "bar", "baz"]` and an item in the list `"bar"`, how do I get its index (`1`) in Python?

## Find current directory and file"s directory

In Python, what commands can I use to find:

1. the current directory (where I was in the terminal when I ran the Python script), and
2. where the file I am executing is?

## How to find if directory exists in Python

In the `os` module in Python, is there a way to find if a directory exists, something like:

``````>>> os.direxists(os.path.join(os.getcwd()), "new_folder")) # in pseudocode
True/False
``````

## How do I find the location of my Python site-packages directory?

### Question by Daryl Spitzer

How do I find the location of my site-packages directory?

## Find all files in a directory with extension .txt in Python

How can I find all the files in a directory having the extension `.txt` in python?

## Find which version of package is installed with pip

Using pip, is it possible to figure out which version of a package is currently installed?

I know about `pip install XYZ --upgrade` but I am wondering if there is anything like `pip info XYZ`. If not what would be the best way to tell what version I am currently using.

## error: Unable to find vcvarsall.bat

I tried to install the Python package dulwich:

``````pip install dulwich
``````

But I get a cryptic error message:

``````error: Unable to find vcvarsall.bat
``````

The same happens if I try installing the package manually:

``````> python setup.py install
running build_ext
building "dulwich._objects" extension
error: Unable to find vcvarsall.bat
``````

## How to use glob() to find files recursively?

This is what I have:

``````glob(os.path.join("src","*.c"))
``````

but I want to search the subfolders of src. Something like this would work:

``````glob(os.path.join("src","*.c"))
glob(os.path.join("src","*","*.c"))
glob(os.path.join("src","*","*","*.c"))
glob(os.path.join("src","*","*","*","*.c"))
``````

But this is obviously limited and clunky.

## Python: Find in list

I have come across this:

``````item = someSortOfSelection()
if item in myList:
doMySpecialFunction(item)
``````

but sometimes it does not work with all my items, as if they weren"t recognized in the list (when it"s a list of string).

Is this the most "pythonic" way of finding an item in a list: `if x in l:`?

## How to find out the number of CPUs using python

I want to know the number of CPUs on the local machine using Python. The result should be `user/real` as output by `time(1)` when called with an optimally scaling userspace-only program.

## How to iterate over rows in a DataFrame in Pandas?

Iteration in Pandas is an anti-pattern and is something you should only do when you have exhausted every other option. You should not use any function with "`iter`" in its name for more than a few thousand rows or you will have to get used to a lot of waiting.

Do you want to print a DataFrame? Use `DataFrame.to_string()`.

Do you want to compute something? In that case, search for methods in this order (list modified from here):

1. Vectorization
2. Cython routines
3. List Comprehensions (vanilla `for` loop)
4. `DataFrame.apply()`: i) ¬†Reductions that can be performed in Cython, ii) Iteration in Python space
5. `DataFrame.itertuples()` and `iteritems()`
6. `DataFrame.iterrows()`

`iterrows` and `itertuples` (both receiving many votes in answers to this question) should be used in very rare circumstances, such as generating row objects/nametuples for sequential processing, which is really the only thing these functions are useful for.

Appeal to Authority

The documentation page on iteration has a huge red warning box that says:

Iterating through pandas objects is generally slow. In many cases, iterating manually over the rows is not needed [...].

* It"s actually a little more complicated than "don"t". `df.iterrows()` is the correct answer to this question, but "vectorize your ops" is the better one. I will concede that there are circumstances where iteration cannot be avoided (for example, some operations where the result depends on the value computed for the previous row). However, it takes some familiarity with the library to know when. If you"re not sure whether you need an iterative solution, you probably don"t. PS: To know more about my rationale for writing this answer, skip to the very bottom.

## Faster than Looping: Vectorization, Cython

A good number of basic operations and computations are "vectorised" by pandas (either through NumPy, or through Cythonized functions). This includes arithmetic, comparisons, (most) reductions, reshaping (such as pivoting), joins, and groupby operations. Look through the documentation on Essential Basic Functionality to find a suitable vectorised method for your problem.

If none exists, feel free to write your own using custom Cython extensions.

## Next Best Thing: List Comprehensions*

List comprehensions should be your next port of call if 1) there is no vectorized solution available, 2) performance is important, but not important enough to go through the hassle of cythonizing your code, and 3) you"re trying to perform elementwise transformation on your code. There is a good amount of evidence to suggest that list comprehensions are sufficiently fast (and even sometimes faster) for many common Pandas tasks.

The formula is simple,

``````# Iterating over one column - `f` is some function that processes your data
result = [f(x) for x in df["col"]]
# Iterating over two columns, use `zip`
result = [f(x, y) for x, y in zip(df["col1"], df["col2"])]
# Iterating over multiple columns - same data type
result = [f(row[0], ..., row[n]) for row in df[["col1", ...,"coln"]].to_numpy()]
# Iterating over multiple columns - differing data type
result = [f(row[0], ..., row[n]) for row in zip(df["col1"], ..., df["coln"])]
``````

If you can encapsulate your business logic into a function, you can use a list comprehension that calls it. You can make arbitrarily complex things work through the simplicity and speed of raw Python code.

Caveats

List comprehensions assume that your data is easy to work with - what that means is your data types are consistent and you don"t have NaNs, but this cannot always be guaranteed.

1. The first one is more obvious, but when dealing with NaNs, prefer in-built pandas methods if they exist (because they have much better corner-case handling logic), or ensure your business logic includes appropriate NaN handling logic.
2. When dealing with mixed data types you should iterate over `zip(df["A"], df["B"], ...)` instead of `df[["A", "B"]].to_numpy()` as the latter implicitly upcasts data to the most common type. As an example if A is numeric and B is string, `to_numpy()` will cast the entire array to string, which may not be what you want. Fortunately `zip`ping your columns together is the most straightforward workaround to this.

*Your mileage may vary for the reasons outlined in the Caveats section above.

## An Obvious Example

Let"s demonstrate the difference with a simple example of adding two pandas columns `A + B`. This is a vectorizable operaton, so it will be easy to contrast the performance of the methods discussed above.

Benchmarking code, for your reference. The line at the bottom measures a function written in numpandas, a style of Pandas that mixes heavily with NumPy to squeeze out maximum performance. Writing numpandas code should be avoided unless you know what you"re doing. Stick to the API where you can (i.e., prefer `vec` over `vec_numpy`).

I should mention, however, that it isn"t always this cut and dry. Sometimes the answer to "what is the best method for an operation" is "it depends on your data". My advice is to test out different approaches on your data before settling on one.

* Pandas string methods are "vectorized" in the sense that they are specified on the series but operate on each element. The underlying mechanisms are still iterative, because string operations are inherently hard to vectorize.

## Why I Wrote this Answer

A common trend I notice from new users is to ask questions of the form "How can I iterate over my df to do X?". Showing code that calls `iterrows()` while doing something inside a `for` loop. Here is why. A new user to the library who has not been introduced to the concept of vectorization will likely envision the code that solves their problem as iterating over their data to do something. Not knowing how to iterate over a DataFrame, the first thing they do is Google it and end up here, at this question. They then see the accepted answer telling them how to, and they close their eyes and run this code without ever first questioning if iteration is not the right thing to do.

The aim of this answer is to help new users understand that iteration is not necessarily the solution to every problem, and that better, faster and more idiomatic solutions could exist, and that it is worth investing time in exploring them. I"m not trying to start a war of iteration vs. vectorization, but I want new users to be informed when developing solutions to their problems with this library.

# In Python, what is the purpose of `__slots__` and what are the cases one should avoid this?

## TLDR:

The special attribute `__slots__` allows you to explicitly state which instance attributes you expect your object instances to have, with the expected results:

1. faster attribute access.
2. space savings in memory.

The space savings is from

1. Storing value references in slots instead of `__dict__`.
2. Denying `__dict__` and `__weakref__` creation if parent classes deny them and you declare `__slots__`.

### Quick Caveats

Small caveat, you should only declare a particular slot one time in an inheritance tree. For example:

``````class Base:
__slots__ = "foo", "bar"

class Right(Base):
__slots__ = "baz",

class Wrong(Base):
__slots__ = "foo", "bar", "baz"        # redundant foo and bar
``````

Python doesn"t object when you get this wrong (it probably should), problems might not otherwise manifest, but your objects will take up more space than they otherwise should. Python 3.8:

``````>>> from sys import getsizeof
>>> getsizeof(Right()), getsizeof(Wrong())
(56, 72)
``````

This is because the Base"s slot descriptor has a slot separate from the Wrong"s. This shouldn"t usually come up, but it could:

``````>>> w = Wrong()
>>> w.foo = "foo"
>>> Base.foo.__get__(w)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
AttributeError: foo
>>> Wrong.foo.__get__(w)
"foo"
``````

The biggest caveat is for multiple inheritance - multiple "parent classes with nonempty slots" cannot be combined.

To accommodate this restriction, follow best practices: Factor out all but one or all parents" abstraction which their concrete class respectively and your new concrete class collectively will inherit from - giving the abstraction(s) empty slots (just like abstract base classes in the standard library).

See section on multiple inheritance below for an example.

### Requirements:

• To have attributes named in `__slots__` to actually be stored in slots instead of a `__dict__`, a class must inherit from `object` (automatic in Python 3, but must be explicit in Python 2).

• To prevent the creation of a `__dict__`, you must inherit from `object` and all classes in the inheritance must declare `__slots__` and none of them can have a `"__dict__"` entry.

There are a lot of details if you wish to keep reading.

## Why use `__slots__`: Faster attribute access.

The creator of Python, Guido van Rossum, states that he actually created `__slots__` for faster attribute access.

It is trivial to demonstrate measurably significant faster access:

``````import timeit

class Foo(object): __slots__ = "foo",

class Bar(object): pass

slotted = Foo()
not_slotted = Bar()

def get_set_delete_fn(obj):
def get_set_delete():
obj.foo = "foo"
obj.foo
del obj.foo
return get_set_delete
``````

and

``````>>> min(timeit.repeat(get_set_delete_fn(slotted)))
0.2846834529991611
>>> min(timeit.repeat(get_set_delete_fn(not_slotted)))
0.3664822799983085
``````

The slotted access is almost 30% faster in Python 3.5 on Ubuntu.

``````>>> 0.3664822799983085 / 0.2846834529991611
1.2873325658284342
``````

In Python 2 on Windows I have measured it about 15% faster.

## Why use `__slots__`: Memory Savings

Another purpose of `__slots__` is to reduce the space in memory that each object instance takes up.

The space saved over using `__dict__` can be significant.

SQLAlchemy attributes a lot of memory savings to `__slots__`.

To verify this, using the Anaconda distribution of Python 2.7 on Ubuntu Linux, with `guppy.hpy` (aka heapy) and `sys.getsizeof`, the size of a class instance without `__slots__` declared, and nothing else, is 64 bytes. That does not include the `__dict__`. Thank you Python for lazy evaluation again, the `__dict__` is apparently not called into existence until it is referenced, but classes without data are usually useless. When called into existence, the `__dict__` attribute is a minimum of 280 bytes additionally.

In contrast, a class instance with `__slots__` declared to be `()` (no data) is only 16 bytes, and 56 total bytes with one item in slots, 64 with two.

For 64 bit Python, I illustrate the memory consumption in bytes in Python 2.7 and 3.6, for `__slots__` and `__dict__` (no slots defined) for each point where the dict grows in 3.6 (except for 0, 1, and 2 attributes):

``````       Python 2.7             Python 3.6
attrs  __slots__  __dict__*   __slots__  __dict__* | *(no slots defined)
none   16         56 + 272‚Ä†   16         56 + 112‚Ä† | ‚Ä†if __dict__ referenced
one    48         56 + 272    48         56 + 112
two    56         56 + 272    56         56 + 112
six    88         56 + 1040   88         56 + 152
11     128        56 + 1040   128        56 + 240
22     216        56 + 3344   216        56 + 408
43     384        56 + 3344   384        56 + 752
``````

So, in spite of smaller dicts in Python 3, we see how nicely `__slots__` scale for instances to save us memory, and that is a major reason you would want to use `__slots__`.

Just for completeness of my notes, note that there is a one-time cost per slot in the class"s namespace of 64 bytes in Python 2, and 72 bytes in Python 3, because slots use data descriptors like properties, called "members".

``````>>> Foo.foo
<member "foo" of "Foo" objects>
>>> type(Foo.foo)
<class "member_descriptor">
>>> getsizeof(Foo.foo)
72
``````

## Demonstration of `__slots__`:

To deny the creation of a `__dict__`, you must subclass `object`. Everything subclasses `object` in Python 3, but in Python 2 you had to be explicit:

``````class Base(object):
__slots__ = ()
``````

now:

``````>>> b = Base()
>>> b.a = "a"
Traceback (most recent call last):
File "<pyshell#38>", line 1, in <module>
b.a = "a"
AttributeError: "Base" object has no attribute "a"
``````

Or subclass another class that defines `__slots__`

``````class Child(Base):
__slots__ = ("a",)
``````

and now:

``````c = Child()
c.a = "a"
``````

but:

``````>>> c.b = "b"
Traceback (most recent call last):
File "<pyshell#42>", line 1, in <module>
c.b = "b"
AttributeError: "Child" object has no attribute "b"
``````

To allow `__dict__` creation while subclassing slotted objects, just add `"__dict__"` to the `__slots__` (note that slots are ordered, and you shouldn"t repeat slots that are already in parent classes):

``````class SlottedWithDict(Child):
__slots__ = ("__dict__", "b")

swd = SlottedWithDict()
swd.a = "a"
swd.b = "b"
swd.c = "c"
``````

and

``````>>> swd.__dict__
{"c": "c"}
``````

Or you don"t even need to declare `__slots__` in your subclass, and you will still use slots from the parents, but not restrict the creation of a `__dict__`:

``````class NoSlots(Child): pass
ns = NoSlots()
ns.a = "a"
ns.b = "b"
``````

And:

``````>>> ns.__dict__
{"b": "b"}
``````

However, `__slots__` may cause problems for multiple inheritance:

``````class BaseA(object):
__slots__ = ("a",)

class BaseB(object):
__slots__ = ("b",)
``````

Because creating a child class from parents with both non-empty slots fails:

``````>>> class Child(BaseA, BaseB): __slots__ = ()
Traceback (most recent call last):
File "<pyshell#68>", line 1, in <module>
class Child(BaseA, BaseB): __slots__ = ()
TypeError: Error when calling the metaclass bases
multiple bases have instance lay-out conflict
``````

If you run into this problem, You could just remove `__slots__` from the parents, or if you have control of the parents, give them empty slots, or refactor to abstractions:

``````from abc import ABC

class AbstractA(ABC):
__slots__ = ()

class BaseA(AbstractA):
__slots__ = ("a",)

class AbstractB(ABC):
__slots__ = ()

class BaseB(AbstractB):
__slots__ = ("b",)

class Child(AbstractA, AbstractB):
__slots__ = ("a", "b")

c = Child() # no problem!
``````

### Add `"__dict__"` to `__slots__` to get dynamic assignment:

``````class Foo(object):
__slots__ = "bar", "baz", "__dict__"
``````

and now:

``````>>> foo = Foo()
>>> foo.boink = "boink"
``````

So with `"__dict__"` in slots we lose some of the size benefits with the upside of having dynamic assignment and still having slots for the names we do expect.

When you inherit from an object that isn"t slotted, you get the same sort of semantics when you use `__slots__` - names that are in `__slots__` point to slotted values, while any other values are put in the instance"s `__dict__`.

Avoiding `__slots__` because you want to be able to add attributes on the fly is actually not a good reason - just add `"__dict__"` to your `__slots__` if this is required.

You can similarly add `__weakref__` to `__slots__` explicitly if you need that feature.

### Set to empty tuple when subclassing a namedtuple:

The namedtuple builtin make immutable instances that are very lightweight (essentially, the size of tuples) but to get the benefits, you need to do it yourself if you subclass them:

``````from collections import namedtuple
class MyNT(namedtuple("MyNT", "bar baz")):
"""MyNT is an immutable and lightweight object"""
__slots__ = ()
``````

usage:

``````>>> nt = MyNT("bar", "baz")
>>> nt.bar
"bar"
>>> nt.baz
"baz"
``````

And trying to assign an unexpected attribute raises an `AttributeError` because we have prevented the creation of `__dict__`:

``````>>> nt.quux = "quux"
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
AttributeError: "MyNT" object has no attribute "quux"
``````

You can allow `__dict__` creation by leaving off `__slots__ = ()`, but you can"t use non-empty `__slots__` with subtypes of tuple.

## Biggest Caveat: Multiple inheritance

Even when non-empty slots are the same for multiple parents, they cannot be used together:

``````class Foo(object):
__slots__ = "foo", "bar"
class Bar(object):
__slots__ = "foo", "bar" # alas, would work if empty, i.e. ()

>>> class Baz(Foo, Bar): pass
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: Error when calling the metaclass bases
multiple bases have instance lay-out conflict
``````

Using an empty `__slots__` in the parent seems to provide the most flexibility, allowing the child to choose to prevent or allow (by adding `"__dict__"` to get dynamic assignment, see section above) the creation of a `__dict__`:

``````class Foo(object): __slots__ = ()
class Bar(object): __slots__ = ()
class Baz(Foo, Bar): __slots__ = ("foo", "bar")
b = Baz()
b.foo, b.bar = "foo", "bar"
``````

You don"t have to have slots - so if you add them, and remove them later, it shouldn"t cause any problems.

Going out on a limb here: If you"re composing mixins or using abstract base classes, which aren"t intended to be instantiated, an empty `__slots__` in those parents seems to be the best way to go in terms of flexibility for subclassers.

To demonstrate, first, let"s create a class with code we"d like to use under multiple inheritance

``````class AbstractBase:
__slots__ = ()
def __init__(self, a, b):
self.a = a
self.b = b
def __repr__(self):
return f"{type(self).__name__}({repr(self.a)}, {repr(self.b)})"
``````

We could use the above directly by inheriting and declaring the expected slots:

``````class Foo(AbstractBase):
__slots__ = "a", "b"
``````

But we don"t care about that, that"s trivial single inheritance, we need another class we might also inherit from, maybe with a noisy attribute:

``````class AbstractBaseC:
__slots__ = ()
@property
def c(self):
print("getting c!")
return self._c
@c.setter
def c(self, arg):
print("setting c!")
self._c = arg
``````

Now if both bases had nonempty slots, we couldn"t do the below. (In fact, if we wanted, we could have given `AbstractBase` nonempty slots a and b, and left them out of the below declaration - leaving them in would be wrong):

``````class Concretion(AbstractBase, AbstractBaseC):
__slots__ = "a b _c".split()
``````

And now we have functionality from both via multiple inheritance, and can still deny `__dict__` and `__weakref__` instantiation:

``````>>> c = Concretion("a", "b")
>>> c.c = c
setting c!
>>> c.c
getting c!
Concretion("a", "b")
>>> c.d = "d"
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
AttributeError: "Concretion" object has no attribute "d"
``````

## Other cases to avoid slots:

• Avoid them when you want to perform `__class__` assignment with another class that doesn"t have them (and you can"t add them) unless the slot layouts are identical. (I am very interested in learning who is doing this and why.)
• Avoid them if you want to subclass variable length builtins like long, tuple, or str, and you want to add attributes to them.
• Avoid them if you insist on providing default values via class attributes for instance variables.

You may be able to tease out further caveats from the rest of the `__slots__` documentation (the 3.7 dev docs are the most current), which I have made significant recent contributions to.

The current top answers cite outdated information and are quite hand-wavy and miss the mark in some important ways.

### Do not "only use `__slots__` when instantiating lots of objects"

I quote:

"You would want to use `__slots__` if you are going to instantiate a lot (hundreds, thousands) of objects of the same class."

Abstract Base Classes, for example, from the `collections` module, are not instantiated, yet `__slots__` are declared for them.

Why?

If a user wishes to deny `__dict__` or `__weakref__` creation, those things must not be available in the parent classes.

`__slots__` contributes to reusability when creating interfaces or mixins.

It is true that many Python users aren"t writing for reusability, but when you are, having the option to deny unnecessary space usage is valuable.

### `__slots__` doesn"t break pickling

When pickling a slotted object, you may find it complains with a misleading `TypeError`:

``````>>> pickle.loads(pickle.dumps(f))
TypeError: a class that defines __slots__ without defining __getstate__ cannot be pickled
``````

This is actually incorrect. This message comes from the oldest protocol, which is the default. You can select the latest protocol with the `-1` argument. In Python 2.7 this would be `2` (which was introduced in 2.3), and in 3.6 it is `4`.

``````>>> pickle.loads(pickle.dumps(f, -1))
<__main__.Foo object at 0x1129C770>
``````

in Python 2.7:

``````>>> pickle.loads(pickle.dumps(f, 2))
<__main__.Foo object at 0x1129C770>
``````

in Python 3.6

``````>>> pickle.loads(pickle.dumps(f, 4))
<__main__.Foo object at 0x1129C770>
``````

So I would keep this in mind, as it is a solved problem.

## Critique of the (until Oct 2, 2016) accepted answer

The first paragraph is half short explanation, half predictive. Here"s the only part that actually answers the question

The proper use of `__slots__` is to save space in objects. Instead of having a dynamic dict that allows adding attributes to objects at anytime, there is a static structure which does not allow additions after creation. This saves the overhead of one dict for every object that uses slots

The second half is wishful thinking, and off the mark:

While this is sometimes a useful optimization, it would be completely unnecessary if the Python interpreter was dynamic enough so that it would only require the dict when there actually were additions to the object.

Python actually does something similar to this, only creating the `__dict__` when it is accessed, but creating lots of objects with no data is fairly ridiculous.

The second paragraph oversimplifies and misses actual reasons to avoid `__slots__`. The below is not a real reason to avoid slots (for actual reasons, see the rest of my answer above.):

They change the behavior of the objects that have slots in a way that can be abused by control freaks and static typing weenies.

It then goes on to discuss other ways of accomplishing that perverse goal with Python, not discussing anything to do with `__slots__`.

The third paragraph is more wishful thinking. Together it is mostly off-the-mark content that the answerer didn"t even author and contributes to ammunition for critics of the site.

# Memory usage evidence

Create some normal objects and slotted objects:

``````>>> class Foo(object): pass
>>> class Bar(object): __slots__ = ()
``````

Instantiate a million of them:

``````>>> foos = [Foo() for f in xrange(1000000)]
>>> bars = [Bar() for b in xrange(1000000)]
``````

Inspect with `guppy.hpy().heap()`:

``````>>> guppy.hpy().heap()
Partition of a set of 2028259 objects. Total size = 99763360 bytes.
Index  Count   %     Size   % Cumulative  % Kind (class / dict of class)
0 1000000  49 64000000  64  64000000  64 __main__.Foo
1     169   0 16281480  16  80281480  80 list
2 1000000  49 16000000  16  96281480  97 __main__.Bar
3   12284   1   987472   1  97268952  97 str
...
``````

Access the regular objects and their `__dict__` and inspect again:

``````>>> for f in foos:
...     f.__dict__
>>> guppy.hpy().heap()
Partition of a set of 3028258 objects. Total size = 379763480 bytes.
Index  Count   %      Size    % Cumulative  % Kind (class / dict of class)
0 1000000  33 280000000  74 280000000  74 dict of __main__.Foo
1 1000000  33  64000000  17 344000000  91 __main__.Foo
2     169   0  16281480   4 360281480  95 list
3 1000000  33  16000000   4 376281480  99 __main__.Bar
4   12284   0    987472   0 377268952  99 str
...
``````

This is consistent with the history of Python, from Unifying types and classes in Python 2.2

If you subclass a built-in type, extra space is automatically added to the instances to accomodate `__dict__` and `__weakrefs__`. (The `__dict__` is not initialized until you use it though, so you shouldn"t worry about the space occupied by an empty dictionary for each instance you create.) If you don"t need this extra space, you can add the phrase "`__slots__ = []`" to your class.

# `os.listdir()` - list in the current directory

With listdir in os module you get the files and the folders in the current dir

`````` import os
arr = os.listdir()
print(arr)

>>> ["\$RECYCLE.BIN", "work.txt", "3ebooks.txt", "documents"]
``````

## Looking in a directory

``````arr = os.listdir("c:\files")
``````

# `glob` from glob

with glob you can specify a type of file to list like this

``````import glob

txtfiles = []
for file in glob.glob("*.txt"):
txtfiles.append(file)
``````

## `glob` in a list comprehension

``````mylist = [f for f in glob.glob("*.txt")]
``````

## get the full path of only files in the current directory

``````import os
from os import listdir
from os.path import isfile, join

cwd = os.getcwd()
onlyfiles = [os.path.join(cwd, f) for f in os.listdir(cwd) if
os.path.isfile(os.path.join(cwd, f))]
print(onlyfiles)

["G:\getfilesname\getfilesname.py", "G:\getfilesname\example.txt"]
``````

## Getting the full path name with `os.path.abspath`

You get the full path in return

`````` import os
files_path = [os.path.abspath(x) for x in os.listdir()]
print(files_path)

["F:\documentiapplications.txt", "F:\documenticollections.txt"]
``````

## Walk: going through sub directories

os.walk returns the root, the directories list and the files list, that is why I unpacked them in r, d, f in the for loop; it, then, looks for other files and directories in the subfolders of the root and so on until there are no subfolders.

``````import os

# Getting the current work directory (cwd)
thisdir = os.getcwd()

# r=root, d=directories, f = files
for r, d, f in os.walk(thisdir):
for file in f:
if file.endswith(".docx"):
print(os.path.join(r, file))
``````

### `os.listdir()`: get files in the current directory (Python 2)

In Python 2, if you want the list of the files in the current directory, you have to give the argument as "." or os.getcwd() in the os.listdir method.

`````` import os
arr = os.listdir(".")
print(arr)

>>> ["\$RECYCLE.BIN", "work.txt", "3ebooks.txt", "documents"]
``````

### To go up in the directory tree

``````# Method 1
x = os.listdir("..")

# Method 2
x= os.listdir("/")
``````

### Get files: `os.listdir()` in a particular directory (Python 2 and 3)

`````` import os
arr = os.listdir("F:\python")
print(arr)

>>> ["\$RECYCLE.BIN", "work.txt", "3ebooks.txt", "documents"]
``````

### Get files of a particular subdirectory with `os.listdir()`

``````import os

x = os.listdir("./content")
``````

### `os.walk(".")` - current directory

`````` import os
arr = next(os.walk("."))[2]
print(arr)

>>> ["5bs_Turismo1.pdf", "5bs_Turismo1.pptx", "esperienza.txt"]
``````

### `next(os.walk("."))` and `os.path.join("dir", "file")`

`````` import os
arr = []
for d,r,f in next(os.walk("F:\_python")):
for file in f:
arr.append(os.path.join(r,file))

for f in arr:
print(files)

>>> F:\_python\dict_class.py
>>> F:\_python\programmi.txt
``````

### `next(os.walk("F:\")` - get the full path - list comprehension

`````` [os.path.join(r,file) for r,d,f in next(os.walk("F:\_python")) for file in f]

>>> ["F:\_python\dict_class.py", "F:\_python\programmi.txt"]
``````

### `os.walk` - get full path - all files in sub dirs**

``````x = [os.path.join(r,file) for r,d,f in os.walk("F:\_python") for file in f]
print(x)

``````

### `os.listdir()` - get only txt files

`````` arr_txt = [x for x in os.listdir() if x.endswith(".txt")]
print(arr_txt)

>>> ["work.txt", "3ebooks.txt"]
``````

## Using `glob` to get the full path of the files

If I should need the absolute path of the files:

``````from path import path
from glob import glob
x = [path(f).abspath() for f in glob("F:\*.txt")]
for f in x:
print(f)

>>> F:acquistionline.txt
>>> F:acquisti_2018.txt
>>> F:ootstrap_jquery_ecc.txt
``````

## Using `os.path.isfile` to avoid directories in the list

``````import os.path
listOfFiles = [f for f in os.listdir() if os.path.isfile(f)]
print(listOfFiles)

>>> ["a simple game.py", "data.txt", "decorator.py"]
``````

## Using `pathlib` from Python 3.4

``````import pathlib

flist = []
for p in pathlib.Path(".").iterdir():
if p.is_file():
print(p)
flist.append(p)

>>> error.PNG
>>> exemaker.bat
>>> guiprova.mp3
>>> setup.py
>>> speak_gui2.py
>>> thumb.PNG
``````

With `list comprehension`:

``````flist = [p for p in pathlib.Path(".").iterdir() if p.is_file()]
``````

Alternatively, use `pathlib.Path()` instead of `pathlib.Path(".")`

## Use glob method in pathlib.Path()

``````import pathlib

py = pathlib.Path().glob("*.py")
for file in py:
print(file)

>>> stack_overflow_list.py
>>> stack_overflow_list_tkinter.py
``````

## Get all and only files with os.walk

``````import os
x = [i[2] for i in os.walk(".")]
y=[]
for t in x:
for f in t:
y.append(f)
print(y)

>>> ["append_to_list.py", "data.txt", "data1.txt", "data2.txt", "data_180617", "os_walk.py", "READ2.py", "read_data.py", "somma_defaltdic.py", "substitute_words.py", "sum_data.py", "data.txt", "data1.txt", "data_180617"]
``````

## Get only files with next and walk in a directory

`````` import os
x = next(os.walk("F://python"))[2]
print(x)

>>> ["calculator.bat","calculator.py"]
``````

## Get only directories with next and walk in a directory

`````` import os
next(os.walk("F://python"))[1] # for the current dir use (".")

>>> ["python3","others"]
``````

## Get all the subdir names with `walk`

``````for r,d,f in os.walk("F:\_python"):
for dirs in d:
print(dirs)

>>> .vscode
>>> pyexcel
>>> pyschool.py
>>> subtitles
>>> _metaprogramming
>>> .ipynb_checkpoints
``````

## `os.scandir()` from Python 3.5 and greater

``````import os
x = [f.name for f in os.scandir() if f.is_file()]
print(x)

>>> ["calculator.bat","calculator.py"]

# Another example with scandir (a little variation from docs.python.org)
# This one is more efficient than os.listdir.
# In this case, it shows the files only in the current directory
# where the script is executed.

import os
with os.scandir() as i:
for entry in i:
if entry.is_file():
print(entry.name)

>>> ebookmaker.py
>>> error.PNG
>>> exemaker.bat
>>> guiprova.mp3
>>> setup.py
>>> speakgui4.py
>>> speak_gui2.py
>>> speak_gui3.py
>>> thumb.PNG
``````

# Examples:

## Ex. 1: How many files are there in the subdirectories?

In this example, we look for the number of files that are included in all the directory and its subdirectories.

``````import os

def count(dir, counter=0):
"returns number of files in dir and subdirs"
for pack in os.walk(dir):
for f in pack[2]:
counter += 1
return dir + " : " + str(counter) + "files"

print(count("F:\python"))

>>> "F:\python" : 12057 files"
``````

## Ex.2: How to copy all files from a directory to another?

A script to make order in your computer finding all files of a type (default: pptx) and copying them in a new folder.

``````import os
import shutil
from path import path

destination = "F:\file_copied"
# os.makedirs(destination)

def copyfile(dir, filetype="pptx", counter=0):
"Searches for pptx (or other - pptx is the default) files and copies them"
for pack in os.walk(dir):
for f in pack[2]:
if f.endswith(filetype):
fullpath = pack[0] + "\" + f
print(fullpath)
shutil.copy(fullpath, destination)
counter += 1
if counter > 0:
print("-" * 30)
print("	==> Found in: `" + dir + "` : " + str(counter) + " files
")

for dir in os.listdir():
"searches for folders that starts with `_`"
if dir[0] == "_":
# copyfile(dir, filetype="pdf")
copyfile(dir, filetype="txt")

>>> _compiti18Compito Contabilit√† 1conti.txt
>>> _compiti18Compito Contabilit√† 1modula4.txt
>>> _compiti18Compito Contabilit√† 1moduloa4.txt
>>> ------------------------
>>> ==> Found in: `_compiti18` : 3 files
``````

## Ex. 3: How to get all the files in a txt file

In case you want to create a txt file with all the file names:

``````import os
mylist = ""
with open("filelist.txt", "w", encoding="utf-8") as file:
for eachfile in os.listdir():
mylist += eachfile + "
"
file.write(mylist)
``````

## Example: txt with all the files of an hard drive

``````"""
We are going to save a txt file with all the files in your directory.
We will use the function walk()
"""

import os

# see all the methods of os
# print(*dir(os), sep=", ")
listafile = []
percorso = []
with open("lista_file.txt", "w", encoding="utf-8") as testo:
for root, dirs, files in os.walk("D:\"):
for file in files:
listafile.append(file)
percorso.append(root + "\" + file)
testo.write(file + "
")
listafile.sort()
print("N. of files", len(listafile))
with open("lista_file_ordinata.txt", "w", encoding="utf-8") as testo_ordinato:
for file in listafile:
testo_ordinato.write(file + "
")

with open("percorso.txt", "w", encoding="utf-8") as file_percorso:
for file in percorso:
file_percorso.write(file + "
")

os.system("lista_file.txt")
os.system("lista_file_ordinata.txt")
os.system("percorso.txt")
``````

## All the file of C: in one text file

This is a shorter version of the previous code. Change the folder where to start finding the files if you need to start from another position. This code generate a 50 mb on text file on my computer with something less then 500.000 lines with files with the complete path.

``````import os

with open("file.txt", "w", encoding="utf-8") as filewrite:
for r, d, f in os.walk("C:\"):
for file in f:
filewrite.write(f"{r + file}
")
``````

## How to write a file with all paths in a folder of a type

With this function you can create a txt file that will have the name of a type of file that you look for (ex. pngfile.txt) with all the full path of all the files of that type. It can be useful sometimes, I think.

``````import os

def searchfiles(extension=".ttf", folder="H:\"):
"Create a txt file with all the file of a type"
with open(extension[1:] + "file.txt", "w", encoding="utf-8") as filewrite:
for r, d, f in os.walk(folder):
for file in f:
if file.endswith(extension):
filewrite.write(f"{r + file}
")

# looking for png file (fonts) in the hard disk H:
searchfiles(".png", "H:\")

>>> H:4bs_18Dolphins5.png
>>> H:4bs_18Dolphins6.png
>>> H:4bs_18Dolphins7.png
>>> H:5_18marketing htmlassetsimageslogo2.png
>>> H:7z001.png
>>> H:7z002.png
``````

## (New) Find all files and open them with tkinter GUI

I just wanted to add in this 2019 a little app to search for all files in a dir and be able to open them by doubleclicking on the name of the file in the list.

``````import tkinter as tk
import os

def searchfiles(extension=".txt", folder="H:\"):
"insert all files in the listbox"
for r, d, f in os.walk(folder):
for file in f:
if file.endswith(extension):
lb.insert(0, r + "\" + file)

def open_file():
os.startfile(lb.get(lb.curselection()[0]))

root = tk.Tk()
root.geometry("400x400")
bt = tk.Button(root, text="Search", command=lambda:searchfiles(".png", "H:\"))
bt.pack()
lb = tk.Listbox(root)
lb.pack(fill="both", expand=1)
lb.bind("<Double-Button>", lambda x: open_file())
root.mainloop()
``````

I just used the following which was quite simple. First open a console then cd to where you"ve downloaded your file like some-package.whl and use

``````pip install some-package.whl
``````

Note: if pip.exe is not recognized, you may find it in the "Scripts" directory from where python has been installed. If pip is not installed, this page can help: How do I install pip on Windows?

Note: for clarification
If you copy the `*.whl` file to your local drive (ex. C:some-dirsome-file.whl) use the following command line parameters --

``````pip install C:/some-dir/some-file.whl
``````

The simplest way to get row counts per group is by calling `.size()`, which returns a `Series`:

``````df.groupby(["col1","col2"]).size()
``````

Usually you want this result as a `DataFrame` (instead of a `Series`) so you can do:

``````df.groupby(["col1", "col2"]).size().reset_index(name="counts")
``````

If you want to find out how to calculate the row counts and other statistics for each group continue reading below.

## Detailed example:

Consider the following example dataframe:

``````In [2]: df
Out[2]:
col1 col2  col3  col4  col5  col6
0    A    B  0.20 -0.61 -0.49  1.49
1    A    B -1.53 -1.01 -0.39  1.82
2    A    B -0.44  0.27  0.72  0.11
3    A    B  0.28 -1.32  0.38  0.18
4    C    D  0.12  0.59  0.81  0.66
5    C    D -0.13 -1.65 -1.64  0.50
6    C    D -1.42 -0.11 -0.18 -0.44
7    E    F -0.00  1.42 -0.26  1.17
8    E    F  0.91 -0.47  1.35 -0.34
9    G    H  1.48 -0.63 -1.14  0.17
``````

First let"s use `.size()` to get the row counts:

``````In [3]: df.groupby(["col1", "col2"]).size()
Out[3]:
col1  col2
A     B       4
C     D       3
E     F       2
G     H       1
dtype: int64
``````

Then let"s use `.size().reset_index(name="counts")` to get the row counts:

``````In [4]: df.groupby(["col1", "col2"]).size().reset_index(name="counts")
Out[4]:
col1 col2  counts
0    A    B       4
1    C    D       3
2    E    F       2
3    G    H       1
``````

### Including results for more statistics

When you want to calculate statistics on grouped data, it usually looks like this:

``````In [5]: (df
...: .groupby(["col1", "col2"])
...: .agg({
...:     "col3": ["mean", "count"],
...:     "col4": ["median", "min", "count"]
...: }))
Out[5]:
col4                  col3
median   min count      mean count
col1 col2
A    B    -0.810 -1.32     4 -0.372500     4
C    D    -0.110 -1.65     3 -0.476667     3
E    F     0.475 -0.47     2  0.455000     2
G    H    -0.630 -0.63     1  1.480000     1
``````

The result above is a little annoying to deal with because of the nested column labels, and also because row counts are on a per column basis.

To gain more control over the output I usually split the statistics into individual aggregations that I then combine using `join`. It looks like this:

``````In [6]: gb = df.groupby(["col1", "col2"])
...: counts = gb.size().to_frame(name="counts")
...: (counts
...:  .join(gb.agg({"col3": "mean"}).rename(columns={"col3": "col3_mean"}))
...:  .join(gb.agg({"col4": "median"}).rename(columns={"col4": "col4_median"}))
...:  .join(gb.agg({"col4": "min"}).rename(columns={"col4": "col4_min"}))
...:  .reset_index()
...: )
...:
Out[6]:
col1 col2  counts  col3_mean  col4_median  col4_min
0    A    B       4  -0.372500       -0.810     -1.32
1    C    D       3  -0.476667       -0.110     -1.65
2    E    F       2   0.455000        0.475     -0.47
3    G    H       1   1.480000       -0.630     -0.63
``````

### Footnotes

The code used to generate the test data is shown below:

``````In [1]: import numpy as np
...: import pandas as pd
...:
...: keys = np.array([
...:         ["A", "B"],
...:         ["A", "B"],
...:         ["A", "B"],
...:         ["A", "B"],
...:         ["C", "D"],
...:         ["C", "D"],
...:         ["C", "D"],
...:         ["E", "F"],
...:         ["E", "F"],
...:         ["G", "H"]
...:         ])
...:
...: df = pd.DataFrame(
...:     np.hstack([keys,np.random.randn(10,4).round(2)]),
...:     columns = ["col1", "col2", "col3", "col4", "col5", "col6"]
...: )
...:
...: df[["col3", "col4", "col5", "col6"]] =
...:     df[["col3", "col4", "col5", "col6"]].astype(float)
...:
``````

Disclaimer:

If some of the columns that you are aggregating have null values, then you really want to be looking at the group row counts as an independent aggregation for each column. Otherwise you may be misled as to how many records are actually being used to calculate things like the mean because pandas will drop `NaN` entries in the mean calculation without telling you about it.

# Using a for loop, how do I access the loop index, from 1 to 5 in this case?

Use `enumerate` to get the index with the element as you iterate:

``````for index, item in enumerate(items):
print(index, item)
``````

And note that Python"s indexes start at zero, so you would get 0 to 4 with the above. If you want the count, 1 to 5, do this:

``````count = 0 # in case items is empty and you need it after the loop
for count, item in enumerate(items, start=1):
print(count, item)
``````

# Unidiomatic control flow

What you are asking for is the Pythonic equivalent of the following, which is the algorithm most programmers of lower-level languages would use:

``````index = 0            # Python"s indexing starts at zero
for item in items:   # Python"s for loops are a "for each" loop
print(index, item)
index += 1
``````

Or in languages that do not have a for-each loop:

``````index = 0
while index < len(items):
print(index, items[index])
index += 1
``````

or sometimes more commonly (but unidiomatically) found in Python:

``````for index in range(len(items)):
print(index, items[index])
``````

# Use the Enumerate Function

Python"s `enumerate` function reduces the visual clutter by hiding the accounting for the indexes, and encapsulating the iterable into another iterable (an `enumerate` object) that yields a two-item tuple of the index and the item that the original iterable would provide. That looks like this:

``````for index, item in enumerate(items, start=0):   # default is zero
print(index, item)
``````

This code sample is fairly well the canonical example of the difference between code that is idiomatic of Python and code that is not. Idiomatic code is sophisticated (but not complicated) Python, written in the way that it was intended to be used. Idiomatic code is expected by the designers of the language, which means that usually this code is not just more readable, but also more efficient.

## Getting a count

Even if you don"t need indexes as you go, but you need a count of the iterations (sometimes desirable) you can start with `1` and the final number will be your count.

``````count = 0 # in case items is empty
for count, item in enumerate(items, start=1):   # default is zero
print(item)

print("there were {0} items printed".format(count))
``````

The count seems to be more what you intend to ask for (as opposed to index) when you said you wanted from 1 to 5.

## Breaking it down - a step by step explanation

To break these examples down, say we have a list of items that we want to iterate over with an index:

``````items = ["a", "b", "c", "d", "e"]
``````

Now we pass this iterable to enumerate, creating an enumerate object:

``````enumerate_object = enumerate(items) # the enumerate object
``````

We can pull the first item out of this iterable that we would get in a loop with the `next` function:

``````iteration = next(enumerate_object) # first iteration from enumerate
print(iteration)
``````

And we see we get a tuple of `0`, the first index, and `"a"`, the first item:

``````(0, "a")
``````

we can use what is referred to as "sequence unpacking" to extract the elements from this two-tuple:

``````index, item = iteration
#   0,  "a" = (0, "a") # essentially this.
``````

and when we inspect `index`, we find it refers to the first index, 0, and `item` refers to the first item, `"a"`.

``````>>> print(index)
0
>>> print(item)
a
``````

# Conclusion

• Python indexes start at zero
• To get these indexes from an iterable as you iterate over it, use the enumerate function
• Using enumerate in the idiomatic way (along with tuple unpacking) creates code that is more readable and maintainable:

So do this:

``````for index, item in enumerate(items, start=0):   # Python indexes start at zero
print(index, item)
``````

Getting some sort of modification date in a cross-platform way is easy - just call `os.path.getmtime(path)` and you"ll get the Unix timestamp of when the file at `path` was last modified.

Getting file creation dates, on the other hand, is fiddly and platform-dependent, differing even between the three big OSes:

Putting this all together, cross-platform code should look something like this...

``````import os
import platform

def creation_date(path_to_file):
"""
Try to get the date that a file was created, falling back to when it was
See http://stackoverflow.com/a/39501288/1709587 for explanation.
"""
if platform.system() == "Windows":
return os.path.getctime(path_to_file)
else:
stat = os.stat(path_to_file)
try:
return stat.st_birthtime
except AttributeError:
# We"re probably on Linux. No easy way to get creation dates here,
return stat.st_mtime
``````

I noticed that every now and then I need to Google fopen all over again, just to build a mental image of what the primary differences between the modes are. So, I thought a diagram will be faster to read next time. Maybe someone else will find that helpful too.

I would suggest using the duplicated method on the Pandas Index itself:

``````df3 = df3[~df3.index.duplicated(keep="first")]
``````

While all the other methods work, `.drop_duplicates` is by far the least performant for the provided example. Furthermore, while the groupby method is only slightly less performant, I find the duplicated method to be more readable.

Using the sample data provided:

``````>>> %timeit df3.reset_index().drop_duplicates(subset="index", keep="first").set_index("index")
1000 loops, best of 3: 1.54 ms per loop

>>> %timeit df3.groupby(df3.index).first()
1000 loops, best of 3: 580 ¬µs per loop

>>> %timeit df3[~df3.index.duplicated(keep="first")]
1000 loops, best of 3: 307 ¬µs per loop
``````

Note that you can keep the last element by changing the keep argument to `"last"`.

It should also be noted that this method works with `MultiIndex` as well (using df1 as specified in Paul"s example):

``````>>> %timeit df1.groupby(level=df1.index.names).last()
1000 loops, best of 3: 771 ¬µs per loop

>>> %timeit df1[~df1.index.duplicated(keep="last")]
1000 loops, best of 3: 365 ¬µs per loop
``````

Here"s a concise solution which avoids regular expressions and slow in-Python loops:

``````def principal_period(s):
i = (s+s).find(s, 1, -1)
return None if i == -1 else s[:i]
``````

See the Community Wiki answer started by @davidism for benchmark results. In summary,

David Zhang"s solution is the clear winner, outperforming all others by at least 5x for the large example set.

This is based on the observation that a string is periodic if and only if it is equal to a nontrivial rotation of itself. Kudos to @AleksiTorhamo for realizing that we can then recover the principal period from the index of the first occurrence of `s` in `(s+s)[1:-1]`, and for informing me of the optional `start` and `end` arguments of Python"s `string.find`.