# Line Detection in Python with OpenCV | Halfline method

NumPy | open | Python Methods and Functions

We will see how the Hough transform works for line detection using the Hough transform method. To apply the Houghline method, it is desirable to first detect the edge of a specific image. For edge detection technique go to Edge detection

Basics of the Houghline method

The line can be represented as y = mx + c or in parametric form, as r = xcosθ + ysinθ, where r — the perpendicular distance from the origin to the line, and θ — the angle formed by this perpendicular line and the horizontal axis, measured in the meter. — clockwise (this direction depends on how you represent the coordinate system. This representation is used in OpenCV).

Thus, any line can be represented in these two terms (r, θ).

The Halfline method works:

• It first creates a two-dimensional array or accumulator (to hold the values ​​of two parameters) and is initially set to zero.
• Let the rows denote r and the columns — (θ) theta.
• The size of the array depends on the required precision. Suppose you want 1 degree angle precision, you need 180 columns (the maximum degree for a straight line is 180).
• For r, the maximum possible distance is — this is the diagonal length of the image. Thus, with an accuracy of one pixel, the number of lines can be the diagonal length of the image.

Example:
Consider a 100 × 100 image with a horizontal line in the middle ... Take the first point of the line. You know its (x, y) values. Now in a linear equation put the values ​​θ (theta) = 0,1,2,…, 180 and check the r you get. For each pair (r, 0), the value is incremented by one in the accumulator in the corresponding cells (r, 0). So now in the battery cell (50.90) = 1 along with some other cells.
Now take the second point on the line. Do the same as above. Increase the values ​​in the cells corresponding to (r, 0) you got. This time, cell (50.90) = 2. We're actually voting for (r, 0) values. You continue this process for every point on the line. At each point, the cell (50.90) will be increased or scored, while other cells may or may not be voted on. Thus, at the end, cell (50.90) will receive the maximum number of votes. So if you search the accumulator for the maximum number of votes, you will get the value (50.90), which says there is a line in this image at a distance of 50 from the origin and at an angle of 90 degrees.

Everything explained above is encapsulated in an OpenCV function .HoughLines (). It just returns an array of (r, 0) values. r is measured in pixels, and 0 — in radians.

 ` # Python HoughLine illustration program ` ` # method for line detection ` ` import ` ` cv2 ` ` import ` ` numpy as np `   ` # Read the desired image in ` ` # what operations should be performed. ` ` # Make sure image in the same ` ` # directory where this python program is located ` ` img ` ` = ` ` cv2.imread (` ` 'image.jpg' ` `) `   ` # Convert img to grayscale ` ` gray ` ` = ` ` cv2.cvtColor (img, cv2.COLOR_BGR2GRAY) `   ` # Apply method for determining the edge on the image ` ` edges ` ` = ` ` cv2.Canny (gray, ` ` 50 ` `, ` ` 150 ` `, apertureSize ` ` = ` ` 3 ` `) `   ` # This returns an array of r and theta values ​​` ` lines ` ` = ` ` cv2.HoughLines (edges, ` ` 1 ` `, np.pi ` ` / ` 180 `, ` ` 200 ` `) `   ` # Below for loop works up to t and theta values ​​` ` # are in the 2d array ` ` for ` ` r, theta ` ` in ` ` lines [` ` 0 ` `]: ` ` `  ` ` ` # Stores the cos (theta) value in ` ` a ` ` = ` ` np.cos (theta) `   ` # Stores the value of sin (theta) in b ` ` ` ` b ` ` = ` ` np.sin (theta) `   ` # x0 stores the rcos (theta) value ` ` x0 ` ` = ` ` a ` ` * ` ` r `   ` # y0 stores the rsin value (theta) ` ` y0 ` ` = ` ` b ` ` * ` ` r `   ` # x1 stores the rounded value (rcos (theta) -1000sin (t heta)) ` ` x1 ` ` = ` ` int ` ` (x0 ` ` + ` ` 1000 ` ` * ` ` (` ` - ` ` b)) ` ` `  ` ` ` # y1 stores the rounded value (rsin (theta) + 1000cos (theta)) ` ` y1 ` ` = ` ` int ` ` (y0 ` ` + ` ` 1000 ` ` * ` ` (a )) `   ` # x2 stores the rounded value (rc os (theta) + 1000sin (theta)) ` ` x2 ` ` = ` ` int ` ` (x0 ` ` - ` ` 1000 ` ` * ` ` (` ` - ` ` b)) `   ` # y2 stores the rounded value (rsin (theta) -1000cos (theta)) ` ` ` ` y2 ` ` = ` ` int ` ` (y0 ` ` - ` ` 1000 ` ` * ` ` (a)) `   ` # cv2.line pic takes a line in img from point (x1, y1) to (x2, y2). ` ` # (0,0,255) denotes the line color that will be ` ` # drawn. In this case, it's red. ` ` cv2.line (img, (x1, y1), (x2, y2), (` ` 0 ` `, ` ` 0 ` `, ` ` 255 ` `), ` ` 2 ` `) `   ` # All changes made to input image finally ` ` # written on new image houghlines.jpg ` ` cv2.imwrite (` `' linesDetected.jpg' ` `, img) `

Function development (cv2.HoughLines (edge, 1, np.pi / 180, 200)) :

1. The first parameter, "Input Image", must be a binary image, so apply the threshold boundary definition before finding applying a rough transformation.
2. The second and third parameters — they are precision r and θ (theta) respectively.
3. The fourth argument — it is a threshold that means the minimum number of votes he must receive to be considered a line.
4. Remember that the number of votes depends on the number of dots on the line. Thus, it represents the minimum line length to be detected.

Summing up the process

• In the context of image analysis, the coordinates of the point (s) of the edge segments (i.e., X, Y) in the image are known and therefore serve as constants in the parametric line equation, while R (rho) and Theta ( θ) are the unknown variables we are looking for.
• If we plot the possible values ​​(r) defined by each (theta), points in Cartesian space are mapped to curves (i.e. sinusoids) in the space of Hough polar parameters. This point-to-curve transform is the Hough transform for straight lines.
• The transform is implemented by quantizing the Hough parameter space into finite bins or battery cells. When the algorithm runs, each (X, Y) is converted to a sampled (r, 0) curve, and the accumulator cells (2D matrices) that lie along this curve are enlarged.
• The resulting peaks in the accumulator array are compelling proof that the corresponding straight line exists in the image.

Applying the Hough transform:

1. It is used to highlight elements of a certain shape in the image.
2. Allows gaps in the description of object boundaries and is relatively independent of image noise.
3. Widely used in scanning, validation and barcode recognition

This article is courtesy of Pratima Upadhyay . If you are as Python.Engineering and would like to contribute, you can also write an article using contribute.python.engineering or by posting an article contribute @ python.engineering. See my article appearing on the Python.Engineering homepage and help other geeks.

## How can I open multiple files using "with open" in Python?

I want to change a couple of files at one time, iff I can write to all of them. I"m wondering if I somehow can combine the multiple open calls with the `with` statement:

``````try:
with open("a", "w") as a and open("b", "w") as b:
do_something()
except IOError as e:
print "Operation failed: %s" % e.strerror
``````

If that"s not possible, what would an elegant solution to this problem look like?

## open() in Python does not create a file if it doesn"t exist

What is the best way to open a file as read/write if it exists, or if it does not, then create it and open it as read/write? From what I read, `file = open("myfile.dat", "rw")` should do this, right?

It is not working for me (Python 2.6.2) and I"m wondering if it is a version problem, or not supposed to work like that or what.

The bottom line is, I just need a solution for the problem. I am curious about the other stuff, but all I need is a nice way to do the opening part.

The enclosing directory was writeable by user and group, not other (I"m on a Linux system... so permissions 775 in other words), and the exact error was:

IOError: no such file or directory.

## Difference between modes a, a+, w, w+, and r+ in built-in open function?

In the python built-in open function, what is the exact difference between the modes `w`, `a`, `w+`, `a+`, and `r+`?

In particular, the documentation implies that all of these will allow writing to the file, and says that it opens the files for "appending", "writing", and "updating" specifically, but does not define what these terms mean.

## Simple Digit Recognition OCR in OpenCV-Python

I am trying to implement a "Digit Recognition OCR" in OpenCV-Python (cv2). It is just for learning purposes. I would like to learn both KNearest and SVM features in OpenCV.

I have 100 samples (i.e. images) of each digit. I would like to train with them.

There is a sample `letter_recog.py` that comes with OpenCV sample. But I still couldn"t figure out on how to use it. I don"t understand what are the samples, responses etc. Also, it loads a txt file at first, which I didn"t understand first.

Later on searching a little bit, I could find a letter_recognition.data in cpp samples. I used it and made a code for cv2.KNearest in the model of letter_recog.py (just for testing):

``````import numpy as np
import cv2

fn = "letter-recognition.data"
a = np.loadtxt(fn, np.float32, delimiter=",", converters={ 0 : lambda ch : ord(ch)-ord("A") })
samples, responses = a[:,1:], a[:,0]

model = cv2.KNearest()
retval = model.train(samples,responses)
retval, results, neigh_resp, dists = model.find_nearest(samples, k = 10)
print results.ravel()
``````

It gave me an array of size 20000, I don"t understand what it is.

Questions:

1) What is letter_recognition.data file? How to build that file from my own data set?

2) What does `results.reval()` denote?

3) How we can write a simple digit recognition tool using letter_recognition.data file (either KNearest or SVM)?

## Does reading an entire file leave the file handle open?

If you read an entire file with `content = open("Path/to/file", "r").read()` is the file handle left open until the script exits? Is there a more concise method to read a whole file?

## Store output of subprocess.Popen call in a string

I"m trying to make a system call in Python and store the output to a string that I can manipulate in the Python program.

``````#!/usr/bin/python
import subprocess
p2 = subprocess.Popen("ntpq -p")
``````

I"ve tried a few things including some of the suggestions here:

Retrieving the output of subprocess.call()

but without any luck.

## "Unicode Error "unicodeescape" codec can"t decode bytes... Cannot open text files in Python 3

I am using Python 3.1 on a Windows 7 machine. Russian is the default system language, and utf-8 is the default encoding.

Looking at the answer to a previous question, I have attempting using the "codecs" module to give me a little luck. Here"s a few examples:

``````>>> g = codecs.open("C:UsersEricDesktopeeline.txt", "r", encoding="utf-8")
SyntaxError: (unicode error) "unicodeescape" codec can"t decode bytes in position 2-4: truncated UXXXXXXXX escape (<pyshell#39>, line 1)
``````
``````>>> g = codecs.open("C:UsersEricDesktopSite.txt", "r", encoding="utf-8")
SyntaxError: (unicode error) "unicodeescape" codec can"t decode bytes in position 2-4: truncated UXXXXXXXX escape (<pyshell#40>, line 1)
``````
``````>>> g = codecs.open("C:Python31Notes.txt", "r", encoding="utf-8")
SyntaxError: (unicode error) "unicodeescape" codec can"t decode bytes in position 11-12: malformed N character escape (<pyshell#41>, line 1)
``````
``````>>> g = codecs.open("C:UsersEricDesktopSite.txt", "r", encoding="utf-8")
SyntaxError: (unicode error) "unicodeescape" codec can"t decode bytes in position 2-4: truncated UXXXXXXXX escape (<pyshell#44>, line 1)
``````

My last idea was, I thought it might have been the fact that Windows "translates" a few folders, such as the "users" folder, into Russian (though typing "users" is still the correct path), so I tried it in the Python31 folder. Still, no luck. Any ideas?

## Python subprocess/Popen with a modified environment

I believe that running an external command with a slightly modified environment is a very common case. That"s how I tend to do it:

``````import subprocess, os
my_env = os.environ
my_env["PATH"] = "/usr/sbin:/sbin:" + my_env["PATH"]
subprocess.Popen(my_command, env=my_env)
``````

I"ve got a gut feeling that there"s a better way; does it look alright?

## Cannot find module cv2 when using OpenCV

I have installed OpenCV on the Occidentalis operating system (a variant of Raspbian) on a Raspberry Pi, using jayrambhia"s script found here. It installed version 2.4.5.

When I try `import cv2` in a Python program, I get the following message:

``````[email protected]~\$ python cam.py
Traceback (most recent call last)
File "cam.py", line 1, in <module>
import cv2
ImportError: No module named cv2
``````

The file `cv2.so` is stored in `/usr/local/lib/python2.7/site-packages/...`

There are also folders in `/usr/local/lib` called python3.2 and python2.6, which could be a problem but I"m not sure.

Is this a path error perhaps? Any help is appreciated, I am new to Linux.

## How to crop an image in OpenCV using Python

How can I crop images, like I"ve done before in PIL, using OpenCV.

Working example on PIL

``````im = Image.open("0.png").convert("L")
im = im.crop((1, 1, 98, 33))
im.save("_0.png")
``````

But how I can do it on OpenCV?

This is what I tried:

``````im = cv.imread("0.png", cv.CV_LOAD_IMAGE_GRAYSCALE)
(thresh, im_bw) = cv.threshold(im, 128, 255, cv.THRESH_OTSU)
im = cv.getRectSubPix(im_bw, (98, 33), (1, 1))
cv.imshow("Img", im)
cv.waitKey(0)
``````

But it doesn"t work.

I think I incorrectly used `getRectSubPix`. If this is the case, please explain how I can correctly use this function.

Since this question was asked in 2010, there has been real simplification in how to do simple multithreading with Python with map and pool.

The code below comes from an article/blog post that you should definitely check out (no affiliation) - Parallelism in one line: A Better Model for Day to Day Threading Tasks. I"ll summarize below - it ends up being just a few lines of code:

``````from multiprocessing.dummy import Pool as ThreadPool
results = pool.map(my_function, my_array)
``````

Which is the multithreaded version of:

``````results = []
for item in my_array:
results.append(my_function(item))
``````

Description

Map is a cool little function, and the key to easily injecting parallelism into your Python code. For those unfamiliar, map is something lifted from functional languages like Lisp. It is a function which maps another function over a sequence.

Map handles the iteration over the sequence for us, applies the function, and stores all of the results in a handy list at the end.

Implementation

Parallel versions of the map function are provided by two libraries:multiprocessing, and also its little known, but equally fantastic step child:multiprocessing.dummy.

`multiprocessing.dummy` is exactly the same as multiprocessing module, but uses threads instead (an important distinction - use multiple processes for CPU-intensive tasks; threads for (and during) I/O):

multiprocessing.dummy replicates the API of multiprocessing, but is no more than a wrapper around the threading module.

``````import urllib2
from multiprocessing.dummy import Pool as ThreadPool

urls = [
"http://www.python.org",
"http://www.onlamp.com/pub/a/python/2003/04/17/metaclasses.html",
"http://www.python.org/doc/",
"http://www.python.org/getit/",
"http://www.python.org/community/",
"https://wiki.python.org/moin/",
]

# Make the Pool of workers

# Open the URLs in their own threads
# and return the results
results = pool.map(urllib2.urlopen, urls)

# Close the pool and wait for the work to finish
pool.close()
pool.join()
``````

And the timing results:

``````Single thread:   14.4 seconds
4 Pool:   3.1 seconds
8 Pool:   1.4 seconds
13 Pool:   1.3 seconds
``````

Passing multiple arguments (works like this only in Python 3.3 and later):

To pass multiple arrays:

``````results = pool.starmap(function, zip(list_a, list_b))
``````

Or to pass a constant and an array:

``````results = pool.starmap(function, zip(itertools.repeat(constant), list_a))
``````

If you are using an earlier version of Python, you can pass multiple arguments via this workaround).

(Thanks to user136036 for the helpful comment.)

# `os.listdir()` - list in the current directory

With listdir in os module you get the files and the folders in the current dir

`````` import os
arr = os.listdir()
print(arr)

>>> ["\$RECYCLE.BIN", "work.txt", "3ebooks.txt", "documents"]
``````

## Looking in a directory

``````arr = os.listdir("c:\files")
``````

# `glob` from glob

with glob you can specify a type of file to list like this

``````import glob

txtfiles = []
for file in glob.glob("*.txt"):
txtfiles.append(file)
``````

## `glob` in a list comprehension

``````mylist = [f for f in glob.glob("*.txt")]
``````

## get the full path of only files in the current directory

``````import os
from os import listdir
from os.path import isfile, join

cwd = os.getcwd()
onlyfiles = [os.path.join(cwd, f) for f in os.listdir(cwd) if
os.path.isfile(os.path.join(cwd, f))]
print(onlyfiles)

["G:\getfilesname\getfilesname.py", "G:\getfilesname\example.txt"]
``````

## Getting the full path name with `os.path.abspath`

You get the full path in return

`````` import os
files_path = [os.path.abspath(x) for x in os.listdir()]
print(files_path)

["F:\documentiapplications.txt", "F:\documenticollections.txt"]
``````

## Walk: going through sub directories

os.walk returns the root, the directories list and the files list, that is why I unpacked them in r, d, f in the for loop; it, then, looks for other files and directories in the subfolders of the root and so on until there are no subfolders.

``````import os

# Getting the current work directory (cwd)
thisdir = os.getcwd()

# r=root, d=directories, f = files
for r, d, f in os.walk(thisdir):
for file in f:
if file.endswith(".docx"):
print(os.path.join(r, file))
``````

### `os.listdir()`: get files in the current directory (Python 2)

In Python 2, if you want the list of the files in the current directory, you have to give the argument as "." or os.getcwd() in the os.listdir method.

`````` import os
arr = os.listdir(".")
print(arr)

>>> ["\$RECYCLE.BIN", "work.txt", "3ebooks.txt", "documents"]
``````

### To go up in the directory tree

``````# Method 1
x = os.listdir("..")

# Method 2
x= os.listdir("/")
``````

### Get files: `os.listdir()` in a particular directory (Python 2 and 3)

`````` import os
arr = os.listdir("F:\python")
print(arr)

>>> ["\$RECYCLE.BIN", "work.txt", "3ebooks.txt", "documents"]
``````

### Get files of a particular subdirectory with `os.listdir()`

``````import os

x = os.listdir("./content")
``````

### `os.walk(".")` - current directory

`````` import os
arr = next(os.walk("."))[2]
print(arr)

>>> ["5bs_Turismo1.pdf", "5bs_Turismo1.pptx", "esperienza.txt"]
``````

### `next(os.walk("."))` and `os.path.join("dir", "file")`

`````` import os
arr = []
for d,r,f in next(os.walk("F:\_python")):
for file in f:
arr.append(os.path.join(r,file))

for f in arr:
print(files)

>>> F:\_python\dict_class.py
>>> F:\_python\programmi.txt
``````

### `next(os.walk("F:\")` - get the full path - list comprehension

`````` [os.path.join(r,file) for r,d,f in next(os.walk("F:\_python")) for file in f]

>>> ["F:\_python\dict_class.py", "F:\_python\programmi.txt"]
``````

### `os.walk` - get full path - all files in sub dirs**

``````x = [os.path.join(r,file) for r,d,f in os.walk("F:\_python") for file in f]
print(x)

``````

### `os.listdir()` - get only txt files

`````` arr_txt = [x for x in os.listdir() if x.endswith(".txt")]
print(arr_txt)

>>> ["work.txt", "3ebooks.txt"]
``````

## Using `glob` to get the full path of the files

If I should need the absolute path of the files:

``````from path import path
from glob import glob
x = [path(f).abspath() for f in glob("F:\*.txt")]
for f in x:
print(f)

>>> F:acquistionline.txt
>>> F:acquisti_2018.txt
>>> F:ootstrap_jquery_ecc.txt
``````

## Using `os.path.isfile` to avoid directories in the list

``````import os.path
listOfFiles = [f for f in os.listdir() if os.path.isfile(f)]
print(listOfFiles)

>>> ["a simple game.py", "data.txt", "decorator.py"]
``````

## Using `pathlib` from Python 3.4

``````import pathlib

flist = []
for p in pathlib.Path(".").iterdir():
if p.is_file():
print(p)
flist.append(p)

>>> error.PNG
>>> exemaker.bat
>>> guiprova.mp3
>>> setup.py
>>> speak_gui2.py
>>> thumb.PNG
``````

With `list comprehension`:

``````flist = [p for p in pathlib.Path(".").iterdir() if p.is_file()]
``````

Alternatively, use `pathlib.Path()` instead of `pathlib.Path(".")`

## Use glob method in pathlib.Path()

``````import pathlib

py = pathlib.Path().glob("*.py")
for file in py:
print(file)

>>> stack_overflow_list.py
>>> stack_overflow_list_tkinter.py
``````

## Get all and only files with os.walk

``````import os
x = [i[2] for i in os.walk(".")]
y=[]
for t in x:
for f in t:
y.append(f)
print(y)

>>> ["append_to_list.py", "data.txt", "data1.txt", "data2.txt", "data_180617", "os_walk.py", "READ2.py", "read_data.py", "somma_defaltdic.py", "substitute_words.py", "sum_data.py", "data.txt", "data1.txt", "data_180617"]
``````

## Get only files with next and walk in a directory

`````` import os
x = next(os.walk("F://python"))[2]
print(x)

>>> ["calculator.bat","calculator.py"]
``````

## Get only directories with next and walk in a directory

`````` import os
next(os.walk("F://python"))[1] # for the current dir use (".")

>>> ["python3","others"]
``````

## Get all the subdir names with `walk`

``````for r,d,f in os.walk("F:\_python"):
for dirs in d:
print(dirs)

>>> .vscode
>>> pyexcel
>>> pyschool.py
>>> subtitles
>>> _metaprogramming
>>> .ipynb_checkpoints
``````

## `os.scandir()` from Python 3.5 and greater

``````import os
x = [f.name for f in os.scandir() if f.is_file()]
print(x)

>>> ["calculator.bat","calculator.py"]

# Another example with scandir (a little variation from docs.python.org)
# This one is more efficient than os.listdir.
# In this case, it shows the files only in the current directory
# where the script is executed.

import os
with os.scandir() as i:
for entry in i:
if entry.is_file():
print(entry.name)

>>> ebookmaker.py
>>> error.PNG
>>> exemaker.bat
>>> guiprova.mp3
>>> setup.py
>>> speakgui4.py
>>> speak_gui2.py
>>> speak_gui3.py
>>> thumb.PNG
``````

# Examples:

## Ex. 1: How many files are there in the subdirectories?

In this example, we look for the number of files that are included in all the directory and its subdirectories.

``````import os

def count(dir, counter=0):
"returns number of files in dir and subdirs"
for pack in os.walk(dir):
for f in pack[2]:
counter += 1
return dir + " : " + str(counter) + "files"

print(count("F:\python"))

>>> "F:\python" : 12057 files"
``````

## Ex.2: How to copy all files from a directory to another?

A script to make order in your computer finding all files of a type (default: pptx) and copying them in a new folder.

``````import os
import shutil
from path import path

destination = "F:\file_copied"
# os.makedirs(destination)

def copyfile(dir, filetype="pptx", counter=0):
"Searches for pptx (or other - pptx is the default) files and copies them"
for pack in os.walk(dir):
for f in pack[2]:
if f.endswith(filetype):
fullpath = pack[0] + "\" + f
print(fullpath)
shutil.copy(fullpath, destination)
counter += 1
if counter > 0:
print("-" * 30)
print("	==> Found in: `" + dir + "` : " + str(counter) + " files
")

for dir in os.listdir():
"searches for folders that starts with `_`"
if dir[0] == "_":
# copyfile(dir, filetype="pdf")
copyfile(dir, filetype="txt")

>>> _compiti18Compito Contabilit√† 1conti.txt
>>> _compiti18Compito Contabilit√† 1modula4.txt
>>> _compiti18Compito Contabilit√† 1moduloa4.txt
>>> ------------------------
>>> ==> Found in: `_compiti18` : 3 files
``````

## Ex. 3: How to get all the files in a txt file

In case you want to create a txt file with all the file names:

``````import os
mylist = ""
with open("filelist.txt", "w", encoding="utf-8") as file:
for eachfile in os.listdir():
mylist += eachfile + "
"
file.write(mylist)
``````

## Example: txt with all the files of an hard drive

``````"""
We are going to save a txt file with all the files in your directory.
We will use the function walk()
"""

import os

# see all the methods of os
# print(*dir(os), sep=", ")
listafile = []
percorso = []
with open("lista_file.txt", "w", encoding="utf-8") as testo:
for root, dirs, files in os.walk("D:\"):
for file in files:
listafile.append(file)
percorso.append(root + "\" + file)
testo.write(file + "
")
listafile.sort()
print("N. of files", len(listafile))
with open("lista_file_ordinata.txt", "w", encoding="utf-8") as testo_ordinato:
for file in listafile:
testo_ordinato.write(file + "
")

with open("percorso.txt", "w", encoding="utf-8") as file_percorso:
for file in percorso:
file_percorso.write(file + "
")

os.system("lista_file.txt")
os.system("lista_file_ordinata.txt")
os.system("percorso.txt")
``````

## All the file of C: in one text file

This is a shorter version of the previous code. Change the folder where to start finding the files if you need to start from another position. This code generate a 50 mb on text file on my computer with something less then 500.000 lines with files with the complete path.

``````import os

with open("file.txt", "w", encoding="utf-8") as filewrite:
for r, d, f in os.walk("C:\"):
for file in f:
filewrite.write(f"{r + file}
")
``````

## How to write a file with all paths in a folder of a type

With this function you can create a txt file that will have the name of a type of file that you look for (ex. pngfile.txt) with all the full path of all the files of that type. It can be useful sometimes, I think.

``````import os

def searchfiles(extension=".ttf", folder="H:\"):
"Create a txt file with all the file of a type"
with open(extension[1:] + "file.txt", "w", encoding="utf-8") as filewrite:
for r, d, f in os.walk(folder):
for file in f:
if file.endswith(extension):
filewrite.write(f"{r + file}
")

# looking for png file (fonts) in the hard disk H:
searchfiles(".png", "H:\")

>>> H:4bs_18Dolphins5.png
>>> H:4bs_18Dolphins6.png
>>> H:4bs_18Dolphins7.png
>>> H:5_18marketing htmlassetsimageslogo2.png
>>> H:7z001.png
>>> H:7z002.png
``````

## (New) Find all files and open them with tkinter GUI

I just wanted to add in this 2019 a little app to search for all files in a dir and be able to open them by doubleclicking on the name of the file in the list.

``````import tkinter as tk
import os

def searchfiles(extension=".txt", folder="H:\"):
"insert all files in the listbox"
for r, d, f in os.walk(folder):
for file in f:
if file.endswith(extension):
lb.insert(0, r + "\" + file)

def open_file():
os.startfile(lb.get(lb.curselection()[0]))

root = tk.Tk()
root.geometry("400x400")
bt = tk.Button(root, text="Search", command=lambda:searchfiles(".png", "H:\"))
bt.pack()
lb = tk.Listbox(root)
lb.pack(fill="both", expand=1)
lb.bind("<Double-Button>", lambda x: open_file())
root.mainloop()
``````

I just used the following which was quite simple. First open a console then cd to where you"ve downloaded your file like some-package.whl and use

``````pip install some-package.whl
``````

Note: if pip.exe is not recognized, you may find it in the "Scripts" directory from where python has been installed. If pip is not installed, this page can help: How do I install pip on Windows?

Note: for clarification
If you copy the `*.whl` file to your local drive (ex. C:some-dirsome-file.whl) use the following command line parameters --

``````pip install C:/some-dir/some-file.whl
``````

This is the behaviour to adopt when the referenced object is deleted. It is not specific to Django; this is an SQL standard. Although Django has its own implementation on top of SQL. (1)

There are seven possible actions to take when such event occurs:

• `CASCADE`: When the referenced object is deleted, also delete the objects that have references to it (when you remove a blog post for instance, you might want to delete comments as well). SQL equivalent: `CASCADE`.
• `PROTECT`: Forbid the deletion of the referenced object. To delete it you will have to delete all objects that reference it manually. SQL equivalent: `RESTRICT`.
• `RESTRICT`: (introduced in Django 3.1) Similar behavior as `PROTECT` that matches SQL"s `RESTRICT` more accurately. (See django documentation example)
• `SET_NULL`: Set the reference to NULL (requires the field to be nullable). For instance, when you delete a User, you might want to keep the comments he posted on blog posts, but say it was posted by an anonymous (or deleted) user. SQL equivalent: `SET NULL`.
• `SET_DEFAULT`: Set the default value. SQL equivalent: `SET DEFAULT`.
• `SET(...)`: Set a given value. This one is not part of the SQL standard and is entirely handled by Django.
• `DO_NOTHING`: Probably a very bad idea since this would create integrity issues in your database (referencing an object that actually doesn"t exist). SQL equivalent: `NO ACTION`. (2)

Source: Django documentation

In most cases, `CASCADE` is the expected behaviour, but for every ForeignKey, you should always ask yourself what is the expected behaviour in this situation. `PROTECT` and `SET_NULL` are often useful. Setting `CASCADE` where it should not, can potentially delete all of your database in cascade, by simply deleting a single user.

It"s funny to notice that the direction of the `CASCADE` action is not clear to many people. Actually, it"s funny to notice that only the `CASCADE` action is not clear. I understand the cascade behavior might be confusing, however you must think that it is the same direction as any other action. Thus, if you feel that `CASCADE` direction is not clear to you, it actually means that `on_delete` behavior is not clear to you.

In your database, a foreign key is basically represented by an integer field which value is the primary key of the foreign object. Let"s say you have an entry comment_A, which has a foreign key to an entry article_B. If you delete the entry comment_A, everything is fine. article_B used to live without comment_A and don"t bother if it"s deleted. However, if you delete article_B, then comment_A panics! It never lived without article_B and needs it, and it"s part of its attributes (`article=article_B`, but what is article_B???). This is where `on_delete` steps in, to determine how to resolve this integrity error, either by saying:

• "No! Please! Don"t! I can"t live without you!" (which is said `PROTECT` or `RESTRICT` in Django/SQL)
• "All right, if I"m not yours, then I"m nobody"s" (which is said `SET_NULL`)
• "Good bye world, I can"t live without article_B" and commit suicide (this is the `CASCADE` behavior).
• "It"s OK, I"ve got spare lover, and I"ll reference article_C from now" (`SET_DEFAULT`, or even `SET(...)`).
• "I can"t face reality, and I"ll keep calling your name even if that"s the only thing left to me!" (`DO_NOTHING`)

I hope it makes cascade direction clearer. :)

Footnotes

(1) Django has its own implementation on top of SQL. And, as mentioned by @JoeMjr2 in the comments below, Django will not create the SQL constraints. If you want the constraints to be ensured by your database (for instance, if your database is used by another application, or if you hang in the database console from time to time), you might want to set the related constraints manually yourself. There is an open ticket to add support for database-level on delete constrains in Django.

(2) Actually, there is one case where `DO_NOTHING` can be useful: If you want to skip Django"s implementation and implement the constraint yourself at the database-level.

Running `brew reinstall [email protected]` didn"t work for my existing Python 2.7 virtual environments. Inside them there were still `ERROR:root:code for hash sha1 was not found` errors.

I encountered this problem after I ran `brew upgrade openssl`. And here"s the fix:

``````\$ ls /usr/local/Cellar/openssl
``````

...which shows

``````1.0.2t
``````

According to the existing version, run:

``````\$ brew switch openssl 1.0.2t
``````

...which shows

``````Cleaning /usr/local/Cellar/openssl/1.0.2t
``````

After that, run the following command in a Python 2.7 virtualenv:

``````(my-venv) \$ python -c "import hashlib;m=hashlib.md5();print(m.hexdigest())"
``````

...which shows

``````d41d8cd98f00b204e9800998ecf8427e
``````

No more errors.

You opened the file in binary mode:

``````with open(fname, "rb") as f:
``````

This means that all data read from the file is returned as `bytes` objects, not `str`. You cannot then use a string in a containment test:

``````if "some-pattern" in tmp: continue
``````

You"d have to use a `bytes` object to test against `tmp` instead:

``````if b"some-pattern" in tmp: continue
``````

or open the file as a textfile instead by replacing the `"rb"` mode with `"r"`.

‚ö°Ô∏è TL;DR ‚Äî One line solution.

All you have to do is:

``````sudo easy_install pip
``````

2019: ‚ö†Ô∏è`easy_install` has been deprecated. Check Method #2 below for preferred installation!

Details:

‚ö°Ô∏è OK, I read the solutions given above, but here"s an EASY solution to install `pip`.

MacOS comes with `Python` installed. But to make sure that you have `Python` installed open the terminal and run the following command.

``````python --version
``````

If this command returns a version number that means `Python` exists. Which also means that you already have access to `easy_install` considering you are using `macOS/OSX`.

‚ÑπÔ∏è Now, all you have to do is run the following command.

``````sudo easy_install pip
``````

After that, `pip` will be installed and you"ll be able to use it for installing other packages.

Let me know if you have any problems installing `pip` this way.

Cheers!

P.S. I ended up blogging a post about it. QuickTip: How Do I Install pip on macOS or OS X?

‚úÖ UPDATE (Jan 2019): METHOD #2: Two line solution ‚Äî

`easy_install` has been deprecated. Please use `get-pip.py` instead.

First of all download the `get-pip` file

``````curl https://bootstrap.pypa.io/get-pip.py -o get-pip.py
``````

Now run this file to install `pip`

``````python get-pip.py
``````

That should do it.

Another gif you said? Here ya go!

I noticed that every now and then I need to Google fopen all over again, just to build a mental image of what the primary differences between the modes are. So, I thought a diagram will be faster to read next time. Maybe someone else will find that helpful too.

It helps to install a python package `foo` on your machine (can also be in `virtualenv`) so that you can import the package `foo` from other projects and also from [I]Python prompts.

It does the similar job of `pip`, `easy_install` etc.,

Using `setup.py`

Package - A folder/directory that contains `__init__.py` file.
Module - A valid python file with `.py` extension.
Distribution - How one package relates to other packages and modules.

Let"s say you want to install a package named `foo`. Then you do,

``````\$ git clone https://github.com/user/foo
\$ cd foo
\$ python setup.py install
``````

Instead, if you don"t want to actually install it but still would like to use it. Then do,

``````\$ python setup.py develop
``````

This command will create symlinks to the source directory within site-packages instead of copying things. Because of this, it is quite fast (particularly for large packages).

Creating `setup.py`

If you have your package tree like,

``````foo
‚îú‚îÄ‚îÄ foo
‚îÇ¬†¬† ‚îú‚îÄ‚îÄ data_struct.py
‚îÇ¬†¬† ‚îú‚îÄ‚îÄ __init__.py
‚îÇ¬†¬† ‚îî‚îÄ‚îÄ internals.py
‚îú‚îÄ‚îÄ requirements.txt
‚îî‚îÄ‚îÄ setup.py
``````

Then, you do the following in your `setup.py` script so that it can be installed on some machine:

``````from setuptools import setup

setup(
name="foo",
version="1.0",
description="A useful module",
author="Man Foo",
author_email="[email protected]",
packages=["foo"],  #same as name
install_requires=["wheel", "bar", "greek"], #external packages as dependencies
)
``````

Instead, if your package tree is more complex like the one below:

``````foo
‚îú‚îÄ‚îÄ foo
‚îÇ¬†¬† ‚îú‚îÄ‚îÄ data_struct.py
‚îÇ¬†¬† ‚îú‚îÄ‚îÄ __init__.py
‚îÇ¬†¬† ‚îî‚îÄ‚îÄ internals.py
‚îú‚îÄ‚îÄ requirements.txt
‚îú‚îÄ‚îÄ scripts
‚îÇ¬†¬† ‚îú‚îÄ‚îÄ cool
‚îÇ¬†¬† ‚îî‚îÄ‚îÄ skype
‚îî‚îÄ‚îÄ setup.py
``````

Then, your `setup.py` in this case would be like:

``````from setuptools import setup

setup(
name="foo",
version="1.0",
description="A useful module",
author="Man Foo",
author_email="[email protected]",
packages=["foo"],  #same as name
install_requires=["wheel", "bar", "greek"], #external packages as dependencies
scripts=[
"scripts/cool",
"scripts/skype",
]
)
``````

Add more stuff to (`setup.py`) & make it decent:

``````from setuptools import setup

setup(
name="foo",
version="1.0",
description="A useful module",
long_description=long_description,
author="Man Foo",
author_email="[email protected]",
url="http://www.foopackage.com/",
packages=["foo"],  #same as name
install_requires=["wheel", "bar", "greek"], #external packages as dependencies
scripts=[
"scripts/cool",
"scripts/skype",
]
)
``````

The `long_description` is used in pypi.org as the README description of your package.

And finally, you"re now ready to upload your package to PyPi.org so that others can install your package using `pip install yourpackage`.

At this point there are two options.

• publish in the temporary test.pypi.org server to make oneself familiarize with the procedure, and then publish it on the permanent pypi.org server for the public to use your package.
• publish straight away on the permanent pypi.org server, if you are already familiar with the procedure and have your user credentials (e.g., username, password, package name)

Once your package name is registered in pypi.org, nobody can claim or use it. Python packaging suggests the twine package for uploading purposes (of your package to PyPi). Thus,

(1) the first step is to locally build the distributions using:

``````# prereq: wheel (pip install wheel)
\$ python setup.py sdist bdist_wheel
``````

(2) then using `twine` for uploading either to test.pypi.org or pypi.org:

``````\$ twine upload --repository testpypi dist/*
``````

It will take few minutes for the package to appear on test.pypi.org. Once you"re satisfied with it, you can then upload your package to the real & permanent index of pypi.org simply with:

``````\$ twine upload dist/*
``````

Optionally, you can also sign the files in your package with a `GPG` by:

``````\$ twine upload dist/* --sign
``````

# tl;dr / quick fix

• Don"t decode/encode willy nilly
• Don"t assume your strings are UTF-8 encoded
• Try to convert strings to Unicode strings as soon as possible in your code
• Fix your locale: How to solve UnicodeDecodeError in Python 3.6?
• Don"t be tempted to use quick `reload` hacks

# Unicode Zen in Python 2.x - The Long Version

Without seeing the source it"s difficult to know the root cause, so I"ll have to speak generally.

`UnicodeDecodeError: "ascii" codec can"t decode byte` generally happens when you try to convert a Python 2.x `str` that contains non-ASCII to a Unicode string without specifying the encoding of the original string.

In brief, Unicode strings are an entirely separate type of Python string that does not contain any encoding. They only hold Unicode point codes and therefore can hold any Unicode point from across the entire spectrum. Strings contain encoded text, beit UTF-8, UTF-16, ISO-8895-1, GBK, Big5 etc. Strings are decoded to Unicode and Unicodes are encoded to strings. Files and text data are always transferred in encoded strings.

The Markdown module authors probably use `unicode()` (where the exception is thrown) as a quality gate to the rest of the code - it will convert ASCII or re-wrap existing Unicodes strings to a new Unicode string. The Markdown authors can"t know the encoding of the incoming string so will rely on you to decode strings to Unicode strings before passing to Markdown.

Unicode strings can be declared in your code using the `u` prefix to strings. E.g.

``````>>> my_u = u"my √ºnic√¥d√© strƒØng"
>>> type(my_u)
<type "unicode">
``````

Unicode strings may also come from file, databases and network modules. When this happens, you don"t need to worry about the encoding.

# Gotchas

Conversion from `str` to Unicode can happen even when you don"t explicitly call `unicode()`.

The following scenarios cause `UnicodeDecodeError` exceptions:

``````# Explicit conversion without encoding
unicode("‚Ç¨")

# New style format string into Unicode string
# Python will try to convert value string to Unicode first
u"The currency is: {}".format("‚Ç¨")

# Old style format string into Unicode string
# Python will try to convert value string to Unicode first
u"The currency is: %s" % "‚Ç¨"

# Append string to Unicode
# Python will try to convert string to Unicode first
u"The currency is: " + "‚Ç¨"
``````

## Examples

In the following diagram, you can see how the word `caf√©` has been encoded in either "UTF-8" or "Cp1252" encoding depending on the terminal type. In both examples, `caf` is just regular ascii. In UTF-8, `√©` is encoded using two bytes. In "Cp1252", √© is 0xE9 (which is also happens to be the Unicode point value (it"s no coincidence)). The correct `decode()` is invoked and conversion to a Python Unicode is successfull:

In this diagram, `decode()` is called with `ascii` (which is the same as calling `unicode()` without an encoding given). As ASCII can"t contain bytes greater than `0x7F`, this will throw a `UnicodeDecodeError` exception:

# The Unicode Sandwich

It"s good practice to form a Unicode sandwich in your code, where you decode all incoming data to Unicode strings, work with Unicodes, then encode to `str`s on the way out. This saves you from worrying about the encoding of strings in the middle of your code.

## Input / Decode

### Source code

If you need to bake non-ASCII into your source code, just create Unicode strings by prefixing the string with a `u`. E.g.

``````u"Z√ºrich"
``````

To allow Python to decode your source code, you will need to add an encoding header to match the actual encoding of your file. For example, if your file was encoded as "UTF-8", you would use:

``````# encoding: utf-8
``````

This is only necessary when you have non-ASCII in your source code.

### Files

Usually non-ASCII data is received from a file. The `io` module provides a TextWrapper that decodes your file on the fly, using a given `encoding`. You must use the correct encoding for the file - it can"t be easily guessed. For example, for a UTF-8 file:

``````import io
with io.open("my_utf8_file.txt", "r", encoding="utf-8") as my_file:
``````

`my_unicode_string` would then be suitable for passing to Markdown. If a `UnicodeDecodeError` from the `read()` line, then you"ve probably used the wrong encoding value.

### CSV Files

The Python 2.7 CSV module does not support non-ASCII characters üò©. Help is at hand, however, with https://pypi.python.org/pypi/backports.csv.

Use it like above but pass the opened file to it:

``````from backports import csv
import io
with io.open("my_utf8_file.txt", "r", encoding="utf-8") as my_file:
yield row
``````

### Databases

Most Python database drivers can return data in Unicode, but usually require a little configuration. Always use Unicode strings for SQL queries.

MySQL

``````charset="utf8",
use_unicode=True
``````

E.g.

``````>>> db = MySQLdb.connect(host="localhost", user="root", passwd="passwd", db="sandbox", use_unicode=True, charset="utf8")
``````
PostgreSQL

``````psycopg2.extensions.register_type(psycopg2.extensions.UNICODE)
psycopg2.extensions.register_type(psycopg2.extensions.UNICODEARRAY)
``````

### HTTP

Web pages can be encoded in just about any encoding. The `Content-type` header should contain a `charset` field to hint at the encoding. The content can then be decoded manually against this value. Alternatively, Python-Requests returns Unicodes in `response.text`.

### Manually

If you must decode strings manually, you can simply do `my_string.decode(encoding)`, where `encoding` is the appropriate encoding. Python 2.x supported codecs are given here: Standard Encodings. Again, if you get `UnicodeDecodeError` then you"ve probably got the wrong encoding.

## The meat of the sandwich

Work with Unicodes as you would normal strs.

## Output

### stdout / printing

`print` writes through the stdout stream. Python tries to configure an encoder on stdout so that Unicodes are encoded to the console"s encoding. For example, if a Linux shell"s `locale` is `en_GB.UTF-8`, the output will be encoded to `UTF-8`. On Windows, you will be limited to an 8bit code page.

An incorrectly configured console, such as corrupt locale, can lead to unexpected print errors. `PYTHONIOENCODING` environment variable can force the encoding for stdout.

### Files

Just like input, `io.open` can be used to transparently convert Unicodes to encoded byte strings.

### Database

The same configuration for reading will allow Unicodes to be written directly.

# Python 3

Python 3 is no more Unicode capable than Python 2.x is, however it is slightly less confused on the topic. E.g the regular `str` is now a Unicode string and the old `str` is now `bytes`.

The default encoding is UTF-8, so if you `.decode()` a byte string without giving an encoding, Python 3 uses UTF-8 encoding. This probably fixes 50% of people"s Unicode problems.

Further, `open()` operates in text mode by default, so returns decoded `str` (Unicode ones). The encoding is derived from your locale, which tends to be UTF-8 on Un*x systems or an 8-bit code page, such as windows-1251, on Windows boxes.

# Why you shouldn"t use `sys.setdefaultencoding("utf8")`

It"s a nasty hack (there"s a reason you have to use `reload`) that will only mask problems and hinder your migration to Python 3.x. Understand the problem, fix the root cause and enjoy Unicode zen. See Why should we NOT use sys.setdefaultencoding("utf-8") in a py script? for further details