Python | os.getcwdb () method

cwd | getcwd | Python Methods and Functions

All functions in the os module raise OSError for invalid or unreachable file names and paths or other arguments that are of the correct type but are not accepted by the operating system.

os.getcwdb () in Python is os.getcwd() version of the os.getcwd() method. This method returns the current working directory (CWD).

Syntax: os.getcwdb ()

Parameter: No parameter is required.

Return Type: This method returns a bytestring which represents the current working directory.

Code: using the os.getcwdb () method to get the current working directory

# Python program to explain the os.getcwdb () method

 
# import of the os module

import os

 
# Get current work
# directory (CWD )

cwd = < code class = "plain"> os.getcwdb ()

 
# Print current work
# directory (CWD)

print ( "Current working directory:" , cwd)

Exit:

 Current working directory: b' / home / ihritik' 




Python | os.getcwdb () method: StackOverflow Questions

Answer #1

os.listdir() - list in the current directory

With listdir in os module you get the files and the folders in the current dir

 import os
 arr = os.listdir()
 print(arr)
 
 >>> ["$RECYCLE.BIN", "work.txt", "3ebooks.txt", "documents"]

Looking in a directory

arr = os.listdir("c:\files")

glob from glob

with glob you can specify a type of file to list like this

import glob

txtfiles = []
for file in glob.glob("*.txt"):
    txtfiles.append(file)

glob in a list comprehension

mylist = [f for f in glob.glob("*.txt")]

get the full path of only files in the current directory

import os
from os import listdir
from os.path import isfile, join

cwd = os.getcwd()
onlyfiles = [os.path.join(cwd, f) for f in os.listdir(cwd) if 
os.path.isfile(os.path.join(cwd, f))]
print(onlyfiles) 

["G:\getfilesname\getfilesname.py", "G:\getfilesname\example.txt"]

Getting the full path name with os.path.abspath

You get the full path in return

 import os
 files_path = [os.path.abspath(x) for x in os.listdir()]
 print(files_path)
 
 ["F:\documentiapplications.txt", "F:\documenticollections.txt"]

Walk: going through sub directories

os.walk returns the root, the directories list and the files list, that is why I unpacked them in r, d, f in the for loop; it, then, looks for other files and directories in the subfolders of the root and so on until there are no subfolders.

import os

# Getting the current work directory (cwd)
thisdir = os.getcwd()

# r=root, d=directories, f = files
for r, d, f in os.walk(thisdir):
    for file in f:
        if file.endswith(".docx"):
            print(os.path.join(r, file))

os.listdir(): get files in the current directory (Python 2)

In Python 2, if you want the list of the files in the current directory, you have to give the argument as "." or os.getcwd() in the os.listdir method.

 import os
 arr = os.listdir(".")
 print(arr)
 
 >>> ["$RECYCLE.BIN", "work.txt", "3ebooks.txt", "documents"]

To go up in the directory tree

# Method 1
x = os.listdir("..")

# Method 2
x= os.listdir("/")

Get files: os.listdir() in a particular directory (Python 2 and 3)

 import os
 arr = os.listdir("F:\python")
 print(arr)
 
 >>> ["$RECYCLE.BIN", "work.txt", "3ebooks.txt", "documents"]

Get files of a particular subdirectory with os.listdir()

import os

x = os.listdir("./content")

os.walk(".") - current directory

 import os
 arr = next(os.walk("."))[2]
 print(arr)
 
 >>> ["5bs_Turismo1.pdf", "5bs_Turismo1.pptx", "esperienza.txt"]

next(os.walk(".")) and os.path.join("dir", "file")

 import os
 arr = []
 for d,r,f in next(os.walk("F:\_python")):
     for file in f:
         arr.append(os.path.join(r,file))

 for f in arr:
     print(files)

>>> F:\_python\dict_class.py
>>> F:\_python\programmi.txt

next(os.walk("F:\") - get the full path - list comprehension

 [os.path.join(r,file) for r,d,f in next(os.walk("F:\_python")) for file in f]
 
 >>> ["F:\_python\dict_class.py", "F:\_python\programmi.txt"]

os.walk - get full path - all files in sub dirs**

x = [os.path.join(r,file) for r,d,f in os.walk("F:\_python") for file in f]
print(x)

>>> ["F:\_python\dict.py", "F:\_python\progr.txt", "F:\_python\readl.py"]

os.listdir() - get only txt files

 arr_txt = [x for x in os.listdir() if x.endswith(".txt")]
 print(arr_txt)
 
 >>> ["work.txt", "3ebooks.txt"]

Using glob to get the full path of the files

If I should need the absolute path of the files:

from path import path
from glob import glob
x = [path(f).abspath() for f in glob("F:\*.txt")]
for f in x:
    print(f)

>>> F:acquistionline.txt
>>> F:acquisti_2018.txt
>>> F:ootstrap_jquery_ecc.txt

Using os.path.isfile to avoid directories in the list

import os.path
listOfFiles = [f for f in os.listdir() if os.path.isfile(f)]
print(listOfFiles)

>>> ["a simple game.py", "data.txt", "decorator.py"]

Using pathlib from Python 3.4

import pathlib

flist = []
for p in pathlib.Path(".").iterdir():
    if p.is_file():
        print(p)
        flist.append(p)

 >>> error.PNG
 >>> exemaker.bat
 >>> guiprova.mp3
 >>> setup.py
 >>> speak_gui2.py
 >>> thumb.PNG

With list comprehension:

flist = [p for p in pathlib.Path(".").iterdir() if p.is_file()]

Alternatively, use pathlib.Path() instead of pathlib.Path(".")

Use glob method in pathlib.Path()

import pathlib

py = pathlib.Path().glob("*.py")
for file in py:
    print(file)

>>> stack_overflow_list.py
>>> stack_overflow_list_tkinter.py

Get all and only files with os.walk

import os
x = [i[2] for i in os.walk(".")]
y=[]
for t in x:
    for f in t:
        y.append(f)
print(y)

>>> ["append_to_list.py", "data.txt", "data1.txt", "data2.txt", "data_180617", "os_walk.py", "READ2.py", "read_data.py", "somma_defaltdic.py", "substitute_words.py", "sum_data.py", "data.txt", "data1.txt", "data_180617"]

Get only files with next and walk in a directory

 import os
 x = next(os.walk("F://python"))[2]
 print(x)
 
 >>> ["calculator.bat","calculator.py"]

Get only directories with next and walk in a directory

 import os
 next(os.walk("F://python"))[1] # for the current dir use (".")
 
 >>> ["python3","others"]

Get all the subdir names with walk

for r,d,f in os.walk("F:\_python"):
    for dirs in d:
        print(dirs)

>>> .vscode
>>> pyexcel
>>> pyschool.py
>>> subtitles
>>> _metaprogramming
>>> .ipynb_checkpoints

os.scandir() from Python 3.5 and greater

import os
x = [f.name for f in os.scandir() if f.is_file()]
print(x)

>>> ["calculator.bat","calculator.py"]

# Another example with scandir (a little variation from docs.python.org)
# This one is more efficient than os.listdir.
# In this case, it shows the files only in the current directory
# where the script is executed.

import os
with os.scandir() as i:
    for entry in i:
        if entry.is_file():
            print(entry.name)

>>> ebookmaker.py
>>> error.PNG
>>> exemaker.bat
>>> guiprova.mp3
>>> setup.py
>>> speakgui4.py
>>> speak_gui2.py
>>> speak_gui3.py
>>> thumb.PNG

Examples:

Ex. 1: How many files are there in the subdirectories?

In this example, we look for the number of files that are included in all the directory and its subdirectories.

import os

def count(dir, counter=0):
    "returns number of files in dir and subdirs"
    for pack in os.walk(dir):
        for f in pack[2]:
            counter += 1
    return dir + " : " + str(counter) + "files"

print(count("F:\python"))

>>> "F:\python" : 12057 files"

Ex.2: How to copy all files from a directory to another?

A script to make order in your computer finding all files of a type (default: pptx) and copying them in a new folder.

import os
import shutil
from path import path

destination = "F:\file_copied"
# os.makedirs(destination)

def copyfile(dir, filetype="pptx", counter=0):
    "Searches for pptx (or other - pptx is the default) files and copies them"
    for pack in os.walk(dir):
        for f in pack[2]:
            if f.endswith(filetype):
                fullpath = pack[0] + "\" + f
                print(fullpath)
                shutil.copy(fullpath, destination)
                counter += 1
    if counter > 0:
        print("-" * 30)
        print("	==> Found in: `" + dir + "` : " + str(counter) + " files
")

for dir in os.listdir():
    "searches for folders that starts with `_`"
    if dir[0] == "_":
        # copyfile(dir, filetype="pdf")
        copyfile(dir, filetype="txt")


>>> _compiti18Compito Contabilità 1conti.txt
>>> _compiti18Compito Contabilità 1modula4.txt
>>> _compiti18Compito Contabilità 1moduloa4.txt
>>> ------------------------
>>> ==> Found in: `_compiti18` : 3 files

Ex. 3: How to get all the files in a txt file

In case you want to create a txt file with all the file names:

import os
mylist = ""
with open("filelist.txt", "w", encoding="utf-8") as file:
    for eachfile in os.listdir():
        mylist += eachfile + "
"
    file.write(mylist)

Example: txt with all the files of an hard drive

"""
We are going to save a txt file with all the files in your directory.
We will use the function walk()
"""

import os

# see all the methods of os
# print(*dir(os), sep=", ")
listafile = []
percorso = []
with open("lista_file.txt", "w", encoding="utf-8") as testo:
    for root, dirs, files in os.walk("D:\"):
        for file in files:
            listafile.append(file)
            percorso.append(root + "\" + file)
            testo.write(file + "
")
listafile.sort()
print("N. of files", len(listafile))
with open("lista_file_ordinata.txt", "w", encoding="utf-8") as testo_ordinato:
    for file in listafile:
        testo_ordinato.write(file + "
")

with open("percorso.txt", "w", encoding="utf-8") as file_percorso:
    for file in percorso:
        file_percorso.write(file + "
")

os.system("lista_file.txt")
os.system("lista_file_ordinata.txt")
os.system("percorso.txt")

All the file of C: in one text file

This is a shorter version of the previous code. Change the folder where to start finding the files if you need to start from another position. This code generate a 50 mb on text file on my computer with something less then 500.000 lines with files with the complete path.

import os

with open("file.txt", "w", encoding="utf-8") as filewrite:
    for r, d, f in os.walk("C:\"):
        for file in f:
            filewrite.write(f"{r + file}
")

How to write a file with all paths in a folder of a type

With this function you can create a txt file that will have the name of a type of file that you look for (ex. pngfile.txt) with all the full path of all the files of that type. It can be useful sometimes, I think.

import os

def searchfiles(extension=".ttf", folder="H:\"):
    "Create a txt file with all the file of a type"
    with open(extension[1:] + "file.txt", "w", encoding="utf-8") as filewrite:
        for r, d, f in os.walk(folder):
            for file in f:
                if file.endswith(extension):
                    filewrite.write(f"{r + file}
")

# looking for png file (fonts) in the hard disk H:
searchfiles(".png", "H:\")

>>> H:4bs_18Dolphins5.png
>>> H:4bs_18Dolphins6.png
>>> H:4bs_18Dolphins7.png
>>> H:5_18marketing htmlassetsimageslogo2.png
>>> H:7z001.png
>>> H:7z002.png

(New) Find all files and open them with tkinter GUI

I just wanted to add in this 2019 a little app to search for all files in a dir and be able to open them by doubleclicking on the name of the file in the list. enter image description here

import tkinter as tk
import os

def searchfiles(extension=".txt", folder="H:\"):
    "insert all files in the listbox"
    for r, d, f in os.walk(folder):
        for file in f:
            if file.endswith(extension):
                lb.insert(0, r + "\" + file)

def open_file():
    os.startfile(lb.get(lb.curselection()[0]))

root = tk.Tk()
root.geometry("400x400")
bt = tk.Button(root, text="Search", command=lambda:searchfiles(".png", "H:\"))
bt.pack()
lb = tk.Listbox(root)
lb.pack(fill="both", expand=1)
lb.bind("<Double-Button>", lambda x: open_file())
root.mainloop()

Answer #2

To somewhat expand on the earlier answers here, there are a number of details which are commonly overlooked.

  • Prefer subprocess.run() over subprocess.check_call() and friends over subprocess.call() over subprocess.Popen() over os.system() over os.popen()
  • Understand and probably use text=True, aka universal_newlines=True.
  • Understand the meaning of shell=True or shell=False and how it changes quoting and the availability of shell conveniences.
  • Understand differences between sh and Bash
  • Understand how a subprocess is separate from its parent, and generally cannot change the parent.
  • Avoid running the Python interpreter as a subprocess of Python.

These topics are covered in some more detail below.

Prefer subprocess.run() or subprocess.check_call()

The subprocess.Popen() function is a low-level workhorse but it is tricky to use correctly and you end up copy/pasting multiple lines of code ... which conveniently already exist in the standard library as a set of higher-level wrapper functions for various purposes, which are presented in more detail in the following.

Here"s a paragraph from the documentation:

The recommended approach to invoking subprocesses is to use the run() function for all use cases it can handle. For more advanced use cases, the underlying Popen interface can be used directly.

Unfortunately, the availability of these wrapper functions differs between Python versions.

  • subprocess.run() was officially introduced in Python 3.5. It is meant to replace all of the following.
  • subprocess.check_output() was introduced in Python 2.7 / 3.1. It is basically equivalent to subprocess.run(..., check=True, stdout=subprocess.PIPE).stdout
  • subprocess.check_call() was introduced in Python 2.5. It is basically equivalent to subprocess.run(..., check=True)
  • subprocess.call() was introduced in Python 2.4 in the original subprocess module (PEP-324). It is basically equivalent to subprocess.run(...).returncode

High-level API vs subprocess.Popen()

The refactored and extended subprocess.run() is more logical and more versatile than the older legacy functions it replaces. It returns a CompletedProcess object which has various methods which allow you to retrieve the exit status, the standard output, and a few other results and status indicators from the finished subprocess.

subprocess.run() is the way to go if you simply need a program to run and return control to Python. For more involved scenarios (background processes, perhaps with interactive I/O with the Python parent program) you still need to use subprocess.Popen() and take care of all the plumbing yourself. This requires a fairly intricate understanding of all the moving parts and should not be undertaken lightly. The simpler Popen object represents the (possibly still-running) process which needs to be managed from your code for the remainder of the lifetime of the subprocess.

It should perhaps be emphasized that just subprocess.Popen() merely creates a process. If you leave it at that, you have a subprocess running concurrently alongside with Python, so a "background" process. If it doesn"t need to do input or output or otherwise coordinate with you, it can do useful work in parallel with your Python program.

Avoid os.system() and os.popen()

Since time eternal (well, since Python 2.5) the os module documentation has contained the recommendation to prefer subprocess over os.system():

The subprocess module provides more powerful facilities for spawning new processes and retrieving their results; using that module is preferable to using this function.

The problems with system() are that it"s obviously system-dependent and doesn"t offer ways to interact with the subprocess. It simply runs, with standard output and standard error outside of Python"s reach. The only information Python receives back is the exit status of the command (zero means success, though the meaning of non-zero values is also somewhat system-dependent).

PEP-324 (which was already mentioned above) contains a more detailed rationale for why os.system is problematic and how subprocess attempts to solve those issues.

os.popen() used to be even more strongly discouraged:

Deprecated since version 2.6: This function is obsolete. Use the subprocess module.

However, since sometime in Python 3, it has been reimplemented to simply use subprocess, and redirects to the subprocess.Popen() documentation for details.

Understand and usually use check=True

You"ll also notice that subprocess.call() has many of the same limitations as os.system(). In regular use, you should generally check whether the process finished successfully, which subprocess.check_call() and subprocess.check_output() do (where the latter also returns the standard output of the finished subprocess). Similarly, you should usually use check=True with subprocess.run() unless you specifically need to allow the subprocess to return an error status.

In practice, with check=True or subprocess.check_*, Python will throw a CalledProcessError exception if the subprocess returns a nonzero exit status.

A common error with subprocess.run() is to omit check=True and be surprised when downstream code fails if the subprocess failed.

On the other hand, a common problem with check_call() and check_output() was that users who blindly used these functions were surprised when the exception was raised e.g. when grep did not find a match. (You should probably replace grep with native Python code anyway, as outlined below.)

All things counted, you need to understand how shell commands return an exit code, and under what conditions they will return a non-zero (error) exit code, and make a conscious decision how exactly it should be handled.

Understand and probably use text=True aka universal_newlines=True

Since Python 3, strings internal to Python are Unicode strings. But there is no guarantee that a subprocess generates Unicode output, or strings at all.

(If the differences are not immediately obvious, Ned Batchelder"s Pragmatic Unicode is recommended, if not outright obligatory, reading. There is a 36-minute video presentation behind the link if you prefer, though reading the page yourself will probably take significantly less time.)

Deep down, Python has to fetch a bytes buffer and interpret it somehow. If it contains a blob of binary data, it shouldn"t be decoded into a Unicode string, because that"s error-prone and bug-inducing behavior - precisely the sort of pesky behavior which riddled many Python 2 scripts, before there was a way to properly distinguish between encoded text and binary data.

With text=True, you tell Python that you, in fact, expect back textual data in the system"s default encoding, and that it should be decoded into a Python (Unicode) string to the best of Python"s ability (usually UTF-8 on any moderately up to date system, except perhaps Windows?)

If that"s not what you request back, Python will just give you bytes strings in the stdout and stderr strings. Maybe at some later point you do know that they were text strings after all, and you know their encoding. Then, you can decode them.

normal = subprocess.run([external, arg],
    stdout=subprocess.PIPE, stderr=subprocess.PIPE,
    check=True,
    text=True)
print(normal.stdout)

convoluted = subprocess.run([external, arg],
    stdout=subprocess.PIPE, stderr=subprocess.PIPE,
    check=True)
# You have to know (or guess) the encoding
print(convoluted.stdout.decode("utf-8"))

Python 3.7 introduced the shorter and more descriptive and understandable alias text for the keyword argument which was previously somewhat misleadingly called universal_newlines.

Understand shell=True vs shell=False

With shell=True you pass a single string to your shell, and the shell takes it from there.

With shell=False you pass a list of arguments to the OS, bypassing the shell.

When you don"t have a shell, you save a process and get rid of a fairly substantial amount of hidden complexity, which may or may not harbor bugs or even security problems.

On the other hand, when you don"t have a shell, you don"t have redirection, wildcard expansion, job control, and a large number of other shell features.

A common mistake is to use shell=True and then still pass Python a list of tokens, or vice versa. This happens to work in some cases, but is really ill-defined and could break in interesting ways.

# XXX AVOID THIS BUG
buggy = subprocess.run("dig +short stackoverflow.com")

# XXX AVOID THIS BUG TOO
broken = subprocess.run(["dig", "+short", "stackoverflow.com"],
    shell=True)

# XXX DEFINITELY AVOID THIS
pathological = subprocess.run(["dig +short stackoverflow.com"],
    shell=True)

correct = subprocess.run(["dig", "+short", "stackoverflow.com"],
    # Probably don"t forget these, too
    check=True, text=True)

# XXX Probably better avoid shell=True
# but this is nominally correct
fixed_but_fugly = subprocess.run("dig +short stackoverflow.com",
    shell=True,
    # Probably don"t forget these, too
    check=True, text=True)

The common retort "but it works for me" is not a useful rebuttal unless you understand exactly under what circumstances it could stop working.

Refactoring Example

Very often, the features of the shell can be replaced with native Python code. Simple Awk or sed scripts should probably simply be translated to Python instead.

To partially illustrate this, here is a typical but slightly silly example which involves many shell features.

cmd = """while read -r x;
   do ping -c 3 "$x" | grep "round-trip min/avg/max"
   done <hosts.txt"""

# Trivial but horrible
results = subprocess.run(
    cmd, shell=True, universal_newlines=True, check=True)
print(results.stdout)

# Reimplement with shell=False
with open("hosts.txt") as hosts:
    for host in hosts:
        host = host.rstrip("
")  # drop newline
        ping = subprocess.run(
             ["ping", "-c", "3", host],
             text=True,
             stdout=subprocess.PIPE,
             check=True)
        for line in ping.stdout.split("
"):
             if "round-trip min/avg/max" in line:
                 print("{}: {}".format(host, line))

Some things to note here:

  • With shell=False you don"t need the quoting that the shell requires around strings. Putting quotes anyway is probably an error.
  • It often makes sense to run as little code as possible in a subprocess. This gives you more control over execution from within your Python code.
  • Having said that, complex shell pipelines are tedious and sometimes challenging to reimplement in Python.

The refactored code also illustrates just how much the shell really does for you with a very terse syntax -- for better or for worse. Python says explicit is better than implicit but the Python code is rather verbose and arguably looks more complex than this really is. On the other hand, it offers a number of points where you can grab control in the middle of something else, as trivially exemplified by the enhancement that we can easily include the host name along with the shell command output. (This is by no means challenging to do in the shell, either, but at the expense of yet another diversion and perhaps another process.)

Common Shell Constructs

For completeness, here are brief explanations of some of these shell features, and some notes on how they can perhaps be replaced with native Python facilities.

  • Globbing aka wildcard expansion can be replaced with glob.glob() or very often with simple Python string comparisons like for file in os.listdir("."): if not file.endswith(".png"): continue. Bash has various other expansion facilities like .{png,jpg} brace expansion and {1..100} as well as tilde expansion (~ expands to your home directory, and more generally ~account to the home directory of another user)
  • Shell variables like $SHELL or $my_exported_var can sometimes simply be replaced with Python variables. Exported shell variables are available as e.g. os.environ["SHELL"] (the meaning of export is to make the variable available to subprocesses -- a variable which is not available to subprocesses will obviously not be available to Python running as a subprocess of the shell, or vice versa. The env= keyword argument to subprocess methods allows you to define the environment of the subprocess as a dictionary, so that"s one way to make a Python variable visible to a subprocess). With shell=False you will need to understand how to remove any quotes; for example, cd "$HOME" is equivalent to os.chdir(os.environ["HOME"]) without quotes around the directory name. (Very often cd is not useful or necessary anyway, and many beginners omit the double quotes around the variable and get away with it until one day ...)
  • Redirection allows you to read from a file as your standard input, and write your standard output to a file. grep "foo" <inputfile >outputfile opens outputfile for writing and inputfile for reading, and passes its contents as standard input to grep, whose standard output then lands in outputfile. This is not generally hard to replace with native Python code.
  • Pipelines are a form of redirection. echo foo | nl runs two subprocesses, where the standard output of echo is the standard input of nl (on the OS level, in Unix-like systems, this is a single file handle). If you cannot replace one or both ends of the pipeline with native Python code, perhaps think about using a shell after all, especially if the pipeline has more than two or three processes (though look at the pipes module in the Python standard library or a number of more modern and versatile third-party competitors).
  • Job control lets you interrupt jobs, run them in the background, return them to the foreground, etc. The basic Unix signals to stop and continue a process are of course available from Python, too. But jobs are a higher-level abstraction in the shell which involve process groups etc which you have to understand if you want to do something like this from Python.
  • Quoting in the shell is potentially confusing until you understand that everything is basically a string. So ls -l / is equivalent to "ls" "-l" "/" but the quoting around literals is completely optional. Unquoted strings which contain shell metacharacters undergo parameter expansion, whitespace tokenization and wildcard expansion; double quotes prevent whitespace tokenization and wildcard expansion but allow parameter expansions (variable substitution, command substitution, and backslash processing). This is simple in theory but can get bewildering, especially when there are several layers of interpretation (a remote shell command, for example).

Understand differences between sh and Bash

subprocess runs your shell commands with /bin/sh unless you specifically request otherwise (except of course on Windows, where it uses the value of the COMSPEC variable). This means that various Bash-only features like arrays, [[ etc are not available.

If you need to use Bash-only syntax, you can pass in the path to the shell as executable="/bin/bash" (where of course if your Bash is installed somewhere else, you need to adjust the path).

subprocess.run("""
    # This for loop syntax is Bash only
    for((i=1;i<=$#;i++)); do
        # Arrays are Bash-only
        array[i]+=123
    done""",
    shell=True, check=True,
    executable="/bin/bash")

A subprocess is separate from its parent, and cannot change it

A somewhat common mistake is doing something like

subprocess.run("cd /tmp", shell=True)
subprocess.run("pwd", shell=True)  # Oops, doesn"t print /tmp

The same thing will happen if the first subprocess tries to set an environment variable, which of course will have disappeared when you run another subprocess, etc.

A child process runs completely separate from Python, and when it finishes, Python has no idea what it did (apart from the vague indicators that it can infer from the exit status and output from the child process). A child generally cannot change the parent"s environment; it cannot set a variable, change the working directory, or, in so many words, communicate with its parent without cooperation from the parent.

The immediate fix in this particular case is to run both commands in a single subprocess;

subprocess.run("cd /tmp; pwd", shell=True)

though obviously this particular use case isn"t very useful; instead, use the cwd keyword argument, or simply os.chdir() before running the subprocess. Similarly, for setting a variable, you can manipulate the environment of the current process (and thus also its children) via

os.environ["foo"] = "bar"

or pass an environment setting to a child process with

subprocess.run("echo "$foo"", shell=True, env={"foo": "bar"})

(not to mention the obvious refactoring subprocess.run(["echo", "bar"]); but echo is a poor example of something to run in a subprocess in the first place, of course).

Don"t run Python from Python

This is slightly dubious advice; there are certainly situations where it does make sense or is even an absolute requirement to run the Python interpreter as a subprocess from a Python script. But very frequently, the correct approach is simply to import the other Python module into your calling script and call its functions directly.

If the other Python script is under your control, and it isn"t a module, consider turning it into one. (This answer is too long already so I will not delve into details here.)

If you need parallelism, you can run Python functions in subprocesses with the multiprocessing module. There is also threading which runs multiple tasks in a single process (which is more lightweight and gives you more control, but also more constrained in that threads within a process are tightly coupled, and bound to a single GIL.)

Answer #3

@SpeedCoder5 "s comment deserves to be an answer;

Specifically, you can specify a dynamic working directory; (i.e. whichever directory where the currently-open Python file is located), using "cwd": "${fileDirname}"

If you"re using the Python: Current File (Integrated Terminal) option when you run Python, your launch.json file might look like mine, below.

{
    "version": "0.2.0",
    "configurations": [
    {
            "name": "Python: Current File (Integrated Terminal)",
            "type": "python",
            "request": "launch",
            "program": "${file}",
            "console": "integratedTerminal",
            "cwd": "${fileDirname}"
    }, 

    //... other settings, but I modified the "Current File" setting above ...
}

Remember the launch.json file controls the run/debug settings of your Visual Studio code project; my launch.json file was auto-generated by VS Code, in the directory of my current "Open Project". I just edited the file manually to add "cwd": "${fileDirname}" as shown above.

Remember the launch.json file may be specific to your project, or specific to your directory, so confirm you"re editing the correct launch.json (see comment)

If you don"t have a launch.json file, try this:

To create a launch.json file, open your project folder in VS Code (File > Open Folder) and then select the Configure gear icon on the Debug view top bar.

Answer #4

tl;dr

Call the is_path_exists_or_creatable() function defined below.

Strictly Python 3. That"s just how we roll.

A Tale of Two Questions

The question of "How do I test pathname validity and, for valid pathnames, the existence or writability of those paths?" is clearly two separate questions. Both are interesting, and neither have received a genuinely satisfactory answer here... or, well, anywhere that I could grep.

vikki"s answer probably hews the closest, but has the remarkable disadvantages of:

  • Needlessly opening (...and then failing to reliably close) file handles.
  • Needlessly writing (...and then failing to reliable close or delete) 0-byte files.
  • Ignoring OS-specific errors differentiating between non-ignorable invalid pathnames and ignorable filesystem issues. Unsurprisingly, this is critical under Windows. (See below.)
  • Ignoring race conditions resulting from external processes concurrently (re)moving parent directories of the pathname to be tested. (See below.)
  • Ignoring connection timeouts resulting from this pathname residing on stale, slow, or otherwise temporarily inaccessible filesystems. This could expose public-facing services to potential DoS-driven attacks. (See below.)

We"re gonna fix all that.

Question #0: What"s Pathname Validity Again?

Before hurling our fragile meat suits into the python-riddled moshpits of pain, we should probably define what we mean by "pathname validity." What defines validity, exactly?

By "pathname validity," we mean the syntactic correctness of a pathname with respect to the root filesystem of the current system – regardless of whether that path or parent directories thereof physically exist. A pathname is syntactically correct under this definition if it complies with all syntactic requirements of the root filesystem.

By "root filesystem," we mean:

  • On POSIX-compatible systems, the filesystem mounted to the root directory (/).
  • On Windows, the filesystem mounted to %HOMEDRIVE%, the colon-suffixed drive letter containing the current Windows installation (typically but not necessarily C:).

The meaning of "syntactic correctness," in turn, depends on the type of root filesystem. For ext4 (and most but not all POSIX-compatible) filesystems, a pathname is syntactically correct if and only if that pathname:

  • Contains no null bytes (i.e., x00 in Python). This is a hard requirement for all POSIX-compatible filesystems.
  • Contains no path components longer than 255 bytes (e.g., "a"*256 in Python). A path component is a longest substring of a pathname containing no / character (e.g., bergtatt, ind, i, and fjeldkamrene in the pathname /bergtatt/ind/i/fjeldkamrene).

Syntactic correctness. Root filesystem. That"s it.

Question #1: How Now Shall We Do Pathname Validity?

Validating pathnames in Python is surprisingly non-intuitive. I"m in firm agreement with Fake Name here: the official os.path package should provide an out-of-the-box solution for this. For unknown (and probably uncompelling) reasons, it doesn"t. Fortunately, unrolling your own ad-hoc solution isn"t that gut-wrenching...

O.K., it actually is. It"s hairy; it"s nasty; it probably chortles as it burbles and giggles as it glows. But what you gonna do? Nuthin".

We"ll soon descend into the radioactive abyss of low-level code. But first, let"s talk high-level shop. The standard os.stat() and os.lstat() functions raise the following exceptions when passed invalid pathnames:

  • For pathnames residing in non-existing directories, instances of FileNotFoundError.
  • For pathnames residing in existing directories:
    • Under Windows, instances of WindowsError whose winerror attribute is 123 (i.e., ERROR_INVALID_NAME).
    • Under all other OSes:
    • For pathnames containing null bytes (i.e., "x00"), instances of TypeError.
    • For pathnames containing path components longer than 255 bytes, instances of OSError whose errcode attribute is:
      • Under SunOS and the *BSD family of OSes, errno.ERANGE. (This appears to be an OS-level bug, otherwise referred to as "selective interpretation" of the POSIX standard.)
      • Under all other OSes, errno.ENAMETOOLONG.

Crucially, this implies that only pathnames residing in existing directories are validatable. The os.stat() and os.lstat() functions raise generic FileNotFoundError exceptions when passed pathnames residing in non-existing directories, regardless of whether those pathnames are invalid or not. Directory existence takes precedence over pathname invalidity.

Does this mean that pathnames residing in non-existing directories are not validatable? Yes – unless we modify those pathnames to reside in existing directories. Is that even safely feasible, however? Shouldn"t modifying a pathname prevent us from validating the original pathname?

To answer this question, recall from above that syntactically correct pathnames on the ext4 filesystem contain no path components (A) containing null bytes or (B) over 255 bytes in length. Hence, an ext4 pathname is valid if and only if all path components in that pathname are valid. This is true of most real-world filesystems of interest.

Does that pedantic insight actually help us? Yes. It reduces the larger problem of validating the full pathname in one fell swoop to the smaller problem of only validating all path components in that pathname. Any arbitrary pathname is validatable (regardless of whether that pathname resides in an existing directory or not) in a cross-platform manner by following the following algorithm:

  1. Split that pathname into path components (e.g., the pathname /troldskog/faren/vild into the list ["", "troldskog", "faren", "vild"]).
  2. For each such component:
    1. Join the pathname of a directory guaranteed to exist with that component into a new temporary pathname (e.g., /troldskog) .
    2. Pass that pathname to os.stat() or os.lstat(). If that pathname and hence that component is invalid, this call is guaranteed to raise an exception exposing the type of invalidity rather than a generic FileNotFoundError exception. Why? Because that pathname resides in an existing directory. (Circular logic is circular.)

Is there a directory guaranteed to exist? Yes, but typically only one: the topmost directory of the root filesystem (as defined above).

Passing pathnames residing in any other directory (and hence not guaranteed to exist) to os.stat() or os.lstat() invites race conditions, even if that directory was previously tested to exist. Why? Because external processes cannot be prevented from concurrently removing that directory after that test has been performed but before that pathname is passed to os.stat() or os.lstat(). Unleash the dogs of mind-fellating insanity!

There exists a substantial side benefit to the above approach as well: security. (Isn"t that nice?) Specifically:

Front-facing applications validating arbitrary pathnames from untrusted sources by simply passing such pathnames to os.stat() or os.lstat() are susceptible to Denial of Service (DoS) attacks and other black-hat shenanigans. Malicious users may attempt to repeatedly validate pathnames residing on filesystems known to be stale or otherwise slow (e.g., NFS Samba shares); in that case, blindly statting incoming pathnames is liable to either eventually fail with connection timeouts or consume more time and resources than your feeble capacity to withstand unemployment.

The above approach obviates this by only validating the path components of a pathname against the root directory of the root filesystem. (If even that"s stale, slow, or inaccessible, you"ve got larger problems than pathname validation.)

Lost? Great. Let"s begin. (Python 3 assumed. See "What Is Fragile Hope for 300, leycec?")

import errno, os

# Sadly, Python fails to provide the following magic number for us.
ERROR_INVALID_NAME = 123
"""
Windows-specific error code indicating an invalid pathname.

See Also
----------
https://docs.microsoft.com/en-us/windows/win32/debug/system-error-codes--0-499-
    Official listing of all such codes.
"""

def is_pathname_valid(pathname: str) -> bool:
    """
    `True` if the passed pathname is a valid pathname for the current OS;
    `False` otherwise.
    """
    # If this pathname is either not a string or is but is empty, this pathname
    # is invalid.
    try:
        if not isinstance(pathname, str) or not pathname:
            return False

        # Strip this pathname"s Windows-specific drive specifier (e.g., `C:`)
        # if any. Since Windows prohibits path components from containing `:`
        # characters, failing to strip this `:`-suffixed prefix would
        # erroneously invalidate all valid absolute Windows pathnames.
        _, pathname = os.path.splitdrive(pathname)

        # Directory guaranteed to exist. If the current OS is Windows, this is
        # the drive to which Windows was installed (e.g., the "%HOMEDRIVE%"
        # environment variable); else, the typical root directory.
        root_dirname = os.environ.get("HOMEDRIVE", "C:") 
            if sys.platform == "win32" else os.path.sep
        assert os.path.isdir(root_dirname)   # ...Murphy and her ironclad Law

        # Append a path separator to this directory if needed.
        root_dirname = root_dirname.rstrip(os.path.sep) + os.path.sep

        # Test whether each path component split from this pathname is valid or
        # not, ignoring non-existent and non-readable path components.
        for pathname_part in pathname.split(os.path.sep):
            try:
                os.lstat(root_dirname + pathname_part)
            # If an OS-specific exception is raised, its error code
            # indicates whether this pathname is valid or not. Unless this
            # is the case, this exception implies an ignorable kernel or
            # filesystem complaint (e.g., path not found or inaccessible).
            #
            # Only the following exceptions indicate invalid pathnames:
            #
            # * Instances of the Windows-specific "WindowsError" class
            #   defining the "winerror" attribute whose value is
            #   "ERROR_INVALID_NAME". Under Windows, "winerror" is more
            #   fine-grained and hence useful than the generic "errno"
            #   attribute. When a too-long pathname is passed, for example,
            #   "errno" is "ENOENT" (i.e., no such file or directory) rather
            #   than "ENAMETOOLONG" (i.e., file name too long).
            # * Instances of the cross-platform "OSError" class defining the
            #   generic "errno" attribute whose value is either:
            #   * Under most POSIX-compatible OSes, "ENAMETOOLONG".
            #   * Under some edge-case OSes (e.g., SunOS, *BSD), "ERANGE".
            except OSError as exc:
                if hasattr(exc, "winerror"):
                    if exc.winerror == ERROR_INVALID_NAME:
                        return False
                elif exc.errno in {errno.ENAMETOOLONG, errno.ERANGE}:
                    return False
    # If a "TypeError" exception was raised, it almost certainly has the
    # error message "embedded NUL character" indicating an invalid pathname.
    except TypeError as exc:
        return False
    # If no exception was raised, all path components and hence this
    # pathname itself are valid. (Praise be to the curmudgeonly python.)
    else:
        return True
    # If any other exception was raised, this is an unrelated fatal issue
    # (e.g., a bug). Permit this exception to unwind the call stack.
    #
    # Did we mention this should be shipped with Python already?

Done. Don"t squint at that code. (It bites.)

Question #2: Possibly Invalid Pathname Existence or Creatability, Eh?

Testing the existence or creatability of possibly invalid pathnames is, given the above solution, mostly trivial. The little key here is to call the previously defined function before testing the passed path:

def is_path_creatable(pathname: str) -> bool:
    """
    `True` if the current user has sufficient permissions to create the passed
    pathname; `False` otherwise.
    """
    # Parent directory of the passed path. If empty, we substitute the current
    # working directory (CWD) instead.
    dirname = os.path.dirname(pathname) or os.getcwd()
    return os.access(dirname, os.W_OK)

def is_path_exists_or_creatable(pathname: str) -> bool:
    """
    `True` if the passed pathname is a valid pathname for the current OS _and_
    either currently exists or is hypothetically creatable; `False` otherwise.

    This function is guaranteed to _never_ raise exceptions.
    """
    try:
        # To prevent "os" module calls from raising undesirable exceptions on
        # invalid pathnames, is_pathname_valid() is explicitly called first.
        return is_pathname_valid(pathname) and (
            os.path.exists(pathname) or is_path_creatable(pathname))
    # Report failure on non-fatal filesystem complaints (e.g., connection
    # timeouts, permissions issues) implying this path to be inaccessible. All
    # other exceptions are unrelated fatal issues and should not be caught here.
    except OSError:
        return False

Done and done. Except not quite.

Question #3: Possibly Invalid Pathname Existence or Writability on Windows

There exists a caveat. Of course there does.

As the official os.access() documentation admits:

Note: I/O operations may fail even when os.access() indicates that they would succeed, particularly for operations on network filesystems which may have permissions semantics beyond the usual POSIX permission-bit model.

To no one"s surprise, Windows is the usual suspect here. Thanks to extensive use of Access Control Lists (ACL) on NTFS filesystems, the simplistic POSIX permission-bit model maps poorly to the underlying Windows reality. While this (arguably) isn"t Python"s fault, it might nonetheless be of concern for Windows-compatible applications.

If this is you, a more robust alternative is wanted. If the passed path does not exist, we instead attempt to create a temporary file guaranteed to be immediately deleted in the parent directory of that path – a more portable (if expensive) test of creatability:

import os, tempfile

def is_path_sibling_creatable(pathname: str) -> bool:
    """
    `True` if the current user has sufficient permissions to create **siblings**
    (i.e., arbitrary files in the parent directory) of the passed pathname;
    `False` otherwise.
    """
    # Parent directory of the passed path. If empty, we substitute the current
    # working directory (CWD) instead.
    dirname = os.path.dirname(pathname) or os.getcwd()

    try:
        # For safety, explicitly close and hence delete this temporary file
        # immediately after creating it in the passed path"s parent directory.
        with tempfile.TemporaryFile(dir=dirname): pass
        return True
    # While the exact type of exception raised by the above function depends on
    # the current version of the Python interpreter, all such types subclass the
    # following exception superclass.
    except EnvironmentError:
        return False

def is_path_exists_or_creatable_portable(pathname: str) -> bool:
    """
    `True` if the passed pathname is a valid pathname on the current OS _and_
    either currently exists or is hypothetically creatable in a cross-platform
    manner optimized for POSIX-unfriendly filesystems; `False` otherwise.

    This function is guaranteed to _never_ raise exceptions.
    """
    try:
        # To prevent "os" module calls from raising undesirable exceptions on
        # invalid pathnames, is_pathname_valid() is explicitly called first.
        return is_pathname_valid(pathname) and (
            os.path.exists(pathname) or is_path_sibling_creatable(pathname))
    # Report failure on non-fatal filesystem complaints (e.g., connection
    # timeouts, permissions issues) implying this path to be inaccessible. All
    # other exceptions are unrelated fatal issues and should not be caught here.
    except OSError:
        return False

Note, however, that even this may not be enough.

Thanks to User Access Control (UAC), the ever-inimicable Windows Vista and all subsequent iterations thereof blatantly lie about permissions pertaining to system directories. When non-Administrator users attempt to create files in either the canonical C:Windows or C:Windowssystem32 directories, UAC superficially permits the user to do so while actually isolating all created files into a "Virtual Store" in that user"s profile. (Who could have possibly imagined that deceiving users would have harmful long-term consequences?)

This is crazy. This is Windows.

Prove It

Dare we? It"s time to test-drive the above tests.

Since NULL is the only character prohibited in pathnames on UNIX-oriented filesystems, let"s leverage that to demonstrate the cold, hard truth – ignoring non-ignorable Windows shenanigans, which frankly bore and anger me in equal measure:

>>> print(""foo.bar" valid? " + str(is_pathname_valid("foo.bar")))
"foo.bar" valid? True
>>> print("Null byte valid? " + str(is_pathname_valid("x00")))
Null byte valid? False
>>> print("Long path valid? " + str(is_pathname_valid("a" * 256)))
Long path valid? False
>>> print(""/dev" exists or creatable? " + str(is_path_exists_or_creatable("/dev")))
"/dev" exists or creatable? True
>>> print(""/dev/foo.bar" exists or creatable? " + str(is_path_exists_or_creatable("/dev/foo.bar")))
"/dev/foo.bar" exists or creatable? False
>>> print("Null byte exists or creatable? " + str(is_path_exists_or_creatable("x00")))
Null byte exists or creatable? False

Beyond sanity. Beyond pain. You will find Python portability concerns.

Answer #5

pathlib module, introduced in Python 3.4 (PEP 428 — The pathlib module — object-oriented filesystem paths), makes path-related experience much much better.

$ pwd
/home/skovorodkin/stack
$ tree
.
└── scripts
    ├── 1.py
    └── 2.py

In order to get current working directory use Path.cwd():

from pathlib import Path

print(Path.cwd())  # /home/skovorodkin/stack

To get an absolute path to your script file, use Path.resolve() method:

print(Path(__file__).resolve())  # /home/skovorodkin/stack/scripts/1.py

And to get path of a directory where your script is located, access .parent (it is recommended to call .resolve() before .parent):

print(Path(__file__).resolve().parent)  # /home/skovorodkin/stack/scripts

Remember that __file__ is not reliable in some situations: How do I get the path of the current executed file in Python?.


Please note, that Path.cwd(), Path.resolve() and other Path methods return path objects (PosixPath in my case), not strings. In Python 3.4 and 3.5 that caused some pain, because open built-in function could only work with string or bytes objects, and did not support Path objects, so you had to convert Path objects to strings or use Path.open() method, but the latter option required you to change old code:

$ cat scripts/2.py
from pathlib import Path

p = Path(__file__).resolve()

with p.open() as f: pass
with open(str(p)) as f: pass
with open(p) as f: pass

print("OK")

$ python3.5 scripts/2.py
Traceback (most recent call last):
  File "scripts/2.py", line 11, in <module>
    with open(p) as f:
TypeError: invalid file: PosixPath("/home/skovorodkin/stack/scripts/2.py")

As you can see open(p) does not work with Python 3.5.

PEP 519 — Adding a file system path protocol, implemented in Python 3.6, adds support of PathLike objects to open function, so now you can pass Path objects to open function directly:

$ python3.6 scripts/2.py
OK

Answer #6

Tensorflow 2 Docs

Saving Checkpoints

Adapted from the docs

# -------------------------
# -----  Toy Context  -----
# -------------------------
import tensorflow as tf


class Net(tf.keras.Model):
    """A simple linear model."""

    def __init__(self):
        super(Net, self).__init__()
        self.l1 = tf.keras.layers.Dense(5)

    def call(self, x):
        return self.l1(x)


def toy_dataset():
    inputs = tf.range(10.0)[:, None]
    labels = inputs * 5.0 + tf.range(5.0)[None, :]
    return (
        tf.data.Dataset.from_tensor_slices(dict(x=inputs, y=labels)).repeat().batch(2)
    )


def train_step(net, example, optimizer):
    """Trains `net` on `example` using `optimizer`."""
    with tf.GradientTape() as tape:
        output = net(example["x"])
        loss = tf.reduce_mean(tf.abs(output - example["y"]))
    variables = net.trainable_variables
    gradients = tape.gradient(loss, variables)
    optimizer.apply_gradients(zip(gradients, variables))
    return loss


# ----------------------------
# -----  Create Objects  -----
# ----------------------------

net = Net()
opt = tf.keras.optimizers.Adam(0.1)
dataset = toy_dataset()
iterator = iter(dataset)
ckpt = tf.train.Checkpoint(
    step=tf.Variable(1), optimizer=opt, net=net, iterator=iterator
)
manager = tf.train.CheckpointManager(ckpt, "./tf_ckpts", max_to_keep=3)

# ----------------------------
# -----  Train and Save  -----
# ----------------------------

ckpt.restore(manager.latest_checkpoint)
if manager.latest_checkpoint:
    print("Restored from {}".format(manager.latest_checkpoint))
else:
    print("Initializing from scratch.")

for _ in range(50):
    example = next(iterator)
    loss = train_step(net, example, opt)
    ckpt.step.assign_add(1)
    if int(ckpt.step) % 10 == 0:
        save_path = manager.save()
        print("Saved checkpoint for step {}: {}".format(int(ckpt.step), save_path))
        print("loss {:1.2f}".format(loss.numpy()))


# ---------------------
# -----  Restore  -----
# ---------------------

# In another script, re-initialize objects
opt = tf.keras.optimizers.Adam(0.1)
net = Net()
dataset = toy_dataset()
iterator = iter(dataset)
ckpt = tf.train.Checkpoint(
    step=tf.Variable(1), optimizer=opt, net=net, iterator=iterator
)
manager = tf.train.CheckpointManager(ckpt, "./tf_ckpts", max_to_keep=3)

# Re-use the manager code above ^

ckpt.restore(manager.latest_checkpoint)
if manager.latest_checkpoint:
    print("Restored from {}".format(manager.latest_checkpoint))
else:
    print("Initializing from scratch.")

for _ in range(50):
    example = next(iterator)
    # Continue training or evaluate etc.

More links

Checkpoints capture the exact value of all parameters (tf.Variable objects) used by a model. Checkpoints do not contain any description of the computation defined by the model and thus are typically only useful when source code that will use the saved parameter values is available.

The SavedModel format on the other hand includes a serialized description of the computation defined by the model in addition to the parameter values (checkpoint). Models in this format are independent of the source code that created the model. They are thus suitable for deployment via TensorFlow Serving, TensorFlow Lite, TensorFlow.js, or programs in other programming languages (the C, C++, Java, Go, Rust, C# etc. TensorFlow APIs).

(Highlights are my own)


Tensorflow < 2


From the docs:

Save

# Create some variables.
v1 = tf.get_variable("v1", shape=[3], initializer = tf.zeros_initializer)
v2 = tf.get_variable("v2", shape=[5], initializer = tf.zeros_initializer)

inc_v1 = v1.assign(v1+1)
dec_v2 = v2.assign(v2-1)

# Add an op to initialize the variables.
init_op = tf.global_variables_initializer()

# Add ops to save and restore all the variables.
saver = tf.train.Saver()

# Later, launch the model, initialize the variables, do some work, and save the
# variables to disk.
with tf.Session() as sess:
  sess.run(init_op)
  # Do some work with the model.
  inc_v1.op.run()
  dec_v2.op.run()
  # Save the variables to disk.
  save_path = saver.save(sess, "/tmp/model.ckpt")
  print("Model saved in path: %s" % save_path)

Restore

tf.reset_default_graph()

# Create some variables.
v1 = tf.get_variable("v1", shape=[3])
v2 = tf.get_variable("v2", shape=[5])

# Add ops to save and restore all the variables.
saver = tf.train.Saver()

# Later, launch the model, use the saver to restore variables from disk, and
# do some work with the model.
with tf.Session() as sess:
  # Restore variables from disk.
  saver.restore(sess, "/tmp/model.ckpt")
  print("Model restored.")
  # Check the values of the variables
  print("v1 : %s" % v1.eval())
  print("v2 : %s" % v2.eval())

simple_save

Many good answer, for completeness I"ll add my 2 cents: simple_save. Also a standalone code example using the tf.data.Dataset API.

Python 3 ; Tensorflow 1.14

import tensorflow as tf
from tensorflow.saved_model import tag_constants

with tf.Graph().as_default():
    with tf.Session() as sess:
        ...

        # Saving
        inputs = {
            "batch_size_placeholder": batch_size_placeholder,
            "features_placeholder": features_placeholder,
            "labels_placeholder": labels_placeholder,
        }
        outputs = {"prediction": model_output}
        tf.saved_model.simple_save(
            sess, "path/to/your/location/", inputs, outputs
        )

Restoring:

graph = tf.Graph()
with restored_graph.as_default():
    with tf.Session() as sess:
        tf.saved_model.loader.load(
            sess,
            [tag_constants.SERVING],
            "path/to/your/location/",
        )
        batch_size_placeholder = graph.get_tensor_by_name("batch_size_placeholder:0")
        features_placeholder = graph.get_tensor_by_name("features_placeholder:0")
        labels_placeholder = graph.get_tensor_by_name("labels_placeholder:0")
        prediction = restored_graph.get_tensor_by_name("dense/BiasAdd:0")

        sess.run(prediction, feed_dict={
            batch_size_placeholder: some_value,
            features_placeholder: some_other_value,
            labels_placeholder: another_value
        })

Standalone example

Original blog post

The following code generates random data for the sake of the demonstration.

  1. We start by creating the placeholders. They will hold the data at runtime. From them, we create the Dataset and then its Iterator. We get the iterator"s generated tensor, called input_tensor which will serve as input to our model.
  2. The model itself is built from input_tensor: a GRU-based bidirectional RNN followed by a dense classifier. Because why not.
  3. The loss is a softmax_cross_entropy_with_logits, optimized with Adam. After 2 epochs (of 2 batches each), we save the "trained" model with tf.saved_model.simple_save. If you run the code as is, then the model will be saved in a folder called simple/ in your current working directory.
  4. In a new graph, we then restore the saved model with tf.saved_model.loader.load. We grab the placeholders and logits with graph.get_tensor_by_name and the Iterator initializing operation with graph.get_operation_by_name.
  5. Lastly we run an inference for both batches in the dataset, and check that the saved and restored model both yield the same values. They do!

Code:

import os
import shutil
import numpy as np
import tensorflow as tf
from tensorflow.python.saved_model import tag_constants


def model(graph, input_tensor):
    """Create the model which consists of
    a bidirectional rnn (GRU(10)) followed by a dense classifier

    Args:
        graph (tf.Graph): Tensors" graph
        input_tensor (tf.Tensor): Tensor fed as input to the model

    Returns:
        tf.Tensor: the model"s output layer Tensor
    """
    cell = tf.nn.rnn_cell.GRUCell(10)
    with graph.as_default():
        ((fw_outputs, bw_outputs), (fw_state, bw_state)) = tf.nn.bidirectional_dynamic_rnn(
            cell_fw=cell,
            cell_bw=cell,
            inputs=input_tensor,
            sequence_length=[10] * 32,
            dtype=tf.float32,
            swap_memory=True,
            scope=None)
        outputs = tf.concat((fw_outputs, bw_outputs), 2)
        mean = tf.reduce_mean(outputs, axis=1)
        dense = tf.layers.dense(mean, 5, activation=None)

        return dense


def get_opt_op(graph, logits, labels_tensor):
    """Create optimization operation from model"s logits and labels

    Args:
        graph (tf.Graph): Tensors" graph
        logits (tf.Tensor): The model"s output without activation
        labels_tensor (tf.Tensor): Target labels

    Returns:
        tf.Operation: the operation performing a stem of Adam optimizer
    """
    with graph.as_default():
        with tf.variable_scope("loss"):
            loss = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(
                    logits=logits, labels=labels_tensor, name="xent"),
                    name="mean-xent"
                    )
        with tf.variable_scope("optimizer"):
            opt_op = tf.train.AdamOptimizer(1e-2).minimize(loss)
        return opt_op


if __name__ == "__main__":
    # Set random seed for reproducibility
    # and create synthetic data
    np.random.seed(0)
    features = np.random.randn(64, 10, 30)
    labels = np.eye(5)[np.random.randint(0, 5, (64,))]

    graph1 = tf.Graph()
    with graph1.as_default():
        # Random seed for reproducibility
        tf.set_random_seed(0)
        # Placeholders
        batch_size_ph = tf.placeholder(tf.int64, name="batch_size_ph")
        features_data_ph = tf.placeholder(tf.float32, [None, None, 30], "features_data_ph")
        labels_data_ph = tf.placeholder(tf.int32, [None, 5], "labels_data_ph")
        # Dataset
        dataset = tf.data.Dataset.from_tensor_slices((features_data_ph, labels_data_ph))
        dataset = dataset.batch(batch_size_ph)
        iterator = tf.data.Iterator.from_structure(dataset.output_types, dataset.output_shapes)
        dataset_init_op = iterator.make_initializer(dataset, name="dataset_init")
        input_tensor, labels_tensor = iterator.get_next()

        # Model
        logits = model(graph1, input_tensor)
        # Optimization
        opt_op = get_opt_op(graph1, logits, labels_tensor)

        with tf.Session(graph=graph1) as sess:
            # Initialize variables
            tf.global_variables_initializer().run(session=sess)
            for epoch in range(3):
                batch = 0
                # Initialize dataset (could feed epochs in Dataset.repeat(epochs))
                sess.run(
                    dataset_init_op,
                    feed_dict={
                        features_data_ph: features,
                        labels_data_ph: labels,
                        batch_size_ph: 32
                    })
                values = []
                while True:
                    try:
                        if epoch < 2:
                            # Training
                            _, value = sess.run([opt_op, logits])
                            print("Epoch {}, batch {} | Sample value: {}".format(epoch, batch, value[0]))
                            batch += 1
                        else:
                            # Final inference
                            values.append(sess.run(logits))
                            print("Epoch {}, batch {} | Final inference | Sample value: {}".format(epoch, batch, values[-1][0]))
                            batch += 1
                    except tf.errors.OutOfRangeError:
                        break
            # Save model state
            print("
Saving...")
            cwd = os.getcwd()
            path = os.path.join(cwd, "simple")
            shutil.rmtree(path, ignore_errors=True)
            inputs_dict = {
                "batch_size_ph": batch_size_ph,
                "features_data_ph": features_data_ph,
                "labels_data_ph": labels_data_ph
            }
            outputs_dict = {
                "logits": logits
            }
            tf.saved_model.simple_save(
                sess, path, inputs_dict, outputs_dict
            )
            print("Ok")
    # Restoring
    graph2 = tf.Graph()
    with graph2.as_default():
        with tf.Session(graph=graph2) as sess:
            # Restore saved values
            print("
Restoring...")
            tf.saved_model.loader.load(
                sess,
                [tag_constants.SERVING],
                path
            )
            print("Ok")
            # Get restored placeholders
            labels_data_ph = graph2.get_tensor_by_name("labels_data_ph:0")
            features_data_ph = graph2.get_tensor_by_name("features_data_ph:0")
            batch_size_ph = graph2.get_tensor_by_name("batch_size_ph:0")
            # Get restored model output
            restored_logits = graph2.get_tensor_by_name("dense/BiasAdd:0")
            # Get dataset initializing operation
            dataset_init_op = graph2.get_operation_by_name("dataset_init")

            # Initialize restored dataset
            sess.run(
                dataset_init_op,
                feed_dict={
                    features_data_ph: features,
                    labels_data_ph: labels,
                    batch_size_ph: 32
                }

            )
            # Compute inference for both batches in dataset
            restored_values = []
            for i in range(2):
                restored_values.append(sess.run(restored_logits))
                print("Restored values: ", restored_values[i][0])

    # Check if original inference and restored inference are equal
    valid = all((v == rv).all() for v, rv in zip(values, restored_values))
    print("
Inferences match: ", valid)

This will print:

$ python3 save_and_restore.py

Epoch 0, batch 0 | Sample value: [-0.13851789 -0.3087595   0.12804556  0.20013677 -0.08229901]
Epoch 0, batch 1 | Sample value: [-0.00555491 -0.04339041 -0.05111827 -0.2480045  -0.00107776]
Epoch 1, batch 0 | Sample value: [-0.19321944 -0.2104792  -0.00602257  0.07465433  0.11674127]
Epoch 1, batch 1 | Sample value: [-0.05275984  0.05981954 -0.15913513 -0.3244143   0.10673307]
Epoch 2, batch 0 | Final inference | Sample value: [-0.26331693 -0.13013336 -0.12553    -0.04276478  0.2933622 ]
Epoch 2, batch 1 | Final inference | Sample value: [-0.07730117  0.11119192 -0.20817074 -0.35660955  0.16990358]

Saving...
INFO:tensorflow:Assets added to graph.
INFO:tensorflow:No assets to write.
INFO:tensorflow:SavedModel written to: b"/some/path/simple/saved_model.pb"
Ok

Restoring...
INFO:tensorflow:Restoring parameters from b"/some/path/simple/variables/variables"
Ok
Restored values:  [-0.26331693 -0.13013336 -0.12553    -0.04276478  0.2933622 ]
Restored values:  [-0.07730117  0.11119192 -0.20817074 -0.35660955  0.16990358]

Inferences match:  True

Answer #7

What is the __main__.py file for?

When creating a Python module, it is common to make the module execute some functionality (usually contained in a main function) when run as the entry point of the program. This is typically done with the following common idiom placed at the bottom of most Python files:

if __name__ == "__main__":
    # execute only if run as the entry point into the program
    main()

You can get the same semantics for a Python package with __main__.py, which might have the following structure:

.
└── demo
    ├── __init__.py
    └── __main__.py

To see this, paste the below into a Python 3 shell:

from pathlib import Path

demo = Path.cwd() / "demo"
demo.mkdir()

(demo / "__init__.py").write_text("""
print("demo/__init__.py executed")

def main():
    print("main() executed")
""")

(demo / "__main__.py").write_text("""
print("demo/__main__.py executed")

from demo import main

main()
""")

We can treat demo as a package and actually import it, which executes the top-level code in the __init__.py (but not the main function):

>>> import demo
demo/__init__.py executed

When we use the package as the entry point to the program, we perform the code in the __main__.py, which imports the __init__.py first:

$ python -m demo
demo/__init__.py executed
demo/__main__.py executed
main() executed

You can derive this from the documentation. The documentation says:

__main__ — Top-level script environment

"__main__" is the name of the scope in which top-level code executes. A module’s __name__ is set equal to "__main__" when read from standard input, a script, or from an interactive prompt.

A module can discover whether or not it is running in the main scope by checking its own __name__, which allows a common idiom for conditionally executing code in a module when it is run as a script or with python -m but not when it is imported:

if __name__ == "__main__":
     # execute only if run as a script
     main()

For a package, the same effect can be achieved by including a __main__.py module, the contents of which will be executed when the module is run with -m.

Zipped

You can also zip up this directory, including the __main__.py, into a single file and run it from the command line like this - but note that zipped packages can"t execute sub-packages or submodules as the entry point:

from pathlib import Path

demo = Path.cwd() / "demo2"
demo.mkdir()

(demo / "__init__.py").write_text("""
print("demo2/__init__.py executed")

def main():
    print("main() executed")
""")

(demo / "__main__.py").write_text("""
print("demo2/__main__.py executed")

from __init__ import main

main()
""")

Note the subtle change - we are importing main from __init__ instead of demo2 - this zipped directory is not being treated as a package, but as a directory of scripts. So it must be used without the -m flag.

Particularly relevant to the question - zipapp causes the zipped directory to execute the __main__.py by default - and it is executed first, before __init__.py:

$ python -m zipapp demo2 -o demo2zip
$ python demo2zip
demo2/__main__.py executed
demo2/__init__.py executed
main() executed

Note again, this zipped directory is not a package - you cannot import it either.

Answer #8

As an extension to @VinaySajip answer. There are additional nargs worth mentioning.

  1. parser.add_argument("dir", nargs=1, default=os.getcwd())

N (an integer). N arguments from the command line will be gathered together into a list

  1. parser.add_argument("dir", nargs="*", default=os.getcwd())

"*". All command-line arguments present are gathered into a list. Note that it generally doesn"t make much sense to have more than one positional argument with nargs="*", but multiple optional arguments with nargs="*" is possible.

  1. parser.add_argument("dir", nargs="+", default=os.getcwd())

"+". Just like "*", all command-line args present are gathered into a list. Additionally, an error message will be generated if there wasn’t at least one command-line argument present.

  1. parser.add_argument("dir", nargs=argparse.REMAINDER, default=os.getcwd())

argparse.REMAINDER. All the remaining command-line arguments are gathered into a list. This is commonly useful for command line utilities that dispatch to other command line utilities

If the nargs keyword argument is not provided, the number of arguments consumed is determined by the action. Generally this means a single command-line argument will be consumed and a single item (not a list) will be produced.

Edit (copied from a comment by @Acumenus) nargs="?" The docs say: "?". One argument will be consumed from the command line if possible and produced as a single item. If no command-line argument is present, the value from default will be produced.

Answer #9

I have summarized my now fully working solution, OpenCV-Python - How to install OpenCV-Python package to Anaconda (Windows). Nevertheless I"ve copied and pasted the important bits to this post.


At the time of writing I was using Windows 8.1, 64-bit machine, Anaconda/ Python 2.x. (see notes below - this works also for Windows 10, and likely Python 3.x too).

  • NOTE 1: as mentioned mentioned by @great_raisin (thank you) in comment section however, this solution appears to also work for Windows 10.

  • NOTE 2: this will probably work for Anaconda/Python 3.x too. If you are using Windows 10 and Anaconda/Python 3.x, and this solution works, please add a comment below. Thanks! (Update: noting from comment "Working on Windows 10")

  • NOTE 3: depending on whether you are using Python 2.x or 3.x, just adjust the print statement accordingly in code snippets. i.e. in Python 3.x it would be print("hello"), and in Python 2.x it would be print "hello".

TL;DR

To use OpenCV fully with Anaconda (and Spyder IDE), we need to:

  1. Download the OpenCV package from the official OpenCV site
  2. Copy and paste the cv2.pyd to the Anaconda site-packages directory.
  3. Set user environmental variables so that Anaconda knows where to find the FFMPEG utility.
  4. Do some testing to confirm OpenCV and FFMPEG are now working.

(Read on for the detail instructions...)

Prerequisite

Install Anaconda

Anaconda is essentially a nicely packaged Python IDE that is shipped with tons of useful packages, such as NumPy, Pandas, IPython Notebook, etc. It seems to be recommended everywhere in the scientific community. Check out Anaconda to get it installed.

Install OpenCV-Python to Anaconda

Cautious Note: I originally tried out installing the binstar.org OpenCV package, as suggested. That method however does not include the FFMPEG codec - i.e. you may be able to use OpenCV, but you won"t be able to process videos.

The following instruction works for me is inspired by this OpenCV YouTube video. So far I have got it working on both my desktop and laptop, both 64-bit machines and Windows 8.1.

Download OpenCV Package

Firstly, go to the official OpenCV site to download the complete OpenCV package. Pick a version you like (2.x or 3.x). I am on Python 2.x and OpenCV 3.x - mainly because this is how the OpenCV-Python Tutorials are setup/based on.

In my case, I"ve extracted the package (essentially a folder) straight to my C drive (C:opencv).

Copy and Paste the cv2.pyd file

The Anaconda Site-packages directory (e.g. C:UsersJohnnyAnacondaLibsite-packages in my case) contains the Python packages that you may import. Our goal is to copy and paste the cv2.pyd file to this directory (so that we can use the import cv2 in our Python codes.).

To do this, copy the cv2.pyd file...

From this OpenCV directory (the beginning part might be slightly different on your machine). For Python 3.x, I guess, just change the 2.x to 3.x accordingly.

# Python 2.7 and 32-bit machine:
C:opencvuildpython2.7x84

# Python 2.7 and 64-bit machine:
C:opencvuildpython2.7x64

To this Anaconda directory (the beginning part might be slightly different on your machine):

C:UsersJohnnyAnacondaLibsite-packages

After performing this step we shall now be able to use import cv2 in Python code. BUT, we still need to do a little bit more work to get FFMPEG (video codec) to work (to enable us to do things like processing videos).

Set Environmental Variables

Right-click on "My Computer" (or "This PC" on Windows 8.1) ‚Üí left-click Properties ‚Üí left-click "Advanced" tab ‚Üí left-click "Environment Variables..." button.

Add a new User Variable to point to the OpenCV (either x86 for 32-bit system or x64 for 64-bit system). I am currently on a 64-bit machine.

| 32-bit or 64 bit machine? | Variable     | Value                                |
|---------------------------|--------------|--------------------------------------|
| 32-bit                    | `OPENCV_DIR` | `C:opencvuildx86vc12`           |
| 64-bit                    | `OPENCV_DIR` | `C:opencvuildx64vc12`           |

Append %OPENCV_DIR%in to the User Variable PATH.

For example, my PATH user variable looks like this...

Before:

C:UsersJohnnyAnaconda;C:UsersJohnnyAnacondaScripts

After:

C:UsersJohnnyAnaconda;C:UsersJohnnyAnacondaScripts;%OPENCV_DIR%in

This is it we are done! FFMPEG is ready to be used!

Test to confirm

We need to test whether we can now do these in Anaconda (via Spyder IDE):

  • Import OpenCV package
  • Use the FFMPEG utility (to read/write/process videos)

Test 1: Can we import OpenCV?

To confirm that Anaconda is now able to import the OpenCV-Python package (namely, cv2), issue these in the IPython console:

import cv2
print cv2.__version__

If the package cv2 is imported OK with no errors, and the cv2 version is printed out, then we are all good! Here is a snapshot:

import-cv2-ok-in-anaconda-python-2.png
(source: mathalope.co.uk)

Test 2: Can we Use the FFMPEG codec?

Place a sample input_video.mp4 video file in a directory. We want to test whether we can:

  • read this .mp4 video file, and
  • write out a new video file (can be .avi or .mp4 etc.)

To do this we need to have a test Python code, call it test.py. Place it in the same directory as the sample input_video.mp4 file.

This is what test.py may look like (I"ve listed out both newer and older version codes here - do let us know which one works / not work for you!).

(Newer version...)

import cv2
cap = cv2.VideoCapture("input_video.mp4")
print cap.isOpened()   # True = read video successfully. False - fail to read video.

fourcc = cv2.VideoWriter_fourcc(*"XVID")
out = cv2.VideoWriter("output_video.avi", fourcc, 20.0, (640, 360))
print out.isOpened()  # True = write out video successfully. False - fail to write out video.

cap.release()
out.release()

(Or the older version...)

import cv2
cv2.VideoCapture("input_video.mp4")
print cv2.isOpened()   # True = read video successfully. False - fail to read video.

fourcc = cv2.cv.CV_FOURCC(*"XVID")
out = cv2.VideoWriter("output_video.avi",fourcc, 20.0, (640,360))
print out.isOpened()  # True = write out video successfully. False - fail to write out video.

cap.release()
out.release()

This test is VERY IMPORTANT. If you"d like to process video files, you"d need to ensure that Anaconda / Spyder IDE can use the FFMPEG (video codec). It took me days to have got it working. But I hope it would take you much less time! :)

Note: One more very important tip when using the Anaconda Spyder IDE. Make sure you check the current working directory (CWD)!!!

Conclusion

To use OpenCV fully with Anaconda (and Spyder IDE), we need to:

  1. Download the OpenCV package from the official OpenCV site
  2. Copy and paste the cv2.pyd to the Anaconda site-packages directory.
  3. Set user environmental variables so that Anaconda knows where to find the FFMPEG utility.
  4. Do some testing to confirm OpenCV and FFMPEG are now working.

Good luck!

Answer #10

To run your_command as a subprocess in a different directory, pass cwd parameter, as suggested in @wim"s answer:

import subprocess

subprocess.check_call(["your_command", "arg 1", "arg 2"], cwd=working_dir)

A child process can"t change its parent"s working directory (normally). Running cd .. in a child shell process using subprocess won"t change your parent Python script"s working directory i.e., the code example in @glglgl"s answer is wrong. cd is a shell builtin (not a separate executable), it can change the directory only in the same process.

Python | os.getcwdb () method: StackOverflow Questions

Answer #1

os.listdir() - list in the current directory

With listdir in os module you get the files and the folders in the current dir

 import os
 arr = os.listdir()
 print(arr)
 
 >>> ["$RECYCLE.BIN", "work.txt", "3ebooks.txt", "documents"]

Looking in a directory

arr = os.listdir("c:\files")

glob from glob

with glob you can specify a type of file to list like this

import glob

txtfiles = []
for file in glob.glob("*.txt"):
    txtfiles.append(file)

glob in a list comprehension

mylist = [f for f in glob.glob("*.txt")]

get the full path of only files in the current directory

import os
from os import listdir
from os.path import isfile, join

cwd = os.getcwd()
onlyfiles = [os.path.join(cwd, f) for f in os.listdir(cwd) if 
os.path.isfile(os.path.join(cwd, f))]
print(onlyfiles) 

["G:\getfilesname\getfilesname.py", "G:\getfilesname\example.txt"]

Getting the full path name with os.path.abspath

You get the full path in return

 import os
 files_path = [os.path.abspath(x) for x in os.listdir()]
 print(files_path)
 
 ["F:\documentiapplications.txt", "F:\documenticollections.txt"]

Walk: going through sub directories

os.walk returns the root, the directories list and the files list, that is why I unpacked them in r, d, f in the for loop; it, then, looks for other files and directories in the subfolders of the root and so on until there are no subfolders.

import os

# Getting the current work directory (cwd)
thisdir = os.getcwd()

# r=root, d=directories, f = files
for r, d, f in os.walk(thisdir):
    for file in f:
        if file.endswith(".docx"):
            print(os.path.join(r, file))

os.listdir(): get files in the current directory (Python 2)

In Python 2, if you want the list of the files in the current directory, you have to give the argument as "." or os.getcwd() in the os.listdir method.

 import os
 arr = os.listdir(".")
 print(arr)
 
 >>> ["$RECYCLE.BIN", "work.txt", "3ebooks.txt", "documents"]

To go up in the directory tree

# Method 1
x = os.listdir("..")

# Method 2
x= os.listdir("/")

Get files: os.listdir() in a particular directory (Python 2 and 3)

 import os
 arr = os.listdir("F:\python")
 print(arr)
 
 >>> ["$RECYCLE.BIN", "work.txt", "3ebooks.txt", "documents"]

Get files of a particular subdirectory with os.listdir()

import os

x = os.listdir("./content")

os.walk(".") - current directory

 import os
 arr = next(os.walk("."))[2]
 print(arr)
 
 >>> ["5bs_Turismo1.pdf", "5bs_Turismo1.pptx", "esperienza.txt"]

next(os.walk(".")) and os.path.join("dir", "file")

 import os
 arr = []
 for d,r,f in next(os.walk("F:\_python")):
     for file in f:
         arr.append(os.path.join(r,file))

 for f in arr:
     print(files)

>>> F:\_python\dict_class.py
>>> F:\_python\programmi.txt

next(os.walk("F:\") - get the full path - list comprehension

 [os.path.join(r,file) for r,d,f in next(os.walk("F:\_python")) for file in f]
 
 >>> ["F:\_python\dict_class.py", "F:\_python\programmi.txt"]

os.walk - get full path - all files in sub dirs**

x = [os.path.join(r,file) for r,d,f in os.walk("F:\_python") for file in f]
print(x)

>>> ["F:\_python\dict.py", "F:\_python\progr.txt", "F:\_python\readl.py"]

os.listdir() - get only txt files

 arr_txt = [x for x in os.listdir() if x.endswith(".txt")]
 print(arr_txt)
 
 >>> ["work.txt", "3ebooks.txt"]

Using glob to get the full path of the files

If I should need the absolute path of the files:

from path import path
from glob import glob
x = [path(f).abspath() for f in glob("F:\*.txt")]
for f in x:
    print(f)

>>> F:acquistionline.txt
>>> F:acquisti_2018.txt
>>> F:ootstrap_jquery_ecc.txt

Using os.path.isfile to avoid directories in the list

import os.path
listOfFiles = [f for f in os.listdir() if os.path.isfile(f)]
print(listOfFiles)

>>> ["a simple game.py", "data.txt", "decorator.py"]

Using pathlib from Python 3.4

import pathlib

flist = []
for p in pathlib.Path(".").iterdir():
    if p.is_file():
        print(p)
        flist.append(p)

 >>> error.PNG
 >>> exemaker.bat
 >>> guiprova.mp3
 >>> setup.py
 >>> speak_gui2.py
 >>> thumb.PNG

With list comprehension:

flist = [p for p in pathlib.Path(".").iterdir() if p.is_file()]

Alternatively, use pathlib.Path() instead of pathlib.Path(".")

Use glob method in pathlib.Path()

import pathlib

py = pathlib.Path().glob("*.py")
for file in py:
    print(file)

>>> stack_overflow_list.py
>>> stack_overflow_list_tkinter.py

Get all and only files with os.walk

import os
x = [i[2] for i in os.walk(".")]
y=[]
for t in x:
    for f in t:
        y.append(f)
print(y)

>>> ["append_to_list.py", "data.txt", "data1.txt", "data2.txt", "data_180617", "os_walk.py", "READ2.py", "read_data.py", "somma_defaltdic.py", "substitute_words.py", "sum_data.py", "data.txt", "data1.txt", "data_180617"]

Get only files with next and walk in a directory

 import os
 x = next(os.walk("F://python"))[2]
 print(x)
 
 >>> ["calculator.bat","calculator.py"]

Get only directories with next and walk in a directory

 import os
 next(os.walk("F://python"))[1] # for the current dir use (".")
 
 >>> ["python3","others"]

Get all the subdir names with walk

for r,d,f in os.walk("F:\_python"):
    for dirs in d:
        print(dirs)

>>> .vscode
>>> pyexcel
>>> pyschool.py
>>> subtitles
>>> _metaprogramming
>>> .ipynb_checkpoints

os.scandir() from Python 3.5 and greater

import os
x = [f.name for f in os.scandir() if f.is_file()]
print(x)

>>> ["calculator.bat","calculator.py"]

# Another example with scandir (a little variation from docs.python.org)
# This one is more efficient than os.listdir.
# In this case, it shows the files only in the current directory
# where the script is executed.

import os
with os.scandir() as i:
    for entry in i:
        if entry.is_file():
            print(entry.name)

>>> ebookmaker.py
>>> error.PNG
>>> exemaker.bat
>>> guiprova.mp3
>>> setup.py
>>> speakgui4.py
>>> speak_gui2.py
>>> speak_gui3.py
>>> thumb.PNG

Examples:

Ex. 1: How many files are there in the subdirectories?

In this example, we look for the number of files that are included in all the directory and its subdirectories.

import os

def count(dir, counter=0):
    "returns number of files in dir and subdirs"
    for pack in os.walk(dir):
        for f in pack[2]:
            counter += 1
    return dir + " : " + str(counter) + "files"

print(count("F:\python"))

>>> "F:\python" : 12057 files"

Ex.2: How to copy all files from a directory to another?

A script to make order in your computer finding all files of a type (default: pptx) and copying them in a new folder.

import os
import shutil
from path import path

destination = "F:\file_copied"
# os.makedirs(destination)

def copyfile(dir, filetype="pptx", counter=0):
    "Searches for pptx (or other - pptx is the default) files and copies them"
    for pack in os.walk(dir):
        for f in pack[2]:
            if f.endswith(filetype):
                fullpath = pack[0] + "\" + f
                print(fullpath)
                shutil.copy(fullpath, destination)
                counter += 1
    if counter > 0:
        print("-" * 30)
        print("	==> Found in: `" + dir + "` : " + str(counter) + " files
")

for dir in os.listdir():
    "searches for folders that starts with `_`"
    if dir[0] == "_":
        # copyfile(dir, filetype="pdf")
        copyfile(dir, filetype="txt")


>>> _compiti18Compito Contabilità 1conti.txt
>>> _compiti18Compito Contabilità 1modula4.txt
>>> _compiti18Compito Contabilità 1moduloa4.txt
>>> ------------------------
>>> ==> Found in: `_compiti18` : 3 files

Ex. 3: How to get all the files in a txt file

In case you want to create a txt file with all the file names:

import os
mylist = ""
with open("filelist.txt", "w", encoding="utf-8") as file:
    for eachfile in os.listdir():
        mylist += eachfile + "
"
    file.write(mylist)

Example: txt with all the files of an hard drive

"""
We are going to save a txt file with all the files in your directory.
We will use the function walk()
"""

import os

# see all the methods of os
# print(*dir(os), sep=", ")
listafile = []
percorso = []
with open("lista_file.txt", "w", encoding="utf-8") as testo:
    for root, dirs, files in os.walk("D:\"):
        for file in files:
            listafile.append(file)
            percorso.append(root + "\" + file)
            testo.write(file + "
")
listafile.sort()
print("N. of files", len(listafile))
with open("lista_file_ordinata.txt", "w", encoding="utf-8") as testo_ordinato:
    for file in listafile:
        testo_ordinato.write(file + "
")

with open("percorso.txt", "w", encoding="utf-8") as file_percorso:
    for file in percorso:
        file_percorso.write(file + "
")

os.system("lista_file.txt")
os.system("lista_file_ordinata.txt")
os.system("percorso.txt")

All the file of C: in one text file

This is a shorter version of the previous code. Change the folder where to start finding the files if you need to start from another position. This code generate a 50 mb on text file on my computer with something less then 500.000 lines with files with the complete path.

import os

with open("file.txt", "w", encoding="utf-8") as filewrite:
    for r, d, f in os.walk("C:\"):
        for file in f:
            filewrite.write(f"{r + file}
")

How to write a file with all paths in a folder of a type

With this function you can create a txt file that will have the name of a type of file that you look for (ex. pngfile.txt) with all the full path of all the files of that type. It can be useful sometimes, I think.

import os

def searchfiles(extension=".ttf", folder="H:\"):
    "Create a txt file with all the file of a type"
    with open(extension[1:] + "file.txt", "w", encoding="utf-8") as filewrite:
        for r, d, f in os.walk(folder):
            for file in f:
                if file.endswith(extension):
                    filewrite.write(f"{r + file}
")

# looking for png file (fonts) in the hard disk H:
searchfiles(".png", "H:\")

>>> H:4bs_18Dolphins5.png
>>> H:4bs_18Dolphins6.png
>>> H:4bs_18Dolphins7.png
>>> H:5_18marketing htmlassetsimageslogo2.png
>>> H:7z001.png
>>> H:7z002.png

(New) Find all files and open them with tkinter GUI

I just wanted to add in this 2019 a little app to search for all files in a dir and be able to open them by doubleclicking on the name of the file in the list. enter image description here

import tkinter as tk
import os

def searchfiles(extension=".txt", folder="H:\"):
    "insert all files in the listbox"
    for r, d, f in os.walk(folder):
        for file in f:
            if file.endswith(extension):
                lb.insert(0, r + "\" + file)

def open_file():
    os.startfile(lb.get(lb.curselection()[0]))

root = tk.Tk()
root.geometry("400x400")
bt = tk.Button(root, text="Search", command=lambda:searchfiles(".png", "H:\"))
bt.pack()
lb = tk.Listbox(root)
lb.pack(fill="both", expand=1)
lb.bind("<Double-Button>", lambda x: open_file())
root.mainloop()

Answer #2

tl;dr

Call the is_path_exists_or_creatable() function defined below.

Strictly Python 3. That"s just how we roll.

A Tale of Two Questions

The question of "How do I test pathname validity and, for valid pathnames, the existence or writability of those paths?" is clearly two separate questions. Both are interesting, and neither have received a genuinely satisfactory answer here... or, well, anywhere that I could grep.

vikki"s answer probably hews the closest, but has the remarkable disadvantages of:

  • Needlessly opening (...and then failing to reliably close) file handles.
  • Needlessly writing (...and then failing to reliable close or delete) 0-byte files.
  • Ignoring OS-specific errors differentiating between non-ignorable invalid pathnames and ignorable filesystem issues. Unsurprisingly, this is critical under Windows. (See below.)
  • Ignoring race conditions resulting from external processes concurrently (re)moving parent directories of the pathname to be tested. (See below.)
  • Ignoring connection timeouts resulting from this pathname residing on stale, slow, or otherwise temporarily inaccessible filesystems. This could expose public-facing services to potential DoS-driven attacks. (See below.)

We"re gonna fix all that.

Question #0: What"s Pathname Validity Again?

Before hurling our fragile meat suits into the python-riddled moshpits of pain, we should probably define what we mean by "pathname validity." What defines validity, exactly?

By "pathname validity," we mean the syntactic correctness of a pathname with respect to the root filesystem of the current system – regardless of whether that path or parent directories thereof physically exist. A pathname is syntactically correct under this definition if it complies with all syntactic requirements of the root filesystem.

By "root filesystem," we mean:

  • On POSIX-compatible systems, the filesystem mounted to the root directory (/).
  • On Windows, the filesystem mounted to %HOMEDRIVE%, the colon-suffixed drive letter containing the current Windows installation (typically but not necessarily C:).

The meaning of "syntactic correctness," in turn, depends on the type of root filesystem. For ext4 (and most but not all POSIX-compatible) filesystems, a pathname is syntactically correct if and only if that pathname:

  • Contains no null bytes (i.e., x00 in Python). This is a hard requirement for all POSIX-compatible filesystems.
  • Contains no path components longer than 255 bytes (e.g., "a"*256 in Python). A path component is a longest substring of a pathname containing no / character (e.g., bergtatt, ind, i, and fjeldkamrene in the pathname /bergtatt/ind/i/fjeldkamrene).

Syntactic correctness. Root filesystem. That"s it.

Question #1: How Now Shall We Do Pathname Validity?

Validating pathnames in Python is surprisingly non-intuitive. I"m in firm agreement with Fake Name here: the official os.path package should provide an out-of-the-box solution for this. For unknown (and probably uncompelling) reasons, it doesn"t. Fortunately, unrolling your own ad-hoc solution isn"t that gut-wrenching...

O.K., it actually is. It"s hairy; it"s nasty; it probably chortles as it burbles and giggles as it glows. But what you gonna do? Nuthin".

We"ll soon descend into the radioactive abyss of low-level code. But first, let"s talk high-level shop. The standard os.stat() and os.lstat() functions raise the following exceptions when passed invalid pathnames:

  • For pathnames residing in non-existing directories, instances of FileNotFoundError.
  • For pathnames residing in existing directories:
    • Under Windows, instances of WindowsError whose winerror attribute is 123 (i.e., ERROR_INVALID_NAME).
    • Under all other OSes:
    • For pathnames containing null bytes (i.e., "x00"), instances of TypeError.
    • For pathnames containing path components longer than 255 bytes, instances of OSError whose errcode attribute is:
      • Under SunOS and the *BSD family of OSes, errno.ERANGE. (This appears to be an OS-level bug, otherwise referred to as "selective interpretation" of the POSIX standard.)
      • Under all other OSes, errno.ENAMETOOLONG.

Crucially, this implies that only pathnames residing in existing directories are validatable. The os.stat() and os.lstat() functions raise generic FileNotFoundError exceptions when passed pathnames residing in non-existing directories, regardless of whether those pathnames are invalid or not. Directory existence takes precedence over pathname invalidity.

Does this mean that pathnames residing in non-existing directories are not validatable? Yes – unless we modify those pathnames to reside in existing directories. Is that even safely feasible, however? Shouldn"t modifying a pathname prevent us from validating the original pathname?

To answer this question, recall from above that syntactically correct pathnames on the ext4 filesystem contain no path components (A) containing null bytes or (B) over 255 bytes in length. Hence, an ext4 pathname is valid if and only if all path components in that pathname are valid. This is true of most real-world filesystems of interest.

Does that pedantic insight actually help us? Yes. It reduces the larger problem of validating the full pathname in one fell swoop to the smaller problem of only validating all path components in that pathname. Any arbitrary pathname is validatable (regardless of whether that pathname resides in an existing directory or not) in a cross-platform manner by following the following algorithm:

  1. Split that pathname into path components (e.g., the pathname /troldskog/faren/vild into the list ["", "troldskog", "faren", "vild"]).
  2. For each such component:
    1. Join the pathname of a directory guaranteed to exist with that component into a new temporary pathname (e.g., /troldskog) .
    2. Pass that pathname to os.stat() or os.lstat(). If that pathname and hence that component is invalid, this call is guaranteed to raise an exception exposing the type of invalidity rather than a generic FileNotFoundError exception. Why? Because that pathname resides in an existing directory. (Circular logic is circular.)

Is there a directory guaranteed to exist? Yes, but typically only one: the topmost directory of the root filesystem (as defined above).

Passing pathnames residing in any other directory (and hence not guaranteed to exist) to os.stat() or os.lstat() invites race conditions, even if that directory was previously tested to exist. Why? Because external processes cannot be prevented from concurrently removing that directory after that test has been performed but before that pathname is passed to os.stat() or os.lstat(). Unleash the dogs of mind-fellating insanity!

There exists a substantial side benefit to the above approach as well: security. (Isn"t that nice?) Specifically:

Front-facing applications validating arbitrary pathnames from untrusted sources by simply passing such pathnames to os.stat() or os.lstat() are susceptible to Denial of Service (DoS) attacks and other black-hat shenanigans. Malicious users may attempt to repeatedly validate pathnames residing on filesystems known to be stale or otherwise slow (e.g., NFS Samba shares); in that case, blindly statting incoming pathnames is liable to either eventually fail with connection timeouts or consume more time and resources than your feeble capacity to withstand unemployment.

The above approach obviates this by only validating the path components of a pathname against the root directory of the root filesystem. (If even that"s stale, slow, or inaccessible, you"ve got larger problems than pathname validation.)

Lost? Great. Let"s begin. (Python 3 assumed. See "What Is Fragile Hope for 300, leycec?")

import errno, os

# Sadly, Python fails to provide the following magic number for us.
ERROR_INVALID_NAME = 123
"""
Windows-specific error code indicating an invalid pathname.

See Also
----------
https://docs.microsoft.com/en-us/windows/win32/debug/system-error-codes--0-499-
    Official listing of all such codes.
"""

def is_pathname_valid(pathname: str) -> bool:
    """
    `True` if the passed pathname is a valid pathname for the current OS;
    `False` otherwise.
    """
    # If this pathname is either not a string or is but is empty, this pathname
    # is invalid.
    try:
        if not isinstance(pathname, str) or not pathname:
            return False

        # Strip this pathname"s Windows-specific drive specifier (e.g., `C:`)
        # if any. Since Windows prohibits path components from containing `:`
        # characters, failing to strip this `:`-suffixed prefix would
        # erroneously invalidate all valid absolute Windows pathnames.
        _, pathname = os.path.splitdrive(pathname)

        # Directory guaranteed to exist. If the current OS is Windows, this is
        # the drive to which Windows was installed (e.g., the "%HOMEDRIVE%"
        # environment variable); else, the typical root directory.
        root_dirname = os.environ.get("HOMEDRIVE", "C:") 
            if sys.platform == "win32" else os.path.sep
        assert os.path.isdir(root_dirname)   # ...Murphy and her ironclad Law

        # Append a path separator to this directory if needed.
        root_dirname = root_dirname.rstrip(os.path.sep) + os.path.sep

        # Test whether each path component split from this pathname is valid or
        # not, ignoring non-existent and non-readable path components.
        for pathname_part in pathname.split(os.path.sep):
            try:
                os.lstat(root_dirname + pathname_part)
            # If an OS-specific exception is raised, its error code
            # indicates whether this pathname is valid or not. Unless this
            # is the case, this exception implies an ignorable kernel or
            # filesystem complaint (e.g., path not found or inaccessible).
            #
            # Only the following exceptions indicate invalid pathnames:
            #
            # * Instances of the Windows-specific "WindowsError" class
            #   defining the "winerror" attribute whose value is
            #   "ERROR_INVALID_NAME". Under Windows, "winerror" is more
            #   fine-grained and hence useful than the generic "errno"
            #   attribute. When a too-long pathname is passed, for example,
            #   "errno" is "ENOENT" (i.e., no such file or directory) rather
            #   than "ENAMETOOLONG" (i.e., file name too long).
            # * Instances of the cross-platform "OSError" class defining the
            #   generic "errno" attribute whose value is either:
            #   * Under most POSIX-compatible OSes, "ENAMETOOLONG".
            #   * Under some edge-case OSes (e.g., SunOS, *BSD), "ERANGE".
            except OSError as exc:
                if hasattr(exc, "winerror"):
                    if exc.winerror == ERROR_INVALID_NAME:
                        return False
                elif exc.errno in {errno.ENAMETOOLONG, errno.ERANGE}:
                    return False
    # If a "TypeError" exception was raised, it almost certainly has the
    # error message "embedded NUL character" indicating an invalid pathname.
    except TypeError as exc:
        return False
    # If no exception was raised, all path components and hence this
    # pathname itself are valid. (Praise be to the curmudgeonly python.)
    else:
        return True
    # If any other exception was raised, this is an unrelated fatal issue
    # (e.g., a bug). Permit this exception to unwind the call stack.
    #
    # Did we mention this should be shipped with Python already?

Done. Don"t squint at that code. (It bites.)

Question #2: Possibly Invalid Pathname Existence or Creatability, Eh?

Testing the existence or creatability of possibly invalid pathnames is, given the above solution, mostly trivial. The little key here is to call the previously defined function before testing the passed path:

def is_path_creatable(pathname: str) -> bool:
    """
    `True` if the current user has sufficient permissions to create the passed
    pathname; `False` otherwise.
    """
    # Parent directory of the passed path. If empty, we substitute the current
    # working directory (CWD) instead.
    dirname = os.path.dirname(pathname) or os.getcwd()
    return os.access(dirname, os.W_OK)

def is_path_exists_or_creatable(pathname: str) -> bool:
    """
    `True` if the passed pathname is a valid pathname for the current OS _and_
    either currently exists or is hypothetically creatable; `False` otherwise.

    This function is guaranteed to _never_ raise exceptions.
    """
    try:
        # To prevent "os" module calls from raising undesirable exceptions on
        # invalid pathnames, is_pathname_valid() is explicitly called first.
        return is_pathname_valid(pathname) and (
            os.path.exists(pathname) or is_path_creatable(pathname))
    # Report failure on non-fatal filesystem complaints (e.g., connection
    # timeouts, permissions issues) implying this path to be inaccessible. All
    # other exceptions are unrelated fatal issues and should not be caught here.
    except OSError:
        return False

Done and done. Except not quite.

Question #3: Possibly Invalid Pathname Existence or Writability on Windows

There exists a caveat. Of course there does.

As the official os.access() documentation admits:

Note: I/O operations may fail even when os.access() indicates that they would succeed, particularly for operations on network filesystems which may have permissions semantics beyond the usual POSIX permission-bit model.

To no one"s surprise, Windows is the usual suspect here. Thanks to extensive use of Access Control Lists (ACL) on NTFS filesystems, the simplistic POSIX permission-bit model maps poorly to the underlying Windows reality. While this (arguably) isn"t Python"s fault, it might nonetheless be of concern for Windows-compatible applications.

If this is you, a more robust alternative is wanted. If the passed path does not exist, we instead attempt to create a temporary file guaranteed to be immediately deleted in the parent directory of that path – a more portable (if expensive) test of creatability:

import os, tempfile

def is_path_sibling_creatable(pathname: str) -> bool:
    """
    `True` if the current user has sufficient permissions to create **siblings**
    (i.e., arbitrary files in the parent directory) of the passed pathname;
    `False` otherwise.
    """
    # Parent directory of the passed path. If empty, we substitute the current
    # working directory (CWD) instead.
    dirname = os.path.dirname(pathname) or os.getcwd()

    try:
        # For safety, explicitly close and hence delete this temporary file
        # immediately after creating it in the passed path"s parent directory.
        with tempfile.TemporaryFile(dir=dirname): pass
        return True
    # While the exact type of exception raised by the above function depends on
    # the current version of the Python interpreter, all such types subclass the
    # following exception superclass.
    except EnvironmentError:
        return False

def is_path_exists_or_creatable_portable(pathname: str) -> bool:
    """
    `True` if the passed pathname is a valid pathname on the current OS _and_
    either currently exists or is hypothetically creatable in a cross-platform
    manner optimized for POSIX-unfriendly filesystems; `False` otherwise.

    This function is guaranteed to _never_ raise exceptions.
    """
    try:
        # To prevent "os" module calls from raising undesirable exceptions on
        # invalid pathnames, is_pathname_valid() is explicitly called first.
        return is_pathname_valid(pathname) and (
            os.path.exists(pathname) or is_path_sibling_creatable(pathname))
    # Report failure on non-fatal filesystem complaints (e.g., connection
    # timeouts, permissions issues) implying this path to be inaccessible. All
    # other exceptions are unrelated fatal issues and should not be caught here.
    except OSError:
        return False

Note, however, that even this may not be enough.

Thanks to User Access Control (UAC), the ever-inimicable Windows Vista and all subsequent iterations thereof blatantly lie about permissions pertaining to system directories. When non-Administrator users attempt to create files in either the canonical C:Windows or C:Windowssystem32 directories, UAC superficially permits the user to do so while actually isolating all created files into a "Virtual Store" in that user"s profile. (Who could have possibly imagined that deceiving users would have harmful long-term consequences?)

This is crazy. This is Windows.

Prove It

Dare we? It"s time to test-drive the above tests.

Since NULL is the only character prohibited in pathnames on UNIX-oriented filesystems, let"s leverage that to demonstrate the cold, hard truth – ignoring non-ignorable Windows shenanigans, which frankly bore and anger me in equal measure:

>>> print(""foo.bar" valid? " + str(is_pathname_valid("foo.bar")))
"foo.bar" valid? True
>>> print("Null byte valid? " + str(is_pathname_valid("x00")))
Null byte valid? False
>>> print("Long path valid? " + str(is_pathname_valid("a" * 256)))
Long path valid? False
>>> print(""/dev" exists or creatable? " + str(is_path_exists_or_creatable("/dev")))
"/dev" exists or creatable? True
>>> print(""/dev/foo.bar" exists or creatable? " + str(is_path_exists_or_creatable("/dev/foo.bar")))
"/dev/foo.bar" exists or creatable? False
>>> print("Null byte exists or creatable? " + str(is_path_exists_or_creatable("x00")))
Null byte exists or creatable? False

Beyond sanity. Beyond pain. You will find Python portability concerns.

Answer #3

Tensorflow 2 Docs

Saving Checkpoints

Adapted from the docs

# -------------------------
# -----  Toy Context  -----
# -------------------------
import tensorflow as tf


class Net(tf.keras.Model):
    """A simple linear model."""

    def __init__(self):
        super(Net, self).__init__()
        self.l1 = tf.keras.layers.Dense(5)

    def call(self, x):
        return self.l1(x)


def toy_dataset():
    inputs = tf.range(10.0)[:, None]
    labels = inputs * 5.0 + tf.range(5.0)[None, :]
    return (
        tf.data.Dataset.from_tensor_slices(dict(x=inputs, y=labels)).repeat().batch(2)
    )


def train_step(net, example, optimizer):
    """Trains `net` on `example` using `optimizer`."""
    with tf.GradientTape() as tape:
        output = net(example["x"])
        loss = tf.reduce_mean(tf.abs(output - example["y"]))
    variables = net.trainable_variables
    gradients = tape.gradient(loss, variables)
    optimizer.apply_gradients(zip(gradients, variables))
    return loss


# ----------------------------
# -----  Create Objects  -----
# ----------------------------

net = Net()
opt = tf.keras.optimizers.Adam(0.1)
dataset = toy_dataset()
iterator = iter(dataset)
ckpt = tf.train.Checkpoint(
    step=tf.Variable(1), optimizer=opt, net=net, iterator=iterator
)
manager = tf.train.CheckpointManager(ckpt, "./tf_ckpts", max_to_keep=3)

# ----------------------------
# -----  Train and Save  -----
# ----------------------------

ckpt.restore(manager.latest_checkpoint)
if manager.latest_checkpoint:
    print("Restored from {}".format(manager.latest_checkpoint))
else:
    print("Initializing from scratch.")

for _ in range(50):
    example = next(iterator)
    loss = train_step(net, example, opt)
    ckpt.step.assign_add(1)
    if int(ckpt.step) % 10 == 0:
        save_path = manager.save()
        print("Saved checkpoint for step {}: {}".format(int(ckpt.step), save_path))
        print("loss {:1.2f}".format(loss.numpy()))


# ---------------------
# -----  Restore  -----
# ---------------------

# In another script, re-initialize objects
opt = tf.keras.optimizers.Adam(0.1)
net = Net()
dataset = toy_dataset()
iterator = iter(dataset)
ckpt = tf.train.Checkpoint(
    step=tf.Variable(1), optimizer=opt, net=net, iterator=iterator
)
manager = tf.train.CheckpointManager(ckpt, "./tf_ckpts", max_to_keep=3)

# Re-use the manager code above ^

ckpt.restore(manager.latest_checkpoint)
if manager.latest_checkpoint:
    print("Restored from {}".format(manager.latest_checkpoint))
else:
    print("Initializing from scratch.")

for _ in range(50):
    example = next(iterator)
    # Continue training or evaluate etc.

More links

Checkpoints capture the exact value of all parameters (tf.Variable objects) used by a model. Checkpoints do not contain any description of the computation defined by the model and thus are typically only useful when source code that will use the saved parameter values is available.

The SavedModel format on the other hand includes a serialized description of the computation defined by the model in addition to the parameter values (checkpoint). Models in this format are independent of the source code that created the model. They are thus suitable for deployment via TensorFlow Serving, TensorFlow Lite, TensorFlow.js, or programs in other programming languages (the C, C++, Java, Go, Rust, C# etc. TensorFlow APIs).

(Highlights are my own)


Tensorflow < 2


From the docs:

Save

# Create some variables.
v1 = tf.get_variable("v1", shape=[3], initializer = tf.zeros_initializer)
v2 = tf.get_variable("v2", shape=[5], initializer = tf.zeros_initializer)

inc_v1 = v1.assign(v1+1)
dec_v2 = v2.assign(v2-1)

# Add an op to initialize the variables.
init_op = tf.global_variables_initializer()

# Add ops to save and restore all the variables.
saver = tf.train.Saver()

# Later, launch the model, initialize the variables, do some work, and save the
# variables to disk.
with tf.Session() as sess:
  sess.run(init_op)
  # Do some work with the model.
  inc_v1.op.run()
  dec_v2.op.run()
  # Save the variables to disk.
  save_path = saver.save(sess, "/tmp/model.ckpt")
  print("Model saved in path: %s" % save_path)

Restore

tf.reset_default_graph()

# Create some variables.
v1 = tf.get_variable("v1", shape=[3])
v2 = tf.get_variable("v2", shape=[5])

# Add ops to save and restore all the variables.
saver = tf.train.Saver()

# Later, launch the model, use the saver to restore variables from disk, and
# do some work with the model.
with tf.Session() as sess:
  # Restore variables from disk.
  saver.restore(sess, "/tmp/model.ckpt")
  print("Model restored.")
  # Check the values of the variables
  print("v1 : %s" % v1.eval())
  print("v2 : %s" % v2.eval())

simple_save

Many good answer, for completeness I"ll add my 2 cents: simple_save. Also a standalone code example using the tf.data.Dataset API.

Python 3 ; Tensorflow 1.14

import tensorflow as tf
from tensorflow.saved_model import tag_constants

with tf.Graph().as_default():
    with tf.Session() as sess:
        ...

        # Saving
        inputs = {
            "batch_size_placeholder": batch_size_placeholder,
            "features_placeholder": features_placeholder,
            "labels_placeholder": labels_placeholder,
        }
        outputs = {"prediction": model_output}
        tf.saved_model.simple_save(
            sess, "path/to/your/location/", inputs, outputs
        )

Restoring:

graph = tf.Graph()
with restored_graph.as_default():
    with tf.Session() as sess:
        tf.saved_model.loader.load(
            sess,
            [tag_constants.SERVING],
            "path/to/your/location/",
        )
        batch_size_placeholder = graph.get_tensor_by_name("batch_size_placeholder:0")
        features_placeholder = graph.get_tensor_by_name("features_placeholder:0")
        labels_placeholder = graph.get_tensor_by_name("labels_placeholder:0")
        prediction = restored_graph.get_tensor_by_name("dense/BiasAdd:0")

        sess.run(prediction, feed_dict={
            batch_size_placeholder: some_value,
            features_placeholder: some_other_value,
            labels_placeholder: another_value
        })

Standalone example

Original blog post

The following code generates random data for the sake of the demonstration.

  1. We start by creating the placeholders. They will hold the data at runtime. From them, we create the Dataset and then its Iterator. We get the iterator"s generated tensor, called input_tensor which will serve as input to our model.
  2. The model itself is built from input_tensor: a GRU-based bidirectional RNN followed by a dense classifier. Because why not.
  3. The loss is a softmax_cross_entropy_with_logits, optimized with Adam. After 2 epochs (of 2 batches each), we save the "trained" model with tf.saved_model.simple_save. If you run the code as is, then the model will be saved in a folder called simple/ in your current working directory.
  4. In a new graph, we then restore the saved model with tf.saved_model.loader.load. We grab the placeholders and logits with graph.get_tensor_by_name and the Iterator initializing operation with graph.get_operation_by_name.
  5. Lastly we run an inference for both batches in the dataset, and check that the saved and restored model both yield the same values. They do!

Code:

import os
import shutil
import numpy as np
import tensorflow as tf
from tensorflow.python.saved_model import tag_constants


def model(graph, input_tensor):
    """Create the model which consists of
    a bidirectional rnn (GRU(10)) followed by a dense classifier

    Args:
        graph (tf.Graph): Tensors" graph
        input_tensor (tf.Tensor): Tensor fed as input to the model

    Returns:
        tf.Tensor: the model"s output layer Tensor
    """
    cell = tf.nn.rnn_cell.GRUCell(10)
    with graph.as_default():
        ((fw_outputs, bw_outputs), (fw_state, bw_state)) = tf.nn.bidirectional_dynamic_rnn(
            cell_fw=cell,
            cell_bw=cell,
            inputs=input_tensor,
            sequence_length=[10] * 32,
            dtype=tf.float32,
            swap_memory=True,
            scope=None)
        outputs = tf.concat((fw_outputs, bw_outputs), 2)
        mean = tf.reduce_mean(outputs, axis=1)
        dense = tf.layers.dense(mean, 5, activation=None)

        return dense


def get_opt_op(graph, logits, labels_tensor):
    """Create optimization operation from model"s logits and labels

    Args:
        graph (tf.Graph): Tensors" graph
        logits (tf.Tensor): The model"s output without activation
        labels_tensor (tf.Tensor): Target labels

    Returns:
        tf.Operation: the operation performing a stem of Adam optimizer
    """
    with graph.as_default():
        with tf.variable_scope("loss"):
            loss = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(
                    logits=logits, labels=labels_tensor, name="xent"),
                    name="mean-xent"
                    )
        with tf.variable_scope("optimizer"):
            opt_op = tf.train.AdamOptimizer(1e-2).minimize(loss)
        return opt_op


if __name__ == "__main__":
    # Set random seed for reproducibility
    # and create synthetic data
    np.random.seed(0)
    features = np.random.randn(64, 10, 30)
    labels = np.eye(5)[np.random.randint(0, 5, (64,))]

    graph1 = tf.Graph()
    with graph1.as_default():
        # Random seed for reproducibility
        tf.set_random_seed(0)
        # Placeholders
        batch_size_ph = tf.placeholder(tf.int64, name="batch_size_ph")
        features_data_ph = tf.placeholder(tf.float32, [None, None, 30], "features_data_ph")
        labels_data_ph = tf.placeholder(tf.int32, [None, 5], "labels_data_ph")
        # Dataset
        dataset = tf.data.Dataset.from_tensor_slices((features_data_ph, labels_data_ph))
        dataset = dataset.batch(batch_size_ph)
        iterator = tf.data.Iterator.from_structure(dataset.output_types, dataset.output_shapes)
        dataset_init_op = iterator.make_initializer(dataset, name="dataset_init")
        input_tensor, labels_tensor = iterator.get_next()

        # Model
        logits = model(graph1, input_tensor)
        # Optimization
        opt_op = get_opt_op(graph1, logits, labels_tensor)

        with tf.Session(graph=graph1) as sess:
            # Initialize variables
            tf.global_variables_initializer().run(session=sess)
            for epoch in range(3):
                batch = 0
                # Initialize dataset (could feed epochs in Dataset.repeat(epochs))
                sess.run(
                    dataset_init_op,
                    feed_dict={
                        features_data_ph: features,
                        labels_data_ph: labels,
                        batch_size_ph: 32
                    })
                values = []
                while True:
                    try:
                        if epoch < 2:
                            # Training
                            _, value = sess.run([opt_op, logits])
                            print("Epoch {}, batch {} | Sample value: {}".format(epoch, batch, value[0]))
                            batch += 1
                        else:
                            # Final inference
                            values.append(sess.run(logits))
                            print("Epoch {}, batch {} | Final inference | Sample value: {}".format(epoch, batch, values[-1][0]))
                            batch += 1
                    except tf.errors.OutOfRangeError:
                        break
            # Save model state
            print("
Saving...")
            cwd = os.getcwd()
            path = os.path.join(cwd, "simple")
            shutil.rmtree(path, ignore_errors=True)
            inputs_dict = {
                "batch_size_ph": batch_size_ph,
                "features_data_ph": features_data_ph,
                "labels_data_ph": labels_data_ph
            }
            outputs_dict = {
                "logits": logits
            }
            tf.saved_model.simple_save(
                sess, path, inputs_dict, outputs_dict
            )
            print("Ok")
    # Restoring
    graph2 = tf.Graph()
    with graph2.as_default():
        with tf.Session(graph=graph2) as sess:
            # Restore saved values
            print("
Restoring...")
            tf.saved_model.loader.load(
                sess,
                [tag_constants.SERVING],
                path
            )
            print("Ok")
            # Get restored placeholders
            labels_data_ph = graph2.get_tensor_by_name("labels_data_ph:0")
            features_data_ph = graph2.get_tensor_by_name("features_data_ph:0")
            batch_size_ph = graph2.get_tensor_by_name("batch_size_ph:0")
            # Get restored model output
            restored_logits = graph2.get_tensor_by_name("dense/BiasAdd:0")
            # Get dataset initializing operation
            dataset_init_op = graph2.get_operation_by_name("dataset_init")

            # Initialize restored dataset
            sess.run(
                dataset_init_op,
                feed_dict={
                    features_data_ph: features,
                    labels_data_ph: labels,
                    batch_size_ph: 32
                }

            )
            # Compute inference for both batches in dataset
            restored_values = []
            for i in range(2):
                restored_values.append(sess.run(restored_logits))
                print("Restored values: ", restored_values[i][0])

    # Check if original inference and restored inference are equal
    valid = all((v == rv).all() for v, rv in zip(values, restored_values))
    print("
Inferences match: ", valid)

This will print:

$ python3 save_and_restore.py

Epoch 0, batch 0 | Sample value: [-0.13851789 -0.3087595   0.12804556  0.20013677 -0.08229901]
Epoch 0, batch 1 | Sample value: [-0.00555491 -0.04339041 -0.05111827 -0.2480045  -0.00107776]
Epoch 1, batch 0 | Sample value: [-0.19321944 -0.2104792  -0.00602257  0.07465433  0.11674127]
Epoch 1, batch 1 | Sample value: [-0.05275984  0.05981954 -0.15913513 -0.3244143   0.10673307]
Epoch 2, batch 0 | Final inference | Sample value: [-0.26331693 -0.13013336 -0.12553    -0.04276478  0.2933622 ]
Epoch 2, batch 1 | Final inference | Sample value: [-0.07730117  0.11119192 -0.20817074 -0.35660955  0.16990358]

Saving...
INFO:tensorflow:Assets added to graph.
INFO:tensorflow:No assets to write.
INFO:tensorflow:SavedModel written to: b"/some/path/simple/saved_model.pb"
Ok

Restoring...
INFO:tensorflow:Restoring parameters from b"/some/path/simple/variables/variables"
Ok
Restored values:  [-0.26331693 -0.13013336 -0.12553    -0.04276478  0.2933622 ]
Restored values:  [-0.07730117  0.11119192 -0.20817074 -0.35660955  0.16990358]

Inferences match:  True

Answer #4

As an extension to @VinaySajip answer. There are additional nargs worth mentioning.

  1. parser.add_argument("dir", nargs=1, default=os.getcwd())

N (an integer). N arguments from the command line will be gathered together into a list

  1. parser.add_argument("dir", nargs="*", default=os.getcwd())

"*". All command-line arguments present are gathered into a list. Note that it generally doesn"t make much sense to have more than one positional argument with nargs="*", but multiple optional arguments with nargs="*" is possible.

  1. parser.add_argument("dir", nargs="+", default=os.getcwd())

"+". Just like "*", all command-line args present are gathered into a list. Additionally, an error message will be generated if there wasn’t at least one command-line argument present.

  1. parser.add_argument("dir", nargs=argparse.REMAINDER, default=os.getcwd())

argparse.REMAINDER. All the remaining command-line arguments are gathered into a list. This is commonly useful for command line utilities that dispatch to other command line utilities

If the nargs keyword argument is not provided, the number of arguments consumed is determined by the action. Generally this means a single command-line argument will be consumed and a single item (not a list) will be produced.

Edit (copied from a comment by @Acumenus) nargs="?" The docs say: "?". One argument will be consumed from the command line if possible and produced as a single item. If no command-line argument is present, the value from default will be produced.

Answer #5

Preliminary notes

  • Although there"s a clear differentiation between file and directory terms in the question text, some may argue that directories are actually special files
  • The statement: "all files of a directory" can be interpreted in two ways:
    1. All direct (or level 1) descendants only
    2. All descendants in the whole directory tree (including the ones in sub-directories)
  • When the question was asked, I imagine that Python 2, was the LTS version, however the code samples will be run by Python 3(.5) (I"ll keep them as Python 2 compliant as possible; also, any code belonging to Python that I"m going to post, is from v3.5.4 - unless otherwise specified). That has consequences related to another keyword in the question: "add them into a list":

    • In pre Python 2.2 versions, sequences (iterables) were mostly represented by lists (tuples, sets, ...)
    • In Python 2.2, the concept of generator ([Python.Wiki]: Generators) - courtesy of [Python 3]: The yield statement) - was introduced. As time passed, generator counterparts started to appear for functions that returned/worked with lists
    • In Python 3, generator is the default behavior
    • Not sure if returning a list is still mandatory (or a generator would do as well), but passing a generator to the list constructor, will create a list out of it (and also consume it). The example below illustrates the differences on [Python 3]: map(function, iterable, ...)
    >>> import sys
    >>> sys.version
    "2.7.10 (default, Mar  8 2016, 15:02:46) [MSC v.1600 64 bit (AMD64)]"
    >>> m = map(lambda x: x, [1, 2, 3])  # Just a dummy lambda function
    >>> m, type(m)
    ([1, 2, 3], <type "list">)
    >>> len(m)
    3
    


    >>> import sys
    >>> sys.version
    "3.5.4 (v3.5.4:3f56838, Aug  8 2017, 02:17:05) [MSC v.1900 64 bit (AMD64)]"
    >>> m = map(lambda x: x, [1, 2, 3])
    >>> m, type(m)
    (<map object at 0x000001B4257342B0>, <class "map">)
    >>> len(m)
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
    TypeError: object of type "map" has no len()
    >>> lm0 = list(m)  # Build a list from the generator
    >>> lm0, type(lm0)
    ([1, 2, 3], <class "list">)
    >>>
    >>> lm1 = list(m)  # Build a list from the same generator
    >>> lm1, type(lm1)  # Empty list now - generator already consumed
    ([], <class "list">)
    
  • The examples will be based on a directory called root_dir with the following structure (this example is for Win, but I"m using the same tree on Lnx as well):

    E:WorkDevStackOverflowq003207219>tree /f "root_dir"
    Folder PATH listing for volume Work
    Volume serial number is 00000029 3655:6FED
    E:WORKDEVSTACKOVERFLOWQ003207219ROOT_DIR
    ¦   file0
    ¦   file1
    ¦
    +---dir0
    ¦   +---dir00
    ¦   ¦   ¦   file000
    ¦   ¦   ¦
    ¦   ¦   +---dir000
    ¦   ¦           file0000
    ¦   ¦
    ¦   +---dir01
    ¦   ¦       file010
    ¦   ¦       file011
    ¦   ¦
    ¦   +---dir02
    ¦       +---dir020
    ¦           +---dir0200
    +---dir1
    ¦       file10
    ¦       file11
    ¦       file12
    ¦
    +---dir2
    ¦   ¦   file20
    ¦   ¦
    ¦   +---dir20
    ¦           file200
    ¦
    +---dir3
    


Solutions

Programmatic approaches:

  1. [Python 3]: os.listdir(path=".")

    Return a list containing the names of the entries in the directory given by path. The list is in arbitrary order, and does not include the special entries "." and ".." ...


    >>> import os
    >>> root_dir = "root_dir"  # Path relative to current dir (os.getcwd())
    >>>
    >>> os.listdir(root_dir)  # List all the items in root_dir
    ["dir0", "dir1", "dir2", "dir3", "file0", "file1"]
    >>>
    >>> [item for item in os.listdir(root_dir) if os.path.isfile(os.path.join(root_dir, item))]  # Filter items and only keep files (strip out directories)
    ["file0", "file1"]
    

    A more elaborate example (code_os_listdir.py):

    import os
    from pprint import pformat
    
    
    def _get_dir_content(path, include_folders, recursive):
        entries = os.listdir(path)
        for entry in entries:
            entry_with_path = os.path.join(path, entry)
            if os.path.isdir(entry_with_path):
                if include_folders:
                    yield entry_with_path
                if recursive:
                    for sub_entry in _get_dir_content(entry_with_path, include_folders, recursive):
                        yield sub_entry
            else:
                yield entry_with_path
    
    
    def get_dir_content(path, include_folders=True, recursive=True, prepend_folder_name=True):
        path_len = len(path) + len(os.path.sep)
        for item in _get_dir_content(path, include_folders, recursive):
            yield item if prepend_folder_name else item[path_len:]
    
    
    def _get_dir_content_old(path, include_folders, recursive):
        entries = os.listdir(path)
        ret = list()
        for entry in entries:
            entry_with_path = os.path.join(path, entry)
            if os.path.isdir(entry_with_path):
                if include_folders:
                    ret.append(entry_with_path)
                if recursive:
                    ret.extend(_get_dir_content_old(entry_with_path, include_folders, recursive))
            else:
                ret.append(entry_with_path)
        return ret
    
    
    def get_dir_content_old(path, include_folders=True, recursive=True, prepend_folder_name=True):
        path_len = len(path) + len(os.path.sep)
        return [item if prepend_folder_name else item[path_len:] for item in _get_dir_content_old(path, include_folders, recursive)]
    
    
    def main():
        root_dir = "root_dir"
        ret0 = get_dir_content(root_dir, include_folders=True, recursive=True, prepend_folder_name=True)
        lret0 = list(ret0)
        print(ret0, len(lret0), pformat(lret0))
        ret1 = get_dir_content_old(root_dir, include_folders=False, recursive=True, prepend_folder_name=False)
        print(len(ret1), pformat(ret1))
    
    
    if __name__ == "__main__":
        main()
    

    Notes:

    • There are two implementations:
      • One that uses generators (of course here it seems useless, since I immediately convert the result to a list)
      • The classic one (function names ending in _old)
    • Recursion is used (to get into subdirectories)
    • For each implementation there are two functions:
      • One that starts with an underscore (_): "private" (should not be called directly) - that does all the work
      • The public one (wrapper over previous): it just strips off the initial path (if required) from the returned entries. It"s an ugly implementation, but it"s the only idea that I could come with at this point
    • In terms of performance, generators are generally a little bit faster (considering both creation and iteration times), but I didn"t test them in recursive functions, and also I am iterating inside the function over inner generators - don"t know how performance friendly is that
    • Play with the arguments to get different results


    Output:

    (py35x64_test) E:WorkDevStackOverflowq003207219>"e:WorkDevVEnvspy35x64_testScriptspython.exe" "code_os_listdir.py"
    <generator object get_dir_content at 0x000001BDDBB3DF10> 22 ["root_dir\dir0",
     "root_dir\dir0\dir00",
     "root_dir\dir0\dir00\dir000",
     "root_dir\dir0\dir00\dir000\file0000",
     "root_dir\dir0\dir00\file000",
     "root_dir\dir0\dir01",
     "root_dir\dir0\dir01\file010",
     "root_dir\dir0\dir01\file011",
     "root_dir\dir0\dir02",
     "root_dir\dir0\dir02\dir020",
     "root_dir\dir0\dir02\dir020\dir0200",
     "root_dir\dir1",
     "root_dir\dir1\file10",
     "root_dir\dir1\file11",
     "root_dir\dir1\file12",
     "root_dir\dir2",
     "root_dir\dir2\dir20",
     "root_dir\dir2\dir20\file200",
     "root_dir\dir2\file20",
     "root_dir\dir3",
     "root_dir\file0",
     "root_dir\file1"]
    11 ["dir0\dir00\dir000\file0000",
     "dir0\dir00\file000",
     "dir0\dir01\file010",
     "dir0\dir01\file011",
     "dir1\file10",
     "dir1\file11",
     "dir1\file12",
     "dir2\dir20\file200",
     "dir2\file20",
     "file0",
     "file1"]
    


  1. [Python 3]: os.scandir(path=".") (Python 3.5+, backport: [PyPI]: scandir)

    Return an iterator of os.DirEntry objects corresponding to the entries in the directory given by path. The entries are yielded in arbitrary order, and the special entries "." and ".." are not included.

    Using scandir() instead of listdir() can significantly increase the performance of code that also needs file type or file attribute information, because os.DirEntry objects expose this information if the operating system provides it when scanning a directory. All os.DirEntry methods may perform a system call, but is_dir() and is_file() usually only require a system call for symbolic links; os.DirEntry.stat() always requires a system call on Unix but only requires one for symbolic links on Windows.


    >>> import os
    >>> root_dir = os.path.join(".", "root_dir")  # Explicitly prepending current directory
    >>> root_dir
    ".\root_dir"
    >>>
    >>> scandir_iterator = os.scandir(root_dir)
    >>> scandir_iterator
    <nt.ScandirIterator object at 0x00000268CF4BC140>
    >>> [item.path for item in scandir_iterator]
    [".\root_dir\dir0", ".\root_dir\dir1", ".\root_dir\dir2", ".\root_dir\dir3", ".\root_dir\file0", ".\root_dir\file1"]
    >>>
    >>> [item.path for item in scandir_iterator]  # Will yield an empty list as it was consumed by previous iteration (automatically performed by the list comprehension)
    []
    >>>
    >>> scandir_iterator = os.scandir(root_dir)  # Reinitialize the generator
    >>> for item in scandir_iterator :
    ...     if os.path.isfile(item.path):
    ...             print(item.name)
    ...
    file0
    file1
    

    Notes:

    • It"s similar to os.listdir
    • But it"s also more flexible (and offers more functionality), more Pythonic (and in some cases, faster)


  1. [Python 3]: os.walk(top, topdown=True, onerror=None, followlinks=False)

    Generate the file names in a directory tree by walking the tree either top-down or bottom-up. For each directory in the tree rooted at directory top (including top itself), it yields a 3-tuple (dirpath, dirnames, filenames).


    >>> import os
    >>> root_dir = os.path.join(os.getcwd(), "root_dir")  # Specify the full path
    >>> root_dir
    "E:\Work\Dev\StackOverflow\q003207219\root_dir"
    >>>
    >>> walk_generator = os.walk(root_dir)
    >>> root_dir_entry = next(walk_generator)  # First entry corresponds to the root dir (passed as an argument)
    >>> root_dir_entry
    ("E:\Work\Dev\StackOverflow\q003207219\root_dir", ["dir0", "dir1", "dir2", "dir3"], ["file0", "file1"])
    >>>
    >>> root_dir_entry[1] + root_dir_entry[2]  # Display dirs and files (direct descendants) in a single list
    ["dir0", "dir1", "dir2", "dir3", "file0", "file1"]
    >>>
    >>> [os.path.join(root_dir_entry[0], item) for item in root_dir_entry[1] + root_dir_entry[2]]  # Display all the entries in the previous list by their full path
    ["E:\Work\Dev\StackOverflow\q003207219\root_dir\dir0", "E:\Work\Dev\StackOverflow\q003207219\root_dir\dir1", "E:\Work\Dev\StackOverflow\q003207219\root_dir\dir2", "E:\Work\Dev\StackOverflow\q003207219\root_dir\dir3", "E:\Work\Dev\StackOverflow\q003207219\root_dir\file0", "E:\Work\Dev\StackOverflow\q003207219\root_dir\file1"]
    >>>
    >>> for entry in walk_generator:  # Display the rest of the elements (corresponding to every subdir)
    ...     print(entry)
    ...
    ("E:\Work\Dev\StackOverflow\q003207219\root_dir\dir0", ["dir00", "dir01", "dir02"], [])
    ("E:\Work\Dev\StackOverflow\q003207219\root_dir\dir0\dir00", ["dir000"], ["file000"])
    ("E:\Work\Dev\StackOverflow\q003207219\root_dir\dir0\dir00\dir000", [], ["file0000"])
    ("E:\Work\Dev\StackOverflow\q003207219\root_dir\dir0\dir01", [], ["file010", "file011"])
    ("E:\Work\Dev\StackOverflow\q003207219\root_dir\dir0\dir02", ["dir020"], [])
    ("E:\Work\Dev\StackOverflow\q003207219\root_dir\dir0\dir02\dir020", ["dir0200"], [])
    ("E:\Work\Dev\StackOverflow\q003207219\root_dir\dir0\dir02\dir020\dir0200", [], [])
    ("E:\Work\Dev\StackOverflow\q003207219\root_dir\dir1", [], ["file10", "file11", "file12"])
    ("E:\Work\Dev\StackOverflow\q003207219\root_dir\dir2", ["dir20"], ["file20"])
    ("E:\Work\Dev\StackOverflow\q003207219\root_dir\dir2\dir20", [], ["file200"])
    ("E:\Work\Dev\StackOverflow\q003207219\root_dir\dir3", [], [])
    

    Notes:

    • Under the scenes, it uses os.scandir (os.listdir on older versions)
    • It does the heavy lifting by recurring in subfolders


  1. [Python 3]: glob.glob(pathname, *, recursive=False) ([Python 3]: glob.iglob(pathname, *, recursive=False))

    Return a possibly-empty list of path names that match pathname, which must be a string containing a path specification. pathname can be either absolute (like /usr/src/Python-1.5/Makefile) or relative (like ../../Tools/*/*.gif), and can contain shell-style wildcards. Broken symlinks are included in the results (as in the shell).
    ...
    Changed in version 3.5: Support for recursive globs using “**”.


    >>> import glob, os
    >>> wildcard_pattern = "*"
    >>> root_dir = os.path.join("root_dir", wildcard_pattern)  # Match every file/dir name
    >>> root_dir
    "root_dir\*"
    >>>
    >>> glob_list = glob.glob(root_dir)
    >>> glob_list
    ["root_dir\dir0", "root_dir\dir1", "root_dir\dir2", "root_dir\dir3", "root_dir\file0", "root_dir\file1"]
    >>>
    >>> [item.replace("root_dir" + os.path.sep, "") for item in glob_list]  # Strip the dir name and the path separator from begining
    ["dir0", "dir1", "dir2", "dir3", "file0", "file1"]
    >>>
    >>> for entry in glob.iglob(root_dir + "*", recursive=True):
    ...     print(entry)
    ...
    root_dir
    root_dirdir0
    root_dirdir0dir00
    root_dirdir0dir00dir000
    root_dirdir0dir00dir000file0000
    root_dirdir0dir00file000
    root_dirdir0dir01
    root_dirdir0dir01file010
    root_dirdir0dir01file011
    root_dirdir0dir02
    root_dirdir0dir02dir020
    root_dirdir0dir02dir020dir0200
    root_dirdir1
    root_dirdir1file10
    root_dirdir1file11
    root_dirdir1file12
    root_dirdir2
    root_dirdir2dir20
    root_dirdir2dir20file200
    root_dirdir2file20
    root_dirdir3
    root_dirfile0
    root_dirfile1
    

    Notes:

    • Uses os.listdir
    • For large trees (especially if recursive is on), iglob is preferred
    • Allows advanced filtering based on name (due to the wildcard)


  1. [Python 3]: class pathlib.Path(*pathsegments) (Python 3.4+, backport: [PyPI]: pathlib2)

    >>> import pathlib
    >>> root_dir = "root_dir"
    >>> root_dir_instance = pathlib.Path(root_dir)
    >>> root_dir_instance
    WindowsPath("root_dir")
    >>> root_dir_instance.name
    "root_dir"
    >>> root_dir_instance.is_dir()
    True
    >>>
    >>> [item.name for item in root_dir_instance.glob("*")]  # Wildcard searching for all direct descendants
    ["dir0", "dir1", "dir2", "dir3", "file0", "file1"]
    >>>
    >>> [os.path.join(item.parent.name, item.name) for item in root_dir_instance.glob("*") if not item.is_dir()]  # Display paths (including parent) for files only
    ["root_dir\file0", "root_dir\file1"]
    

    Notes:

    • This is one way of achieving our goal
    • It"s the OOP style of handling paths
    • Offers lots of functionalities


  1. [Python 2]: dircache.listdir(path) (Python 2 only)


    def listdir(path):
        """List directory contents, using cache."""
        try:
            cached_mtime, list = cache[path]
            del cache[path]
        except KeyError:
            cached_mtime, list = -1, []
        mtime = os.stat(path).st_mtime
        if mtime != cached_mtime:
            list = os.listdir(path)
            list.sort()
        cache[path] = mtime, list
        return list
    


  1. [man7]: OPENDIR(3) / [man7]: READDIR(3) / [man7]: CLOSEDIR(3) via [Python 3]: ctypes - A foreign function library for Python (POSIX specific)

    ctypes is a foreign function library for Python. It provides C compatible data types, and allows calling functions in DLLs or shared libraries. It can be used to wrap these libraries in pure Python.

    code_ctypes.py:

    #!/usr/bin/env python3
    
    import sys
    from ctypes import Structure, 
        c_ulonglong, c_longlong, c_ushort, c_ubyte, c_char, c_int, 
        CDLL, POINTER, 
        create_string_buffer, get_errno, set_errno, cast
    
    
    DT_DIR = 4
    DT_REG = 8
    
    char256 = c_char * 256
    
    
    class LinuxDirent64(Structure):
        _fields_ = [
            ("d_ino", c_ulonglong),
            ("d_off", c_longlong),
            ("d_reclen", c_ushort),
            ("d_type", c_ubyte),
            ("d_name", char256),
        ]
    
    LinuxDirent64Ptr = POINTER(LinuxDirent64)
    
    libc_dll = this_process = CDLL(None, use_errno=True)
    # ALWAYS set argtypes and restype for functions, otherwise it"s UB!!!
    opendir = libc_dll.opendir
    readdir = libc_dll.readdir
    closedir = libc_dll.closedir
    
    
    def get_dir_content(path):
        ret = [path, list(), list()]
        dir_stream = opendir(create_string_buffer(path.encode()))
        if (dir_stream == 0):
            print("opendir returned NULL (errno: {:d})".format(get_errno()))
            return ret
        set_errno(0)
        dirent_addr = readdir(dir_stream)
        while dirent_addr:
            dirent_ptr = cast(dirent_addr, LinuxDirent64Ptr)
            dirent = dirent_ptr.contents
            name = dirent.d_name.decode()
            if dirent.d_type & DT_DIR:
                if name not in (".", ".."):
                    ret[1].append(name)
            elif dirent.d_type & DT_REG:
                ret[2].append(name)
            dirent_addr = readdir(dir_stream)
        if get_errno():
            print("readdir returned NULL (errno: {:d})".format(get_errno()))
        closedir(dir_stream)
        return ret
    
    
    def main():
        print("{:s} on {:s}
    ".format(sys.version, sys.platform))
        root_dir = "root_dir"
        entries = get_dir_content(root_dir)
        print(entries)
    
    
    if __name__ == "__main__":
        main()
    

    Notes:

    • It loads the three functions from libc (loaded in the current process) and calls them (for more details check [SO]: How do I check whether a file exists without exceptions? (@CristiFati"s answer) - last notes from item #4.). That would place this approach very close to the Python / C edge
    • LinuxDirent64 is the ctypes representation of struct dirent64 from [man7]: dirent.h(0P) (so are the DT_ constants) from my machine: Ubtu 16 x64 (4.10.0-40-generic and libc6-dev:amd64). On other flavors/versions, the struct definition might differ, and if so, the ctypes alias should be updated, otherwise it will yield Undefined Behavior
    • It returns data in the os.walk"s format. I didn"t bother to make it recursive, but starting from the existing code, that would be a fairly trivial task
    • Everything is doable on Win as well, the data (libraries, functions, structs, constants, ...) differ


    Output:

    [[email protected]:~/Work/Dev/StackOverflow/q003207219]> ./code_ctypes.py
    3.5.2 (default, Nov 12 2018, 13:43:14)
    [GCC 5.4.0 20160609] on linux
    
    ["root_dir", ["dir2", "dir1", "dir3", "dir0"], ["file1", "file0"]]
    


  1. [ActiveState.Docs]: win32file.FindFilesW (Win specific)

    Retrieves a list of matching filenames, using the Windows Unicode API. An interface to the API FindFirstFileW/FindNextFileW/Find close functions.


    >>> import os, win32file, win32con
    >>> root_dir = "root_dir"
    >>> wildcard = "*"
    >>> root_dir_wildcard = os.path.join(root_dir, wildcard)
    >>> entry_list = win32file.FindFilesW(root_dir_wildcard)
    >>> len(entry_list)  # Don"t display the whole content as it"s too long
    8
    >>> [entry[-2] for entry in entry_list]  # Only display the entry names
    [".", "..", "dir0", "dir1", "dir2", "dir3", "file0", "file1"]
    >>>
    >>> [entry[-2] for entry in entry_list if entry[0] & win32con.FILE_ATTRIBUTE_DIRECTORY and entry[-2] not in (".", "..")]  # Filter entries and only display dir names (except self and parent)
    ["dir0", "dir1", "dir2", "dir3"]
    >>>
    >>> [os.path.join(root_dir, entry[-2]) for entry in entry_list if entry[0] & (win32con.FILE_ATTRIBUTE_NORMAL | win32con.FILE_ATTRIBUTE_ARCHIVE)]  # Only display file "full" names
    ["root_dir\file0", "root_dir\file1"]
    

    Notes:


  1. Install some (other) third-party package that does the trick
    • Most likely, will rely on one (or more) of the above (maybe with slight customizations)


Notes:

  • Code is meant to be portable (except places that target a specific area - which are marked) or cross:

    • platform (Nix, Win, )
    • Python version (2, 3, )
  • Multiple path styles (absolute, relatives) were used across the above variants, to illustrate the fact that the "tools" used are flexible in this direction

  • os.listdir and os.scandir use opendir / readdir / closedir ([MS.Docs]: FindFirstFileW function / [MS.Docs]: FindNextFileW function / [MS.Docs]: FindClose function) (via [GitHub]: python/cpython - (master) cpython/Modules/posixmodule.c)

  • win32file.FindFilesW uses those (Win specific) functions as well (via [GitHub]: mhammond/pywin32 - (master) pywin32/win32/src/win32file.i)

  • _get_dir_content (from point #1.) can be implemented using any of these approaches (some will require more work and some less)

    • Some advanced filtering (instead of just file vs. dir) could be done: e.g. the include_folders argument could be replaced by another one (e.g. filter_func) which would be a function that takes a path as an argument: filter_func=lambda x: True (this doesn"t strip out anything) and inside _get_dir_content something like: if not filter_func(entry_with_path): continue (if the function fails for one entry, it will be skipped), but the more complex the code becomes, the longer it will take to execute
  • Nota bene! Since recursion is used, I must mention that I did some tests on my laptop (Win 10 x64), totally unrelated to this problem, and when the recursion level was reaching values somewhere in the (990 .. 1000) range (recursionlimit - 1000 (default)), I got StackOverflow :). If the directory tree exceeds that limit (I am not an FS expert, so I don"t know if that is even possible), that could be a problem.
    I must also mention that I didn"t try to increase recursionlimit because I have no experience in the area (how much can I increase it before having to also increase the stack at OS level), but in theory there will always be the possibility for failure, if the dir depth is larger than the highest possible recursionlimit (on that machine)

  • The code samples are for demonstrative purposes only. That means that I didn"t take into account error handling (I don"t think there"s any try / except / else / finally block), so the code is not robust (the reason is: to keep it as simple and short as possible). For production, error handling should be added as well

Other approaches:

  1. Use Python only as a wrapper

    • Everything is done using another technology
    • That technology is invoked from Python
    • The most famous flavor that I know is what I call the system administrator approach:

      • Use Python (or any programming language for that matter) in order to execute shell commands (and parse their outputs)
      • Some consider this a neat hack
      • I consider it more like a lame workaround (gainarie), as the action per se is performed from shell (cmd in this case), and thus doesn"t have anything to do with Python.
      • Filtering (grep / findstr) or output formatting could be done on both sides, but I"m not going to insist on it. Also, I deliberately used os.system instead of subprocess.Popen.
      (py35x64_test) E:WorkDevStackOverflowq003207219>"e:WorkDevVEnvspy35x64_testScriptspython.exe" -c "import os;os.system("dir /b root_dir")"
      dir0
      dir1
      dir2
      dir3
      file0
      file1
      

    In general this approach is to be avoided, since if some command output format slightly differs between OS versions/flavors, the parsing code should be adapted as well; not to mention differences between locales).

Answer #6

To get the full path to the directory a Python file is contained in, write this in that file:

import os 
dir_path = os.path.dirname(os.path.realpath(__file__))

(Note that the incantation above won"t work if you"ve already used os.chdir() to change your current working directory, since the value of the __file__ constant is relative to the current working directory and is not changed by an os.chdir() call.)


To get the current working directory use

import os
cwd = os.getcwd()

Documentation references for the modules, constants and functions used above:

Answer #7

The special variable _file_ contains the path to the current file. From that we can get the directory using either Pathlib or the os.path module.

Python 3

For the directory of the script being run:

import pathlib
pathlib.Path(__file__).parent.resolve()

For the current working directory:

import pathlib
pathlib.Path().resolve()

Python 2 and 3

For the directory of the script being run:

import os
os.path.dirname(os.path.abspath(__file__))

If you mean the current working directory:

import os
os.path.abspath(os.getcwd())

Note that before and after file is two underscores, not just one.

Also note that if you are running interactively or have loaded code from something other than a file (eg: a database or online resource), __file__ may not be set since there is no notion of "current file". The above answer assumes the most common scenario of running a python script that is in a file.

References

  1. pathlib in the python documentation.
  2. os.path - Python 2.7, os.path - Python 3
  3. os.getcwd - Python 2.7, os.getcwd - Python 3
  4. what does the __file__ variable mean/do?

Answer #8

You"re looking for os.path.isdir, or os.path.exists if you don"t care whether it"s a file or a directory:

>>> import os
>>> os.path.isdir("new_folder")
True
>>> os.path.exists(os.path.join(os.getcwd(), "new_folder", "file.txt"))
False

Alternatively, you can use pathlib:

 >>> from pathlib import Path
 >>> Path("new_folder").is_dir()
 True
 >>> (Path.cwd() / "new_folder" / "file.txt").exists()
 False

Answer #9

Use nargs="?" (or nargs="*" if you need more than one dir)

parser.add_argument("dir", nargs="?", default=os.getcwd())

extended example:

>>> import os, argparse
>>> parser = argparse.ArgumentParser()
>>> parser.add_argument("-v", action="store_true")
_StoreTrueAction(option_strings=["-v"], dest="v", nargs=0, const=True, default=False, type=None, choices=None, help=None, metavar=None)
>>> parser.add_argument("dir", nargs="?", default=os.getcwd())
_StoreAction(option_strings=[], dest="dir", nargs="?", const=None, default="/home/vinay", type=None, choices=None, help=None, metavar=None)
>>> parser.parse_args("somedir -v".split())
Namespace(dir="somedir", v=True)
>>> parser.parse_args("-v".split())
Namespace(dir="/home/vinay", v=True)
>>> parser.parse_args("".split())
Namespace(dir="/home/vinay", v=False)
>>> parser.parse_args(["somedir"])
Namespace(dir="somedir", v=False)
>>> parser.parse_args("somedir -h -v".split())
usage: [-h] [-v] [dir]

positional arguments:
  dir

optional arguments:
  -h, --help  show this help message and exit
  -v

Answer #10

unfortunately, this module needs to be inside the package, and it also needs to be runnable as a script, sometimes. Any idea how I could achieve that?

It"s quite common to have a layout like this...

main.py
mypackage/
    __init__.py
    mymodule.py
    myothermodule.py

...with a mymodule.py like this...

#!/usr/bin/env python3

# Exported function
def as_int(a):
    return int(a)

# Test function for module  
def _test():
    assert as_int("1") == 1

if __name__ == "__main__":
    _test()

...a myothermodule.py like this...

#!/usr/bin/env python3

from .mymodule import as_int

# Exported function
def add(a, b):
    return as_int(a) + as_int(b)

# Test function for module  
def _test():
    assert add("1", "1") == 2

if __name__ == "__main__":
    _test()

...and a main.py like this...

#!/usr/bin/env python3

from mypackage.myothermodule import add

def main():
    print(add("1", "1"))

if __name__ == "__main__":
    main()

...which works fine when you run main.py or mypackage/mymodule.py, but fails with mypackage/myothermodule.py, due to the relative import...

from .mymodule import as_int

The way you"re supposed to run it is...

python3 -m mypackage.myothermodule

...but it"s somewhat verbose, and doesn"t mix well with a shebang line like #!/usr/bin/env python3.

The simplest fix for this case, assuming the name mymodule is globally unique, would be to avoid using relative imports, and just use...

from mymodule import as_int

...although, if it"s not unique, or your package structure is more complex, you"ll need to include the directory containing your package directory in PYTHONPATH, and do it like this...

from mypackage.mymodule import as_int

...or if you want it to work "out of the box", you can frob the PYTHONPATH in code first with this...

import sys
import os

PACKAGE_PARENT = ".."
SCRIPT_DIR = os.path.dirname(os.path.realpath(os.path.join(os.getcwd(), os.path.expanduser(__file__))))
sys.path.append(os.path.normpath(os.path.join(SCRIPT_DIR, PACKAGE_PARENT)))

from mypackage.mymodule import as_int

It"s kind of a pain, but there"s a clue as to why in an email written by a certain Guido van Rossum...

I"m -1 on this and on any other proposed twiddlings of the __main__ machinery. The only use case seems to be running scripts that happen to be living inside a module"s directory, which I"ve always seen as an antipattern. To make me change my mind you"d have to convince me that it isn"t.

Whether running scripts inside a package is an antipattern or not is subjective, but personally I find it really useful in a package I have which contains some custom wxPython widgets, so I can run the script for any of the source files to display a wx.Frame containing only that widget for testing purposes.

Get Solution for free from DataCamp guru