os.path.basename() in Python is used to get the base name at a given path. This method internally uses the os.path.split ( () method to split the specified path into a pair (head, tail) .
os.path.basename() returns the tail after splitting the specified path into a pair (head, tail) .
Syntax: os.path.basename (path)
path : A path-like object representing a file system path.
Return Type: em > This method returns a string value which represents the base name the specified path.
Code: method usage
Documents file.txt file.txt
What is the difference between
I already searched for answers and read some links, but didn"t understand. Can anyone give a simple explanation?
You can use
Return the directory name of pathname path. This is the first element of the pair returned by passing path to the function split().
And given the full path, then you can split normally to get the last portion of the path. For example, by using
Return the base name of pathname path. This is the second element of the pair returned by passing path to the function split(). Note that the result of this function is different from the Unix basename program; where basename for "/foo/bar/" returns "bar", the basename() function returns an empty string ("").
>>> import os >>> path=os.path.dirname("C:/folder1/folder2/filename.xml") >>> path "C:/folder1/folder2" >>> os.path.basename(path) "folder2"
For completeness" sake, I thought it would be worthwhile summarizing the various possible outcomes and supplying references for the exact behaviour of each.
The answer is composed of four sections:
A list of different approaches that return the full path to the currently executing script.
A caveat regarding handling of relative paths.
A recommendation regarding handling of symbolic links.
An account of a few methods that could be used to extract the actual file name, with or without its suffix, from the full file path.
__file__ is the currently executing file, as detailed in the official documentation:
__file__is the pathname of the file from which the module was loaded, if it was loaded from a file. The
__file__attribute may be missing for certain types of modules, such as C modules that are statically linked into the interpreter; for extension modules loaded dynamically from a shared library, it is the pathname of the shared library file.
From Python3.4 onwards, per issue 18416,
__file__ is always an absolute path, unless the currently executing file is a script that has been executed directly (not via the interpreter with the
-m command line option) using a relative path.
__main__.__file__ (requires importing
__main__) simply accesses the aforementioned
__file__ attribute of the main module, e.g. of the script that was invoked from the command line.
sys.argv (requires importing
sys) is the script name that was invoked from the command line, and might be an absolute path, as detailed in the official documentation:
argvis the script name (it is operating system dependent whether this is a full pathname or not). If the command was executed using the
-ccommand line option to the interpreter,
argvis set to the string
"-c". If no script name was passed to the Python interpreter,
argvis the empty string.
As mentioned in another answer to this question, Python scripts that were converted into stand-alone executable programs via tools such as py2exe or PyInstaller might not display the desired result when using this approach (i.e.
sys.argv would hold the name of the executable rather than the name of the main Python file within that executable).
If none of the aforementioned options seem to work, probably due to an atypical execution process or an irregular import operation, the inspect module might prove useful. In particular, invoking
inspect.stack()[-1] should work, although it would raise an exception when running in an implementation without Python stack frame.
From Python3.6 onwards, and as detailed in another answer to this question, it"s possible to install an external open source library, lib_programname, which is tailored to provide a complete solution to this problem.
This library iterates through all of the approaches listed above until a valid path is returned. If all of them fail, it raises an exception. It also tries to address various pitfalls, such as invocations via the pytest framework or the pydoc module.
import lib_programname # this returns the fully resolved path to the launched python program path_to_program = lib_programname.get_path_executed_script() # type: pathlib.Path
When dealing with an approach that happens to return a relative path, it might be tempting to invoke various path manipulation functions, such as
os.path.realpath(...) in order to extract the full or real path.
However, these methods rely on the current path in order to derive the full path. Thus, if a program first changes the current working directory, for example via
os.chdir(...), and only then invokes these methods, they would return an incorrect path.
If the current script is a symbolic link, then all of the above would return the path of the symbolic link rather than the path of the real file and
os.path.realpath(...) should be invoked in order to extract the latter.
os.path.basename(...) may be invoked on any of the above in order to extract the actual file name and
os.path.splitext(...) may be invoked on the actual file name in order to truncate its suffix, as in
From Python 3.4 onwards, per PEP 428, the
PurePath class of the
pathlib module may be used as well on any of the above. Specifically,
pathlib.PurePath(...).name extracts the actual file name and
pathlib.PurePath(...).stem extracts the actual file name without its suffix.
S3 is an object storage, it doesn"t have real directory structure. The "/" is rather cosmetic. One reason that people want to have a directory structure, because they can maintain/prune/add a tree to the application. For S3, you treat such structure as sort of index or search tag.
To manipulate object in S3, you need boto3.client or boto3.resource, e.g. To list all object
import boto3 s3 = boto3.client("s3") all_objects = s3.list_objects(Bucket = "bucket-name")
In fact, if the s3 object name is stored using "/" separator. The more recent version of list_objects (list_objects_v2) allows you to limit the response to keys that begin with the specified prefix.
To limit the items to items under certain sub-folders:
import boto3 s3 = boto3.client("s3") response = s3.list_objects_v2( Bucket=BUCKET, Prefix ="DIR1/DIR2", MaxKeys=100 )
Another option is using python os.path function to extract the folder prefix. Problem is that this will require listing objects from undesired directories.
import os s3_key = "first-level/1456753904534/part-00014" filename = os.path.basename(s3_key) foldername = os.path.dirname(s3_key) # if you are not using conventional delimiter like "#" s3_key = "first-level#1456753904534#part-00014" filename = s3_key.split("#")[-1]
A reminder about boto3 : boto3.resource is a nice high level API. There are pros and cons using boto3.client vs boto3.resource. If you develop internal shared library, using boto3.resource will give you a blackbox layer over the resources used.
Here is a summary of experiments with Python 2 and 3. With
main.py - runs foo.py
foo.py - runs lib/bar.py
lib/bar.py - prints filepath expressions
| Python | Run statement | Filepath expression | |--------+---------------------+----------------------------------------| | 2 | execfile | os.path.abspath(inspect.stack()) | | 2 | from lib import bar | __file__ | | 3 | exec | (wasn"t able to obtain it) | | 3 | import lib.bar | __file__ |
For Python 2, it might be clearer to switch to packages so can use
from lib import bar - just add empty
__init__.py files to the two folders.
For Python 3,
execfile doesn"t exist - the nearest alternative is
exec(open(<filename>).read()), though this affects the stack frames. It"s simplest to just use
import foo and
import lib.bar - no
__init__.py files needed.
Here is an experiment based on the answers in this thread - with Python 2.7.10 on Windows.
The stack-based ones are the only ones that seem to give reliable results. The last two have the shortest syntax, i.e. -
print os.path.abspath(inspect.stack()) # C:filepathslibar.py print os.path.dirname(os.path.abspath(inspect.stack())) # C:filepathslib
Here"s to these being added to sys as functions! Credit to @Usagi and @pablog
Based on the following three files, and running main.py from its folder with
python main.py (also tried execfiles with absolute paths and calling from a separate folder).
import sys import os import inspect print "Python " + sys.version print print __file__ # main.py print sys.argv # main.py print inspect.stack() # lib/bar.py print sys.path # C:filepaths print print os.path.realpath(__file__) # C:filepathsmain.py print os.path.abspath(__file__) # C:filepathsmain.py print os.path.basename(__file__) # main.py print os.path.basename(os.path.realpath(sys.argv)) # main.py print print sys.path # C:filepaths print os.path.abspath(os.path.split(sys.argv)) # C:filepaths print os.path.dirname(os.path.abspath(__file__)) # C:filepaths print os.path.dirname(os.path.realpath(sys.argv)) # C:filepaths print os.path.dirname(__file__) # (empty string) print print inspect.getfile(inspect.currentframe()) # lib/bar.py print os.path.abspath(inspect.getfile(inspect.currentframe())) # C:filepathslibar.py print os.path.dirname(os.path.abspath(inspect.getfile(inspect.currentframe()))) # C:filepathslib print print os.path.abspath(inspect.stack()) # C:filepathslibar.py print os.path.dirname(os.path.abspath(inspect.stack())) # C:filepathslib print
exec system call of the Linux kernel understands shebangs (
When you do on bash:
on Linux, this calls the
exec system call with the path
This line of the kernel gets called on the file passed to
if ((bprm->buf != "#") || (bprm->buf != "!"))
It reads the very first bytes of the file, and compares them to
If the comparison is true, then the rest of the line is parsed by the Linux kernel, which makes another
exec call with:
therefore equivalent to:
/usr/bin/env python /path/to/script.py
env is an executable that searches
PATH to e.g. find
/usr/bin/python, and then finally calls:
The Python interpreter does see the
#! line in the file, but
# is the comment character in Python, so that line just gets ignored as a regular comment.
And yes, you can make an infinite loop with:
printf "#!/a " | sudo tee /a sudo chmod +x /a /a
Bash recognizes the error:
-bash: /a: /a: bad interpreter: Too many levels of symbolic links
#! just happens to be human readable, but that is not required.
If the file started with different bytes, then the
exec system call would use a different handler. The other most important built-in handler is for ELF executable files: https://github.com/torvalds/linux/blob/v4.8/fs/binfmt_elf.c#L1305 which checks for bytes
7f 45 4c 46 (which also happens to be human readable for
.ELF). Let"s confirm that by reading the 4 first bytes of
/bin/ls, which is an ELF executable:
head -c 4 "$(which ls)" | hd
00000000 7f 45 4c 46 |.ELF| 00000004
So when the kernel sees those bytes, it takes the ELF file, puts it into memory correctly, and starts a new process with it. See also: How does kernel get an executable binary file running under linux?
Finally, you can add your own shebang handlers with the
binfmt_misc mechanism. For example, you can add a custom handler for
.jar files. This mechanism even supports handlers by file extension. Another application is to transparently run executables of a different architecture with QEMU.
I don"t think POSIX specifies shebangs however: https://unix.stackexchange.com/a/346214/32558 , although it does mention in on rationale sections, and in the form "if executable scripts are supported by the system something may happen". macOS and FreeBSD also seem to implement it however.
PATH search motivation
Likely, one big motivation for the existence of shebangs is the fact that in Linux, we often want to run commands from
PATH just as:
But then, without the shebang mechanism, how would Linux know how to launch each type of file?
Hardcoding the extension in commands:
or implementing PATH search on every interpreter:
would be a possibility, but this has the major problem that everything breaks if we ever decide to refactor the command into another language.
Shebangs solve this problem beautifully.
Major use case of
pyenv and other version managers
One major use case of why you should use
#!/usr/bin/env python instead of just
/usr/bin/python is that of version managers with
pyenv allows you to easily install multiple python versions on a single machine, to be able to better reproduce other projects without virtualization.
Then, it manages the "current" python version by setting its order in the PATH: e.g. as shown at apt-get install for different python versions a pyenv managed python could be located at:
so nowhere close to
/usr/bin/python, which some systems might deal with via
Actually, there"s a function that returns exactly what you want
import os print(os.path.basename(your_path))
os.path.basename() is used on a POSIX system to get the base name from a Windows styled path (e.g.
"C:\my\file.txt"), the entire path will be returned.
Example below from interactive python shell running on a Linux host:
Python 3.8.2 (default, Mar 13 2020, 10:14:16) [GCC 9.3.0] on linux Type "help", "copyright", "credits" or "license" for more information. >>> import os >>> filepath = "C:\my\path\to\file.txt" # A Windows style file path. >>> os.path.basename(filepath) "C:\my\path\to\file.txt"
os.path.basename as others suggest won"t work in all cases: if you"re running the script on Linux and attempt to process a classic windows-style path, it will fail.
Windows paths can use either backslash or forward slash as path separator. Therefore, the
ntpath module (which is equivalent to os.path when running on windows) will work for all(1) paths on all platforms.
import ntpath ntpath.basename("a/b/c")
Of course, if the file ends with a slash, the basename will be empty, so make your own function to deal with it:
def path_leaf(path): head, tail = ntpath.split(path) return tail or ntpath.basename(head)
>>> paths = ["a/b/c/", "a/b/c", "\a\b\c", "\a\b\c\", "a\b\c", ... "a/b/../../a/b/c/", "a/b/../../a/b/c"] >>> [path_leaf(path) for path in paths] ["c", "c", "c", "c", "c", "c", "c"]
(1) There"s one caveat: Linux filenames may contain backslashes. So on linux,
r"a/bc" always refers to the file
bc in the
a folder, while on Windows, it always refers to the
c file in the
b subfolder of the
a folder. So when both forward and backward slashes are used in a path, you need to know the associated platform to be able to interpret it correctly. In practice it"s usually safe to assume it"s a windows path since backslashes are seldom used in Linux filenames, but keep this in mind when you code so you don"t create accidental security holes.
dstto be a directory (instead of the complete target filename), in which case the basename of
srcis used for creating the new file;
Here is a short example:
import shutil shutil.copy2("/src/dir/file.ext", "/dst/dir/newname.ext") # complete target filename given shutil.copy2("/src/file.ext", "/dst/dir") # target filename is /dst/dir/file.ext
You can use
__file__ to get the name of the current file. When used in the main module, this is the name of the script that was originally invoked.
If you want to omit the directory part (which might be present), you can use
You can make your own with:
>>> import os >>> base=os.path.basename("/root/dir/sub/file.ext") >>> base "file.ext" >>> os.path.splitext(base) ("file", ".ext") >>> os.path.splitext(base) "file"
Important note: If there is more than one
. in the filename, only the last one is removed. For example:
/root/dir/sub/file.ext.zip -> file.ext /root/dir/sub/file.ext.tar.gz -> file.ext.tar
See below for other answers that address that.