Python | os.path.basename () method

basename | Python Methods and Functions

os.path.basename() in Python is used to get the base name at a given path. This method internally uses the os.path.split ( () method to split the specified path into a pair (head, tail) os.path.basename() returns the tail after splitting the specified path into a pair (head, tail) .

Syntax: os.path.basename (path)

Parameter:
path : A path-like object representing a file system path.

Return Type: This method returns a string value which represents the base name the specified path.

Code: method usage os.path.basename ()

# Python program to explain the method os.path.basename ( )

 
# import of the os.path module

import os.path

 
 # Path

path = '/ home / User / Documents'

 

 
# Above the specified path
# will be split into
# (head, tail) pair as
# (& # 39; / home / User & # 39 ;, & # 39; Documents & # 39;)

 
# Get base name
# of the specified path

basename = os.path.basename (path)

  
# Print base name

print (basename)

  

  
# Path

path = '/ home / User / Documents / file.txt'

 
# Above the specified path
# will be split into
# (head, tail) pair as
# (& # 39; / home / User / Documents & # 39 ;, & # 39; file.txt & # 39;)

 
# Get base name
# of the specified path

basename = os.path.basename (path)

 
# Print base name name

print (basename)

 

 
# Path

path = 'file.txt'

  

 
# The above path
# will be split into
# steam head and tail
# as (& # 39; & # 39 ;, & # 39; file.txt & # 39;)
# so file.txt will be printed

  
# Get base name
# of the specified path

basename = os.path.basename (path)

  
# Print base name

print (basename)

Output:

 Documents file.txt file.txt 

Link: https://docs.python.org/3/library/os.path.html





Python | os.path.basename () method: StackOverflow Questions

What is the difference between os.path.basename() and os.path.dirname()?

What is the difference between os.path.basename() and os.path.dirname()?

I already searched for answers and read some links, but didn"t understand. Can anyone give a simple explanation?

Answer #1

You can use dirname:

os.path.dirname(path)

Return the directory name of pathname path. This is the first element of the pair returned by passing path to the function split().

And given the full path, then you can split normally to get the last portion of the path. For example, by using basename:

os.path.basename(path)

Return the base name of pathname path. This is the second element of the pair returned by passing path to the function split(). Note that the result of this function is different from the Unix basename program; where basename for "/foo/bar/" returns "bar", the basename() function returns an empty string ("").


All together:

>>> import os
>>> path=os.path.dirname("C:/folder1/folder2/filename.xml")
>>> path
"C:/folder1/folder2"
>>> os.path.basename(path)
"folder2"

Answer #2

For completeness" sake, I thought it would be worthwhile summarizing the various possible outcomes and supplying references for the exact behaviour of each.

The answer is composed of four sections:

  1. A list of different approaches that return the full path to the currently executing script.

  2. A caveat regarding handling of relative paths.

  3. A recommendation regarding handling of symbolic links.

  4. An account of a few methods that could be used to extract the actual file name, with or without its suffix, from the full file path.


Extracting the full file path

  • __file__ is the currently executing file, as detailed in the official documentation:

    __file__ is the pathname of the file from which the module was loaded, if it was loaded from a file. The __file__ attribute may be missing for certain types of modules, such as C modules that are statically linked into the interpreter; for extension modules loaded dynamically from a shared library, it is the pathname of the shared library file.

    From Python3.4 onwards, per issue 18416, __file__ is always an absolute path, unless the currently executing file is a script that has been executed directly (not via the interpreter with the -m command line option) using a relative path.

  • __main__.__file__ (requires importing __main__) simply accesses the aforementioned __file__ attribute of the main module, e.g. of the script that was invoked from the command line.

    From Python3.9 onwards, per issue 20443, the __file__ attribute of the __main__ module became an absolute path, rather than a relative path.

  • sys.argv[0] (requires importing sys) is the script name that was invoked from the command line, and might be an absolute path, as detailed in the official documentation:

    argv[0] is the script name (it is operating system dependent whether this is a full pathname or not). If the command was executed using the -c command line option to the interpreter, argv[0] is set to the string "-c". If no script name was passed to the Python interpreter, argv[0] is the empty string.

    As mentioned in another answer to this question, Python scripts that were converted into stand-alone executable programs via tools such as py2exe or PyInstaller might not display the desired result when using this approach (i.e. sys.argv[0] would hold the name of the executable rather than the name of the main Python file within that executable).

  • If none of the aforementioned options seem to work, probably due to an atypical execution process or an irregular import operation, the inspect module might prove useful. In particular, invoking inspect.stack()[-1][1] should work, although it would raise an exception when running in an implementation without Python stack frame.

  • From Python3.6 onwards, and as detailed in another answer to this question, it"s possible to install an external open source library, lib_programname, which is tailored to provide a complete solution to this problem.

    This library iterates through all of the approaches listed above until a valid path is returned. If all of them fail, it raises an exception. It also tries to address various pitfalls, such as invocations via the pytest framework or the pydoc module.

    import lib_programname
    # this returns the fully resolved path to the launched python program
    path_to_program = lib_programname.get_path_executed_script()  # type: pathlib.Path
    

Handling relative paths

When dealing with an approach that happens to return a relative path, it might be tempting to invoke various path manipulation functions, such as os.path.abspath(...) or os.path.realpath(...) in order to extract the full or real path.

However, these methods rely on the current path in order to derive the full path. Thus, if a program first changes the current working directory, for example via os.chdir(...), and only then invokes these methods, they would return an incorrect path.


Handling symbolic links

If the current script is a symbolic link, then all of the above would return the path of the symbolic link rather than the path of the real file and os.path.realpath(...) should be invoked in order to extract the latter.


Further manipulations that extract the actual file name

os.path.basename(...) may be invoked on any of the above in order to extract the actual file name and os.path.splitext(...) may be invoked on the actual file name in order to truncate its suffix, as in os.path.splitext(os.path.basename(...)).

From Python 3.4 onwards, per PEP 428, the PurePath class of the pathlib module may be used as well on any of the above. Specifically, pathlib.PurePath(...).name extracts the actual file name and pathlib.PurePath(...).stem extracts the actual file name without its suffix.

Answer #3

S3 is an object storage, it doesn"t have real directory structure. The "/" is rather cosmetic. One reason that people want to have a directory structure, because they can maintain/prune/add a tree to the application. For S3, you treat such structure as sort of index or search tag.

To manipulate object in S3, you need boto3.client or boto3.resource, e.g. To list all object

import boto3 
s3 = boto3.client("s3")
all_objects = s3.list_objects(Bucket = "bucket-name") 

http://boto3.readthedocs.org/en/latest/reference/services/s3.html#S3.Client.list_objects

In fact, if the s3 object name is stored using "/" separator. The more recent version of list_objects (list_objects_v2) allows you to limit the response to keys that begin with the specified prefix.

To limit the items to items under certain sub-folders:

    import boto3 
    s3 = boto3.client("s3")
    response = s3.list_objects_v2(
            Bucket=BUCKET,
            Prefix ="DIR1/DIR2",
            MaxKeys=100 )

Documentation

Another option is using python os.path function to extract the folder prefix. Problem is that this will require listing objects from undesired directories.

import os
s3_key = "first-level/1456753904534/part-00014"
filename = os.path.basename(s3_key) 
foldername = os.path.dirname(s3_key)

# if you are not using conventional delimiter like "#" 
s3_key = "first-level#1456753904534#part-00014"
filename = s3_key.split("#")[-1]

A reminder about boto3 : boto3.resource is a nice high level API. There are pros and cons using boto3.client vs boto3.resource. If you develop internal shared library, using boto3.resource will give you a blackbox layer over the resources used.

Answer #4

Update 2018-11-28:

Here is a summary of experiments with Python 2 and 3. With

main.py - runs foo.py
foo.py - runs lib/bar.py
lib/bar.py - prints filepath expressions

| Python | Run statement       | Filepath expression                    |
|--------+---------------------+----------------------------------------|
|      2 | execfile            | os.path.abspath(inspect.stack()[0][1]) |
|      2 | from lib import bar | __file__                               |
|      3 | exec                | (wasn"t able to obtain it)             |
|      3 | import lib.bar      | __file__                               |

For Python 2, it might be clearer to switch to packages so can use from lib import bar - just add empty __init__.py files to the two folders.

For Python 3, execfile doesn"t exist - the nearest alternative is exec(open(<filename>).read()), though this affects the stack frames. It"s simplest to just use import foo and import lib.bar - no __init__.py files needed.

See also Difference between import and execfile


Original Answer:

Here is an experiment based on the answers in this thread - with Python 2.7.10 on Windows.

The stack-based ones are the only ones that seem to give reliable results. The last two have the shortest syntax, i.e. -

print os.path.abspath(inspect.stack()[0][1])                   # C:filepathslibar.py
print os.path.dirname(os.path.abspath(inspect.stack()[0][1]))  # C:filepathslib

Here"s to these being added to sys as functions! Credit to @Usagi and @pablog

Based on the following three files, and running main.py from its folder with python main.py (also tried execfiles with absolute paths and calling from a separate folder).

C:filepathsmain.py: execfile("foo.py")
C:filepathsfoo.py: execfile("lib/bar.py")
C:filepathslibar.py:

import sys
import os
import inspect

print "Python " + sys.version
print

print __file__                                        # main.py
print sys.argv[0]                                     # main.py
print inspect.stack()[0][1]                           # lib/bar.py
print sys.path[0]                                     # C:filepaths
print

print os.path.realpath(__file__)                      # C:filepathsmain.py
print os.path.abspath(__file__)                       # C:filepathsmain.py
print os.path.basename(__file__)                      # main.py
print os.path.basename(os.path.realpath(sys.argv[0])) # main.py
print

print sys.path[0]                                     # C:filepaths
print os.path.abspath(os.path.split(sys.argv[0])[0])  # C:filepaths
print os.path.dirname(os.path.abspath(__file__))      # C:filepaths
print os.path.dirname(os.path.realpath(sys.argv[0]))  # C:filepaths
print os.path.dirname(__file__)                       # (empty string)
print

print inspect.getfile(inspect.currentframe())         # lib/bar.py

print os.path.abspath(inspect.getfile(inspect.currentframe())) # C:filepathslibar.py
print os.path.dirname(os.path.abspath(inspect.getfile(inspect.currentframe()))) # C:filepathslib
print

print os.path.abspath(inspect.stack()[0][1])          # C:filepathslibar.py
print os.path.dirname(os.path.abspath(inspect.stack()[0][1]))  # C:filepathslib
print

Answer #5

The exec system call of the Linux kernel understands shebangs (#!) natively

When you do on bash:

./something

on Linux, this calls the exec system call with the path ./something.

This line of the kernel gets called on the file passed to exec: https://github.com/torvalds/linux/blob/v4.8/fs/binfmt_script.c#L25

if ((bprm->buf[0] != "#") || (bprm->buf[1] != "!"))

It reads the very first bytes of the file, and compares them to #!.

If the comparison is true, then the rest of the line is parsed by the Linux kernel, which makes another exec call with:

  • executable: /usr/bin/env
  • first argument: python
  • second argument: script path

therefore equivalent to:

/usr/bin/env python /path/to/script.py

env is an executable that searches PATH to e.g. find /usr/bin/python, and then finally calls:

/usr/bin/python /path/to/script.py

The Python interpreter does see the #! line in the file, but # is the comment character in Python, so that line just gets ignored as a regular comment.

And yes, you can make an infinite loop with:

printf "#!/a
" | sudo tee /a
sudo chmod +x /a
/a

Bash recognizes the error:

-bash: /a: /a: bad interpreter: Too many levels of symbolic links

#! just happens to be human readable, but that is not required.

If the file started with different bytes, then the exec system call would use a different handler. The other most important built-in handler is for ELF executable files: https://github.com/torvalds/linux/blob/v4.8/fs/binfmt_elf.c#L1305 which checks for bytes 7f 45 4c 46 (which also happens to be human readable for .ELF). Let"s confirm that by reading the 4 first bytes of /bin/ls, which is an ELF executable:

head -c 4 "$(which ls)" | hd 

output:

00000000  7f 45 4c 46                                       |.ELF|
00000004                                                                 

So when the kernel sees those bytes, it takes the ELF file, puts it into memory correctly, and starts a new process with it. See also: How does kernel get an executable binary file running under linux?

Finally, you can add your own shebang handlers with the binfmt_misc mechanism. For example, you can add a custom handler for .jar files. This mechanism even supports handlers by file extension. Another application is to transparently run executables of a different architecture with QEMU.

I don"t think POSIX specifies shebangs however: https://unix.stackexchange.com/a/346214/32558 , although it does mention in on rationale sections, and in the form "if executable scripts are supported by the system something may happen". macOS and FreeBSD also seem to implement it however.

PATH search motivation

Likely, one big motivation for the existence of shebangs is the fact that in Linux, we often want to run commands from PATH just as:

basename-of-command

instead of:

/full/path/to/basename-of-command

But then, without the shebang mechanism, how would Linux know how to launch each type of file?

Hardcoding the extension in commands:

 basename-of-command.py

or implementing PATH search on every interpreter:

python basename-of-command

would be a possibility, but this has the major problem that everything breaks if we ever decide to refactor the command into another language.

Shebangs solve this problem beautifully.

Major use case of env: pyenv and other version managers

One major use case of why you should use #!/usr/bin/env python instead of just /usr/bin/python is that of version managers with pyenv.

pyenv allows you to easily install multiple python versions on a single machine, to be able to better reproduce other projects without virtualization.

Then, it manages the "current" python version by setting its order in the PATH: e.g. as shown at apt-get install for different python versions a pyenv managed python could be located at:

/home/ciro/.pyenv/shims/python

so nowhere close to /usr/bin/python, which some systems might deal with via update-alternatives symlinks.

Answer #6

Actually, there"s a function that returns exactly what you want

import os
print(os.path.basename(your_path))

WARNING: When os.path.basename() is used on a POSIX system to get the base name from a Windows styled path (e.g. "C:\my\file.txt"), the entire path will be returned.

Example below from interactive python shell running on a Linux host:

Python 3.8.2 (default, Mar 13 2020, 10:14:16)
[GCC 9.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import os
>>> filepath = "C:\my\path\to\file.txt" # A Windows style file path.
>>> os.path.basename(filepath)
"C:\my\path\to\file.txt"

Answer #7

Using os.path.split or os.path.basename as others suggest won"t work in all cases: if you"re running the script on Linux and attempt to process a classic windows-style path, it will fail.

Windows paths can use either backslash or forward slash as path separator. Therefore, the ntpath module (which is equivalent to os.path when running on windows) will work for all(1) paths on all platforms.

import ntpath
ntpath.basename("a/b/c")

Of course, if the file ends with a slash, the basename will be empty, so make your own function to deal with it:

def path_leaf(path):
    head, tail = ntpath.split(path)
    return tail or ntpath.basename(head)

Verification:

>>> paths = ["a/b/c/", "a/b/c", "\a\b\c", "\a\b\c\", "a\b\c", 
...     "a/b/../../a/b/c/", "a/b/../../a/b/c"]
>>> [path_leaf(path) for path in paths]
["c", "c", "c", "c", "c", "c", "c"]


(1) There"s one caveat: Linux filenames may contain backslashes. So on linux, r"a/bc" always refers to the file bc in the a folder, while on Windows, it always refers to the c file in the b subfolder of the a folder. So when both forward and backward slashes are used in a path, you need to know the associated platform to be able to interpret it correctly. In practice it"s usually safe to assume it"s a windows path since backslashes are seldom used in Linux filenames, but keep this in mind when you code so you don"t create accidental security holes.

Answer #8

copy2(src,dst) is often more useful than copyfile(src,dst) because:

  • it allows dst to be a directory (instead of the complete target filename), in which case the basename of src is used for creating the new file;
  • it preserves the original modification and access info (mtime and atime) in the file metadata (however, this comes with a slight overhead).

Here is a short example:

import shutil
shutil.copy2("/src/dir/file.ext", "/dst/dir/newname.ext") # complete target filename given
shutil.copy2("/src/file.ext", "/dst/dir") # target filename is /dst/dir/file.ext

Answer #9

You can use __file__ to get the name of the current file. When used in the main module, this is the name of the script that was originally invoked.

If you want to omit the directory part (which might be present), you can use os.path.basename(__file__).

Answer #10

You can make your own with:

>>> import os
>>> base=os.path.basename("/root/dir/sub/file.ext")
>>> base
"file.ext"
>>> os.path.splitext(base)
("file", ".ext")
>>> os.path.splitext(base)[0]
"file"

Important note: If there is more than one . in the filename, only the last one is removed. For example:

/root/dir/sub/file.ext.zip -> file.ext

/root/dir/sub/file.ext.tar.gz -> file.ext.tar

See below for other answers that address that.