Could you tell me how can I read a file that is inside my Python package?
A package that I load has a number of templates (text files used as strings) that I want to load from within the program. But how do I specify the path to such file?
Imagine I want to read a file from:
package emplates emp_file
Some kind of path manipulation? Package base path tracking?
importlib.resourcesmodule as explained in the method no 2, below.
setuptools is not recommended anymore because the new method:
setuptools), but rely on Python"s standard-library alone.
I kept the traditional listed first, to explain the differences with the new method when porting existing code (porting also explained here).
Let"s assume your templates are located in a folder nested inside your module"s package:
<your-package> +--<module-asking-the-file> +--templates/ +--temp_file <-- We want this file.
Note 1: For sure, we should NOT fiddle with the
__file__attribute (e.g. code will break when served from a zip).
Note 2: If you are building this package, remember to declatre your data files as
import pkg_resources # Could be any dot-separated package/module name or a "Requirement" resource_package = __name__ resource_path = "/".join(("templates", "temp_file")) # Do not use os.path.join() template = pkg_resources.resource_string(resource_package, resource_path) # or for a file-like stream: template = pkg_resources.resource_stream(resource_package, resource_path)
This will read data even if your distribution is zipped, so you may set
setup.py, and/or use the long-awaited
zipapppacker from python-3.5 to create self-contained distributions.
Remember to add
setuptoolsinto your run-time requirements (e.g. in install_requires').
... and notice that according to the Setuptools/
pkg_resources docs, you should not use
Basic Resource Access
Note that resource names must be
/-separated paths and cannot be absolute (i.e. no leading
/) or contain relative names like "
..". Do not use
os.pathroutines to manipulate resource paths, as they are not filesystem paths.
Use the standard library"s
importlib.resources module which is more efficient than
try: import importlib.resources as pkg_resources except ImportError: # Try backported to PY<37 'importlib_resources'. import importlib_resources as pkg_resources from . import templates # relative-import the *package* containing the templates template = pkg_resources.read_text(templates, "temp_file") # or for a file-like stream: template = pkg_resources.open_text(templates, "temp_file")
Regarding the function
packagecan be either a string or a module.
resourceis NOT a path anymore, but just the filename of the resource to open, within an existing package; it may not contain path separators and it may not have sub-resources (i.e. it cannot be a directory).
For the example asked in the question, we must now:
<your_package>/templates/into a proper package, by creating an empty
__init__.pyfile in it,
importstatement (no more parsing package/module names),
resource_name = "temp_file"(no path).
- To access a file inside the current module, set the package argument to
pkg_resources.read_text(__package__, "temp_file")(thanks to @ben-mares).
- Things become interesting when an actual filename is asked with
path(), since now context-managers are used for temporarily-created files (read this).
- Add the backported library, conditionally for older Pythons, with
install_requires=[" importlib_resources ; python_version<"3.7""](check this if you package your project with
- Remember to remove
setuptoolslibrary from your runtime-requirements, if you migrated from the traditional method.
- Remember to customize
MANIFESTto include any static files.
- You may also set
Before you can even worry about reading resource files, the first step is to make sure that the data files are getting packaged into your distribution in the first place - it is easy to read them directly from the source tree, but the important part is making sure these resource files are accessible from code within an installed package.
Structure your project like this, putting data files into a subdirectory within the package:
. ‚îú‚îÄ‚îÄ package ‚îÇ¬†¬† ‚îú‚îÄ‚îÄ __init__.py ‚îÇ¬†¬† ‚îú‚îÄ‚îÄ templates ‚îÇ¬†¬† ‚îÇ¬†¬† ‚îî‚îÄ‚îÄ temp_file ‚îÇ¬†¬† ‚îú‚îÄ‚îÄ mymodule1.py ‚îÇ¬†¬† ‚îî‚îÄ‚îÄ mymodule2.py ‚îú‚îÄ‚îÄ README.rst ‚îú‚îÄ‚îÄ MANIFEST.in ‚îî‚îÄ‚îÄ setup.py
You should pass
include_package_data=True in the
setup() call. The manifest file is only needed if you want to use setuptools/distutils and build source distributions. To make sure the
templates/temp_file gets packaged for this example project structure, add a line like this into the manifest file:
recursive-include package *
Historical cruft note: Using a manifest file is not needed for modern build backends such as flit, poetry, which will include the package data files by default. So, if you"re using
pyproject.toml and you don"t have a
setup.py file then you can ignore all the stuff about
Now, with packaging out of the way, onto the reading part...
Use standard library
pkgutil APIs. It"s going to look like this in library code:
# within package/mymodule1.py, for example import pkgutil data = pkgutil.get_data(__name__, "templates/temp_file")
It works in zips. It works on Python 2 and Python 3. It doesn"t require third-party dependencies. I"m not really aware of any downsides (if you are, then please comment on the answer).
This is currently the accepted answer. At best, it looks something like this:
from pathlib import Path resource_path = Path(__file__).parent / "templates" data = resource_path.joinpath("temp_file").read_bytes()
What"s wrong with that? The assumption that you have files and subdirectories available is not correct. This approach doesn"t work if executing code which is packed in a zip or a wheel, and it may be entirely out of the user"s control whether or not your package gets extracted to a filesystem at all.
This is described in the top-voted answer. It looks something like this:
from pkg_resources import resource_string data = resource_string(__name__, "templates/temp_file")
What"s wrong with that? It adds a runtime dependency on setuptools, which should preferably be an install time dependency only. Importing and using
pkg_resources can become really slow, as the code builds up a working set of all installed packages, even though you were only interested in your own package resources. That"s not a big deal at install time (since installation is once-off), but it"s ugly at runtime.
This is currently the recommendation in the top-voted answer. It"s a recent standard library addition (new in Python 3.7). It looks like this:
from importlib.resources import read_binary data = read_binary("package.templates", "temp_file")
What"s wrong with that? Well, unfortunately, it doesn"t work...yet. This is still an incomplete API, using
importlib.resources will require you to add an empty file
templates/__init__.py in order that the data files will reside within a sub-package rather than in a subdirectory. It will also expose the
package/templates subdirectory as an importable
package.templates sub-package in its own right. If that"s not a big deal and it doesn"t bother you, then you can go ahead and add the
__init__.py file there and use the import system to access resources. However, while you"re at it you may as well make it into a
my_resources.py file instead, and just define some bytes or string variables in the module, then import them in Python code. It"s the import system doing the heavy lifting here either way.
This has not been mentioned in any other answers yet, but
importlib_resources is more than a simple backport of the Python 3.7+
importlib.resources code. It has traversable APIs which you can use like this:
import importlib_resources my_resources = importlib_resources.files("package") data = (my_resources / "templates" / "temp_file").read_bytes()
This works on Python 2 and 3, it works in zips, and it doesn"t require spurious
__init__.py files to be added in resource subdirectories. The only downside vs
pkgutil that I can see is that these new APIs haven"t yet arrived in stdlib, so there is still a third-party dependency. Newer APIs from
importlib_resources should arrive to stdlib
importlib.resources in Python 3.9.
$ pip install resources-example $ resources-example
See https://github.com/wimglenn/resources-example for more info.
Google BigQuery: The Definitive Guide PDF download. Data Warehousing, Analytics, and Machine Learning at Scale, 1st Edition, 2019. Work with petabyte-scale datasets while building a collaborative a...
Systems programming provides the basis for global calculation. Developing performance-sensitive code requires a programming language that allows programmers to control the use of memory, processor tim...
I remember one day, when I was about 15, my little cousin had come over. Being the good elder sister that I was, I spent time with her outside in the garden, while all the adults were inside having a ...
Python Data Science Handbook: Essential Tools for Working with Data - PDF, 1st Edition For many researchers, Python is a first-class tool, primarily because of its libraries for storing, manipulati...