numpy.around () in Python

around | NumPy | Python Methods and Functions

Parameters :

array: [array_like] Input array.
decimal: [int, optional] Decimal places we want to round off.

Default = 0. In case of -ve decimal, it specifies the n0. of positions to the left of the decimal point.

out: [optional] Output resulted array

Return:

 An array with all array elements being rounded off, having same type as input 

Code # 1: Work

Output:

 Input array: [0.5, 1.5, 2.5, 3.5, 4.5, 10.1] Rounded values : [0. 2. 2. 4. 4. 10.] Input array: [0.53, 1.54, 0.71] Rounded values: [1. 2. 1.] Input array: [0.5538, 1.33354, 0.71445] Rounded values: [ 0.554 1.334 0.714] 

Code 2: Work

# Python program explaining
# around () function

 

import numpy as np

 

in_array = [. 5 , 1.5 , 2.5 , 3. 5 , 4.5 , 10.1 ]

print ( "Input array:" , in_array)

  

round_off_values ​​ = np.around ( in_array)

print ( "Rounded values : " , round_off_values)

  

 

in_array = [. 53 , 1.54 ,. 71 < code class = "plain">]

print ( " Input array: " , in_array)

  

round_off_values ​​ = np.around (in_array)

print ( "Rounded values:" , round_off_values)

 

in_array = [. 5538 , 1.33354 ,. 71445 ]

print ( "Input array:" , i n_array)

 

round_off_values ​​ = np.around (in_array, decimals = 3 )

print ( "Rounded values:" , round_off_values)

# Python program explaining
# around () function

  

import numpy as np

 

in_array = [ 1 , 4 , 7 , 9 , 12 ]

print ( "Input array:" , in_array)

 

round_off_values ​​ = np.around (in_array)

print ( "Rounded values:" , round_off_values)

  

 

in_array = [ 133 , 344 , 437 , 449 , 12 ]

print ( "Input array:" , in_array)

 

round_off_values ​​ = np.around (in_array, decimals = - 2 )

print ( "Rounded values ​​upto 2:" , round_off_values)

 

in_array = [ 133 , 344 , 437 , 449 , 12 ]

print ( "Input array:" , in_array)

 

round_off_values ​​ = np.around (in_array, decimals = - 3 )

print ( "Rounded values ​​upto 3:" , round_off_values)

Output:

 Input array: [1, 4, 7, 9, 12] Rounded values: [1 4 7 9 12] Input array: [133, 344, 437 , 449, 12] Rounded values ​​upto 2: [100 300 400 400 0] Input array: [133, 344, 437, 449, 12] Rounded values ​​upto 3: [0 0 0 0 0] 

Links:
https: / /docs.scipy.org/doc/numpy-dev/reference/generated/numpy.around.html#numpy.around
,





numpy.around () in Python: StackOverflow Questions

Removing white space around a saved image in matplotlib

I need to take an image and save it after some process. The figure looks fine when I display it, but after saving the figure, I got some white space around the saved image. I have tried the "tight" option for savefig method, did not work either. The code:

  import matplotlib.image as mpimg
  import matplotlib.pyplot as plt

  fig = plt.figure(1)
  img = mpimg.imread(path)
  plt.imshow(img)
  ax=fig.add_subplot(1,1,1)

  extent = ax.get_window_extent().transformed(fig.dpi_scale_trans.inverted())
  plt.savefig("1.png", bbox_inches=extent)

  plt.axis("off") 
  plt.show()

I am trying to draw a basic graph by using NetworkX on a figure and save it. I realized that without a graph it works, but when added a graph I get white space around the saved image;

import matplotlib.image as mpimg
import matplotlib.pyplot as plt
import networkx as nx

G = nx.Graph()
G.add_node(1)
G.add_node(2)
G.add_node(3)
G.add_edge(1,3)
G.add_edge(1,2)
pos = {1:[100,120], 2:[200,300], 3:[50,75]}

fig = plt.figure(1)
img = mpimg.imread("image.jpg")
plt.imshow(img)
ax=fig.add_subplot(1,1,1)

nx.draw(G, pos=pos)

extent = ax.get_window_extent().transformed(fig.dpi_scale_trans.inverted())
plt.savefig("1.png", bbox_inches = extent)

plt.axis("off") 
plt.show()

PEP 8, why no spaces around "=" in keyword argument or a default parameter value?

Why does PEP 8 recommend not having spaces around = in a keyword argument or a default parameter value?

Is this inconsistent with recommending spaces around every other occurrence of = in Python code?

How is:

func(1, 2, very_long_variable_name=another_very_long_variable_name)

better than:

func(1, 2, very_long_variable_name = another_very_long_variable_name)

Any links to discussion/explanation by Python"s BDFL will be appreciated.

Mind, this question is more about kwargs than default values, i just used the phrasing from PEP 8.

I"m not soliciting opinions. I"m asking for reasons behind this decision. It"s more like asking why would I use { on the same line as if statement in a C program, not whether I should use it or not.

Answer #1

Since this question was asked in 2010, there has been real simplification in how to do simple multithreading with Python with map and pool.

The code below comes from an article/blog post that you should definitely check out (no affiliation) - Parallelism in one line: A Better Model for Day to Day Threading Tasks. I"ll summarize below - it ends up being just a few lines of code:

from multiprocessing.dummy import Pool as ThreadPool
pool = ThreadPool(4)
results = pool.map(my_function, my_array)

Which is the multithreaded version of:

results = []
for item in my_array:
    results.append(my_function(item))

Description

Map is a cool little function, and the key to easily injecting parallelism into your Python code. For those unfamiliar, map is something lifted from functional languages like Lisp. It is a function which maps another function over a sequence.

Map handles the iteration over the sequence for us, applies the function, and stores all of the results in a handy list at the end.

Enter image description here


Implementation

Parallel versions of the map function are provided by two libraries:multiprocessing, and also its little known, but equally fantastic step child:multiprocessing.dummy.

multiprocessing.dummy is exactly the same as multiprocessing module, but uses threads instead (an important distinction - use multiple processes for CPU-intensive tasks; threads for (and during) I/O):

multiprocessing.dummy replicates the API of multiprocessing, but is no more than a wrapper around the threading module.

import urllib2
from multiprocessing.dummy import Pool as ThreadPool

urls = [
  "http://www.python.org",
  "http://www.python.org/about/",
  "http://www.onlamp.com/pub/a/python/2003/04/17/metaclasses.html",
  "http://www.python.org/doc/",
  "http://www.python.org/download/",
  "http://www.python.org/getit/",
  "http://www.python.org/community/",
  "https://wiki.python.org/moin/",
]

# Make the Pool of workers
pool = ThreadPool(4)

# Open the URLs in their own threads
# and return the results
results = pool.map(urllib2.urlopen, urls)

# Close the pool and wait for the work to finish
pool.close()
pool.join()

And the timing results:

Single thread:   14.4 seconds
       4 Pool:   3.1 seconds
       8 Pool:   1.4 seconds
      13 Pool:   1.3 seconds

Passing multiple arguments (works like this only in Python 3.3 and later):

To pass multiple arrays:

results = pool.starmap(function, zip(list_a, list_b))

Or to pass a constant and an array:

results = pool.starmap(function, zip(itertools.repeat(constant), list_a))

If you are using an earlier version of Python, you can pass multiple arguments via this workaround).

(Thanks to user136036 for the helpful comment.)

Answer #2

How to iterate over rows in a DataFrame in Pandas?

Answer: DON"T*!

Iteration in Pandas is an anti-pattern and is something you should only do when you have exhausted every other option. You should not use any function with "iter" in its name for more than a few thousand rows or you will have to get used to a lot of waiting.

Do you want to print a DataFrame? Use DataFrame.to_string().

Do you want to compute something? In that case, search for methods in this order (list modified from here):

  1. Vectorization
  2. Cython routines
  3. List Comprehensions (vanilla for loop)
  4. DataFrame.apply(): i)  Reductions that can be performed in Cython, ii) Iteration in Python space
  5. DataFrame.itertuples() and iteritems()
  6. DataFrame.iterrows()

iterrows and itertuples (both receiving many votes in answers to this question) should be used in very rare circumstances, such as generating row objects/nametuples for sequential processing, which is really the only thing these functions are useful for.

Appeal to Authority

The documentation page on iteration has a huge red warning box that says:

Iterating through pandas objects is generally slow. In many cases, iterating manually over the rows is not needed [...].

* It"s actually a little more complicated than "don"t". df.iterrows() is the correct answer to this question, but "vectorize your ops" is the better one. I will concede that there are circumstances where iteration cannot be avoided (for example, some operations where the result depends on the value computed for the previous row). However, it takes some familiarity with the library to know when. If you"re not sure whether you need an iterative solution, you probably don"t. PS: To know more about my rationale for writing this answer, skip to the very bottom.


Faster than Looping: Vectorization, Cython

A good number of basic operations and computations are "vectorised" by pandas (either through NumPy, or through Cythonized functions). This includes arithmetic, comparisons, (most) reductions, reshaping (such as pivoting), joins, and groupby operations. Look through the documentation on Essential Basic Functionality to find a suitable vectorised method for your problem.

If none exists, feel free to write your own using custom Cython extensions.


Next Best Thing: List Comprehensions*

List comprehensions should be your next port of call if 1) there is no vectorized solution available, 2) performance is important, but not important enough to go through the hassle of cythonizing your code, and 3) you"re trying to perform elementwise transformation on your code. There is a good amount of evidence to suggest that list comprehensions are sufficiently fast (and even sometimes faster) for many common Pandas tasks.

The formula is simple,

# Iterating over one column - `f` is some function that processes your data
result = [f(x) for x in df["col"]]
# Iterating over two columns, use `zip`
result = [f(x, y) for x, y in zip(df["col1"], df["col2"])]
# Iterating over multiple columns - same data type
result = [f(row[0], ..., row[n]) for row in df[["col1", ...,"coln"]].to_numpy()]
# Iterating over multiple columns - differing data type
result = [f(row[0], ..., row[n]) for row in zip(df["col1"], ..., df["coln"])]

If you can encapsulate your business logic into a function, you can use a list comprehension that calls it. You can make arbitrarily complex things work through the simplicity and speed of raw Python code.

Caveats

List comprehensions assume that your data is easy to work with - what that means is your data types are consistent and you don"t have NaNs, but this cannot always be guaranteed.

  1. The first one is more obvious, but when dealing with NaNs, prefer in-built pandas methods if they exist (because they have much better corner-case handling logic), or ensure your business logic includes appropriate NaN handling logic.
  2. When dealing with mixed data types you should iterate over zip(df["A"], df["B"], ...) instead of df[["A", "B"]].to_numpy() as the latter implicitly upcasts data to the most common type. As an example if A is numeric and B is string, to_numpy() will cast the entire array to string, which may not be what you want. Fortunately zipping your columns together is the most straightforward workaround to this.

*Your mileage may vary for the reasons outlined in the Caveats section above.


An Obvious Example

Let"s demonstrate the difference with a simple example of adding two pandas columns A + B. This is a vectorizable operaton, so it will be easy to contrast the performance of the methods discussed above.

Benchmarking code, for your reference. The line at the bottom measures a function written in numpandas, a style of Pandas that mixes heavily with NumPy to squeeze out maximum performance. Writing numpandas code should be avoided unless you know what you"re doing. Stick to the API where you can (i.e., prefer vec over vec_numpy).

I should mention, however, that it isn"t always this cut and dry. Sometimes the answer to "what is the best method for an operation" is "it depends on your data". My advice is to test out different approaches on your data before settling on one.


Further Reading

* Pandas string methods are "vectorized" in the sense that they are specified on the series but operate on each element. The underlying mechanisms are still iterative, because string operations are inherently hard to vectorize.


Why I Wrote this Answer

A common trend I notice from new users is to ask questions of the form "How can I iterate over my df to do X?". Showing code that calls iterrows() while doing something inside a for loop. Here is why. A new user to the library who has not been introduced to the concept of vectorization will likely envision the code that solves their problem as iterating over their data to do something. Not knowing how to iterate over a DataFrame, the first thing they do is Google it and end up here, at this question. They then see the accepted answer telling them how to, and they close their eyes and run this code without ever first questioning if iteration is not the right thing to do.

The aim of this answer is to help new users understand that iteration is not necessarily the solution to every problem, and that better, faster and more idiomatic solutions could exist, and that it is worth investing time in exploring them. I"m not trying to start a war of iteration vs. vectorization, but I want new users to be informed when developing solutions to their problems with this library.

Answer #3

-----> pip install gensim config --global http.sslVerify false

Just install any package with the "config --global http.sslVerify false" statement

You can ignore SSL errors by setting pypi.org and files.pythonhosted.org as trusted hosts.

$ pip install --trusted-host pypi.org --trusted-host files.pythonhosted.org <package_name>

Note: Sometime during April 2018, the Python Package Index was migrated from pypi.python.org to pypi.org. This means "trusted-host" commands using the old domain no longer work.

Permanent Fix

Since the release of pip 10.0, you should be able to fix this permanently just by upgrading pip itself:

$ pip install --trusted-host pypi.org --trusted-host files.pythonhosted.org pip setuptools

Or by just reinstalling it to get the latest version:

$ curl https://bootstrap.pypa.io/get-pip.py -o get-pip.py

(… and then running get-pip.py with the relevant Python interpreter).

pip install <otherpackage> should just work after this. If not, then you will need to do more, as explained below.


You may want to add the trusted hosts and proxy to your config file.

pip.ini (Windows) or pip.conf (unix)

[global]
trusted-host = pypi.python.org
               pypi.org
               files.pythonhosted.org

Alternate Solutions (Less secure)

Most of the answers could pose a security issue.

Two of the workarounds that help in installing most of the python packages with ease would be:

  • Using easy_install: if you are really lazy and don"t want to waste much time, use easy_install <package_name>. Note that some packages won"t be found or will give small errors.
  • Using Wheel: download the Wheel of the python package and use the pip command pip install wheel_package_name.whl to install the package.

Answer #4

I tested most suggested solutions with perfplot (a pet project of mine, essentially a wrapper around timeit), and found

import functools
import operator
functools.reduce(operator.iconcat, a, [])

to be the fastest solution, both when many small lists and few long lists are concatenated. (operator.iadd is equally fast.)

enter image description here

enter image description here


Code to reproduce the plot:

import functools
import itertools
import numpy
import operator
import perfplot


def forfor(a):
    return [item for sublist in a for item in sublist]


def sum_brackets(a):
    return sum(a, [])


def functools_reduce(a):
    return functools.reduce(operator.concat, a)


def functools_reduce_iconcat(a):
    return functools.reduce(operator.iconcat, a, [])


def itertools_chain(a):
    return list(itertools.chain.from_iterable(a))


def numpy_flat(a):
    return list(numpy.array(a).flat)


def numpy_concatenate(a):
    return list(numpy.concatenate(a))


perfplot.show(
    setup=lambda n: [list(range(10))] * n,
    # setup=lambda n: [list(range(n))] * 10,
    kernels=[
        forfor,
        sum_brackets,
        functools_reduce,
        functools_reduce_iconcat,
        itertools_chain,
        numpy_flat,
        numpy_concatenate,
    ],
    n_range=[2 ** k for k in range(16)],
    xlabel="num lists (of length 10)",
    # xlabel="len lists (10 lists total)"
)

Answer #5

You can"t.

One workaround is to create clone a new environment and then remove the original one.

First, remember to deactivate your current environment. You can do this with the commands:

  • deactivate on Windows or
  • source deactivate on macOS/Linux.

Then:

conda create --name new_name --clone old_name
conda remove --name old_name --all # or its alias: `conda env remove --name old_name`

Notice there are several drawbacks of this method:

  1. It redownloads packages (you can use --offline flag to disable it)
  2. Time consumed on copying environment"s files
  3. Temporary double disk usage

There is an open issue requesting this feature.

Answer #6

Watch out for the parentheses. As has been pointed out above, in Python 3, assert is still a statement, so by analogy with print(..), one may extrapolate the same to assert(..) or raise(..) but you shouldn"t.

This is wrong:

assert(2 + 2 == 5, "Houston we"ve got a problem")

This is correct:

assert 2 + 2 == 5, "Houston we"ve got a problem"

The reason the first one will not work is that bool( (False, "Houston we"ve got a problem") ) evaluates to True.

In the statement assert(False), these are just redundant parentheses around False, which evaluate to their contents. But with assert(False,) the parentheses are now a tuple, and a non-empty tuple evaluates to True in a boolean context.

Answer #7

Using plt.rcParams

There is also this workaround in case you want to change the size without using the figure environment. So in case you are using plt.plot() for example, you can set a tuple with width and height.

import matplotlib.pyplot as plt
plt.rcParams["figure.figsize"] = (20,3)

This is very useful when you plot inline (e.g., with IPython Notebook). As asmaier noticed, it is preferable to not put this statement in the same cell of the imports statements.

To reset the global figure size back to default for subsequent plots:

plt.rcParams["figure.figsize"] = plt.rcParamsDefault["figure.figsize"]

Conversion to cm

The figsize tuple accepts inches, so if you want to set it in centimetres you have to divide them by 2.54. Have a look at this question.

Answer #8

df.to_numpy() is better than df.values, here"s why.*

It"s time to deprecate your usage of values and as_matrix().

pandas v0.24.0 introduced two new methods for obtaining NumPy arrays from pandas objects:

  1. to_numpy(), which is defined on Index, Series, and DataFrame objects, and
  2. array, which is defined on Index and Series objects only.

If you visit the v0.24 docs for .values, you will see a big red warning that says:

Warning: We recommend using DataFrame.to_numpy() instead.

See this section of the v0.24.0 release notes, and this answer for more information.

* - to_numpy() is my recommended method for any production code that needs to run reliably for many versions into the future. However if you"re just making a scratchpad in jupyter or the terminal, using .values to save a few milliseconds of typing is a permissable exception. You can always add the fit n finish later.



Towards Better Consistency: to_numpy()

In the spirit of better consistency throughout the API, a new method to_numpy has been introduced to extract the underlying NumPy array from DataFrames.

# Setup
df = pd.DataFrame(data={"A": [1, 2, 3], "B": [4, 5, 6], "C": [7, 8, 9]}, 
                  index=["a", "b", "c"])

# Convert the entire DataFrame
df.to_numpy()
# array([[1, 4, 7],
#        [2, 5, 8],
#        [3, 6, 9]])

# Convert specific columns
df[["A", "C"]].to_numpy()
# array([[1, 7],
#        [2, 8],
#        [3, 9]])

As mentioned above, this method is also defined on Index and Series objects (see here).

df.index.to_numpy()
# array(["a", "b", "c"], dtype=object)

df["A"].to_numpy()
#  array([1, 2, 3])

By default, a view is returned, so any modifications made will affect the original.

v = df.to_numpy()
v[0, 0] = -1
 
df
   A  B  C
a -1  4  7
b  2  5  8
c  3  6  9

If you need a copy instead, use to_numpy(copy=True).


pandas >= 1.0 update for ExtensionTypes

If you"re using pandas 1.x, chances are you"ll be dealing with extension types a lot more. You"ll have to be a little more careful that these extension types are correctly converted.

a = pd.array([1, 2, None], dtype="Int64")                                  
a                                                                          

<IntegerArray>
[1, 2, <NA>]
Length: 3, dtype: Int64 

# Wrong
a.to_numpy()                                                               
# array([1, 2, <NA>], dtype=object)  # yuck, objects

# Correct
a.to_numpy(dtype="float", na_value=np.nan)                                 
# array([ 1.,  2., nan])

# Also correct
a.to_numpy(dtype="int", na_value=-1)
# array([ 1,  2, -1])

This is called out in the docs.


If you need the dtypes in the result...

As shown in another answer, DataFrame.to_records is a good way to do this.

df.to_records()
# rec.array([("a", 1, 4, 7), ("b", 2, 5, 8), ("c", 3, 6, 9)],
#           dtype=[("index", "O"), ("A", "<i8"), ("B", "<i8"), ("C", "<i8")])

This cannot be done with to_numpy, unfortunately. However, as an alternative, you can use np.rec.fromrecords:

v = df.reset_index()
np.rec.fromrecords(v, names=v.columns.tolist())
# rec.array([("a", 1, 4, 7), ("b", 2, 5, 8), ("c", 3, 6, 9)],
#           dtype=[("index", "<U1"), ("A", "<i8"), ("B", "<i8"), ("C", "<i8")])

Performance wise, it"s nearly the same (actually, using rec.fromrecords is a bit faster).

df2 = pd.concat([df] * 10000)

%timeit df2.to_records()
%%timeit
v = df2.reset_index()
np.rec.fromrecords(v, names=v.columns.tolist())

12.9 ms ± 511 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
9.56 ms ± 291 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)


Rationale for Adding a New Method

to_numpy() (in addition to array) was added as a result of discussions under two GitHub issues GH19954 and GH23623.

Specifically, the docs mention the rationale:

[...] with .values it was unclear whether the returned value would be the actual array, some transformation of it, or one of pandas custom arrays (like Categorical). For example, with PeriodIndex, .values generates a new ndarray of period objects each time. [...]

to_numpy aims to improve the consistency of the API, which is a major step in the right direction. .values will not be deprecated in the current version, but I expect this may happen at some point in the future, so I would urge users to migrate towards the newer API, as soon as you can.



Critique of Other Solutions

DataFrame.values has inconsistent behaviour, as already noted.

DataFrame.get_values() is simply a wrapper around DataFrame.values, so everything said above applies.

DataFrame.as_matrix() is deprecated now, do NOT use!

Answer #9

Explanation

From PEP 328

Relative imports use a module"s __name__ attribute to determine that module"s position in the package hierarchy. If the module"s name does not contain any package information (e.g. it is set to "__main__") then relative imports are resolved as if the module were a top level module, regardless of where the module is actually located on the file system.

At some point PEP 338 conflicted with PEP 328:

... relative imports rely on __name__ to determine the current module"s position in the package hierarchy. In a main module, the value of __name__ is always "__main__", so explicit relative imports will always fail (as they only work for a module inside a package)

and to address the issue, PEP 366 introduced the top level variable __package__:

By adding a new module level attribute, this PEP allows relative imports to work automatically if the module is executed using the -m switch. A small amount of boilerplate in the module itself will allow the relative imports to work when the file is executed by name. [...] When it [the attribute] is present, relative imports will be based on this attribute rather than the module __name__ attribute. [...] When the main module is specified by its filename, then the __package__ attribute will be set to None. [...] When the import system encounters an explicit relative import in a module without __package__ set (or with it set to None), it will calculate and store the correct value (__name__.rpartition(".")[0] for normal modules and __name__ for package initialisation modules)

(emphasis mine)

If the __name__ is "__main__", __name__.rpartition(".")[0] returns empty string. This is why there"s empty string literal in the error description:

SystemError: Parent module "" not loaded, cannot perform relative import

The relevant part of the CPython"s PyImport_ImportModuleLevelObject function:

if (PyDict_GetItem(interp->modules, package) == NULL) {
    PyErr_Format(PyExc_SystemError,
            "Parent module %R not loaded, cannot perform relative "
            "import", package);
    goto error;
}

CPython raises this exception if it was unable to find package (the name of the package) in interp->modules (accessible as sys.modules). Since sys.modules is "a dictionary that maps module names to modules which have already been loaded", it"s now clear that the parent module must be explicitly absolute-imported before performing relative import.

Note: The patch from the issue 18018 has added another if block, which will be executed before the code above:

if (PyUnicode_CompareWithASCIIString(package, "") == 0) {
    PyErr_SetString(PyExc_ImportError,
            "attempted relative import with no known parent package");
    goto error;
} /* else if (PyDict_GetItem(interp->modules, package) == NULL) {
    ...
*/

If package (same as above) is empty string, the error message will be

ImportError: attempted relative import with no known parent package

However, you will only see this in Python 3.6 or newer.

Solution #1: Run your script using -m

Consider a directory (which is a Python package):

.
├── package
│   ├── __init__.py
│   ├── module.py
│   └── standalone.py

All of the files in package begin with the same 2 lines of code:

from pathlib import Path
print("Running" if __name__ == "__main__" else "Importing", Path(__file__).resolve())

I"m including these two lines only to make the order of operations obvious. We can ignore them completely, since they don"t affect the execution.

__init__.py and module.py contain only those two lines (i.e., they are effectively empty).

standalone.py additionally attempts to import module.py via relative import:

from . import module  # explicit relative import

We"re well aware that /path/to/python/interpreter package/standalone.py will fail. However, we can run the module with the -m command line option that will "search sys.path for the named module and execute its contents as the __main__ module":

[email protected]:~$ python3 -i -m package.standalone
Importing /home/vaultah/package/__init__.py
Running /home/vaultah/package/standalone.py
Importing /home/vaultah/package/module.py
>>> __file__
"/home/vaultah/package/standalone.py"
>>> __package__
"package"
>>> # The __package__ has been correctly set and module.py has been imported.
... # What"s inside sys.modules?
... import sys
>>> sys.modules["__main__"]
<module "package.standalone" from "/home/vaultah/package/standalone.py">
>>> sys.modules["package.module"]
<module "package.module" from "/home/vaultah/package/module.py">
>>> sys.modules["package"]
<module "package" from "/home/vaultah/package/__init__.py">

-m does all the importing stuff for you and automatically sets __package__, but you can do that yourself in the

Solution #2: Set __package__ manually

Please treat it as a proof of concept rather than an actual solution. It isn"t well-suited for use in real-world code.

PEP 366 has a workaround to this problem, however, it"s incomplete, because setting __package__ alone is not enough. You"re going to need to import at least N preceding packages in the module hierarchy, where N is the number of parent directories (relative to the directory of the script) that will be searched for the module being imported.

Thus,

  1. Add the parent directory of the Nth predecessor of the current module to sys.path

  2. Remove the current file"s directory from sys.path

  3. Import the parent module of the current module using its fully-qualified name

  4. Set __package__ to the fully-qualified name from 2

  5. Perform the relative import

I"ll borrow files from the Solution #1 and add some more subpackages:

package
├── __init__.py
├── module.py
└── subpackage
    ├── __init__.py
    └── subsubpackage
        ├── __init__.py
        └── standalone.py

This time standalone.py will import module.py from the package package using the following relative import

from ... import module  # N = 3

We"ll need to precede that line with the boilerplate code, to make it work.

import sys
from pathlib import Path

if __name__ == "__main__" and __package__ is None:
    file = Path(__file__).resolve()
    parent, top = file.parent, file.parents[3]

    sys.path.append(str(top))
    try:
        sys.path.remove(str(parent))
    except ValueError: # Already removed
        pass

    import package.subpackage.subsubpackage
    __package__ = "package.subpackage.subsubpackage"

from ... import module # N = 3

It allows us to execute standalone.py by filename:

[email protected]:~$ python3 package/subpackage/subsubpackage/standalone.py
Running /home/vaultah/package/subpackage/subsubpackage/standalone.py
Importing /home/vaultah/package/__init__.py
Importing /home/vaultah/package/subpackage/__init__.py
Importing /home/vaultah/package/subpackage/subsubpackage/__init__.py
Importing /home/vaultah/package/module.py

A more general solution wrapped in a function can be found here. Example usage:

if __name__ == "__main__" and __package__ is None:
    import_parents(level=3) # N = 3

from ... import module
from ...module.submodule import thing

Solution #3: Use absolute imports and setuptools

The steps are -

  1. Replace explicit relative imports with equivalent absolute imports

  2. Install package to make it importable

For instance, the directory structure may be as follows

.
├── project
│   ├── package
│   │   ├── __init__.py
│   │   ├── module.py
│   │   └── standalone.py
│   └── setup.py

where setup.py is

from setuptools import setup, find_packages
setup(
    name = "your_package_name",
    packages = find_packages(),
)

The rest of the files were borrowed from the Solution #1.

Installation will allow you to import the package regardless of your working directory (assuming there"ll be no naming issues).

We can modify standalone.py to use this advantage (step 1):

from package import module  # absolute import

Change your working directory to project and run /path/to/python/interpreter setup.py install --user (--user installs the package in your site-packages directory) (step 2):

[email protected]:~$ cd project
[email protected]:~/project$ python3 setup.py install --user

Let"s verify that it"s now possible to run standalone.py as a script:

[email protected]:~/project$ python3 -i package/standalone.py
Running /home/vaultah/project/package/standalone.py
Importing /home/vaultah/.local/lib/python3.6/site-packages/your_package_name-0.0.0-py3.6.egg/package/__init__.py
Importing /home/vaultah/.local/lib/python3.6/site-packages/your_package_name-0.0.0-py3.6.egg/package/module.py
>>> module
<module "package.module" from "/home/vaultah/.local/lib/python3.6/site-packages/your_package_name-0.0.0-py3.6.egg/package/module.py">
>>> import sys
>>> sys.modules["package"]
<module "package" from "/home/vaultah/.local/lib/python3.6/site-packages/your_package_name-0.0.0-py3.6.egg/package/__init__.py">
>>> sys.modules["package.module"]
<module "package.module" from "/home/vaultah/.local/lib/python3.6/site-packages/your_package_name-0.0.0-py3.6.egg/package/module.py">

Note: If you decide to go down this route, you"d be better off using virtual environments to install packages in isolation.

Solution #4: Use absolute imports and some boilerplate code

Frankly, the installation is not necessary - you could add some boilerplate code to your script to make absolute imports work.

I"m going to borrow files from Solution #1 and change standalone.py:

  1. Add the parent directory of package to sys.path before attempting to import anything from package using absolute imports:

    import sys
    from pathlib import Path # if you haven"t already done so
    file = Path(__file__).resolve()
    parent, root = file.parent, file.parents[1]
    sys.path.append(str(root))
    
    # Additionally remove the current file"s directory from sys.path
    try:
        sys.path.remove(str(parent))
    except ValueError: # Already removed
        pass
    
  2. Replace the relative import by the absolute import:

    from package import module  # absolute import
    

standalone.py runs without problems:

[email protected]:~$ python3 -i package/standalone.py
Running /home/vaultah/package/standalone.py
Importing /home/vaultah/package/__init__.py
Importing /home/vaultah/package/module.py
>>> module
<module "package.module" from "/home/vaultah/package/module.py">
>>> import sys
>>> sys.modules["package"]
<module "package" from "/home/vaultah/package/__init__.py">
>>> sys.modules["package.module"]
<module "package.module" from "/home/vaultah/package/module.py">

I feel that I should warn you: try not to do this, especially if your project has a complex structure.


As a side note, PEP 8 recommends the use of absolute imports, but states that in some scenarios explicit relative imports are acceptable:

Absolute imports are recommended, as they are usually more readable and tend to be better behaved (or at least give better error messages). [...] However, explicit relative imports are an acceptable alternative to absolute imports, especially when dealing with complex package layouts where using absolute imports would be unnecessarily verbose.

Answer #10

What"s the pythonic way to use getters and setters?

The "Pythonic" way is not to use "getters" and "setters", but to use plain attributes, like the question demonstrates, and del for deleting (but the names are changed to protect the innocent... builtins):

value = "something"

obj.attribute = value  
value = obj.attribute
del obj.attribute

If later, you want to modify the setting and getting, you can do so without having to alter user code, by using the property decorator:

class Obj:
    """property demo"""
    #
    @property            # first decorate the getter method
    def attribute(self): # This getter method name is *the* name
        return self._attribute
    #
    @attribute.setter    # the property decorates with `.setter` now
    def attribute(self, value):   # name, e.g. "attribute", is the same
        self._attribute = value   # the "value" name isn"t special
    #
    @attribute.deleter     # decorate with `.deleter`
    def attribute(self):   # again, the method name is the same
        del self._attribute

(Each decorator usage copies and updates the prior property object, so note that you should use the same name for each set, get, and delete function/method.

After defining the above, the original setting, getting, and deleting code is the same:

obj = Obj()
obj.attribute = value  
the_value = obj.attribute
del obj.attribute

You should avoid this:

def set_property(property,value):  
def get_property(property):  

Firstly, the above doesn"t work, because you don"t provide an argument for the instance that the property would be set to (usually self), which would be:

class Obj:

    def set_property(self, property, value): # don"t do this
        ...
    def get_property(self, property):        # don"t do this either
        ...

Secondly, this duplicates the purpose of two special methods, __setattr__ and __getattr__.

Thirdly, we also have the setattr and getattr builtin functions.

setattr(object, "property_name", value)
getattr(object, "property_name", default_value)  # default is optional

The @property decorator is for creating getters and setters.

For example, we could modify the setting behavior to place restrictions the value being set:

class Protective(object):

    @property
    def protected_value(self):
        return self._protected_value

    @protected_value.setter
    def protected_value(self, value):
        if acceptable(value): # e.g. type or range check
            self._protected_value = value

In general, we want to avoid using property and just use direct attributes.

This is what is expected by users of Python. Following the rule of least-surprise, you should try to give your users what they expect unless you have a very compelling reason to the contrary.

Demonstration

For example, say we needed our object"s protected attribute to be an integer between 0 and 100 inclusive, and prevent its deletion, with appropriate messages to inform the user of its proper usage:

class Protective(object):
    """protected property demo"""
    #
    def __init__(self, start_protected_value=0):
        self.protected_value = start_protected_value
    # 
    @property
    def protected_value(self):
        return self._protected_value
    #
    @protected_value.setter
    def protected_value(self, value):
        if value != int(value):
            raise TypeError("protected_value must be an integer")
        if 0 <= value <= 100:
            self._protected_value = int(value)
        else:
            raise ValueError("protected_value must be " +
                             "between 0 and 100 inclusive")
    #
    @protected_value.deleter
    def protected_value(self):
        raise AttributeError("do not delete, protected_value can be set to 0")

(Note that __init__ refers to self.protected_value but the property methods refer to self._protected_value. This is so that __init__ uses the property through the public API, ensuring it is "protected".)

And usage:

>>> p1 = Protective(3)
>>> p1.protected_value
3
>>> p1 = Protective(5.0)
>>> p1.protected_value
5
>>> p2 = Protective(-5)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "<stdin>", line 3, in __init__
  File "<stdin>", line 15, in protected_value
ValueError: protectected_value must be between 0 and 100 inclusive
>>> p1.protected_value = 7.3
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "<stdin>", line 17, in protected_value
TypeError: protected_value must be an integer
>>> p1.protected_value = 101
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "<stdin>", line 15, in protected_value
ValueError: protectected_value must be between 0 and 100 inclusive
>>> del p1.protected_value
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "<stdin>", line 18, in protected_value
AttributeError: do not delete, protected_value can be set to 0

Do the names matter?

Yes they do. .setter and .deleter make copies of the original property. This allows subclasses to properly modify behavior without altering the behavior in the parent.

class Obj:
    """property demo"""
    #
    @property
    def get_only(self):
        return self._attribute
    #
    @get_only.setter
    def get_or_set(self, value):
        self._attribute = value
    #
    @get_or_set.deleter
    def get_set_or_delete(self):
        del self._attribute

Now for this to work, you have to use the respective names:

obj = Obj()
# obj.get_only = "value" # would error
obj.get_or_set = "value"  
obj.get_set_or_delete = "new value"
the_value = obj.get_only
del obj.get_set_or_delete
# del obj.get_or_set # would error

I"m not sure where this would be useful, but the use-case is if you want a get, set, and/or delete-only property. Probably best to stick to semantically same property having the same name.

Conclusion

Start with simple attributes.

If you later need functionality around the setting, getting, and deleting, you can add it with the property decorator.

Avoid functions named set_... and get_... - that"s what properties are for.

Get Solution for free from DataCamp guru