  # numpy.float_power () in Python

NumPy | power | Python Methods and Functions

Parameters:

`  arr1:  [array_like] Input array or object which works as base.  arr2:  [array_like] Input array or object which works as exponent.  out:  [ndarray, optional] Output array with same dimensions as Input array, placed with result.  ** kwargs:  Allows you to pass keyword variable length of argument to a function. It is used when we want to handle named argument in a function.  where:  [array_like, optional] True value means to calculate the universal functions (ufunc) at that position, False value means to leave the value in the output alone. `

Return:

` An array with elements of arr1 raised to exponents in arr2 `

Code 1: arr1 raised to arr2

 ` # Python program explaining ` ` # float_power () function ` ` import ` ` numpy as np ` ` `  ` # input_array ` ` arr1 ` ` = ` ` [` ` 2 ` `, ` ` 2 ` `, ` ` 2 ` `, ` ` 2 ` `, ` ` 2 ` `] ` ` arr2 ` = ` [` ` 2 ` `, 3 , 4 , 5 , 6 ] `` print ( "arr1 :" , arr1) print ( "arr1 :" , arr2)   # output_array out = np.float_power (arr1, arr2) print ( " Output array: " , out) < / p> `

Output:

` arr1 : [2, 2, 2, 2, 2] arr1: [2, 3, 4, 5, 6] Output array: [4. 8. 16. 32. 64.] `

Code 2: arr1 elements raised to power of 2

 ` # Python program explaining ` ` # float_power () function ` ` import ` ` numpy as np `   ` # input_array ` ` arr1 ` ` = ` ` np.arange (` ` 8 ` `) ` ` exponent ` ` = ` ` 2 ` ` print ` ` (` ` "arr1 :" ` `, arr1) `   ` # output_array ` ` out ` ` = ` ` np.float_power (arr1, exponent) ` ` print ` ` (` ` "Output array:" ` `, out) `

Output:

` arr1: [0 1 2 3 4 5 6 7] Output array: [0. 1. 4 . 9. 16. 25. 36. 49.] `

Code 3: results of processing float_power if arr2 has -ve elements

` `

` # Python program explaining # float_power () function import numpy as np    # input_array arr1 = [ 2 , 2 , 2 , 2 , 2 ] arr2 = [ 2 , - 3 , 4 , - 5 , 6 ] print ( " arr1 : " , arr1) print ( "arr2 :" , arr2)   # output_array out = np.float_power (arr1, arr2) print ( "Output array:" , out) `

Output:

``` arr1: [2 , 2, 2, 2, 2] arr2: [2, -3, 4, -5, 6] Output array: [4.00000000e + 00 1.25000000e-01 1.60000000e + 01 3.12500000e-02 6.40000000e + 01]    Links:    https://docs.scipy.org/doc/numpy-1.13.0/reference/generated/ numpy.float_power.html # numpy.float_power  ,

```

## How to get all subsets of a set? (powerset)

Given a set

``````{0, 1, 2, 3}
``````

How can I produce the subsets:

``````[set(),
{0},
{1},
{2},
{3},
{0, 1},
{0, 2},
{0, 3},
{1, 2},
{1, 3},
{2, 3},
{0, 1, 2},
{0, 1, 3},
{0, 2, 3},
{1, 2, 3},
{0, 1, 2, 3}]
``````

You have four main options for converting types in pandas:

1. `to_numeric()` - provides functionality to safely convert non-numeric types (e.g. strings) to a suitable numeric type. (See also `to_datetime()` and `to_timedelta()`.)

2. `astype()` - convert (almost) any type to (almost) any other type (even if it"s not necessarily sensible to do so). Also allows you to convert to categorial types (very useful).

3. `infer_objects()` - a utility method to convert object columns holding Python objects to a pandas type if possible.

4. `convert_dtypes()` - convert DataFrame columns to the "best possible" dtype that supports `pd.NA` (pandas" object to indicate a missing value).

Read on for more detailed explanations and usage of each of these methods.

# 1. `to_numeric()`

The best way to convert one or more columns of a DataFrame to numeric values is to use `pandas.to_numeric()`.

This function will try to change non-numeric objects (such as strings) into integers or floating point numbers as appropriate.

## Basic usage

The input to `to_numeric()` is a Series or a single column of a DataFrame.

``````>>> s = pd.Series(["8", 6, "7.5", 3, "0.9"]) # mixed string and numeric values
>>> s
0      8
1      6
2    7.5
3      3
4    0.9
dtype: object

>>> pd.to_numeric(s) # convert everything to float values
0    8.0
1    6.0
2    7.5
3    3.0
4    0.9
dtype: float64
``````

As you can see, a new Series is returned. Remember to assign this output to a variable or column name to continue using it:

``````# convert Series
my_series = pd.to_numeric(my_series)

# convert column "a" of a DataFrame
df["a"] = pd.to_numeric(df["a"])
``````

You can also use it to convert multiple columns of a DataFrame via the `apply()` method:

``````# convert all columns of DataFrame
df = df.apply(pd.to_numeric) # convert all columns of DataFrame

# convert just columns "a" and "b"
df[["a", "b"]] = df[["a", "b"]].apply(pd.to_numeric)
``````

As long as your values can all be converted, that"s probably all you need.

## Error handling

But what if some values can"t be converted to a numeric type?

`to_numeric()` also takes an `errors` keyword argument that allows you to force non-numeric values to be `NaN`, or simply ignore columns containing these values.

Here"s an example using a Series of strings `s` which has the object dtype:

``````>>> s = pd.Series(["1", "2", "4.7", "pandas", "10"])
>>> s
0         1
1         2
2       4.7
3    pandas
4        10
dtype: object
``````

The default behaviour is to raise if it can"t convert a value. In this case, it can"t cope with the string "pandas":

``````>>> pd.to_numeric(s) # or pd.to_numeric(s, errors="raise")
ValueError: Unable to parse string
``````

Rather than fail, we might want "pandas" to be considered a missing/bad numeric value. We can coerce invalid values to `NaN` as follows using the `errors` keyword argument:

``````>>> pd.to_numeric(s, errors="coerce")
0     1.0
1     2.0
2     4.7
3     NaN
4    10.0
dtype: float64
``````

The third option for `errors` is just to ignore the operation if an invalid value is encountered:

``````>>> pd.to_numeric(s, errors="ignore")
# the original Series is returned untouched
``````

This last option is particularly useful when you want to convert your entire DataFrame, but don"t not know which of our columns can be converted reliably to a numeric type. In that case just write:

``````df.apply(pd.to_numeric, errors="ignore")
``````

The function will be applied to each column of the DataFrame. Columns that can be converted to a numeric type will be converted, while columns that cannot (e.g. they contain non-digit strings or dates) will be left alone.

## Downcasting

By default, conversion with `to_numeric()` will give you either a `int64` or `float64` dtype (or whatever integer width is native to your platform).

That"s usually what you want, but what if you wanted to save some memory and use a more compact dtype, like `float32`, or `int8`?

`to_numeric()` gives you the option to downcast to either "integer", "signed", "unsigned", "float". Here"s an example for a simple series `s` of integer type:

``````>>> s = pd.Series([1, 2, -7])
>>> s
0    1
1    2
2   -7
dtype: int64
``````

Downcasting to "integer" uses the smallest possible integer that can hold the values:

``````>>> pd.to_numeric(s, downcast="integer")
0    1
1    2
2   -7
dtype: int8
``````

Downcasting to "float" similarly picks a smaller than normal floating type:

``````>>> pd.to_numeric(s, downcast="float")
0    1.0
1    2.0
2   -7.0
dtype: float32
``````

# 2. `astype()`

The `astype()` method enables you to be explicit about the dtype you want your DataFrame or Series to have. It"s very versatile in that you can try and go from one type to the any other.

## Basic usage

Just pick a type: you can use a NumPy dtype (e.g. `np.int16`), some Python types (e.g. bool), or pandas-specific types (like the categorical dtype).

Call the method on the object you want to convert and `astype()` will try and convert it for you:

``````# convert all DataFrame columns to the int64 dtype
df = df.astype(int)

# convert column "a" to int64 dtype and "b" to complex type
df = df.astype({"a": int, "b": complex})

# convert Series to float16 type
s = s.astype(np.float16)

# convert Series to Python strings
s = s.astype(str)

# convert Series to categorical type - see docs for more details
s = s.astype("category")
``````

Notice I said "try" - if `astype()` does not know how to convert a value in the Series or DataFrame, it will raise an error. For example if you have a `NaN` or `inf` value you"ll get an error trying to convert it to an integer.

As of pandas 0.20.0, this error can be suppressed by passing `errors="ignore"`. Your original object will be return untouched.

## Be careful

`astype()` is powerful, but it will sometimes convert values "incorrectly". For example:

``````>>> s = pd.Series([1, 2, -7])
>>> s
0    1
1    2
2   -7
dtype: int64
``````

These are small integers, so how about converting to an unsigned 8-bit type to save memory?

``````>>> s.astype(np.uint8)
0      1
1      2
2    249
dtype: uint8
``````

The conversion worked, but the -7 was wrapped round to become 249 (i.e. 28 - 7)!

Trying to downcast using `pd.to_numeric(s, downcast="unsigned")` instead could help prevent this error.

# 3. `infer_objects()`

Version 0.21.0 of pandas introduced the method `infer_objects()` for converting columns of a DataFrame that have an object datatype to a more specific type (soft conversions).

For example, here"s a DataFrame with two columns of object type. One holds actual integers and the other holds strings representing integers:

``````>>> df = pd.DataFrame({"a": [7, 1, 5], "b": ["3","2","1"]}, dtype="object")
>>> df.dtypes
a    object
b    object
dtype: object
``````

Using `infer_objects()`, you can change the type of column "a" to int64:

``````>>> df = df.infer_objects()
>>> df.dtypes
a     int64
b    object
dtype: object
``````

Column "b" has been left alone since its values were strings, not integers. If you wanted to try and force the conversion of both columns to an integer type, you could use `df.astype(int)` instead.

# 4. `convert_dtypes()`

Version 1.0 and above includes a method `convert_dtypes()` to convert Series and DataFrame columns to the best possible dtype that supports the `pd.NA` missing value.

Here "best possible" means the type most suited to hold the values. For example, this a pandas integer type if all of the values are integers (or missing values): an object column of Python integer objects is converted to `Int64`, a column of NumPy `int32` values will become the pandas dtype `Int32`.

With our `object` DataFrame `df`, we get the following result:

``````>>> df.convert_dtypes().dtypes
a     Int64
b    string
dtype: object
``````

Since column "a" held integer values, it was converted to the `Int64` type (which is capable of holding missing values, unlike `int64`).

Column "b" contained string objects, so was changed to pandas" `string` dtype.

By default, this method will infer the type from object values in each column. We can change this by passing `infer_objects=False`:

``````>>> df.convert_dtypes(infer_objects=False).dtypes
a    object
b    string
dtype: object
``````

Now column "a" remained an object column: pandas knows it can be described as an "integer" column (internally it ran `infer_dtype`) but didn"t infer exactly what dtype of integer it should have so did not convert it. Column "b" was again converted to "string" dtype as it was recognised as holding "string" values.

There are two types of site-packages directories, global and per user.

1. Global site-packages ("dist-packages") directories are listed in `sys.path` when you run:

``````python -m site
``````

For a more concise list run `getsitepackages` from the site module in Python code:

``````python -c "import site; print(site.getsitepackages())"
``````

Note: With virtualenvs getsitepackages is not available, `sys.path` from above will list the virtualenv"s site-packages directory correctly, though. In Python 3, you may use the sysconfig module instead:

``````python3 -c "import sysconfig; print(sysconfig.get_paths()["purelib"])"
``````
2. The per user site-packages directory (PEP 370) is where Python installs your local packages:

``````python -m site --user-site
``````

If this points to a non-existing directory check the exit status of Python and see `python -m site --help` for explanations.

Hint: Running `pip list --user` or `pip freeze --user` gives you a list of all installed per user site-packages.

## Practical Tips

• `<package>.__path__` lets you identify the location(s) of a specific package: (details)

``````\$ python -c "import setuptools as _; print(_.__path__)"
["/usr/lib/python2.7/dist-packages/setuptools"]
``````
• `<module>.__file__` lets you identify the location of a specific module: (difference)

``````\$ python3 -c "import os as _; print(_.__file__)"
/usr/lib/python3.6/os.py
``````
• Run `pip show <package>` to show Debian-style package information:

``````\$ pip show pytest
Name: pytest
Version: 3.8.2
Summary: pytest: simple powerful testing with Python
Home-page: https://docs.pytest.org/en/latest/
Author: Holger Krekel, Bruno Oliveira, Ronny Pfannschmidt, Floris Bruynooghe, Brianna Laugher, Florian Bruhin and others
Author-email: None
Location: /home/peter/.local/lib/python3.4/site-packages
Requires: more-itertools, atomicwrites, setuptools, attrs, pathlib2, six, py, pluggy
``````

# TL;DR version:

For the simple case of:

• I have a text column with a delimiter and I want two columns

The simplest solution is:

``````df[["A", "B"]] = df["AB"].str.split(" ", 1, expand=True)
``````

You must use `expand=True` if your strings have a non-uniform number of splits and you want `None` to replace the missing values.

Notice how, in either case, the `.tolist()` method is not necessary. Neither is `zip()`.

# In detail:

Andy Hayden"s solution is most excellent in demonstrating the power of the `str.extract()` method.

But for a simple split over a known separator (like, splitting by dashes, or splitting by whitespace), the `.str.split()` method is enough1. It operates on a column (Series) of strings, and returns a column (Series) of lists:

``````>>> import pandas as pd
>>> df = pd.DataFrame({"AB": ["A1-B1", "A2-B2"]})
>>> df

AB
0  A1-B1
1  A2-B2
>>> df["AB_split"] = df["AB"].str.split("-")
>>> df

AB  AB_split
0  A1-B1  [A1, B1]
1  A2-B2  [A2, B2]
``````

1: If you"re unsure what the first two parameters of `.str.split()` do, I recommend the docs for the plain Python version of the method.

But how do you go from:

• a column containing two-element lists

to:

• two columns, each containing the respective element of the lists?

Well, we need to take a closer look at the `.str` attribute of a column.

It"s a magical object that is used to collect methods that treat each element in a column as a string, and then apply the respective method in each element as efficient as possible:

``````>>> upper_lower_df = pd.DataFrame({"U": ["A", "B", "C"]})
>>> upper_lower_df

U
0  A
1  B
2  C
>>> upper_lower_df["L"] = upper_lower_df["U"].str.lower()
>>> upper_lower_df

U  L
0  A  a
1  B  b
2  C  c
``````

But it also has an "indexing" interface for getting each element of a string by its index:

``````>>> df["AB"].str

0    A
1    A
Name: AB, dtype: object

>>> df["AB"].str

0    1
1    2
Name: AB, dtype: object
``````

Of course, this indexing interface of `.str` doesn"t really care if each element it"s indexing is actually a string, as long as it can be indexed, so:

``````>>> df["AB"].str.split("-", 1).str

0    A1
1    A2
Name: AB, dtype: object

>>> df["AB"].str.split("-", 1).str

0    B1
1    B2
Name: AB, dtype: object
``````

Then, it"s a simple matter of taking advantage of the Python tuple unpacking of iterables to do

``````>>> df["A"], df["B"] = df["AB"].str.split("-", 1).str
>>> df

AB  AB_split   A   B
0  A1-B1  [A1, B1]  A1  B1
1  A2-B2  [A2, B2]  A2  B2
``````

Of course, getting a DataFrame out of splitting a column of strings is so useful that the `.str.split()` method can do it for you with the `expand=True` parameter:

``````>>> df["AB"].str.split("-", 1, expand=True)

0   1
0  A1  B1
1  A2  B2
``````

So, another way of accomplishing what we wanted is to do:

``````>>> df = df[["AB"]]
>>> df

AB
0  A1-B1
1  A2-B2

>>> df.join(df["AB"].str.split("-", 1, expand=True).rename(columns={0:"A", 1:"B"}))

AB   A   B
0  A1-B1  A1  B1
1  A2-B2  A2  B2
``````

The `expand=True` version, although longer, has a distinct advantage over the tuple unpacking method. Tuple unpacking doesn"t deal well with splits of different lengths:

``````>>> df = pd.DataFrame({"AB": ["A1-B1", "A2-B2", "A3-B3-C3"]})
>>> df
AB
0     A1-B1
1     A2-B2
2  A3-B3-C3
>>> df["A"], df["B"], df["C"] = df["AB"].str.split("-")
Traceback (most recent call last):
[...]
ValueError: Length of values does not match length of index
>>>
``````

But `expand=True` handles it nicely by placing `None` in the columns for which there aren"t enough "splits":

``````>>> df.join(
...     df["AB"].str.split("-", expand=True).rename(
...         columns={0:"A", 1:"B", 2:"C"}
...     )
... )
AB   A   B     C
0     A1-B1  A1  B1  None
1     A2-B2  A2  B2  None
2  A3-B3-C3  A3  B3    C3
``````

If you haven"t got python installed along with all the node-gyp dependencies, simply open Powershell or Git Bash with administrator privileges and execute:

``````npm install --global --production windows-build-tools
``````

and then to install the package:

``````npm install --global node-gyp
``````

once installed, you will have all the node-gyp dependencies downloaded, but you still need the environment variable. Validate Python is indeed found in the correct folder:

``````C:Usersen.windows-build-toolspython27python.exe
``````

If it doesn"t moan, go ahead and create your (user) environment variable:

``````setx PYTHON "%USERPROFILE%.windows-build-toolspython27python.exe"
``````

restart cmd, and verify the variable exists via `set PYTHON` which should return the variable (`\$env:PYTHON` if using Powershell)

Lastly re-apply `npm install <module>`

## How do I determine the size of an object in Python?

The answer, "Just use `sys.getsizeof`", is not a complete answer.

That answer does work for builtin objects directly, but it does not account for what those objects may contain, specifically, what types, such as custom objects, tuples, lists, dicts, and sets contain. They can contain instances each other, as well as numbers, strings and other objects.

Using 64-bit Python 3.6 from the Anaconda distribution, with `sys.getsizeof`, I have determined the minimum size of the following objects, and note that sets and dicts preallocate space so empty ones don"t grow again until after a set amount (which may vary by implementation of the language):

Python 3:

``````Empty
Bytes  type        scaling notes
28     int         +4 bytes about every 30 powers of 2
37     bytes       +1 byte per additional byte
49     str         +1-4 per additional character (depending on max width)
48     tuple       +8 per additional item
64     list        +8 for each additional
224    set         5th increases to 736; 21nd, 2272; 85th, 8416; 341, 32992
240    dict        6th increases to 368; 22nd, 1184; 43rd, 2280; 86th, 4704; 171st, 9320
136    func def    does not include default args and other attrs
1056   class def   no slots
56     class inst  has a __dict__ attr, same scaling as dict above
888    class def   with slots
16     __slots__   seems to store in mutable tuple-like structure
first slot grows to 48, and so on.
``````

How do you interpret this? Well say you have a set with 10 items in it. If each item is 100 bytes each, how big is the whole data structure? The set is 736 itself because it has sized up one time to 736 bytes. Then you add the size of the items, so that"s 1736 bytes in total

Some caveats for function and class definitions:

Note each class definition has a proxy `__dict__` (48 bytes) structure for class attrs. Each slot has a descriptor (like a `property`) in the class definition.

Slotted instances start out with 48 bytes on their first element, and increase by 8 each additional. Only empty slotted objects have 16 bytes, and an instance with no data makes very little sense.

Also, each function definition has code objects, maybe docstrings, and other possible attributes, even a `__dict__`.

Also note that we use `sys.getsizeof()` because we care about the marginal space usage, which includes the garbage collection overhead for the object, from the docs:

`getsizeof()` calls the object‚Äôs `__sizeof__` method and adds an additional garbage collector overhead if the object is managed by the garbage collector.

Also note that resizing lists (e.g. repetitively appending to them) causes them to preallocate space, similarly to sets and dicts. From the listobj.c source code:

``````    /* This over-allocates proportional to the list size, making room
* for additional growth.  The over-allocation is mild, but is
* enough to give linear-time amortized behavior over a long
* sequence of appends() in the presence of a poorly-performing
* system realloc().
* The growth pattern is:  0, 4, 8, 16, 25, 35, 46, 58, 72, 88, ...
* Note: new_allocated won"t overflow because the largest possible value
*       is PY_SSIZE_T_MAX * (9 / 8) + 6 which always fits in a size_t.
*/
new_allocated = (size_t)newsize + (newsize >> 3) + (newsize < 9 ? 3 : 6);
``````

### Historical data

Python 2.7 analysis, confirmed with `guppy.hpy` and `sys.getsizeof`:

``````Bytes  type        empty + scaling notes
24     int         NA
28     long        NA
37     str         + 1 byte per additional character
52     unicode     + 4 bytes per additional character
56     tuple       + 8 bytes per additional item
72     list        + 32 for first, 8 for each additional
232    set         sixth item increases to 744; 22nd, 2280; 86th, 8424
280    dict        sixth item increases to 1048; 22nd, 3352; 86th, 12568 *
120    func def    does not include default args and other attrs
64     class inst  has a __dict__ attr, same scaling as dict above
16     __slots__   class with slots has no dict, seems to store in
mutable tuple-like structure.
904    class def   has a proxy __dict__ structure for class attrs
104    old class   makes sense, less stuff, has real dict though.
``````

Note that dictionaries (but not sets) got a more compact representation in Python 3.6

I think 8 bytes per additional item to reference makes a lot of sense on a 64 bit machine. Those 8 bytes point to the place in memory the contained item is at. The 4 bytes are fixed width for unicode in Python 2, if I recall correctly, but in Python 3, str becomes a unicode of width equal to the max width of the characters.

And for more on slots, see this answer.

## A More Complete Function

We want a function that searches the elements in lists, tuples, sets, dicts, `obj.__dict__`"s, and `obj.__slots__`, as well as other things we may not have yet thought of.

We want to rely on `gc.get_referents` to do this search because it works at the C level (making it very fast). The downside is that get_referents can return redundant members, so we need to ensure we don"t double count.

Classes, modules, and functions are singletons - they exist one time in memory. We"re not so interested in their size, as there"s not much we can do about them - they"re a part of the program. So we"ll avoid counting them if they happen to be referenced.

We"re going to use a blacklist of types so we don"t include the entire program in our size count.

``````import sys
from types import ModuleType, FunctionType
from gc import get_referents

# Custom objects know their class.
# Function objects seem to know way too much, including modules.
# Exclude modules as well.
BLACKLIST = type, ModuleType, FunctionType

def getsize(obj):
"""sum size of object & members."""
if isinstance(obj, BLACKLIST):
raise TypeError("getsize() does not take argument of type: "+ str(type(obj)))
seen_ids = set()
size = 0
objects = [obj]
while objects:
need_referents = []
for obj in objects:
if not isinstance(obj, BLACKLIST) and id(obj) not in seen_ids:
size += sys.getsizeof(obj)
need_referents.append(obj)
objects = get_referents(*need_referents)
return size
``````

To contrast this with the following whitelisted function, most objects know how to traverse themselves for the purposes of garbage collection (which is approximately what we"re looking for when we want to know how expensive in memory certain objects are. This functionality is used by `gc.get_referents`.) However, this measure is going to be much more expansive in scope than we intended if we are not careful.

For example, functions know quite a lot about the modules they are created in.

Another point of contrast is that strings that are keys in dictionaries are usually interned so they are not duplicated. Checking for `id(key)` will also allow us to avoid counting duplicates, which we do in the next section. The blacklist solution skips counting keys that are strings altogether.

## Whitelisted Types, Recursive visitor

To cover most of these types myself, instead of relying on the `gc` module, I wrote this recursive function to try to estimate the size of most Python objects, including most builtins, types in the collections module, and custom types (slotted and otherwise).

This sort of function gives much more fine-grained control over the types we"re going to count for memory usage, but has the danger of leaving important types out:

``````import sys
from numbers import Number
from collections import deque
from collections.abc import Set, Mapping

ZERO_DEPTH_BASES = (str, bytes, Number, range, bytearray)

def getsize(obj_0):
"""Recursively iterate to sum size of object & members."""
_seen_ids = set()
def inner(obj):
obj_id = id(obj)
if obj_id in _seen_ids:
return 0
size = sys.getsizeof(obj)
if isinstance(obj, ZERO_DEPTH_BASES):
pass # bypass remaining control flow and return
elif isinstance(obj, (tuple, list, Set, deque)):
size += sum(inner(i) for i in obj)
elif isinstance(obj, Mapping) or hasattr(obj, "items"):
size += sum(inner(k) + inner(v) for k, v in getattr(obj, "items")())
# Check for custom object instances - may subclass above too
if hasattr(obj, "__dict__"):
size += inner(vars(obj))
if hasattr(obj, "__slots__"): # can have __slots__ with __dict__
size += sum(inner(getattr(obj, s)) for s in obj.__slots__ if hasattr(obj, s))
return size
return inner(obj_0)
``````

And I tested it rather casually (I should unittest it):

``````>>> getsize(["a", tuple("bcd"), Foo()])
344
>>> getsize(Foo())
16
>>> getsize(tuple("bcd"))
194
>>> getsize(["a", tuple("bcd"), Foo(), {"foo": "bar", "baz": "bar"}])
752
>>> getsize({"foo": "bar", "baz": "bar"})
400
>>> getsize({})
280
>>> getsize({"foo":"bar"})
360
>>> getsize("foo")
40
>>> class Bar():
...     def baz():
...         pass
>>> getsize(Bar())
352
>>> getsize(Bar().__dict__)
280
>>> sys.getsizeof(Bar())
72
>>> getsize(Bar.__dict__)
872
>>> sys.getsizeof(Bar.__dict__)
280
``````

This implementation breaks down on class definitions and function definitions because we don"t go after all of their attributes, but since they should only exist once in memory for the process, their size really doesn"t matter too much.

To somewhat expand on the earlier answers here, there are a number of details which are commonly overlooked.

• Prefer `subprocess.run()` over `subprocess.check_call()` and friends over `subprocess.call()` over `subprocess.Popen()` over `os.system()` over `os.popen()`
• Understand and probably use `text=True`, aka `universal_newlines=True`.
• Understand the meaning of `shell=True` or `shell=False` and how it changes quoting and the availability of shell conveniences.
• Understand differences between `sh` and Bash
• Understand how a subprocess is separate from its parent, and generally cannot change the parent.
• Avoid running the Python interpreter as a subprocess of Python.

These topics are covered in some more detail below.

# Prefer `subprocess.run()` or `subprocess.check_call()`

The `subprocess.Popen()` function is a low-level workhorse but it is tricky to use correctly and you end up copy/pasting multiple lines of code ... which conveniently already exist in the standard library as a set of higher-level wrapper functions for various purposes, which are presented in more detail in the following.

Here"s a paragraph from the documentation:

The recommended approach to invoking subprocesses is to use the `run()` function for all use cases it can handle. For more advanced use cases, the underlying `Popen` interface can be used directly.

Unfortunately, the availability of these wrapper functions differs between Python versions.

• `subprocess.run()` was officially introduced in Python 3.5. It is meant to replace all of the following.
• `subprocess.check_output()` was introduced in Python 2.7 / 3.1. It is basically equivalent to `subprocess.run(..., check=True, stdout=subprocess.PIPE).stdout`
• `subprocess.check_call()` was introduced in Python 2.5. It is basically equivalent to `subprocess.run(..., check=True)`
• `subprocess.call()` was introduced in Python 2.4 in the original `subprocess` module (PEP-324). It is basically equivalent to `subprocess.run(...).returncode`

### High-level API vs `subprocess.Popen()`

The refactored and extended `subprocess.run()` is more logical and more versatile than the older legacy functions it replaces. It returns a `CompletedProcess` object which has various methods which allow you to retrieve the exit status, the standard output, and a few other results and status indicators from the finished subprocess.

`subprocess.run()` is the way to go if you simply need a program to run and return control to Python. For more involved scenarios (background processes, perhaps with interactive I/O with the Python parent program) you still need to use `subprocess.Popen()` and take care of all the plumbing yourself. This requires a fairly intricate understanding of all the moving parts and should not be undertaken lightly. The simpler `Popen` object represents the (possibly still-running) process which needs to be managed from your code for the remainder of the lifetime of the subprocess.

It should perhaps be emphasized that just `subprocess.Popen()` merely creates a process. If you leave it at that, you have a subprocess running concurrently alongside with Python, so a "background" process. If it doesn"t need to do input or output or otherwise coordinate with you, it can do useful work in parallel with your Python program.

### Avoid `os.system()` and `os.popen()`

Since time eternal (well, since Python 2.5) the `os` module documentation has contained the recommendation to prefer `subprocess` over `os.system()`:

The `subprocess` module provides more powerful facilities for spawning new processes and retrieving their results; using that module is preferable to using this function.

The problems with `system()` are that it"s obviously system-dependent and doesn"t offer ways to interact with the subprocess. It simply runs, with standard output and standard error outside of Python"s reach. The only information Python receives back is the exit status of the command (zero means success, though the meaning of non-zero values is also somewhat system-dependent).

PEP-324 (which was already mentioned above) contains a more detailed rationale for why `os.system` is problematic and how `subprocess` attempts to solve those issues.

`os.popen()` used to be even more strongly discouraged:

Deprecated since version 2.6: This function is obsolete. Use the `subprocess` module.

However, since sometime in Python 3, it has been reimplemented to simply use `subprocess`, and redirects to the `subprocess.Popen()` documentation for details.

### Understand and usually use `check=True`

You"ll also notice that `subprocess.call()` has many of the same limitations as `os.system()`. In regular use, you should generally check whether the process finished successfully, which `subprocess.check_call()` and `subprocess.check_output()` do (where the latter also returns the standard output of the finished subprocess). Similarly, you should usually use `check=True` with `subprocess.run()` unless you specifically need to allow the subprocess to return an error status.

In practice, with `check=True` or `subprocess.check_*`, Python will throw a `CalledProcessError` exception if the subprocess returns a nonzero exit status.

A common error with `subprocess.run()` is to omit `check=True` and be surprised when downstream code fails if the subprocess failed.

On the other hand, a common problem with `check_call()` and `check_output()` was that users who blindly used these functions were surprised when the exception was raised e.g. when `grep` did not find a match. (You should probably replace `grep` with native Python code anyway, as outlined below.)

All things counted, you need to understand how shell commands return an exit code, and under what conditions they will return a non-zero (error) exit code, and make a conscious decision how exactly it should be handled.

# Understand and probably use `text=True` aka `universal_newlines=True`

Since Python 3, strings internal to Python are Unicode strings. But there is no guarantee that a subprocess generates Unicode output, or strings at all.

(If the differences are not immediately obvious, Ned Batchelder"s Pragmatic Unicode is recommended, if not outright obligatory, reading. There is a 36-minute video presentation behind the link if you prefer, though reading the page yourself will probably take significantly less time.)

Deep down, Python has to fetch a `bytes` buffer and interpret it somehow. If it contains a blob of binary data, it shouldn"t be decoded into a Unicode string, because that"s error-prone and bug-inducing behavior - precisely the sort of pesky behavior which riddled many Python 2 scripts, before there was a way to properly distinguish between encoded text and binary data.

With `text=True`, you tell Python that you, in fact, expect back textual data in the system"s default encoding, and that it should be decoded into a Python (Unicode) string to the best of Python"s ability (usually UTF-8 on any moderately up to date system, except perhaps Windows?)

If that"s not what you request back, Python will just give you `bytes` strings in the `stdout` and `stderr` strings. Maybe at some later point you do know that they were text strings after all, and you know their encoding. Then, you can decode them.

``````normal = subprocess.run([external, arg],
stdout=subprocess.PIPE, stderr=subprocess.PIPE,
check=True,
text=True)
print(normal.stdout)

convoluted = subprocess.run([external, arg],
stdout=subprocess.PIPE, stderr=subprocess.PIPE,
check=True)
# You have to know (or guess) the encoding
print(convoluted.stdout.decode("utf-8"))
``````

Python 3.7 introduced the shorter and more descriptive and understandable alias `text` for the keyword argument which was previously somewhat misleadingly called `universal_newlines`.

# Understand `shell=True` vs `shell=False`

With `shell=True` you pass a single string to your shell, and the shell takes it from there.

With `shell=False` you pass a list of arguments to the OS, bypassing the shell.

When you don"t have a shell, you save a process and get rid of a fairly substantial amount of hidden complexity, which may or may not harbor bugs or even security problems.

On the other hand, when you don"t have a shell, you don"t have redirection, wildcard expansion, job control, and a large number of other shell features.

A common mistake is to use `shell=True` and then still pass Python a list of tokens, or vice versa. This happens to work in some cases, but is really ill-defined and could break in interesting ways.

``````# XXX AVOID THIS BUG
buggy = subprocess.run("dig +short stackoverflow.com")

# XXX AVOID THIS BUG TOO
broken = subprocess.run(["dig", "+short", "stackoverflow.com"],
shell=True)

# XXX DEFINITELY AVOID THIS
pathological = subprocess.run(["dig +short stackoverflow.com"],
shell=True)

correct = subprocess.run(["dig", "+short", "stackoverflow.com"],
# Probably don"t forget these, too
check=True, text=True)

# XXX Probably better avoid shell=True
# but this is nominally correct
fixed_but_fugly = subprocess.run("dig +short stackoverflow.com",
shell=True,
# Probably don"t forget these, too
check=True, text=True)
``````

The common retort "but it works for me" is not a useful rebuttal unless you understand exactly under what circumstances it could stop working.

### Refactoring Example

Very often, the features of the shell can be replaced with native Python code. Simple Awk or `sed` scripts should probably simply be translated to Python instead.

To partially illustrate this, here is a typical but slightly silly example which involves many shell features.

``````cmd = """while read -r x;
do ping -c 3 "\$x" | grep "round-trip min/avg/max"
done <hosts.txt"""

# Trivial but horrible
results = subprocess.run(
cmd, shell=True, universal_newlines=True, check=True)
print(results.stdout)

# Reimplement with shell=False
with open("hosts.txt") as hosts:
for host in hosts:
host = host.rstrip("
")  # drop newline
ping = subprocess.run(
["ping", "-c", "3", host],
text=True,
stdout=subprocess.PIPE,
check=True)
for line in ping.stdout.split("
"):
if "round-trip min/avg/max" in line:
print("{}: {}".format(host, line))
``````

Some things to note here:

• With `shell=False` you don"t need the quoting that the shell requires around strings. Putting quotes anyway is probably an error.
• It often makes sense to run as little code as possible in a subprocess. This gives you more control over execution from within your Python code.
• Having said that, complex shell pipelines are tedious and sometimes challenging to reimplement in Python.

The refactored code also illustrates just how much the shell really does for you with a very terse syntax -- for better or for worse. Python says explicit is better than implicit but the Python code is rather verbose and arguably looks more complex than this really is. On the other hand, it offers a number of points where you can grab control in the middle of something else, as trivially exemplified by the enhancement that we can easily include the host name along with the shell command output. (This is by no means challenging to do in the shell, either, but at the expense of yet another diversion and perhaps another process.)

### Common Shell Constructs

For completeness, here are brief explanations of some of these shell features, and some notes on how they can perhaps be replaced with native Python facilities.

• Globbing aka wildcard expansion can be replaced with `glob.glob()` or very often with simple Python string comparisons like `for file in os.listdir("."): if not file.endswith(".png"): continue`. Bash has various other expansion facilities like `.{png,jpg}` brace expansion and `{1..100}` as well as tilde expansion (`~` expands to your home directory, and more generally `~account` to the home directory of another user)
• Shell variables like `\$SHELL` or `\$my_exported_var` can sometimes simply be replaced with Python variables. Exported shell variables are available as e.g. `os.environ["SHELL"]` (the meaning of `export` is to make the variable available to subprocesses -- a variable which is not available to subprocesses will obviously not be available to Python running as a subprocess of the shell, or vice versa. The `env=` keyword argument to `subprocess` methods allows you to define the environment of the subprocess as a dictionary, so that"s one way to make a Python variable visible to a subprocess). With `shell=False` you will need to understand how to remove any quotes; for example, `cd "\$HOME"` is equivalent to `os.chdir(os.environ["HOME"])` without quotes around the directory name. (Very often `cd` is not useful or necessary anyway, and many beginners omit the double quotes around the variable and get away with it until one day ...)
• Redirection allows you to read from a file as your standard input, and write your standard output to a file. `grep "foo" <inputfile >outputfile` opens `outputfile` for writing and `inputfile` for reading, and passes its contents as standard input to `grep`, whose standard output then lands in `outputfile`. This is not generally hard to replace with native Python code.
• Pipelines are a form of redirection. `echo foo | nl` runs two subprocesses, where the standard output of `echo` is the standard input of `nl` (on the OS level, in Unix-like systems, this is a single file handle). If you cannot replace one or both ends of the pipeline with native Python code, perhaps think about using a shell after all, especially if the pipeline has more than two or three processes (though look at the `pipes` module in the Python standard library or a number of more modern and versatile third-party competitors).
• Job control lets you interrupt jobs, run them in the background, return them to the foreground, etc. The basic Unix signals to stop and continue a process are of course available from Python, too. But jobs are a higher-level abstraction in the shell which involve process groups etc which you have to understand if you want to do something like this from Python.
• Quoting in the shell is potentially confusing until you understand that everything is basically a string. So `ls -l /` is equivalent to `"ls" "-l" "/"` but the quoting around literals is completely optional. Unquoted strings which contain shell metacharacters undergo parameter expansion, whitespace tokenization and wildcard expansion; double quotes prevent whitespace tokenization and wildcard expansion but allow parameter expansions (variable substitution, command substitution, and backslash processing). This is simple in theory but can get bewildering, especially when there are several layers of interpretation (a remote shell command, for example).

# Understand differences between `sh` and Bash

`subprocess` runs your shell commands with `/bin/sh` unless you specifically request otherwise (except of course on Windows, where it uses the value of the `COMSPEC` variable). This means that various Bash-only features like arrays, `[[` etc are not available.

If you need to use Bash-only syntax, you can pass in the path to the shell as `executable="/bin/bash"` (where of course if your Bash is installed somewhere else, you need to adjust the path).

``````subprocess.run("""
# This for loop syntax is Bash only
for((i=1;i<=\$#;i++)); do
# Arrays are Bash-only
array[i]+=123
done""",
shell=True, check=True,
executable="/bin/bash")
``````

# A `subprocess` is separate from its parent, and cannot change it

A somewhat common mistake is doing something like

``````subprocess.run("cd /tmp", shell=True)
subprocess.run("pwd", shell=True)  # Oops, doesn"t print /tmp
``````

The same thing will happen if the first subprocess tries to set an environment variable, which of course will have disappeared when you run another subprocess, etc.

A child process runs completely separate from Python, and when it finishes, Python has no idea what it did (apart from the vague indicators that it can infer from the exit status and output from the child process). A child generally cannot change the parent"s environment; it cannot set a variable, change the working directory, or, in so many words, communicate with its parent without cooperation from the parent.

The immediate fix in this particular case is to run both commands in a single subprocess;

``````subprocess.run("cd /tmp; pwd", shell=True)
``````

though obviously this particular use case isn"t very useful; instead, use the `cwd` keyword argument, or simply `os.chdir()` before running the subprocess. Similarly, for setting a variable, you can manipulate the environment of the current process (and thus also its children) via

``````os.environ["foo"] = "bar"
``````

or pass an environment setting to a child process with

``````subprocess.run("echo "\$foo"", shell=True, env={"foo": "bar"})
``````

(not to mention the obvious refactoring `subprocess.run(["echo", "bar"])`; but `echo` is a poor example of something to run in a subprocess in the first place, of course).

# Don"t run Python from Python

This is slightly dubious advice; there are certainly situations where it does make sense or is even an absolute requirement to run the Python interpreter as a subprocess from a Python script. But very frequently, the correct approach is simply to `import` the other Python module into your calling script and call its functions directly.

If the other Python script is under your control, and it isn"t a module, consider turning it into one. (This answer is too long already so I will not delve into details here.)

If you need parallelism, you can run Python functions in subprocesses with the `multiprocessing` module. There is also `threading` which runs multiple tasks in a single process (which is more lightweight and gives you more control, but also more constrained in that threads within a process are tightly coupled, and bound to a single GIL.)

To summarize and complement the existing answers:

• `python.exe` is a console (terminal) application for launching CLI-type scripts (console applications).

• Unless run from an existing console window, `python.exe` opens a new console window.

• Standard streams `sys.stdin`, `sys.stdout` and `sys.stderr` are connected to the console window.

• Execution is synchronous when launched from a `cmd.exe` or PowerShell console window: See eryksun"s 1st comment below.

• If a new console window was created, it stays open until the script terminates.
• When invoked from an existing console window, the prompt is blocked until the script terminates.
• `pythonw.exe` is a GUI app for launching GUI/no-UI-at-all scripts.

• NO console window is opened.
• Execution is asynchronous:
• When invoked from a console window, the script is merely launched and the prompt returns right away, whether the script is still running or not.
• Standard streams `sys.stdin`, `sys.stdout` and `sys.stderr` are NOT available.
• Caution: Unless you take extra steps, this has potentially unexpected side effects:
• Unhandled exceptions cause the script to abort silently.
• In Python 2.x, simply trying to use `print()` can cause that to happen (in 3.x, `print()` simply has no effect).
• Ad-hoc, you can use output redirection:Thanks, @handle.
`pythonw.exe yourScript.pyw 1>stdout.txt 2>stderr.txt`
(from PowerShell:
`cmd /c pythonw.exe yourScript.pyw 1>stdout.txt 2>stderr.txt`) to capture stdout and stderr output in files.
If you"re confident that use of `print()` is the only reason your script fails silently with `pythonw.exe`, and you"re not interested in stdout output, use @handle"s command from the comments:
`pythonw.exe yourScript.pyw 1>NUL 2>&1`
Caveat: This output redirection technique does not work when invoking `*.pyw` scripts directly (as opposed to by passing the script file path to `pythonw.exe`). See eryksun"s 2nd comment and its follow-ups below.

You can control which of the executables runs your script by default - such as when opened from Explorer - by choosing the right filename extension:

• `*.py` files are by default associated (invoked) with `python.exe`
• `*.pyw` files are by default associated (invoked) with `pythonw.exe`

The simple answer is because `3*0.1 != 0.3` due to quantization (roundoff) error (whereas `4*0.1 == 0.4` because multiplying by a power of two is usually an "exact" operation). Python tries to find the shortest string that would round to the desired value, so it can display `4*0.1` as `0.4` as these are equal, but it cannot display `3*0.1` as `0.3` because these are not equal.

You can use the `.hex` method in Python to view the internal representation of a number (basically, the exact binary floating point value, rather than the base-10 approximation). This can help to explain what"s going on under the hood.

``````>>> (0.1).hex()
"0x1.999999999999ap-4"
>>> (0.3).hex()
"0x1.3333333333333p-2"
>>> (0.1*3).hex()
"0x1.3333333333334p-2"
>>> (0.4).hex()
"0x1.999999999999ap-2"
>>> (0.1*4).hex()
"0x1.999999999999ap-2"
``````

0.1 is 0x1.999999999999a times 2^-4. The "a" at the end means the digit 10 - in other words, 0.1 in binary floating point is very slightly larger than the "exact" value of 0.1 (because the final 0x0.99 is rounded up to 0x0.a). When you multiply this by 4, a power of two, the exponent shifts up (from 2^-4 to 2^-2) but the number is otherwise unchanged, so `4*0.1 == 0.4`.

However, when you multiply by 3, the tiny little difference between 0x0.99 and 0x0.a0 (0x0.07) magnifies into a 0x0.15 error, which shows up as a one-digit error in the last position. This causes 0.1*3 to be very slightly larger than the rounded value of 0.3.

Python 3"s float `repr` is designed to be round-trippable, that is, the value shown should be exactly convertible into the original value (`float(repr(f)) == f` for all floats `f`). Therefore, it cannot display `0.3` and `0.1*3` exactly the same way, or the two different numbers would end up the same after round-tripping. Consequently, Python 3"s `repr` engine chooses to display one with a slight apparent error.

I recommend anytree (I am the author).

Example:

``````from anytree import Node, RenderTree

udo = Node("Udo")
marc = Node("Marc", parent=udo)
lian = Node("Lian", parent=marc)
dan = Node("Dan", parent=udo)
jet = Node("Jet", parent=dan)
jan = Node("Jan", parent=dan)
joe = Node("Joe", parent=dan)

print(udo)
Node("/Udo")
print(joe)
Node("/Udo/Dan/Joe")

for pre, fill, node in RenderTree(udo):
print("%s%s" % (pre, node.name))
Udo
‚îú‚îÄ‚îÄ Marc
‚îÇ   ‚îî‚îÄ‚îÄ Lian
‚îî‚îÄ‚îÄ Dan
‚îú‚îÄ‚îÄ Jet
‚îú‚îÄ‚îÄ Jan
‚îî‚îÄ‚îÄ Joe

print(dan.children)
(Node("/Udo/Dan/Jet"), Node("/Udo/Dan/Jan"), Node("/Udo/Dan/Joe"))
``````

anytree has also a powerful API with:

• simple tree creation
• simple tree modification
• pre-order tree iteration
• post-order tree iteration
• resolve relative and absolute node paths
• walking from one node to an other.
• tree rendering (see example above)
• node attach/detach hookups

# Distribution Fitting with Sum of Square Error (SSE)

This is an update and modification to Saullo"s answer, that uses the full list of the current `scipy.stats` distributions and returns the distribution with the least SSE between the distribution"s histogram and the data"s histogram.

## Example Fitting

Using the El Ni√±o dataset from `statsmodels`, the distributions are fit and error is determined. The distribution with the least error is returned.

### All Distributions ### Best Fit Distribution ### Example Code

``````%matplotlib inline

import warnings
import numpy as np
import pandas as pd
import scipy.stats as st
import statsmodels.api as sm
from scipy.stats._continuous_distns import _distn_names
import matplotlib
import matplotlib.pyplot as plt

matplotlib.rcParams["figure.figsize"] = (16.0, 12.0)
matplotlib.style.use("ggplot")

# Create models from data
def best_fit_distribution(data, bins=200, ax=None):
"""Model data by finding best fit distribution to data"""
# Get histogram of original data
y, x = np.histogram(data, bins=bins, density=True)
x = (x + np.roll(x, -1))[:-1] / 2.0

# Best holders
best_distributions = []

# Estimate distribution parameters from data
for ii, distribution in enumerate([d for d in _distn_names if not d in ["levy_stable", "studentized_range"]]):

print("{:>3} / {:<3}: {}".format( ii+1, len(_distn_names), distribution ))

distribution = getattr(st, distribution)

# Try to fit the distribution
try:
# Ignore warnings from data that can"t be fit
with warnings.catch_warnings():
warnings.filterwarnings("ignore")

# fit dist to data
params = distribution.fit(data)

# Separate parts of parameters
arg = params[:-2]
loc = params[-2]
scale = params[-1]

# Calculate fitted PDF and error with fit in distribution
pdf = distribution.pdf(x, loc=loc, scale=scale, *arg)
sse = np.sum(np.power(y - pdf, 2.0))

# if axis pass in add to plot
try:
if ax:
pd.Series(pdf, x).plot(ax=ax)
end
except Exception:
pass

# identify if this distribution is better
best_distributions.append((distribution, params, sse))

except Exception:
pass

return sorted(best_distributions, key=lambda x:x)

def make_pdf(dist, params, size=10000):
"""Generate distributions"s Probability Distribution Function """

# Separate parts of parameters
arg = params[:-2]
loc = params[-2]
scale = params[-1]

# Get sane start and end points of distribution
start = dist.ppf(0.01, *arg, loc=loc, scale=scale) if arg else dist.ppf(0.01, loc=loc, scale=scale)
end = dist.ppf(0.99, *arg, loc=loc, scale=scale) if arg else dist.ppf(0.99, loc=loc, scale=scale)

# Build PDF and turn into pandas Series
x = np.linspace(start, end, size)
y = dist.pdf(x, loc=loc, scale=scale, *arg)
pdf = pd.Series(y, x)

return pdf

# Load data from statsmodels datasets

# Plot for comparison
plt.figure(figsize=(12,8))
ax = data.plot(kind="hist", bins=50, density=True, alpha=0.5, color=list(matplotlib.rcParams["axes.prop_cycle"])["color"])

# Save plot limits
dataYLim = ax.get_ylim()

# Find best fit distribution
best_distibutions = best_fit_distribution(data, 200, ax)
best_dist = best_distibutions

# Update plots
ax.set_ylim(dataYLim)
ax.set_title(u"El Ni√±o sea temp.
All Fitted Distributions")
ax.set_xlabel(u"Temp (¬∞C)")
ax.set_ylabel("Frequency")

# Make PDF with best params
pdf = make_pdf(best_dist, best_dist)

# Display
plt.figure(figsize=(12,8))
ax = pdf.plot(lw=2, label="PDF", legend=True)
data.plot(kind="hist", bins=50, density=True, alpha=0.5, label="Data", legend=True, ax=ax)

param_names = (best_dist.shapes + ", loc, scale").split(", ") if best_dist.shapes else ["loc", "scale"]
param_str = ", ".join(["{}={:0.2f}".format(k,v) for k,v in zip(param_names, best_dist)])
dist_str = "{}({})".format(best_dist.name, param_str)

ax.set_title(u"El Ni√±o sea temp. with best fit distribution
" + dist_str)
ax.set_xlabel(u"Temp. (¬∞C)")
ax.set_ylabel("Frequency")
``````