PHP GetSizeOffset () Imagick function

getsize | PHP

Syntax:
int Imagick::getSizeOffset (void)
Parameters:This function takes no parameters.Return value:This function returns an integer value containing the offset of the size.Exceptions:This function throws an ImagickException on error.The following programs illustrate the Imagick::getSizeOffset() functionin PHP:Program 1:  
// Create a new imagick object $imagick = new Imagick ( ' https://media.engineerforengineer.org/wp-content/uploads/engineerforengineer-13.png ' );  
// Get dimension offset $sizeOffset = $imagick -> getSizeOffset(); echo $sizeOffset ; ?>
Output:
0
Program 2:  
// Create a new imagick object $imagick = new Imagick ( ' https://media.engineerforengineer.org/wp-content/uploads/engineerforengineer-13.png ' );  
// Set size offset $imagick -> setSizeOffset (100, 200, 25);  
// Get dimension offset $sizeOffset = $imagick -> getSizeOffset(); echo $sizeOffset ; ?>
Output:
25
Link: https:// www .php.net / manual / en / imagick.getsizeoffset.php



PHP GetSizeOffset () Imagick function: StackOverflow Questions

Answer #1

In Python, what is the purpose of __slots__ and what are the cases one should avoid this?

TLDR:

The special attribute __slots__ allows you to explicitly state which instance attributes you expect your object instances to have, with the expected results:

  1. faster attribute access.
  2. space savings in memory.

The space savings is from

  1. Storing value references in slots instead of __dict__.
  2. Denying __dict__ and __weakref__ creation if parent classes deny them and you declare __slots__.

Quick Caveats

Small caveat, you should only declare a particular slot one time in an inheritance tree. For example:

class Base:
    __slots__ = "foo", "bar"

class Right(Base):
    __slots__ = "baz", 

class Wrong(Base):
    __slots__ = "foo", "bar", "baz"        # redundant foo and bar

Python doesn"t object when you get this wrong (it probably should), problems might not otherwise manifest, but your objects will take up more space than they otherwise should. Python 3.8:

>>> from sys import getsizeof
>>> getsizeof(Right()), getsizeof(Wrong())
(56, 72)

This is because the Base"s slot descriptor has a slot separate from the Wrong"s. This shouldn"t usually come up, but it could:

>>> w = Wrong()
>>> w.foo = "foo"
>>> Base.foo.__get__(w)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AttributeError: foo
>>> Wrong.foo.__get__(w)
"foo"

The biggest caveat is for multiple inheritance - multiple "parent classes with nonempty slots" cannot be combined.

To accommodate this restriction, follow best practices: Factor out all but one or all parents" abstraction which their concrete class respectively and your new concrete class collectively will inherit from - giving the abstraction(s) empty slots (just like abstract base classes in the standard library).

See section on multiple inheritance below for an example.

Requirements:

  • To have attributes named in __slots__ to actually be stored in slots instead of a __dict__, a class must inherit from object (automatic in Python 3, but must be explicit in Python 2).

  • To prevent the creation of a __dict__, you must inherit from object and all classes in the inheritance must declare __slots__ and none of them can have a "__dict__" entry.

There are a lot of details if you wish to keep reading.

Why use __slots__: Faster attribute access.

The creator of Python, Guido van Rossum, states that he actually created __slots__ for faster attribute access.

It is trivial to demonstrate measurably significant faster access:

import timeit

class Foo(object): __slots__ = "foo",

class Bar(object): pass

slotted = Foo()
not_slotted = Bar()

def get_set_delete_fn(obj):
    def get_set_delete():
        obj.foo = "foo"
        obj.foo
        del obj.foo
    return get_set_delete

and

>>> min(timeit.repeat(get_set_delete_fn(slotted)))
0.2846834529991611
>>> min(timeit.repeat(get_set_delete_fn(not_slotted)))
0.3664822799983085

The slotted access is almost 30% faster in Python 3.5 on Ubuntu.

>>> 0.3664822799983085 / 0.2846834529991611
1.2873325658284342

In Python 2 on Windows I have measured it about 15% faster.

Why use __slots__: Memory Savings

Another purpose of __slots__ is to reduce the space in memory that each object instance takes up.

My own contribution to the documentation clearly states the reasons behind this:

The space saved over using __dict__ can be significant.

SQLAlchemy attributes a lot of memory savings to __slots__.

To verify this, using the Anaconda distribution of Python 2.7 on Ubuntu Linux, with guppy.hpy (aka heapy) and sys.getsizeof, the size of a class instance without __slots__ declared, and nothing else, is 64 bytes. That does not include the __dict__. Thank you Python for lazy evaluation again, the __dict__ is apparently not called into existence until it is referenced, but classes without data are usually useless. When called into existence, the __dict__ attribute is a minimum of 280 bytes additionally.

In contrast, a class instance with __slots__ declared to be () (no data) is only 16 bytes, and 56 total bytes with one item in slots, 64 with two.

For 64 bit Python, I illustrate the memory consumption in bytes in Python 2.7 and 3.6, for __slots__ and __dict__ (no slots defined) for each point where the dict grows in 3.6 (except for 0, 1, and 2 attributes):

       Python 2.7             Python 3.6
attrs  __slots__  __dict__*   __slots__  __dict__* | *(no slots defined)
none   16         56 + 272†   16         56 + 112† | †if __dict__ referenced
one    48         56 + 272    48         56 + 112
two    56         56 + 272    56         56 + 112
six    88         56 + 1040   88         56 + 152
11     128        56 + 1040   128        56 + 240
22     216        56 + 3344   216        56 + 408     
43     384        56 + 3344   384        56 + 752

So, in spite of smaller dicts in Python 3, we see how nicely __slots__ scale for instances to save us memory, and that is a major reason you would want to use __slots__.

Just for completeness of my notes, note that there is a one-time cost per slot in the class"s namespace of 64 bytes in Python 2, and 72 bytes in Python 3, because slots use data descriptors like properties, called "members".

>>> Foo.foo
<member "foo" of "Foo" objects>
>>> type(Foo.foo)
<class "member_descriptor">
>>> getsizeof(Foo.foo)
72

Demonstration of __slots__:

To deny the creation of a __dict__, you must subclass object. Everything subclasses object in Python 3, but in Python 2 you had to be explicit:

class Base(object): 
    __slots__ = ()

now:

>>> b = Base()
>>> b.a = "a"
Traceback (most recent call last):
  File "<pyshell#38>", line 1, in <module>
    b.a = "a"
AttributeError: "Base" object has no attribute "a"

Or subclass another class that defines __slots__

class Child(Base):
    __slots__ = ("a",)

and now:

c = Child()
c.a = "a"

but:

>>> c.b = "b"
Traceback (most recent call last):
  File "<pyshell#42>", line 1, in <module>
    c.b = "b"
AttributeError: "Child" object has no attribute "b"

To allow __dict__ creation while subclassing slotted objects, just add "__dict__" to the __slots__ (note that slots are ordered, and you shouldn"t repeat slots that are already in parent classes):

class SlottedWithDict(Child): 
    __slots__ = ("__dict__", "b")

swd = SlottedWithDict()
swd.a = "a"
swd.b = "b"
swd.c = "c"

and

>>> swd.__dict__
{"c": "c"}

Or you don"t even need to declare __slots__ in your subclass, and you will still use slots from the parents, but not restrict the creation of a __dict__:

class NoSlots(Child): pass
ns = NoSlots()
ns.a = "a"
ns.b = "b"

And:

>>> ns.__dict__
{"b": "b"}

However, __slots__ may cause problems for multiple inheritance:

class BaseA(object): 
    __slots__ = ("a",)

class BaseB(object): 
    __slots__ = ("b",)

Because creating a child class from parents with both non-empty slots fails:

>>> class Child(BaseA, BaseB): __slots__ = ()
Traceback (most recent call last):
  File "<pyshell#68>", line 1, in <module>
    class Child(BaseA, BaseB): __slots__ = ()
TypeError: Error when calling the metaclass bases
    multiple bases have instance lay-out conflict

If you run into this problem, You could just remove __slots__ from the parents, or if you have control of the parents, give them empty slots, or refactor to abstractions:

from abc import ABC

class AbstractA(ABC):
    __slots__ = ()

class BaseA(AbstractA): 
    __slots__ = ("a",)

class AbstractB(ABC):
    __slots__ = ()

class BaseB(AbstractB): 
    __slots__ = ("b",)

class Child(AbstractA, AbstractB): 
    __slots__ = ("a", "b")

c = Child() # no problem!

Add "__dict__" to __slots__ to get dynamic assignment:

class Foo(object):
    __slots__ = "bar", "baz", "__dict__"

and now:

>>> foo = Foo()
>>> foo.boink = "boink"

So with "__dict__" in slots we lose some of the size benefits with the upside of having dynamic assignment and still having slots for the names we do expect.

When you inherit from an object that isn"t slotted, you get the same sort of semantics when you use __slots__ - names that are in __slots__ point to slotted values, while any other values are put in the instance"s __dict__.

Avoiding __slots__ because you want to be able to add attributes on the fly is actually not a good reason - just add "__dict__" to your __slots__ if this is required.

You can similarly add __weakref__ to __slots__ explicitly if you need that feature.

Set to empty tuple when subclassing a namedtuple:

The namedtuple builtin make immutable instances that are very lightweight (essentially, the size of tuples) but to get the benefits, you need to do it yourself if you subclass them:

from collections import namedtuple
class MyNT(namedtuple("MyNT", "bar baz")):
    """MyNT is an immutable and lightweight object"""
    __slots__ = ()

usage:

>>> nt = MyNT("bar", "baz")
>>> nt.bar
"bar"
>>> nt.baz
"baz"

And trying to assign an unexpected attribute raises an AttributeError because we have prevented the creation of __dict__:

>>> nt.quux = "quux"
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AttributeError: "MyNT" object has no attribute "quux"

You can allow __dict__ creation by leaving off __slots__ = (), but you can"t use non-empty __slots__ with subtypes of tuple.

Biggest Caveat: Multiple inheritance

Even when non-empty slots are the same for multiple parents, they cannot be used together:

class Foo(object): 
    __slots__ = "foo", "bar"
class Bar(object):
    __slots__ = "foo", "bar" # alas, would work if empty, i.e. ()

>>> class Baz(Foo, Bar): pass
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: Error when calling the metaclass bases
    multiple bases have instance lay-out conflict

Using an empty __slots__ in the parent seems to provide the most flexibility, allowing the child to choose to prevent or allow (by adding "__dict__" to get dynamic assignment, see section above) the creation of a __dict__:

class Foo(object): __slots__ = ()
class Bar(object): __slots__ = ()
class Baz(Foo, Bar): __slots__ = ("foo", "bar")
b = Baz()
b.foo, b.bar = "foo", "bar"

You don"t have to have slots - so if you add them, and remove them later, it shouldn"t cause any problems.

Going out on a limb here: If you"re composing mixins or using abstract base classes, which aren"t intended to be instantiated, an empty __slots__ in those parents seems to be the best way to go in terms of flexibility for subclassers.

To demonstrate, first, let"s create a class with code we"d like to use under multiple inheritance

class AbstractBase:
    __slots__ = ()
    def __init__(self, a, b):
        self.a = a
        self.b = b
    def __repr__(self):
        return f"{type(self).__name__}({repr(self.a)}, {repr(self.b)})"

We could use the above directly by inheriting and declaring the expected slots:

class Foo(AbstractBase):
    __slots__ = "a", "b"

But we don"t care about that, that"s trivial single inheritance, we need another class we might also inherit from, maybe with a noisy attribute:

class AbstractBaseC:
    __slots__ = ()
    @property
    def c(self):
        print("getting c!")
        return self._c
    @c.setter
    def c(self, arg):
        print("setting c!")
        self._c = arg

Now if both bases had nonempty slots, we couldn"t do the below. (In fact, if we wanted, we could have given AbstractBase nonempty slots a and b, and left them out of the below declaration - leaving them in would be wrong):

class Concretion(AbstractBase, AbstractBaseC):
    __slots__ = "a b _c".split()

And now we have functionality from both via multiple inheritance, and can still deny __dict__ and __weakref__ instantiation:

>>> c = Concretion("a", "b")
>>> c.c = c
setting c!
>>> c.c
getting c!
Concretion("a", "b")
>>> c.d = "d"
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AttributeError: "Concretion" object has no attribute "d"

Other cases to avoid slots:

  • Avoid them when you want to perform __class__ assignment with another class that doesn"t have them (and you can"t add them) unless the slot layouts are identical. (I am very interested in learning who is doing this and why.)
  • Avoid them if you want to subclass variable length builtins like long, tuple, or str, and you want to add attributes to them.
  • Avoid them if you insist on providing default values via class attributes for instance variables.

You may be able to tease out further caveats from the rest of the __slots__ documentation (the 3.7 dev docs are the most current), which I have made significant recent contributions to.

Critiques of other answers

The current top answers cite outdated information and are quite hand-wavy and miss the mark in some important ways.

Do not "only use __slots__ when instantiating lots of objects"

I quote:

"You would want to use __slots__ if you are going to instantiate a lot (hundreds, thousands) of objects of the same class."

Abstract Base Classes, for example, from the collections module, are not instantiated, yet __slots__ are declared for them.

Why?

If a user wishes to deny __dict__ or __weakref__ creation, those things must not be available in the parent classes.

__slots__ contributes to reusability when creating interfaces or mixins.

It is true that many Python users aren"t writing for reusability, but when you are, having the option to deny unnecessary space usage is valuable.

__slots__ doesn"t break pickling

When pickling a slotted object, you may find it complains with a misleading TypeError:

>>> pickle.loads(pickle.dumps(f))
TypeError: a class that defines __slots__ without defining __getstate__ cannot be pickled

This is actually incorrect. This message comes from the oldest protocol, which is the default. You can select the latest protocol with the -1 argument. In Python 2.7 this would be 2 (which was introduced in 2.3), and in 3.6 it is 4.

>>> pickle.loads(pickle.dumps(f, -1))
<__main__.Foo object at 0x1129C770>

in Python 2.7:

>>> pickle.loads(pickle.dumps(f, 2))
<__main__.Foo object at 0x1129C770>

in Python 3.6

>>> pickle.loads(pickle.dumps(f, 4))
<__main__.Foo object at 0x1129C770>

So I would keep this in mind, as it is a solved problem.

Critique of the (until Oct 2, 2016) accepted answer

The first paragraph is half short explanation, half predictive. Here"s the only part that actually answers the question

The proper use of __slots__ is to save space in objects. Instead of having a dynamic dict that allows adding attributes to objects at anytime, there is a static structure which does not allow additions after creation. This saves the overhead of one dict for every object that uses slots

The second half is wishful thinking, and off the mark:

While this is sometimes a useful optimization, it would be completely unnecessary if the Python interpreter was dynamic enough so that it would only require the dict when there actually were additions to the object.

Python actually does something similar to this, only creating the __dict__ when it is accessed, but creating lots of objects with no data is fairly ridiculous.

The second paragraph oversimplifies and misses actual reasons to avoid __slots__. The below is not a real reason to avoid slots (for actual reasons, see the rest of my answer above.):

They change the behavior of the objects that have slots in a way that can be abused by control freaks and static typing weenies.

It then goes on to discuss other ways of accomplishing that perverse goal with Python, not discussing anything to do with __slots__.

The third paragraph is more wishful thinking. Together it is mostly off-the-mark content that the answerer didn"t even author and contributes to ammunition for critics of the site.

Memory usage evidence

Create some normal objects and slotted objects:

>>> class Foo(object): pass
>>> class Bar(object): __slots__ = ()

Instantiate a million of them:

>>> foos = [Foo() for f in xrange(1000000)]
>>> bars = [Bar() for b in xrange(1000000)]

Inspect with guppy.hpy().heap():

>>> guppy.hpy().heap()
Partition of a set of 2028259 objects. Total size = 99763360 bytes.
 Index  Count   %     Size   % Cumulative  % Kind (class / dict of class)
     0 1000000  49 64000000  64  64000000  64 __main__.Foo
     1     169   0 16281480  16  80281480  80 list
     2 1000000  49 16000000  16  96281480  97 __main__.Bar
     3   12284   1   987472   1  97268952  97 str
...

Access the regular objects and their __dict__ and inspect again:

>>> for f in foos:
...     f.__dict__
>>> guppy.hpy().heap()
Partition of a set of 3028258 objects. Total size = 379763480 bytes.
 Index  Count   %      Size    % Cumulative  % Kind (class / dict of class)
     0 1000000  33 280000000  74 280000000  74 dict of __main__.Foo
     1 1000000  33  64000000  17 344000000  91 __main__.Foo
     2     169   0  16281480   4 360281480  95 list
     3 1000000  33  16000000   4 376281480  99 __main__.Bar
     4   12284   0    987472   0 377268952  99 str
...

This is consistent with the history of Python, from Unifying types and classes in Python 2.2

If you subclass a built-in type, extra space is automatically added to the instances to accomodate __dict__ and __weakrefs__. (The __dict__ is not initialized until you use it though, so you shouldn"t worry about the space occupied by an empty dictionary for each instance you create.) If you don"t need this extra space, you can add the phrase "__slots__ = []" to your class.

Answer #2

How do I determine the size of an object in Python?

The answer, "Just use sys.getsizeof", is not a complete answer.

That answer does work for builtin objects directly, but it does not account for what those objects may contain, specifically, what types, such as custom objects, tuples, lists, dicts, and sets contain. They can contain instances each other, as well as numbers, strings and other objects.

A More Complete Answer

Using 64-bit Python 3.6 from the Anaconda distribution, with sys.getsizeof, I have determined the minimum size of the following objects, and note that sets and dicts preallocate space so empty ones don"t grow again until after a set amount (which may vary by implementation of the language):

Python 3:

Empty
Bytes  type        scaling notes
28     int         +4 bytes about every 30 powers of 2
37     bytes       +1 byte per additional byte
49     str         +1-4 per additional character (depending on max width)
48     tuple       +8 per additional item
64     list        +8 for each additional
224    set         5th increases to 736; 21nd, 2272; 85th, 8416; 341, 32992
240    dict        6th increases to 368; 22nd, 1184; 43rd, 2280; 86th, 4704; 171st, 9320
136    func def    does not include default args and other attrs
1056   class def   no slots 
56     class inst  has a __dict__ attr, same scaling as dict above
888    class def   with slots
16     __slots__   seems to store in mutable tuple-like structure
                   first slot grows to 48, and so on.

How do you interpret this? Well say you have a set with 10 items in it. If each item is 100 bytes each, how big is the whole data structure? The set is 736 itself because it has sized up one time to 736 bytes. Then you add the size of the items, so that"s 1736 bytes in total

Some caveats for function and class definitions:

Note each class definition has a proxy __dict__ (48 bytes) structure for class attrs. Each slot has a descriptor (like a property) in the class definition.

Slotted instances start out with 48 bytes on their first element, and increase by 8 each additional. Only empty slotted objects have 16 bytes, and an instance with no data makes very little sense.

Also, each function definition has code objects, maybe docstrings, and other possible attributes, even a __dict__.

Also note that we use sys.getsizeof() because we care about the marginal space usage, which includes the garbage collection overhead for the object, from the docs:

getsizeof() calls the object’s __sizeof__ method and adds an additional garbage collector overhead if the object is managed by the garbage collector.

Also note that resizing lists (e.g. repetitively appending to them) causes them to preallocate space, similarly to sets and dicts. From the listobj.c source code:

    /* This over-allocates proportional to the list size, making room
     * for additional growth.  The over-allocation is mild, but is
     * enough to give linear-time amortized behavior over a long
     * sequence of appends() in the presence of a poorly-performing
     * system realloc().
     * The growth pattern is:  0, 4, 8, 16, 25, 35, 46, 58, 72, 88, ...
     * Note: new_allocated won"t overflow because the largest possible value
     *       is PY_SSIZE_T_MAX * (9 / 8) + 6 which always fits in a size_t.
     */
    new_allocated = (size_t)newsize + (newsize >> 3) + (newsize < 9 ? 3 : 6);

Historical data

Python 2.7 analysis, confirmed with guppy.hpy and sys.getsizeof:

Bytes  type        empty + scaling notes
24     int         NA
28     long        NA
37     str         + 1 byte per additional character
52     unicode     + 4 bytes per additional character
56     tuple       + 8 bytes per additional item
72     list        + 32 for first, 8 for each additional
232    set         sixth item increases to 744; 22nd, 2280; 86th, 8424
280    dict        sixth item increases to 1048; 22nd, 3352; 86th, 12568 *
120    func def    does not include default args and other attrs
64     class inst  has a __dict__ attr, same scaling as dict above
16     __slots__   class with slots has no dict, seems to store in 
                    mutable tuple-like structure.
904    class def   has a proxy __dict__ structure for class attrs
104    old class   makes sense, less stuff, has real dict though.

Note that dictionaries (but not sets) got a more compact representation in Python 3.6

I think 8 bytes per additional item to reference makes a lot of sense on a 64 bit machine. Those 8 bytes point to the place in memory the contained item is at. The 4 bytes are fixed width for unicode in Python 2, if I recall correctly, but in Python 3, str becomes a unicode of width equal to the max width of the characters.

And for more on slots, see this answer.

A More Complete Function

We want a function that searches the elements in lists, tuples, sets, dicts, obj.__dict__"s, and obj.__slots__, as well as other things we may not have yet thought of.

We want to rely on gc.get_referents to do this search because it works at the C level (making it very fast). The downside is that get_referents can return redundant members, so we need to ensure we don"t double count.

Classes, modules, and functions are singletons - they exist one time in memory. We"re not so interested in their size, as there"s not much we can do about them - they"re a part of the program. So we"ll avoid counting them if they happen to be referenced.

We"re going to use a blacklist of types so we don"t include the entire program in our size count.

import sys
from types import ModuleType, FunctionType
from gc import get_referents

# Custom objects know their class.
# Function objects seem to know way too much, including modules.
# Exclude modules as well.
BLACKLIST = type, ModuleType, FunctionType


def getsize(obj):
    """sum size of object & members."""
    if isinstance(obj, BLACKLIST):
        raise TypeError("getsize() does not take argument of type: "+ str(type(obj)))
    seen_ids = set()
    size = 0
    objects = [obj]
    while objects:
        need_referents = []
        for obj in objects:
            if not isinstance(obj, BLACKLIST) and id(obj) not in seen_ids:
                seen_ids.add(id(obj))
                size += sys.getsizeof(obj)
                need_referents.append(obj)
        objects = get_referents(*need_referents)
    return size

To contrast this with the following whitelisted function, most objects know how to traverse themselves for the purposes of garbage collection (which is approximately what we"re looking for when we want to know how expensive in memory certain objects are. This functionality is used by gc.get_referents.) However, this measure is going to be much more expansive in scope than we intended if we are not careful.

For example, functions know quite a lot about the modules they are created in.

Another point of contrast is that strings that are keys in dictionaries are usually interned so they are not duplicated. Checking for id(key) will also allow us to avoid counting duplicates, which we do in the next section. The blacklist solution skips counting keys that are strings altogether.

Whitelisted Types, Recursive visitor

To cover most of these types myself, instead of relying on the gc module, I wrote this recursive function to try to estimate the size of most Python objects, including most builtins, types in the collections module, and custom types (slotted and otherwise).

This sort of function gives much more fine-grained control over the types we"re going to count for memory usage, but has the danger of leaving important types out:

import sys
from numbers import Number
from collections import deque
from collections.abc import Set, Mapping


ZERO_DEPTH_BASES = (str, bytes, Number, range, bytearray)


def getsize(obj_0):
    """Recursively iterate to sum size of object & members."""
    _seen_ids = set()
    def inner(obj):
        obj_id = id(obj)
        if obj_id in _seen_ids:
            return 0
        _seen_ids.add(obj_id)
        size = sys.getsizeof(obj)
        if isinstance(obj, ZERO_DEPTH_BASES):
            pass # bypass remaining control flow and return
        elif isinstance(obj, (tuple, list, Set, deque)):
            size += sum(inner(i) for i in obj)
        elif isinstance(obj, Mapping) or hasattr(obj, "items"):
            size += sum(inner(k) + inner(v) for k, v in getattr(obj, "items")())
        # Check for custom object instances - may subclass above too
        if hasattr(obj, "__dict__"):
            size += inner(vars(obj))
        if hasattr(obj, "__slots__"): # can have __slots__ with __dict__
            size += sum(inner(getattr(obj, s)) for s in obj.__slots__ if hasattr(obj, s))
        return size
    return inner(obj_0)

And I tested it rather casually (I should unittest it):

>>> getsize(["a", tuple("bcd"), Foo()])
344
>>> getsize(Foo())
16
>>> getsize(tuple("bcd"))
194
>>> getsize(["a", tuple("bcd"), Foo(), {"foo": "bar", "baz": "bar"}])
752
>>> getsize({"foo": "bar", "baz": "bar"})
400
>>> getsize({})
280
>>> getsize({"foo":"bar"})
360
>>> getsize("foo")
40
>>> class Bar():
...     def baz():
...         pass
>>> getsize(Bar())
352
>>> getsize(Bar().__dict__)
280
>>> sys.getsizeof(Bar())
72
>>> getsize(Bar.__dict__)
872
>>> sys.getsizeof(Bar.__dict__)
280

This implementation breaks down on class definitions and function definitions because we don"t go after all of their attributes, but since they should only exist once in memory for the process, their size really doesn"t matter too much.

Answer #3

When you write [None] * 10, Python knows that it will need a list of exactly 10 objects, so it allocates exactly that.

When you use a list comprehension, Python doesn"t know how much it will need. So it gradually grows the list as elements are added. For each reallocation it allocates more room than is immediately needed, so that it doesn"t have to reallocate for each element. The resulting list is likely to be somewhat bigger than needed.

You can see this behavior when comparing lists created with similar sizes:

>>> sys.getsizeof([None]*15)
184
>>> sys.getsizeof([None]*16)
192
>>> sys.getsizeof([None for _ in range(15)])
192
>>> sys.getsizeof([None for _ in range(16)])
192
>>> sys.getsizeof([None for _ in range(17)])
264

You can see that the first method allocates just what is needed, while the second one grows periodically. In this example, it allocates enough for 16 elements, and had to reallocate when reaching the 17th.

Answer #4

I assume you"re using CPython and with 64bits (I got the same results on my CPython 2.7 64-bit). There could be differences in other Python implementations or if you have a 32bit Python.

Regardless of the implementation, lists are variable-sized while tuples are fixed-size.

So tuples can store the elements directly inside the struct, lists on the other hand need a layer of indirection (it stores a pointer to the elements). This layer of indirection is a pointer, on 64bit systems that"s 64bit, hence 8bytes.

But there"s another thing that lists do: They over-allocate. Otherwise list.append would be an O(n) operation always - to make it amortized O(1) (much faster!!!) it over-allocates. But now it has to keep track of the allocated size and the filled size (tuples only need to store one size, because allocated and filled size are always identical). That means each list has to store another "size" which on 64bit systems is a 64bit integer, again 8 bytes.

So lists need at least 16 bytes more memory than tuples. Why did I say "at least"? Because of the over-allocation. Over-allocation means it allocates more space than needed. However, the amount of over-allocation depends on "how" you create the list and the append/deletion history:

>>> l = [1,2,3]
>>> l.__sizeof__()
64
>>> l.append(4)  # triggers re-allocation (with over-allocation), because the original list is full
>>> l.__sizeof__()
96

>>> l = []
>>> l.__sizeof__()
40
>>> l.append(1)  # re-allocation with over-allocation
>>> l.__sizeof__()
72
>>> l.append(2)  # no re-alloc
>>> l.append(3)  # no re-alloc
>>> l.__sizeof__()
72
>>> l.append(4)  # still has room, so no over-allocation needed (yet)
>>> l.__sizeof__()
72

Images

I decided to create some images to accompany the explanation above. Maybe these are helpful

This is how it (schematically) is stored in memory in your example. I highlighted the differences with red (free-hand) cycles:

enter image description here

That"s actually just an approximation because int objects are also Python objects and CPython even reuses small integers, so a probably more accurate representation (although not as readable) of the objects in memory would be:

enter image description here

Useful links:

Note that __sizeof__ doesn"t really return the "correct" size! It only returns the size of the stored values. However when you use sys.getsizeof the result is different:

>>> import sys
>>> l = [1,2,3]
>>> t = (1, 2, 3)
>>> sys.getsizeof(l)
88
>>> sys.getsizeof(t)
72

There are 24 "extra" bytes. These are real, that"s the garbage collector overhead that isn"t accounted for in the __sizeof__ method. That"s because you"re generally not supposed to use magic methods directly - use the functions that know how to handle them, in this case: sys.getsizeof (which actually adds the GC overhead to the value returned from __sizeof__).

Answer #5

Overview

The question has been addressed. However, this answer adds some practical examples to aid in the basic understanding of dataclasses.

What exactly are python data classes and when is it best to use them?

  1. code generators: generate boilerplate code; you can choose to implement special methods in a regular class or have a dataclass implement them automatically.
  2. data containers: structures that hold data (e.g. tuples and dicts), often with dotted, attribute access such as classes, namedtuple and others.

"mutable namedtuples with default[s]"

Here is what the latter phrase means:

  • mutable: by default, dataclass attributes can be reassigned. You can optionally make them immutable (see Examples below).
  • namedtuple: you have dotted, attribute access like a namedtuple or a regular class.
  • default: you can assign default values to attributes.

Compared to common classes, you primarily save on typing boilerplate code.


Features

This is an overview of dataclass features (TL;DR? See the Summary Table in the next section).

What you get

Here are features you get by default from dataclasses.

Attributes + Representation + Comparison

import dataclasses


@dataclasses.dataclass
#@dataclasses.dataclass()                                       # alternative
class Color:
    r : int = 0
    g : int = 0
    b : int = 0

These defaults are provided by automatically setting the following keywords to True:

@dataclasses.dataclass(init=True, repr=True, eq=True)

What you can turn on

Additional features are available if the appropriate keywords are set to True.

Order

@dataclasses.dataclass(order=True)
class Color:
    r : int = 0
    g : int = 0
    b : int = 0

The ordering methods are now implemented (overloading operators: < > <= >=), similarly to functools.total_ordering with stronger equality tests.

Hashable, Mutable

@dataclasses.dataclass(unsafe_hash=True)                        # override base `__hash__`
class Color:
    ...

Although the object is potentially mutable (possibly undesired), a hash is implemented.

Hashable, Immutable

@dataclasses.dataclass(frozen=True)                             # `eq=True` (default) to be immutable 
class Color:
    ...

A hash is now implemented and changing the object or assigning to attributes is disallowed.

Overall, the object is hashable if either unsafe_hash=True or frozen=True.

See also the original hashing logic table with more details.

What you don"t get

To get the following features, special methods must be manually implemented:

Unpacking

@dataclasses.dataclass
class Color:
    r : int = 0
    g : int = 0
    b : int = 0

    def __iter__(self):
        yield from dataclasses.astuple(self)

Optimization

@dataclasses.dataclass
class SlottedColor:
    __slots__ = ["r", "b", "g"]
    r : int
    g : int
    b : int

The object size is now reduced:

>>> imp sys
>>> sys.getsizeof(Color)
1056
>>> sys.getsizeof(SlottedColor)
888

In some circumstances, __slots__ also improves the speed of creating instances and accessing attributes. Also, slots do not allow default assignments; otherwise, a ValueError is raised.

See more on slots in this blog post.


Summary Table

+----------------------+----------------------+----------------------------------------------------+-----------------------------------------+
|       Feature        |       Keyword        |                      Example                       |           Implement in a Class          |
+----------------------+----------------------+----------------------------------------------------+-----------------------------------------+
| Attributes           |  init                |  Color().r -> 0                                    |  __init__                               |
| Representation       |  repr                |  Color() -> Color(r=0, g=0, b=0)                   |  __repr__                               |
| Comparision*         |  eq                  |  Color() == Color(0, 0, 0) -> True                 |  __eq__                                 |
|                      |                      |                                                    |                                         |
| Order                |  order               |  sorted([Color(0, 50, 0), Color()]) -> ...         |  __lt__, __le__, __gt__, __ge__         |
| Hashable             |  unsafe_hash/frozen  |  {Color(), {Color()}} -> {Color(r=0, g=0, b=0)}    |  __hash__                               |
| Immutable            |  frozen + eq         |  Color().r = 10 -> TypeError                       |  __setattr__, __delattr__               |
|                      |                      |                                                    |                                         |
| Unpacking+           |  -                   |  r, g, b = Color()                                 |   __iter__                              |
| Optimization+        |  -                   |  sys.getsizeof(SlottedColor) -> 888                |  __slots__                              |
+----------------------+----------------------+----------------------------------------------------+-----------------------------------------+

+These methods are not automatically generated and require manual implementation in a dataclass.

* __ne__ is not needed and thus not implemented.


Additional features

Post-initialization

@dataclasses.dataclass
class RGBA:
    r : int = 0
    g : int = 0
    b : int = 0
    a : float = 1.0

    def __post_init__(self):
        self.a : int =  int(self.a * 255)


RGBA(127, 0, 255, 0.5)
# RGBA(r=127, g=0, b=255, a=127)

Inheritance

@dataclasses.dataclass
class RGBA(Color):
    a : int = 0

Conversions

Convert a dataclass to a tuple or a dict, recursively:

>>> dataclasses.astuple(Color(128, 0, 255))
(128, 0, 255)
>>> dataclasses.asdict(Color(128, 0, 255))
{"r": 128, "g": 0, "b": 255}

Limitations


References

  • R. Hettinger"s talk on Dataclasses: The code generator to end all code generators
  • T. Hunner"s talk on Easier Classes: Python Classes Without All the Cruft
  • Python"s documentation on hashing details
  • Real Python"s guide on The Ultimate Guide to Data Classes in Python 3.7
  • A. Shaw"s blog post on A brief tour of Python 3.7 data classes
  • E. Smith"s github repository on dataclasses

Answer #6

Reducing memory usage in Python is difficult, because Python does not actually release memory back to the operating system. If you delete objects, then the memory is available to new Python objects, but not free()"d back to the system (see this question).

If you stick to numeric numpy arrays, those are freed, but boxed objects are not.

>>> import os, psutil, numpy as np # psutil may need to be installed
>>> def usage():
...     process = psutil.Process(os.getpid())
...     return process.memory_info()[0] / float(2 ** 20)
... 
>>> usage() # initial memory usage
27.5 

>>> arr = np.arange(10 ** 8) # create a large array without boxing
>>> usage()
790.46875
>>> del arr
>>> usage()
27.52734375 # numpy just free()"d the array

>>> arr = np.arange(10 ** 8, dtype="O") # create lots of objects
>>> usage()
3135.109375
>>> del arr
>>> usage()
2372.16796875  # numpy frees the array, but python keeps the heap big

Reducing the Number of Dataframes

Python keep our memory at high watermark, but we can reduce the total number of dataframes we create. When modifying your dataframe, prefer inplace=True, so you don"t create copies.

Another common gotcha is holding on to copies of previously created dataframes in ipython:

In [1]: import pandas as pd

In [2]: df = pd.DataFrame({"foo": [1,2,3,4]})

In [3]: df + 1
Out[3]: 
   foo
0    2
1    3
2    4
3    5

In [4]: df + 2
Out[4]: 
   foo
0    3
1    4
2    5
3    6

In [5]: Out # Still has all our temporary DataFrame objects!
Out[5]: 
{3:    foo
 0    2
 1    3
 2    4
 3    5, 4:    foo
 0    3
 1    4
 2    5
 3    6}

You can fix this by typing %reset Out to clear your history. Alternatively, you can adjust how much history ipython keeps with ipython --cache-size=5 (default is 1000).

Reducing Dataframe Size

Wherever possible, avoid using object dtypes.

>>> df.dtypes
foo    float64 # 8 bytes per value
bar      int64 # 8 bytes per value
baz     object # at least 48 bytes per value, often more

Values with an object dtype are boxed, which means the numpy array just contains a pointer and you have a full Python object on the heap for every value in your dataframe. This includes strings.

Whilst numpy supports fixed-size strings in arrays, pandas does not (it"s caused user confusion). This can make a significant difference:

>>> import numpy as np
>>> arr = np.array(["foo", "bar", "baz"])
>>> arr.dtype
dtype("S3")
>>> arr.nbytes
9

>>> import sys; import pandas as pd
>>> s = pd.Series(["foo", "bar", "baz"])
dtype("O")
>>> sum(sys.getsizeof(x) for x in s)
120

You may want to avoid using string columns, or find a way of representing string data as numbers.

If you have a dataframe that contains many repeated values (NaN is very common), then you can use a sparse data structure to reduce memory usage:

>>> df1.info()
<class "pandas.core.frame.DataFrame">
Int64Index: 39681584 entries, 0 to 39681583
Data columns (total 1 columns):
foo    float64
dtypes: float64(1)
memory usage: 605.5 MB

>>> df1.shape
(39681584, 1)

>>> df1.foo.isnull().sum() * 100. / len(df1)
20.628483479893344 # so 20% of values are NaN

>>> df1.to_sparse().info()
<class "pandas.sparse.frame.SparseDataFrame">
Int64Index: 39681584 entries, 0 to 39681583
Data columns (total 1 columns):
foo    float64
dtypes: float64(1)
memory usage: 543.0 MB

Viewing Memory Usage

You can view the memory usage (docs):

>>> df.info()
<class "pandas.core.frame.DataFrame">
Int64Index: 39681584 entries, 0 to 39681583
Data columns (total 14 columns):
...
dtypes: datetime64[ns](1), float64(8), int64(1), object(4)
memory usage: 4.4+ GB

As of pandas 0.17.1, you can also do df.info(memory_usage="deep") to see memory usage including objects.

Answer #7

The Pympler package"s asizeof module can do this.

Use as follows:

from pympler import asizeof
asizeof.asizeof(my_object)

Unlike sys.getsizeof, it works for your self-created objects. It even works with numpy.

>>> asizeof.asizeof(tuple("bcd"))
200
>>> asizeof.asizeof({"foo": "bar", "baz": "bar"})
400
>>> asizeof.asizeof({})
280
>>> asizeof.asizeof({"foo":"bar"})
360
>>> asizeof.asizeof("foo")
40
>>> asizeof.asizeof(Bar())
352
>>> asizeof.asizeof(Bar().__dict__)
280
>>> A = rand(10)
>>> B = rand(10000)
>>> asizeof.asizeof(A)
176
>>> asizeof.asizeof(B)
80096

As mentioned,

The (byte)code size of objects like classes, functions, methods, modules, etc. can be included by setting option code=True.

And if you need other view on live data, Pympler"s

module muppy is used for on-line monitoring of a Python application and module Class Tracker provides off-line analysis of the lifetime of selected Python objects.

Answer #8

Here"s a comparison of the different methods - sys.getsizeof(df) is simplest.

For this example, df is a dataframe with 814 rows, 11 columns (2 ints, 9 objects) - read from a 427kb shapefile

sys.getsizeof(df)

>>> import sys
>>> sys.getsizeof(df)
(gives results in bytes)
462456

df.memory_usage()

>>> df.memory_usage()
...
(lists each column at 8 bytes/row)

>>> df.memory_usage().sum()
71712
(roughly rows * cols * 8 bytes)

>>> df.memory_usage(deep=True)
(lists each column"s full memory usage)

>>> df.memory_usage(deep=True).sum()
(gives results in bytes)
462432

df.info()

Prints dataframe info to stdout. Technically these are kibibytes (KiB), not kilobytes - as the docstring says, "Memory usage is shown in human-readable units (base-2 representation)." So to get bytes would multiply by 1024, e.g. 451.6 KiB = 462,438 bytes.

>>> df.info()
...
memory usage: 70.0+ KB

>>> df.info(memory_usage="deep")
...
memory usage: 451.6 KB

Answer #9

[*a] is internally doing the C equivalent of:

  1. Make a new, empty list
  2. Call newlist.extend(a)
  3. Returns list.

So if you expand your test to:

from sys import getsizeof

for n in range(13):
    a = [None] * n
    l = []
    l.extend(a)
    print(n, getsizeof(list(a)),
             getsizeof([x for x in a]),
             getsizeof([*a]),
             getsizeof(l))

Try it online!

you"ll see the results for getsizeof([*a]) and l = []; l.extend(a); getsizeof(l) are the same.

This is usually the right thing to do; when extending you"re usually expecting to add more later, and similarly for generalized unpacking, it"s assumed that multiple things will be added one after the other. [*a] is not the normal case; Python assumes there are multiple items or iterables being added to the list ([*a, b, c, *d]), so overallocation saves work in the common case.

By contrast, a list constructed from a single, presized iterable (with list()) may not grow or shrink during use, and overallocating is premature until proven otherwise; Python recently fixed a bug that made the constructor overallocate even for inputs with known size.

As for list comprehensions, they"re effectively equivalent to repeated appends, so you"re seeing the final result of the normal overallocation growth pattern when adding an element at a time.

To be clear, none of this is a language guarantee. It"s just how CPython implements it. The Python language spec is generally unconcerned with specific growth patterns in list (aside from guaranteeing amortized O(1) appends and pops from the end). As noted in the comments, the specific implementation changes again in 3.9; while it won"t affect [*a], it could affect other cases where what used to be "build a temporary tuple of individual items and then extend with the tuple" now becomes multiple applications of LIST_APPEND, which can change when the overallocation occurs and what numbers go into the calculation.

Answer #10

Using os.path.getsize:

>>> import os
>>> b = os.path.getsize("/path/isa_005.mp3")
>>> b
2071611

The output is in bytes.