Change language

Python lists remember what you did to them

Python lists remember what you did to them

as any python programmer hopefully knows a list uses memory sys.getsizeof will tell us how many bytes python thinks this object takes this list takes 120 bytes generally speaking a bigger list will take more memory than a smaller one but did you know that two equal lists can actually take up different amounts of memory both of these lists contain just three zeros and they compare equal in fact the sizes of the lists are not at all what determines how much memory they take here i have an eight element list taking up the same 120 bytes that the three element list takes up and the seven element list takes up even less space than the three element list and it just keeps going all three of these lists contain three zeros theyre all equal to each other and they all take different amounts of memory so what on earth is going on here how could this list take 50 percent more memory than this list when they contain identical elements and compare equal okay lets investigate maybe theres some kind of caching going on where the first list to be created somehow just takes more memory nope thats not it if we reorder them their sizes remain the same does anything change if i use variables instead of literals nope same answers was there something special about a length of 3 well we get different sizes because this is a bigger list but theyre still all different answers what if i extract the common element out into a variable interestingly this actually does change the answer for the first list using literal zeros gave 120 but using a variable e gave 80. what about the size of the object this number is much larger and will take more memory than zero but we dont see any increase in the size of the list what if i use a really big object clearly the size of this element e does not play a role in whatever getsizeof is returning what if after the lists are created we change their contents nope still the same numbers as in the original example what if we add more elements now were getting somewhere two of the lists became the same size and if we add a bunch of elements then they all become equal and if we shrink them all back down to three elements then they remain all equal in size but if we start them at 1000 and see 8056 and then shrink them just a little bit we still see 8056.

its almost like the lists remember whats happened to them before but then if you change them enough then they forget okay enough anthropomorphizing lists whats actually going on here in order to understand whats going on we really have to understand what get size of is actually returning to us what getsize of returns is supposed to be the size of the object in bytes as it would be represented in memory on your computer and as we already saw getsizeof only counts the size of the object itself not any sub-objects so to understand what getsizeof is actually returning to us we need to understand what list actually looks like under the hood lets build up what the internal structure of a list looks like first we have the object reference count and a pointer to the objects type under the hood every object in python starts with these two fields a list is also a variable sized object so it contains an object size field this contains the length of the list then we have two list specific fields this ob item points to the contiguous memory where all the actual data for the list is stored the final field is allocated its an integer telling us how many elements could fit into the memory that ob item points to so notice ob size tells us how many elements are in the memory and allocated tells us how many elements could fit in the memory and this distinction is the key to understanding what get size of returns the value that get size of returns is based off of this allocated number not the length of the list if you think about it whether you put python objects in this memory or not once its allocated youre using it so that something that gets size of should be counting and the last piece of the puzzle is that for garbage collection purposes python stores two pointers just before the object starts under the hood this is what a python lists internal structure looks like heres how we actually calculate the value the getsizeof returns first is the static part of the list each of these fields takes up probably eight bytes on your machine if we ask for the size of an empty list we see that its 56 that corresponds to seven fields times eight bytes apiece is 56 bytes for the variable part of the list python stores a pointer to each of the objects in the list so the variable size is just the size of a pointer which is another eight bytes times the number of slots for those pointers that have been allocated and again in this calculation its using allocated which is the number of elements that could fit in the memory not ob size which contains the actual number of elements that are stored in the list to better understand whats going on lets solve for the allocated variable and then examine it in most programming languages this would be referred to as the capacity of the list its how many total elements could fit into this list before i need to get more memory lets start with the empty list the capacity of the empty list is zero meaning when you make an empty list python has not allocated any extra space for the list if you want to start adding elements in it will first need to allocate memory now lets try repeatedly appending elements to x and seeing how the capacity changes after we first add an element in the capacity jumps up to 4 then it remains at 4 until we hit that capacity then the capacity jumps to 8 and remains until we hit 8 then 16 and remains until we hit 16 so it looks like its always a multiple of four and so far its been doubling but once we get to 16 then it jumps up to 24. so it stops doubling every time then from 24 jumps to 32 and from 32 it jumps to 40 and 40 to 52 then to 64 76 92 and 108. basically whats happening here is whenever we hit a capacity limit python must allocate more memory and then it has to copy all of the old data into the new memory copying can be expensive and more expensive the longer the list is thats why theres a difference between the capacity and the length at all if youre running a business and youre renting out space for your employees to work in say your current building could fit 20 employees well your business is expanding and youre looking to hire your 21st employee are you going to go out and find a building that can fit 21 people no youre expecting to continue growing and it would be a complete waste to get a building with 21 people and then have to get a whole new building for your 22nd employee instead youll probably rent a room for maybe 30 employees or 40 employees you dont want to go crazy and rent a room for a thousand employees but a fixed percentage bigger than your current capacity is something that makes sense were doing the exact same thing here with the memory in the list this kind of proportional or geometric resizing is something that you see in almost every programming language its used in pythons list c buses vector and javas arraylist it turns out that python resizing uses a factor of approximately 9 8. we can check this by repeatedly appending to a list and seeing when the capacity changes and then looking at the ratios the ratio starts out larger but very quickly approaches 1.125 or 9 8.

does anyone remember what the original question was at this point yeah that was quite the deep dive but lets get back to it why does getsizeof return different sizes for all of these lists well now that we know that getsizeof is just a shifted scaled version of the capacity lets take a look at the capacities instead when a list is constructed what determines the capacity that python chooses well every different way of constructing a list is a different function and it could choose whatever capacity it wants in this case it knows theres going to be exactly three elements so it chooses a capacity of three okay that makes sense if you think about how a list comprehension might work its akin to just repeatedly appending elements and indeed four is the capacity that you would get if you started with the empty list and then repeatedly appended elements when you add the first element the capacity jumps up to four and then it stays there because we only added three elements so the only really weird one is this why did this get a capacity of eight and remember its even weirder because if we used a variable e instead of the literal zero then we get a capacity of three again makes sense because we just have a literal with 3 elements so the capacity of 8 in this case is just a quirk of using literal zeros again its a different function called to construct the list so they can just choose whatever capacity they want if we subsequently add a thousand elements into each of them then they all end up with the same capacity of a thousand if we then drop them down to 500 elements they still have that same capacity the length of the list changed but its capacity didnt but if we drop to 499 one less than half the size then python adjusts and actually shrinks its capacity i guess they decided no need to waste as much space if youre using less than half of what you were so reality check its really good to understand the internals of a language if youre really trying to master it but in practice is knowing these capacity things going to help you in your daily programming life probably not especially since you cant change the capacity algorithms at best knowing the internals of how capacities work in python will save you a factor of two times memory but lets be honest in python youre rarely ever concerned with how much memory youre using so good to know good to understand the internals but not necessarily something that you need to think about as always thank you to my patrons and donors for supporting me if you enjoyed the video please consider becoming a patron on patreon and of course dont forget to like comment and subscribe see you next time

Shop

Learn programming in R: courses

$

Best Python online courses for 2022

$

Best laptop for Fortnite

$

Best laptop for Excel

$

Best laptop for Solidworks

$

Best laptop for Roblox

$

Best computer for crypto mining

$

Best laptop for Sims 4

$

Latest questions

NUMPYNUMPY

Common xlabel/ylabel for matplotlib subplots

12 answers

NUMPYNUMPY

How to specify multiple return types using type-hints

12 answers

NUMPYNUMPY

Why do I get "Pickle - EOFError: Ran out of input" reading an empty file?

12 answers

NUMPYNUMPY

Flake8: Ignore specific warning for entire file

12 answers

NUMPYNUMPY

glob exclude pattern

12 answers

NUMPYNUMPY

How to avoid HTTP error 429 (Too Many Requests) python

12 answers

NUMPYNUMPY

Python CSV error: line contains NULL byte

12 answers

NUMPYNUMPY

csv.Error: iterator should return strings, not bytes

12 answers

News


Wiki

Python | How to copy data from one Excel sheet to another

Common xlabel/ylabel for matplotlib subplots

Check if one list is a subset of another in Python

sin

How to specify multiple return types using type-hints

exp

Printing words vertically in Python

exp

Python Extract words from a given string

Cyclic redundancy check in Python

Finding mean, median, mode in Python without libraries

cos

Python add suffix / add prefix to strings in a list

Why do I get "Pickle - EOFError: Ran out of input" reading an empty file?

Python - Move item to the end of the list

Python - Print list vertically