# Second most repeated word in a sequence in Python

Counters | Python Methods and Functions | repeat

Examples:

` Input: {"aaa", "bbb", "ccc", "bbb", "aaa", "aaa"} Output: bbb Input: {"geeks", " for "," geeks "," for "," geeks "," aaa "} Output: for `

This problem has an existing solution, please refer to second most repeated word in link Counter (iterator) method.

The approach is very simple —

1. Create a dictionary using the Counter (iterator) method which contains words as keys and their frequency as values.
2. Now get a list of all the values ​​in the dictionary and sort it in descending order. Select the second item from the sorted list because it will be the second largest.
3. Now go through the dictionary again and print the key whose value is equal to the second largest item.

` `

` # Python print code Second most repetitive # word in sequence in Python from collections import Counter   def secondFrequent ( input ):    # Convert this list to a dictionary   # this will look like {& # 39; ccc & # 39 ;: 1, & # 39; aaa & # 39 ;: 3, & # 39; bbb & # 39 ;: 2} dict = Counter ( input )   # Get list all values ​​and sort it in ascending order value = sorted ( dict . values ​​(), reverse = True )   # Select the second largest item secondLarge = value [ 1 ]   # Go through the dictionary and type the key of which # the value is equal to the second big item for (key, val) in dict . iteritems (): if val = = secondLarge:   print key   return    # Driver program if __ name__ = = "__ main__" : input = [ 'aaa' , ' bbb' , 'ccc' , ' bbb ' , ' aaa' , 'aaa' ] secondFrequent ( input ) `

Output:

` bbb `

## Create list of single item repeated N times

I want to create a series of lists, all of varying lengths. Each list will contain the same element `e`, repeated `n` times (where `n` = length of the list).

How do I create the lists, without using a list comprehension `[e for number in xrange(n)]` for each list?

## What is the best way to repeatedly execute a function every x seconds?

### Question by DavidM

I want to repeatedly execute a function in Python every 60 seconds forever (just like an NSTimer in Objective C). This code will run as a daemon and is effectively like calling the python script every minute using a cron, but without requiring that to be set up by the user.

In this question about a cron implemented in Python, the solution appears to effectively just sleep() for x seconds. I don"t need such advanced functionality so perhaps something like this would work

``````while True:
# Code executed here
time.sleep(60)
``````

Are there any foreseeable problems with this code?

## How can I tell if a string repeats itself in Python?

I"m looking for a way to test whether or not a given string repeats itself for the entire string or not.

Examples:

``````[
"0045662100456621004566210045662100456621",             # "00456621"
"0072992700729927007299270072992700729927",             # "00729927"
"001443001443001443001443001443001443001443",           # "001443"
"037037037037037037037037037037037037037037037",        # "037"
"047619047619047619047619047619047619047619",           # "047619"
"002457002457002457002457002457002457002457",           # "002457"
"001221001221001221001221001221001221001221",           # "001221"
"001230012300123001230012300123001230012300123",        # "00123"
"0013947001394700139470013947001394700139470013947",    # "0013947"
"001001001001001001001001001001001001001001001001001",  # "001"
"001406469760900140646976090014064697609",              # "0014064697609"
]
``````

are strings which repeat themselves, and

``````[
"004608294930875576036866359447",
"00469483568075117370892018779342723",
"004739336492890995260663507109",
"001508295625942684766214177978883861236802413273",
"007518796992481203",
"0071942446043165467625899280575539568345323741",
"0434782608695652173913",
"0344827586206896551724137931",
"002481389578163771712158808933",
"002932551319648093841642228739",
"0035587188612099644128113879",
"003484320557491289198606271777",
"00115074798619102416570771",
]
``````

are examples of ones that do not.

The repeating sections of the strings I"m given can be quite long, and the strings themselves can be 500 or more characters, so looping through each character trying to build a pattern then checking the pattern vs the rest of the string seems awful slow. Multiply that by potentially hundreds of strings and I can"t see any intuitive solution.

I"ve looked into regexes a bit and they seem good for when you know what you"re looking for, or at least the length of the pattern you"re looking for. Unfortunately, I know neither.

How can I tell if a string is repeating itself and if it is, what the shortest repeating subsequence is?

## Repeat string to certain length

What is an efficient way to repeat a string to a certain length? Eg: `repeat("abc", 7) -> "abcabca"`

Here is my current code:

``````def repeat(string, length):
cur, old = 1, string
while len(string) < length:
string += old[cur-1]
cur = (cur+1)%len(old)
return string
``````

Is there a better (more pythonic) way to do this? Maybe using list comprehension?

## tqdm in Jupyter Notebook prints new progress bars repeatedly

I am using `tqdm` to print progress in a script I"m running in a Jupyter notebook. I am printing all messages to the console via `tqdm.write()`. However, this still gives me a skewed output like so:

That is, each time a new line has to be printed, a new progress bar is printed on the next line. This does not happen when I run the script via terminal. How can I solve this?

## How to repeat last command in python interpreter shell?

How do I repeat the last command? The usual keys: Up, Ctrl+Up, Alt-p don"t work. They produce nonsensical characters.

``````(ve)[[email protected] ve]\$ python
Python 2.6.6 (r266:84292, Nov 15 2010, 21:48:32)
[GCC 4.4.4 20100630 (Red Hat 4.4.4-10)] on linux2
>>> print "hello world"
hello world
>>> ^[[A
File "<stdin>", line 1
^
SyntaxError: invalid syntax
>>> ^[[1;5A
File "<stdin>", line 1
[1;5A
^
SyntaxError: invalid syntax
>>> ^[p
File "<stdin>", line 1
p
^
SyntaxError: invalid syntax
>>>
``````

## Given a string of a million numbers, return all repeating 3 digit numbers

I had an interview with a hedge fund company in New York a few months ago and unfortunately, I did not get the internship offer as a data/software engineer. (They also asked the solution to be in Python.)

I pretty much screwed up on the first interview problem...

Question: Given a string of a million numbers (Pi for example), write a function/program that returns all repeating 3 digit numbers and number of repetition greater than 1

For example: if the string was: `123412345123456` then the function/program would return:

``````123 - 3 times
234 - 3 times
345 - 2 times
``````

They did not give me the solution after I failed the interview, but they did tell me that the time complexity for the solution was constant of 1000 since all the possible outcomes are between:

000 --> 999

Now that I"m thinking about it, I don"t think it"s possible to come up with a constant time algorithm. Is it?

## Python threading.timer - repeat function every "n" seconds

I want to fire off a function every 0.5 seconds and be able to start and stop and reset the timer. I"m not too knowledgeable of how Python threads work and am having difficulties with the python timer.

However, I keep getting `RuntimeError: threads can only be started once` when I execute `threading.timer.start()` twice. Is there a work around for this? I tried applying `threading.timer.cancel()` before each start.

Pseudo code:

``````t=threading.timer(0.5,function)
while True:
t.cancel()
t.start()
``````

## How to assign to repeated field?

I am using protocol buffers in python and I have a `Person` message

``````repeated uint64 id
``````

but when I try to assign a value to it like:

``````person.id = [1, 32, 43432]
``````

I get an error: `Assigment not allowed for repeated field "id" in protocol message object` How to assign a value to a repeated field ?

Since this question was asked in 2010, there has been real simplification in how to do simple multithreading with Python with map and pool.

The code below comes from an article/blog post that you should definitely check out (no affiliation) - Parallelism in one line: A Better Model for Day to Day Threading Tasks. I"ll summarize below - it ends up being just a few lines of code:

``````from multiprocessing.dummy import Pool as ThreadPool
results = pool.map(my_function, my_array)
``````

Which is the multithreaded version of:

``````results = []
for item in my_array:
results.append(my_function(item))
``````

Description

Map is a cool little function, and the key to easily injecting parallelism into your Python code. For those unfamiliar, map is something lifted from functional languages like Lisp. It is a function which maps another function over a sequence.

Map handles the iteration over the sequence for us, applies the function, and stores all of the results in a handy list at the end.

Implementation

Parallel versions of the map function are provided by two libraries:multiprocessing, and also its little known, but equally fantastic step child:multiprocessing.dummy.

`multiprocessing.dummy` is exactly the same as multiprocessing module, but uses threads instead (an important distinction - use multiple processes for CPU-intensive tasks; threads for (and during) I/O):

multiprocessing.dummy replicates the API of multiprocessing, but is no more than a wrapper around the threading module.

``````import urllib2
from multiprocessing.dummy import Pool as ThreadPool

urls = [
"http://www.python.org",
"http://www.onlamp.com/pub/a/python/2003/04/17/metaclasses.html",
"http://www.python.org/doc/",
"http://www.python.org/getit/",
"http://www.python.org/community/",
"https://wiki.python.org/moin/",
]

# Make the Pool of workers

# Open the URLs in their own threads
# and return the results
results = pool.map(urllib2.urlopen, urls)

# Close the pool and wait for the work to finish
pool.close()
pool.join()
``````

And the timing results:

``````Single thread:   14.4 seconds
4 Pool:   3.1 seconds
8 Pool:   1.4 seconds
13 Pool:   1.3 seconds
``````

Passing multiple arguments (works like this only in Python 3.3 and later):

To pass multiple arrays:

``````results = pool.starmap(function, zip(list_a, list_b))
``````

Or to pass a constant and an array:

``````results = pool.starmap(function, zip(itertools.repeat(constant), list_a))
``````

If you are using an earlier version of Python, you can pass multiple arguments via this workaround).

(Thanks to user136036 for the helpful comment.)

# In Python, what is the purpose of `__slots__` and what are the cases one should avoid this?

## TLDR:

The special attribute `__slots__` allows you to explicitly state which instance attributes you expect your object instances to have, with the expected results:

1. faster attribute access.
2. space savings in memory.

The space savings is from

1. Storing value references in slots instead of `__dict__`.
2. Denying `__dict__` and `__weakref__` creation if parent classes deny them and you declare `__slots__`.

### Quick Caveats

Small caveat, you should only declare a particular slot one time in an inheritance tree. For example:

``````class Base:
__slots__ = "foo", "bar"

class Right(Base):
__slots__ = "baz",

class Wrong(Base):
__slots__ = "foo", "bar", "baz"        # redundant foo and bar
``````

Python doesn"t object when you get this wrong (it probably should), problems might not otherwise manifest, but your objects will take up more space than they otherwise should. Python 3.8:

``````>>> from sys import getsizeof
>>> getsizeof(Right()), getsizeof(Wrong())
(56, 72)
``````

This is because the Base"s slot descriptor has a slot separate from the Wrong"s. This shouldn"t usually come up, but it could:

``````>>> w = Wrong()
>>> w.foo = "foo"
>>> Base.foo.__get__(w)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
AttributeError: foo
>>> Wrong.foo.__get__(w)
"foo"
``````

The biggest caveat is for multiple inheritance - multiple "parent classes with nonempty slots" cannot be combined.

To accommodate this restriction, follow best practices: Factor out all but one or all parents" abstraction which their concrete class respectively and your new concrete class collectively will inherit from - giving the abstraction(s) empty slots (just like abstract base classes in the standard library).

See section on multiple inheritance below for an example.

### Requirements:

• To have attributes named in `__slots__` to actually be stored in slots instead of a `__dict__`, a class must inherit from `object` (automatic in Python 3, but must be explicit in Python 2).

• To prevent the creation of a `__dict__`, you must inherit from `object` and all classes in the inheritance must declare `__slots__` and none of them can have a `"__dict__"` entry.

There are a lot of details if you wish to keep reading.

## Why use `__slots__`: Faster attribute access.

The creator of Python, Guido van Rossum, states that he actually created `__slots__` for faster attribute access.

It is trivial to demonstrate measurably significant faster access:

``````import timeit

class Foo(object): __slots__ = "foo",

class Bar(object): pass

slotted = Foo()
not_slotted = Bar()

def get_set_delete_fn(obj):
def get_set_delete():
obj.foo = "foo"
obj.foo
del obj.foo
return get_set_delete
``````

and

``````>>> min(timeit.repeat(get_set_delete_fn(slotted)))
0.2846834529991611
>>> min(timeit.repeat(get_set_delete_fn(not_slotted)))
0.3664822799983085
``````

The slotted access is almost 30% faster in Python 3.5 on Ubuntu.

``````>>> 0.3664822799983085 / 0.2846834529991611
1.2873325658284342
``````

In Python 2 on Windows I have measured it about 15% faster.

## Why use `__slots__`: Memory Savings

Another purpose of `__slots__` is to reduce the space in memory that each object instance takes up.

The space saved over using `__dict__` can be significant.

SQLAlchemy attributes a lot of memory savings to `__slots__`.

To verify this, using the Anaconda distribution of Python 2.7 on Ubuntu Linux, with `guppy.hpy` (aka heapy) and `sys.getsizeof`, the size of a class instance without `__slots__` declared, and nothing else, is 64 bytes. That does not include the `__dict__`. Thank you Python for lazy evaluation again, the `__dict__` is apparently not called into existence until it is referenced, but classes without data are usually useless. When called into existence, the `__dict__` attribute is a minimum of 280 bytes additionally.

In contrast, a class instance with `__slots__` declared to be `()` (no data) is only 16 bytes, and 56 total bytes with one item in slots, 64 with two.

For 64 bit Python, I illustrate the memory consumption in bytes in Python 2.7 and 3.6, for `__slots__` and `__dict__` (no slots defined) for each point where the dict grows in 3.6 (except for 0, 1, and 2 attributes):

``````       Python 2.7             Python 3.6
attrs  __slots__  __dict__*   __slots__  __dict__* | *(no slots defined)
none   16         56 + 272‚Ä†   16         56 + 112‚Ä† | ‚Ä†if __dict__ referenced
one    48         56 + 272    48         56 + 112
two    56         56 + 272    56         56 + 112
six    88         56 + 1040   88         56 + 152
11     128        56 + 1040   128        56 + 240
22     216        56 + 3344   216        56 + 408
43     384        56 + 3344   384        56 + 752
``````

So, in spite of smaller dicts in Python 3, we see how nicely `__slots__` scale for instances to save us memory, and that is a major reason you would want to use `__slots__`.

Just for completeness of my notes, note that there is a one-time cost per slot in the class"s namespace of 64 bytes in Python 2, and 72 bytes in Python 3, because slots use data descriptors like properties, called "members".

``````>>> Foo.foo
<member "foo" of "Foo" objects>
>>> type(Foo.foo)
<class "member_descriptor">
>>> getsizeof(Foo.foo)
72
``````

## Demonstration of `__slots__`:

To deny the creation of a `__dict__`, you must subclass `object`. Everything subclasses `object` in Python 3, but in Python 2 you had to be explicit:

``````class Base(object):
__slots__ = ()
``````

now:

``````>>> b = Base()
>>> b.a = "a"
Traceback (most recent call last):
File "<pyshell#38>", line 1, in <module>
b.a = "a"
AttributeError: "Base" object has no attribute "a"
``````

Or subclass another class that defines `__slots__`

``````class Child(Base):
__slots__ = ("a",)
``````

and now:

``````c = Child()
c.a = "a"
``````

but:

``````>>> c.b = "b"
Traceback (most recent call last):
File "<pyshell#42>", line 1, in <module>
c.b = "b"
AttributeError: "Child" object has no attribute "b"
``````

To allow `__dict__` creation while subclassing slotted objects, just add `"__dict__"` to the `__slots__` (note that slots are ordered, and you shouldn"t repeat slots that are already in parent classes):

``````class SlottedWithDict(Child):
__slots__ = ("__dict__", "b")

swd = SlottedWithDict()
swd.a = "a"
swd.b = "b"
swd.c = "c"
``````

and

``````>>> swd.__dict__
{"c": "c"}
``````

Or you don"t even need to declare `__slots__` in your subclass, and you will still use slots from the parents, but not restrict the creation of a `__dict__`:

``````class NoSlots(Child): pass
ns = NoSlots()
ns.a = "a"
ns.b = "b"
``````

And:

``````>>> ns.__dict__
{"b": "b"}
``````

However, `__slots__` may cause problems for multiple inheritance:

``````class BaseA(object):
__slots__ = ("a",)

class BaseB(object):
__slots__ = ("b",)
``````

Because creating a child class from parents with both non-empty slots fails:

``````>>> class Child(BaseA, BaseB): __slots__ = ()
Traceback (most recent call last):
File "<pyshell#68>", line 1, in <module>
class Child(BaseA, BaseB): __slots__ = ()
TypeError: Error when calling the metaclass bases
multiple bases have instance lay-out conflict
``````

If you run into this problem, You could just remove `__slots__` from the parents, or if you have control of the parents, give them empty slots, or refactor to abstractions:

``````from abc import ABC

class AbstractA(ABC):
__slots__ = ()

class BaseA(AbstractA):
__slots__ = ("a",)

class AbstractB(ABC):
__slots__ = ()

class BaseB(AbstractB):
__slots__ = ("b",)

class Child(AbstractA, AbstractB):
__slots__ = ("a", "b")

c = Child() # no problem!
``````

### Add `"__dict__"` to `__slots__` to get dynamic assignment:

``````class Foo(object):
__slots__ = "bar", "baz", "__dict__"
``````

and now:

``````>>> foo = Foo()
>>> foo.boink = "boink"
``````

So with `"__dict__"` in slots we lose some of the size benefits with the upside of having dynamic assignment and still having slots for the names we do expect.

When you inherit from an object that isn"t slotted, you get the same sort of semantics when you use `__slots__` - names that are in `__slots__` point to slotted values, while any other values are put in the instance"s `__dict__`.

Avoiding `__slots__` because you want to be able to add attributes on the fly is actually not a good reason - just add `"__dict__"` to your `__slots__` if this is required.

You can similarly add `__weakref__` to `__slots__` explicitly if you need that feature.

### Set to empty tuple when subclassing a namedtuple:

The namedtuple builtin make immutable instances that are very lightweight (essentially, the size of tuples) but to get the benefits, you need to do it yourself if you subclass them:

``````from collections import namedtuple
class MyNT(namedtuple("MyNT", "bar baz")):
"""MyNT is an immutable and lightweight object"""
__slots__ = ()
``````

usage:

``````>>> nt = MyNT("bar", "baz")
>>> nt.bar
"bar"
>>> nt.baz
"baz"
``````

And trying to assign an unexpected attribute raises an `AttributeError` because we have prevented the creation of `__dict__`:

``````>>> nt.quux = "quux"
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
AttributeError: "MyNT" object has no attribute "quux"
``````

You can allow `__dict__` creation by leaving off `__slots__ = ()`, but you can"t use non-empty `__slots__` with subtypes of tuple.

## Biggest Caveat: Multiple inheritance

Even when non-empty slots are the same for multiple parents, they cannot be used together:

``````class Foo(object):
__slots__ = "foo", "bar"
class Bar(object):
__slots__ = "foo", "bar" # alas, would work if empty, i.e. ()

>>> class Baz(Foo, Bar): pass
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: Error when calling the metaclass bases
multiple bases have instance lay-out conflict
``````

Using an empty `__slots__` in the parent seems to provide the most flexibility, allowing the child to choose to prevent or allow (by adding `"__dict__"` to get dynamic assignment, see section above) the creation of a `__dict__`:

``````class Foo(object): __slots__ = ()
class Bar(object): __slots__ = ()
class Baz(Foo, Bar): __slots__ = ("foo", "bar")
b = Baz()
b.foo, b.bar = "foo", "bar"
``````

You don"t have to have slots - so if you add them, and remove them later, it shouldn"t cause any problems.

Going out on a limb here: If you"re composing mixins or using abstract base classes, which aren"t intended to be instantiated, an empty `__slots__` in those parents seems to be the best way to go in terms of flexibility for subclassers.

To demonstrate, first, let"s create a class with code we"d like to use under multiple inheritance

``````class AbstractBase:
__slots__ = ()
def __init__(self, a, b):
self.a = a
self.b = b
def __repr__(self):
return f"{type(self).__name__}({repr(self.a)}, {repr(self.b)})"
``````

We could use the above directly by inheriting and declaring the expected slots:

``````class Foo(AbstractBase):
__slots__ = "a", "b"
``````

But we don"t care about that, that"s trivial single inheritance, we need another class we might also inherit from, maybe with a noisy attribute:

``````class AbstractBaseC:
__slots__ = ()
@property
def c(self):
print("getting c!")
return self._c
@c.setter
def c(self, arg):
print("setting c!")
self._c = arg
``````

Now if both bases had nonempty slots, we couldn"t do the below. (In fact, if we wanted, we could have given `AbstractBase` nonempty slots a and b, and left them out of the below declaration - leaving them in would be wrong):

``````class Concretion(AbstractBase, AbstractBaseC):
__slots__ = "a b _c".split()
``````

And now we have functionality from both via multiple inheritance, and can still deny `__dict__` and `__weakref__` instantiation:

``````>>> c = Concretion("a", "b")
>>> c.c = c
setting c!
>>> c.c
getting c!
Concretion("a", "b")
>>> c.d = "d"
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
AttributeError: "Concretion" object has no attribute "d"
``````

## Other cases to avoid slots:

• Avoid them when you want to perform `__class__` assignment with another class that doesn"t have them (and you can"t add them) unless the slot layouts are identical. (I am very interested in learning who is doing this and why.)
• Avoid them if you want to subclass variable length builtins like long, tuple, or str, and you want to add attributes to them.
• Avoid them if you insist on providing default values via class attributes for instance variables.

You may be able to tease out further caveats from the rest of the `__slots__` documentation (the 3.7 dev docs are the most current), which I have made significant recent contributions to.

The current top answers cite outdated information and are quite hand-wavy and miss the mark in some important ways.

### Do not "only use `__slots__` when instantiating lots of objects"

I quote:

"You would want to use `__slots__` if you are going to instantiate a lot (hundreds, thousands) of objects of the same class."

Abstract Base Classes, for example, from the `collections` module, are not instantiated, yet `__slots__` are declared for them.

Why?

If a user wishes to deny `__dict__` or `__weakref__` creation, those things must not be available in the parent classes.

`__slots__` contributes to reusability when creating interfaces or mixins.

It is true that many Python users aren"t writing for reusability, but when you are, having the option to deny unnecessary space usage is valuable.

### `__slots__` doesn"t break pickling

When pickling a slotted object, you may find it complains with a misleading `TypeError`:

``````>>> pickle.loads(pickle.dumps(f))
TypeError: a class that defines __slots__ without defining __getstate__ cannot be pickled
``````

This is actually incorrect. This message comes from the oldest protocol, which is the default. You can select the latest protocol with the `-1` argument. In Python 2.7 this would be `2` (which was introduced in 2.3), and in 3.6 it is `4`.

``````>>> pickle.loads(pickle.dumps(f, -1))
<__main__.Foo object at 0x1129C770>
``````

in Python 2.7:

``````>>> pickle.loads(pickle.dumps(f, 2))
<__main__.Foo object at 0x1129C770>
``````

in Python 3.6

``````>>> pickle.loads(pickle.dumps(f, 4))
<__main__.Foo object at 0x1129C770>
``````

So I would keep this in mind, as it is a solved problem.

## Critique of the (until Oct 2, 2016) accepted answer

The first paragraph is half short explanation, half predictive. Here"s the only part that actually answers the question

The proper use of `__slots__` is to save space in objects. Instead of having a dynamic dict that allows adding attributes to objects at anytime, there is a static structure which does not allow additions after creation. This saves the overhead of one dict for every object that uses slots

The second half is wishful thinking, and off the mark:

While this is sometimes a useful optimization, it would be completely unnecessary if the Python interpreter was dynamic enough so that it would only require the dict when there actually were additions to the object.

Python actually does something similar to this, only creating the `__dict__` when it is accessed, but creating lots of objects with no data is fairly ridiculous.

The second paragraph oversimplifies and misses actual reasons to avoid `__slots__`. The below is not a real reason to avoid slots (for actual reasons, see the rest of my answer above.):

They change the behavior of the objects that have slots in a way that can be abused by control freaks and static typing weenies.

It then goes on to discuss other ways of accomplishing that perverse goal with Python, not discussing anything to do with `__slots__`.

The third paragraph is more wishful thinking. Together it is mostly off-the-mark content that the answerer didn"t even author and contributes to ammunition for critics of the site.

# Memory usage evidence

Create some normal objects and slotted objects:

``````>>> class Foo(object): pass
>>> class Bar(object): __slots__ = ()
``````

Instantiate a million of them:

``````>>> foos = [Foo() for f in xrange(1000000)]
>>> bars = [Bar() for b in xrange(1000000)]
``````

Inspect with `guppy.hpy().heap()`:

``````>>> guppy.hpy().heap()
Partition of a set of 2028259 objects. Total size = 99763360 bytes.
Index  Count   %     Size   % Cumulative  % Kind (class / dict of class)
0 1000000  49 64000000  64  64000000  64 __main__.Foo
1     169   0 16281480  16  80281480  80 list
2 1000000  49 16000000  16  96281480  97 __main__.Bar
3   12284   1   987472   1  97268952  97 str
...
``````

Access the regular objects and their `__dict__` and inspect again:

``````>>> for f in foos:
...     f.__dict__
>>> guppy.hpy().heap()
Partition of a set of 3028258 objects. Total size = 379763480 bytes.
Index  Count   %      Size    % Cumulative  % Kind (class / dict of class)
0 1000000  33 280000000  74 280000000  74 dict of __main__.Foo
1 1000000  33  64000000  17 344000000  91 __main__.Foo
2     169   0  16281480   4 360281480  95 list
3 1000000  33  16000000   4 376281480  99 __main__.Bar
4   12284   0    987472   0 377268952  99 str
...
``````

This is consistent with the history of Python, from Unifying types and classes in Python 2.2

If you subclass a built-in type, extra space is automatically added to the instances to accomodate `__dict__` and `__weakrefs__`. (The `__dict__` is not initialized until you use it though, so you shouldn"t worry about the space occupied by an empty dictionary for each instance you create.) If you don"t need this extra space, you can add the phrase "`__slots__ = []`" to your class.

# The short answer, or TL;DR

Basically, `eval` is used to evaluate a single dynamically generated Python expression, and `exec` is used to execute dynamically generated Python code only for its side effects.

`eval` and `exec` have these two differences:

1. `eval` accepts only a single expression, `exec` can take a code block that has Python statements: loops, `try: except:`, `class` and function/method `def`initions and so on.

An expression in Python is whatever you can have as the value in a variable assignment:

``````a_variable = (anything you can put within these parentheses is an expression)
``````
2. `eval` returns the value of the given expression, whereas `exec` ignores the return value from its code, and always returns `None` (in Python 2 it is a statement and cannot be used as an expression, so it really does not return anything).

In versions 1.0 - 2.7, `exec` was a statement, because CPython needed to produce a different kind of code object for functions that used `exec` for its side effects inside the function.

In Python 3, `exec` is a function; its use has no effect on the compiled bytecode of the function where it is used.

Thus basically:

``````>>> a = 5
>>> eval("37 + a")   # it is an expression
42
>>> exec("37 + a")   # it is an expression statement; value is ignored (None is returned)
>>> exec("a = 47")   # modify a global variable as a side effect
>>> a
47
>>> eval("a = 47")  # you cannot evaluate a statement
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "<string>", line 1
a = 47
^
SyntaxError: invalid syntax
``````

The `compile` in `"exec"` mode compiles any number of statements into a bytecode that implicitly always returns `None`, whereas in `"eval"` mode it compiles a single expression into bytecode that returns the value of that expression.

``````>>> eval(compile("42", "<string>", "exec"))  # code returns None
>>> eval(compile("42", "<string>", "eval"))  # code returns 42
42
>>> exec(compile("42", "<string>", "eval"))  # code returns 42,
>>>                                          # but ignored by exec
``````

In the `"eval"` mode (and thus with the `eval` function if a string is passed in), the `compile` raises an exception if the source code contains statements or anything else beyond a single expression:

``````>>> compile("for i in range(3): print(i)", "<string>", "eval")
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "<string>", line 1
for i in range(3): print(i)
^
SyntaxError: invalid syntax
``````

Actually the statement "eval accepts only a single expression" applies only when a string (which contains Python source code) is passed to `eval`. Then it is internally compiled to bytecode using `compile(source, "<string>", "eval")` This is where the difference really comes from.

If a `code` object (which contains Python bytecode) is passed to `exec` or `eval`, they behave identically, excepting for the fact that `exec` ignores the return value, still returning `None` always. So it is possible use `eval` to execute something that has statements, if you just `compile`d it into bytecode before instead of passing it as a string:

``````>>> eval(compile("if 1: print("Hello")", "<string>", "exec"))
Hello
>>>
``````

works without problems, even though the compiled code contains statements. It still returns `None`, because that is the return value of the code object returned from `compile`.

In the `"eval"` mode (and thus with the `eval` function if a string is passed in), the `compile` raises an exception if the source code contains statements or anything else beyond a single expression:

``````>>> compile("for i in range(3): print(i)", "<string>". "eval")
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "<string>", line 1
for i in range(3): print(i)
^
SyntaxError: invalid syntax
``````

# The longer answer, a.k.a the gory details

## `exec` and `eval`

The `exec` function (which was a statement in Python 2) is used for executing a dynamically created statement or program:

``````>>> program = """
for i in range(3):
print("Python is cool")
"""
>>> exec(program)
Python is cool
Python is cool
Python is cool
>>>
``````

The `eval` function does the same for a single expression, and returns the value of the expression:

``````>>> a = 2
>>> my_calculation = "42 * a"
>>> result = eval(my_calculation)
>>> result
84
``````

`exec` and `eval` both accept the program/expression to be run either as a `str`, `unicode` or `bytes` object containing source code, or as a `code` object which contains Python bytecode.

If a `str`/`unicode`/`bytes` containing source code was passed to `exec`, it behaves equivalently to:

``````exec(compile(source, "<string>", "exec"))
``````

and `eval` similarly behaves equivalent to:

``````eval(compile(source, "<string>", "eval"))
``````

Since all expressions can be used as statements in Python (these are called the `Expr` nodes in the Python abstract grammar; the opposite is not true), you can always use `exec` if you do not need the return value. That is to say, you can use either `eval("my_func(42)")` or `exec("my_func(42)")`, the difference being that `eval` returns the value returned by `my_func`, and `exec` discards it:

``````>>> def my_func(arg):
...     print("Called with %d" % arg)
...     return arg * 2
...
>>> exec("my_func(42)")
Called with 42
>>> eval("my_func(42)")
Called with 42
84
>>>
``````

Of the 2, only `exec` accepts source code that contains statements, like `def`, `for`, `while`, `import`, or `class`, the assignment statement (a.k.a `a = 42`), or entire programs:

``````>>> exec("for i in range(3): print(i)")
0
1
2
>>> eval("for i in range(3): print(i)")
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "<string>", line 1
for i in range(3): print(i)
^
SyntaxError: invalid syntax
``````

Both `exec` and `eval` accept 2 additional positional arguments - `globals` and `locals` - which are the global and local variable scopes that the code sees. These default to the `globals()` and `locals()` within the scope that called `exec` or `eval`, but any dictionary can be used for `globals` and any `mapping` for `locals` (including `dict` of course). These can be used not only to restrict/modify the variables that the code sees, but are often also used for capturing the variables that the `exec`uted code creates:

``````>>> g = dict()
>>> l = dict()
>>> exec("global a; a, b = 123, 42", g, l)
>>> g["a"]
123
>>> l
{"b": 42}
``````

(If you display the value of the entire `g`, it would be much longer, because `exec` and `eval` add the built-ins module as `__builtins__` to the globals automatically if it is missing).

In Python 2, the official syntax for the `exec` statement is actually `exec code in globals, locals`, as in

``````>>> exec "global a; a, b = 123, 42" in g, l
``````

However the alternate syntax `exec(code, globals, locals)` has always been accepted too (see below).

## `compile`

The `compile(source, filename, mode, flags=0, dont_inherit=False, optimize=-1)` built-in can be used to speed up repeated invocations of the same code with `exec` or `eval` by compiling the source into a `code` object beforehand. The `mode` parameter controls the kind of code fragment the `compile` function accepts and the kind of bytecode it produces. The choices are `"eval"`, `"exec"` and `"single"`:

• `"eval"` mode expects a single expression, and will produce bytecode that when run will return the value of that expression:

``````>>> dis.dis(compile("a + b", "<string>", "eval"))
7 RETURN_VALUE
``````
• `"exec"` accepts any kinds of python constructs from single expressions to whole modules of code, and executes them as if they were module top-level statements. The code object returns `None`:

``````>>> dis.dis(compile("a + b", "<string>", "exec"))
``````
• `"single"` is a limited form of `"exec"` which accepts a source code containing a single statement (or multiple statements separated by `;`) if the last statement is an expression statement, the resulting bytecode also prints the `repr` of the value of that expression to the standard output(!).

An `if`-`elif`-`else` chain, a loop with `else`, and `try` with its `except`, `else` and `finally` blocks is considered a single statement.

A source fragment containing 2 top-level statements is an error for the `"single"`, except in Python 2 there is a bug that sometimes allows multiple toplevel statements in the code; only the first is compiled; the rest are ignored:

In Python 2.7.8:

``````>>> exec(compile("a = 5
a = 6", "<string>", "single"))
>>> a
5
``````

And in Python 3.4.2:

``````>>> exec(compile("a = 5
a = 6", "<string>", "single"))
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "<string>", line 1
a = 5
^
SyntaxError: multiple statements found while compiling a single statement
``````

This is very useful for making interactive Python shells. However, the value of the expression is not returned, even if you `eval` the resulting code.

Thus greatest distinction of `exec` and `eval` actually comes from the `compile` function and its modes.

In addition to compiling source code to bytecode, `compile` supports compiling abstract syntax trees (parse trees of Python code) into `code` objects; and source code into abstract syntax trees (the `ast.parse` is written in Python and just calls `compile(source, filename, mode, PyCF_ONLY_AST)`); these are used for example for modifying source code on the fly, and also for dynamic code creation, as it is often easier to handle the code as a tree of nodes instead of lines of text in complex cases.

While `eval` only allows you to evaluate a string that contains a single expression, you can `eval` a whole statement, or even a whole module that has been `compile`d into bytecode; that is, with Python 2, `print` is a statement, and cannot be `eval`led directly:

``````>>> eval("for i in range(3): print("Python is cool")")
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "<string>", line 1
for i in range(3): print("Python is cool")
^
SyntaxError: invalid syntax
``````

`compile` it with `"exec"` mode into a `code` object and you can `eval` it; the `eval` function will return `None`.

``````>>> code = compile("for i in range(3): print("Python is cool")",
"foo.py", "exec")
>>> eval(code)
Python is cool
Python is cool
Python is cool
``````

If one looks into `eval` and `exec` source code in CPython 3, this is very evident; they both call `PyEval_EvalCode` with same arguments, the only difference being that `exec` explicitly returns `None`.

## Syntax differences of `exec` between Python 2 and Python 3

One of the major differences in Python 2 is that `exec` is a statement and `eval` is a built-in function (both are built-in functions in Python 3). It is a well-known fact that the official syntax of `exec` in Python 2 is `exec code [in globals[, locals]]`.

Unlike majority of the Python 2-to-3 porting guides seem to suggest, the `exec` statement in CPython 2 can be also used with syntax that looks exactly like the `exec` function invocation in Python 3. The reason is that Python 0.9.9 had the `exec(code, globals, locals)` built-in function! And that built-in function was replaced with `exec` statement somewhere before Python 1.0 release.

Since it was desirable to not break backwards compatibility with Python 0.9.9, Guido van Rossum added a compatibility hack in 1993: if the `code` was a tuple of length 2 or 3, and `globals` and `locals` were not passed into the `exec` statement otherwise, the `code` would be interpreted as if the 2nd and 3rd element of the tuple were the `globals` and `locals` respectively. The compatibility hack was not mentioned even in Python 1.4 documentation (the earliest available version online); and thus was not known to many writers of the porting guides and tools, until it was documented again in November 2012:

The first expression may also be a tuple of length 2 or 3. In this case, the optional parts must be omitted. The form `exec(expr, globals)` is equivalent to `exec expr in globals`, while the form `exec(expr, globals, locals)` is equivalent to `exec expr in globals, locals`. The tuple form of `exec` provides compatibility with Python 3, where `exec` is a function rather than a statement.

Yes, in CPython 2.7 that it is handily referred to as being a forward-compatibility option (why confuse people over that there is a backward compatibility option at all), when it actually had been there for backward-compatibility for two decades.

Thus while `exec` is a statement in Python 1 and Python 2, and a built-in function in Python 3 and Python 0.9.9,

``````>>> exec("print(a)", globals(), {"a": 42})
42
``````

has had identical behaviour in possibly every widely released Python version ever; and works in Jython 2.5.2, PyPy 2.3.1 (Python 2.7.6) and IronPython 2.6.1 too (kudos to them following the undocumented behaviour of CPython closely).

What you cannot do in Pythons 1.0 - 2.7 with its compatibility hack, is to store the return value of `exec` into a variable:

``````Python 2.7.11+ (default, Apr 17 2016, 14:00:29)
[GCC 5.3.1 20160413] on linux2
>>> a = exec("print(42)")
File "<stdin>", line 1
a = exec("print(42)")
^
SyntaxError: invalid syntax
``````

(which wouldn"t be useful in Python 3 either, as `exec` always returns `None`), or pass a reference to `exec`:

``````>>> call_later(exec, "print(42)", delay=1000)
File "<stdin>", line 1
call_later(exec, "print(42)", delay=1000)
^
SyntaxError: invalid syntax
``````

Which a pattern that someone might actually have used, though unlikely;

Or use it in a list comprehension:

``````>>> [exec(i) for i in ["print(42)", "print(foo)"]
File "<stdin>", line 1
[exec(i) for i in ["print(42)", "print(foo)"]
^
SyntaxError: invalid syntax
``````

which is abuse of list comprehensions (use a `for` loop instead!).

## What is the difference between the list methods append and extend?

• `append` adds its argument as a single element to the end of a list. The length of the list itself will increase by one.
• `extend` iterates over its argument adding each element to the list, extending the list. The length of the list will increase by however many elements were in the iterable argument.

## `append`

The `list.append` method appends an object to the end of the list.

``````my_list.append(object)
``````

Whatever the object is, whether a number, a string, another list, or something else, it gets added onto the end of `my_list` as a single entry on the list.

``````>>> my_list
["foo", "bar"]
>>> my_list.append("baz")
>>> my_list
["foo", "bar", "baz"]
``````

So keep in mind that a list is an object. If you append another list onto a list, the first list will be a single object at the end of the list (which may not be what you want):

``````>>> another_list = [1, 2, 3]
>>> my_list.append(another_list)
>>> my_list
["foo", "bar", "baz", [1, 2, 3]]
#^^^^^^^^^--- single item at the end of the list.
``````

## `extend`

The `list.extend` method extends a list by appending elements from an iterable:

``````my_list.extend(iterable)
``````

So with extend, each element of the iterable gets appended onto the list. For example:

``````>>> my_list
["foo", "bar"]
>>> another_list = [1, 2, 3]
>>> my_list.extend(another_list)
>>> my_list
["foo", "bar", 1, 2, 3]
``````

Keep in mind that a string is an iterable, so if you extend a list with a string, you"ll append each character as you iterate over the string (which may not be what you want):

``````>>> my_list.extend("baz")
>>> my_list
["foo", "bar", 1, 2, 3, "b", "a", "z"]
``````

## Operator Overload, `__add__` (`+`) and `__iadd__` (`+=`)

Both `+` and `+=` operators are defined for `list`. They are semantically similar to extend.

`my_list + another_list` creates a third list in memory, so you can return the result of it, but it requires that the second iterable be a list.

`my_list += another_list` modifies the list in-place (it is the in-place operator, and lists are mutable objects, as we"ve seen) so it does not create a new list. It also works like extend, in that the second iterable can be any kind of iterable.

Don"t get confused - `my_list = my_list + another_list` is not equivalent to `+=` - it gives you a brand new list assigned to my_list.

## Time Complexity

Append has (amortized) constant time complexity, O(1).

Extend has time complexity, O(k).

Iterating through the multiple calls to `append` adds to the complexity, making it equivalent to that of extend, and since extend"s iteration is implemented in C, it will always be faster if you intend to append successive items from an iterable onto a list.

Regarding "amortized" - from the list object implementation source:

``````    /* This over-allocates proportional to the list size, making room
* for additional growth.  The over-allocation is mild, but is
* enough to give linear-time amortized behavior over a long
* sequence of appends() in the presence of a poorly-performing
* system realloc().
``````

This means that we get the benefits of a larger than needed memory reallocation up front, but we may pay for it on the next marginal reallocation with an even larger one. Total time for all appends is linear at O(n), and that time allocated per append, becomes O(1).

## Performance

You may wonder what is more performant, since append can be used to achieve the same outcome as extend. The following functions do the same thing:

``````def append(alist, iterable):
for item in iterable:
alist.append(item)

def extend(alist, iterable):
alist.extend(iterable)
``````

So let"s time them:

``````import timeit

>>> min(timeit.repeat(lambda: append([], "abcdefghijklmnopqrstuvwxyz")))
2.867846965789795
>>> min(timeit.repeat(lambda: extend([], "abcdefghijklmnopqrstuvwxyz")))
0.8060121536254883
``````

### Addressing a comment on timings

A commenter said:

Perfect answer, I just miss the timing of comparing adding only one element

Do the semantically correct thing. If you want to append all elements in an iterable, use `extend`. If you"re just adding one element, use `append`.

Ok, so let"s create an experiment to see how this works out in time:

``````def append_one(a_list, element):
a_list.append(element)

def extend_one(a_list, element):
"""creating a new list is semantically the most direct
way to create an iterable to give to extend"""
a_list.extend([element])

import timeit
``````

And we see that going out of our way to create an iterable just to use extend is a (minor) waste of time:

``````>>> min(timeit.repeat(lambda: append_one([], 0)))
0.2082819009956438
>>> min(timeit.repeat(lambda: extend_one([], 0)))
0.2397019260097295
``````

We learn from this that there"s nothing gained from using `extend` when we have only one element to append.

Also, these timings are not that important. I am just showing them to make the point that, in Python, doing the semantically correct thing is doing things the Right Way‚Ñ¢.

It"s conceivable that you might test timings on two comparable operations and get an ambiguous or inverse result. Just focus on doing the semantically correct thing.

## Conclusion

We see that `extend` is semantically clearer, and that it can run much faster than `append`, when you intend to append each element in an iterable to a list.

If you only have a single element (not in an iterable) to add to the list, use `append`.

(Note: this answer is based on a short blog post about `einsum` I wrote a while ago.)

## What does `einsum` do?

Imagine that we have two multi-dimensional arrays, `A` and `B`. Now let"s suppose we want to...

• multiply `A` with `B` in a particular way to create new array of products; and then maybe
• sum this new array along particular axes; and then maybe
• transpose the axes of the new array in a particular order.

There"s a good chance that `einsum` will help us do this faster and more memory-efficiently than combinations of the NumPy functions like `multiply`, `sum` and `transpose` will allow.

## How does `einsum` work?

Here"s a simple (but not completely trivial) example. Take the following two arrays:

``````A = np.array([0, 1, 2])

B = np.array([[ 0,  1,  2,  3],
[ 4,  5,  6,  7],
[ 8,  9, 10, 11]])
``````

We will multiply `A` and `B` element-wise and then sum along the rows of the new array. In "normal" NumPy we"d write:

``````>>> (A[:, np.newaxis] * B).sum(axis=1)
array([ 0, 22, 76])
``````

So here, the indexing operation on `A` lines up the first axes of the two arrays so that the multiplication can be broadcast. The rows of the array of products are then summed to return the answer.

Now if we wanted to use `einsum` instead, we could write:

``````>>> np.einsum("i,ij->i", A, B)
array([ 0, 22, 76])
``````

The signature string `"i,ij->i"` is the key here and needs a little bit of explaining. You can think of it in two halves. On the left-hand side (left of the `->`) we"ve labelled the two input arrays. To the right of `->`, we"ve labelled the array we want to end up with.

Here is what happens next:

• `A` has one axis; we"ve labelled it `i`. And `B` has two axes; we"ve labelled axis 0 as `i` and axis 1 as `j`.

• By repeating the label `i` in both input arrays, we are telling `einsum` that these two axes should be multiplied together. In other words, we"re multiplying array `A` with each column of array `B`, just like `A[:, np.newaxis] * B` does.

• Notice that `j` does not appear as a label in our desired output; we"ve just used `i` (we want to end up with a 1D array). By omitting the label, we"re telling `einsum` to sum along this axis. In other words, we"re summing the rows of the products, just like `.sum(axis=1)` does.

That"s basically all you need to know to use `einsum`. It helps to play about a little; if we leave both labels in the output, `"i,ij->ij"`, we get back a 2D array of products (same as `A[:, np.newaxis] * B`). If we say no output labels, `"i,ij->`, we get back a single number (same as doing `(A[:, np.newaxis] * B).sum()`).

The great thing about `einsum` however, is that it does not build a temporary array of products first; it just sums the products as it goes. This can lead to big savings in memory use.

## A slightly bigger example

To explain the dot product, here are two new arrays:

``````A = array([[1, 1, 1],
[2, 2, 2],
[5, 5, 5]])

B = array([[0, 1, 0],
[1, 1, 0],
[1, 1, 1]])
``````

We will compute the dot product using `np.einsum("ij,jk->ik", A, B)`. Here"s a picture showing the labelling of the `A` and `B` and the output array that we get from the function:

You can see that label `j` is repeated - this means we"re multiplying the rows of `A` with the columns of `B`. Furthermore, the label `j` is not included in the output - we"re summing these products. Labels `i` and `k` are kept for the output, so we get back a 2D array.

It might be even clearer to compare this result with the array where the label `j` is not summed. Below, on the left you can see the 3D array that results from writing `np.einsum("ij,jk->ijk", A, B)` (i.e. we"ve kept label `j`):

Summing axis `j` gives the expected dot product, shown on the right.

## Some exercises

To get more of a feel for `einsum`, it can be useful to implement familiar NumPy array operations using the subscript notation. Anything that involves combinations of multiplying and summing axes can be written using `einsum`.

Let A and B be two 1D arrays with the same length. For example, `A = np.arange(10)` and `B = np.arange(5, 15)`.

• The sum of `A` can be written:

``````np.einsum("i->", A)
``````
• Element-wise multiplication, `A * B`, can be written:

``````np.einsum("i,i->i", A, B)
``````
• The inner product or dot product, `np.inner(A, B)` or `np.dot(A, B)`, can be written:

``````np.einsum("i,i->", A, B) # or just use "i,i"
``````
• The outer product, `np.outer(A, B)`, can be written:

``````np.einsum("i,j->ij", A, B)
``````

For 2D arrays, `C` and `D`, provided that the axes are compatible lengths (both the same length or one of them of has length 1), here are a few examples:

• The trace of `C` (sum of main diagonal), `np.trace(C)`, can be written:

``````np.einsum("ii", C)
``````
• Element-wise multiplication of `C` and the transpose of `D`, `C * D.T`, can be written:

``````np.einsum("ij,ji->ij", C, D)
``````
• Multiplying each element of `C` by the array `D` (to make a 4D array), `C[:, :, None, None] * D`, can be written:

``````np.einsum("ij,kl->ijkl", C, D)
``````

`root` is the old (pre-conda 4.4) name for the main environment; after conda 4.4, it was renamed to be `base`. source

## What 95% of people actually want

In most cases what you want to do when you say that you want to update Anaconda is to execute the command:

``````conda update --all
``````

(But this should be preceeded by `conda update -n base conda` so you have the latest `conda` version installed)

This will update all packages in the current environment to the latest version -- with the small print being that it may use an older version of some packages in order to satisfy dependency constraints (often this won"t be necessary and when it is necessary the package plan solver will do its best to minimize the impact).

This needs to be executed from the command line, and the best way to get there is from Anaconda Navigator, then the "Environments" tab, then click on the triangle beside the `base` environment, selecting "Open Terminal":

This operation will only update the one selected environment (in this case, the `base` environment). If you have other environments you"d like to update you can repeat the process above, but first click on the environment. When it is selected there is a triangular marker on the right (see image above, step 3). Or from the command line you can provide the environment name (`-n envname`) or path (`-p /path/to/env`), for example to update your `dspyr` environment from the screenshot above:

``````conda update -n dspyr --all
``````

## Update individual packages

If you are only interested in updating an individual package then simply click on the blue arrow or blue version number in Navigator, e.g. for `astroid` or `astropy` in the screenshot above, and this will tag those packages for an upgrade. When you are done you need to click the "Apply" button:

Or from the command line:

``````conda update astroid astropy
``````

## Updating just the packages in the standard Anaconda Distribution

If you don"t care about package versions and just want "the latest set of all packages in the standard Anaconda Distribution, so long as they work together", then you should take a look at this gist.

## Why updating the Anaconda package is almost always a bad idea

In most cases updating the Anaconda package in the package list will have a surprising result: you may actually downgrade many packages (in fact, this is likely if it indicates the version as `custom`). The gist above provides details.

## Leverage conda environments

Your `base` environment is probably not a good place to try and manage an exact set of packages: it is going to be a dynamic working space with new packages installed and packages randomly updated. If you need an exact set of packages then create a conda environment to hold them. Thanks to the conda package cache and the way file linking is used doing this is typically i) fast and ii) consumes very little additional disk space. E.g.

``````conda create -n myspecialenv -c bioconda -c conda-forge python=3.5 pandas beautifulsoup seaborn nltk
``````

The conda documentation has more details and examples.

## pip, PyPI, and setuptools?

None of this is going to help with updating packages that have been installed from PyPI via `pip` or any packages installed using `python setup.py install`. `conda list` will give you some hints about the pip-based Python packages you have in an environment, but it won"t do anything special to update them.

## Commercial use of Anaconda or Anaconda Enterprise

It is pretty much exactly the same story, with the exception that you may not be able to update the `base` environment if it was installed by someone else (say to `/opt/anaconda/latest`). If you"re not able to update the environments you are using you should be able to clone and then update:

``````conda create -n myenv --clone base
conda update -n myenv --all
``````

First consider if you really need to iterate over rows in a DataFrame. See this answer for alternatives.

If you still need to iterate over rows, you can use methods below. Note some important caveats which are not mentioned in any of the other answers.

• DataFrame.iterrows()

``````for index, row in df.iterrows():
print(row["c1"], row["c2"])
``````
• DataFrame.itertuples()

``````for row in df.itertuples(index=True, name="Pandas"):
print(row.c1, row.c2)
``````

`itertuples()` is supposed to be faster than `iterrows()`

But be aware, according to the docs (pandas 0.24.2 at the moment):

• iterrows: `dtype` might not match from row to row

Because iterrows returns a Series for each row, it does not preserve dtypes across the rows (dtypes are preserved across columns for DataFrames). To preserve dtypes while iterating over the rows, it is better to use itertuples() which returns namedtuples of the values and which is generally much faster than iterrows()

• iterrows: Do not modify rows

You should never modify something you are iterating over. This is not guaranteed to work in all cases. Depending on the data types, the iterator returns a copy and not a view, and writing to it will have no effect.

``````new_df = df.apply(lambda x: x * 2)
``````
• itertuples:

The column names will be renamed to positional names if they are invalid Python identifiers, repeated, or start with an underscore. With a large number of columns (>255), regular tuples are returned.

See pandas docs on iteration for more details.

Client and Resource are two different abstractions within the boto3 SDK for making AWS service requests. You would typically choose to use either the Client abstraction or the Resource abstraction. I"ve outlined the differences between Client and Resource below to help readers decide which to use.

Session is largely orthogonal to the concepts of Client and Resource (but is used by both).

Here"s some more detailed information on what Client, Resource, and Session are all about.

Client:

• this is the original boto3 API abstraction
• provides low-level AWS service access
• all AWS service operations are supported by clients
• exposes botocore client to the developer
• typically maps 1:1 with the AWS service API
• snake-cased method names (e.g. ListBuckets API => list_buckets method)
• generated from AWS service description

``````import boto3

client = boto3.client("s3")
response = client.list_objects_v2(Bucket="mybucket")
for content in response["Contents"]:
obj_dict = client.get_object(Bucket="mybucket", Key=content["Key"])
print(content["Key"], obj_dict["LastModified"])
``````

Note: this client-level code is limited to listing at most 1000 objects. You would have to use a paginator, or implement your own loop, calling list_objects_v2() repeatedly with a continuation marker if there were more than 1000 objects.

Resource:

• this is the newer boto3 API abstraction
• provides high-level, object-oriented API
• does not provide 100% API coverage of AWS services
• uses identifiers and attributes
• has actions (operations on resources)
• exposes subresources and collections of AWS resources
• generated from resource description

Here"s the equivalent example using resource-level access to an S3 bucket"s objects (all):

``````import boto3

s3 = boto3.resource("s3")
bucket = s3.Bucket("mybucket")
for obj in bucket.objects.all():
print(obj.key, obj.last_modified)
``````

Note: in this case you do not have to make a second API call to get the objects; they"re available to you as a collection on the bucket. These collections of subresources are lazily-loaded.

You can see that the `Resource` version of the code is much simpler, more compact, and has more capability (it does pagination for you). The `Client` version of the code would actually be more complicated than shown above if you wanted to include pagination.

Session:

• stores configuration information (primarily credentials and selected region)
• allows you to create service clients and resources
• boto3 creates a default session for you when needed

## Best way to check if a list is empty

For example, if passed the following:

``````a = []
``````

How do I check to see if a is empty?

Place the list in a boolean context (for example, with an `if` or `while` statement). It will test `False` if it is empty, and `True` otherwise. For example:

``````if not a:                           # do this!
print("a is an empty list")
``````

## PEP 8

PEP 8, the official Python style guide for Python code in Python"s standard library, asserts:

For sequences, (strings, lists, tuples), use the fact that empty sequences are false.

``````Yes: if not seq:
if seq:

No: if len(seq):
if not len(seq):
``````

We should expect that standard library code should be as performant and correct as possible. But why is that the case, and why do we need this guidance?

## Explanation

I frequently see code like this from experienced programmers new to Python:

``````if len(a) == 0:                     # Don"t do this!
print("a is an empty list")
``````

And users of lazy languages may be tempted to do this:

``````if a == []:                         # Don"t do this!
print("a is an empty list")
``````

These are correct in their respective other languages. And this is even semantically correct in Python.

But we consider it un-Pythonic because Python supports these semantics directly in the list object"s interface via boolean coercion.

From the docs (and note specifically the inclusion of the empty list, `[]`):

By default, an object is considered true unless its class defines either a `__bool__()` method that returns `False` or a `__len__()` method that returns zero, when called with the object. Here are most of the built-in objects considered false:

• constants defined to be false: `None` and `False`.
• zero of any numeric type: `0`, `0.0`, `0j`, `Decimal(0)`, `Fraction(0, 1)`
• empty sequences and collections: `""`, `()`, `[]`, `{}`, `set()`, `range(0)`

And the datamodel documentation:

`object.__bool__(self)`

Called to implement truth value testing and the built-in operation `bool()`; should return `False` or `True`. When this method is not defined, `__len__()` is called, if it is defined, and the object is considered true if its result is nonzero. If a class defines neither `__len__()` nor `__bool__()`, all its instances are considered true.

and

`object.__len__(self)`

Called to implement the built-in function `len()`. Should return the length of the object, an integer >= 0. Also, an object that doesn‚Äôt define a `__bool__()` method and whose `__len__()` method returns zero is considered to be false in a Boolean context.

``````if len(a) == 0:                     # Don"t do this!
print("a is an empty list")
``````

or this:

``````if a == []:                     # Don"t do this!
print("a is an empty list")
``````

Do this:

``````if not a:
print("a is an empty list")
``````

## Doing what"s Pythonic usually pays off in performance:

Does it pay off? (Note that less time to perform an equivalent operation is better:)

``````>>> import timeit
>>> min(timeit.repeat(lambda: len([]) == 0, repeat=100))
0.13775854044661884
>>> min(timeit.repeat(lambda: [] == [], repeat=100))
0.0984637276455409
>>> min(timeit.repeat(lambda: not [], repeat=100))
0.07878462291455435
``````

For scale, here"s the cost of calling the function and constructing and returning an empty list, which you might subtract from the costs of the emptiness checks used above:

``````>>> min(timeit.repeat(lambda: [], repeat=100))
0.07074015751817342
``````

We see that either checking for length with the builtin function `len` compared to `0` or checking against an empty list is much less performant than using the builtin syntax of the language as documented.

Why?

For the `len(a) == 0` check:

First Python has to check the globals to see if `len` is shadowed.

Then it must call the function, load `0`, and do the equality comparison in Python (instead of with C):

``````>>> import dis
>>> dis.dis(lambda: len([]) == 0)
2 BUILD_LIST               0
4 CALL_FUNCTION            1
8 COMPARE_OP               2 (==)
10 RETURN_VALUE
``````

And for the `[] == []` it has to build an unnecessary list and then, again, do the comparison operation in Python"s virtual machine (as opposed to C)

``````>>> dis.dis(lambda: [] == [])
1           0 BUILD_LIST               0
2 BUILD_LIST               0
4 COMPARE_OP               2 (==)
6 RETURN_VALUE
``````

The "Pythonic" way is a much simpler and faster check since the length of the list is cached in the object instance header:

``````>>> dis.dis(lambda: not [])
1           0 BUILD_LIST               0
2 UNARY_NOT
4 RETURN_VALUE
``````

## Evidence from the C source and documentation

`PyVarObject`

This is an extension of `PyObject` that adds the `ob_size` field. This is only used for objects that have some notion of length. This type does not often appear in the Python/C API. It corresponds to the fields defined by the expansion of the `PyObject_VAR_HEAD` macro.

From the c source in Include/listobject.h:

``````typedef struct {
/* Vector of pointers to list elements.  list[0] is ob_item[0], etc. */
PyObject **ob_item;

/* ob_item contains space for "allocated" elements.  The number
* currently in use is ob_size.
* Invariants:
*     0 <= ob_size <= allocated
*     len(list) == ob_size
``````

I would point out that this is also true for the non-empty case though its pretty ugly as with `l=[]` then `%timeit len(l) != 0` 90.6 ns ¬± 8.3 ns, `%timeit l != []` 55.6 ns ¬± 3.09, `%timeit not not l` 38.5 ns ¬± 0.372. But there is no way anyone is going to enjoy `not not l` despite triple the speed. It looks ridiculous. But the speed wins out
I suppose the problem is testing with timeit since just `if l:` is sufficient but surprisingly `%timeit bool(l)` yields 101 ns ¬± 2.64 ns. Interesting there is no way to coerce to bool without this penalty. `%timeit l` is useless since no conversion would occur.

IPython magic, `%timeit`, is not entirely useless here:

``````In [1]: l = []

In [2]: %timeit l
20 ns ¬± 0.155 ns per loop (mean ¬± std. dev. of 7 runs, 100000000 loops each)

In [3]: %timeit not l
24.4 ns ¬± 1.58 ns per loop (mean ¬± std. dev. of 7 runs, 10000000 loops each)

In [4]: %timeit not not l
30.1 ns ¬± 2.16 ns per loop (mean ¬± std. dev. of 7 runs, 10000000 loops each)
``````

We can see there"s a bit of linear cost for each additional `not` here. We want to see the costs, ceteris paribus, that is, all else equal - where all else is minimized as far as possible:

``````In [5]: %timeit if l: pass
22.6 ns ¬± 0.963 ns per loop (mean ¬± std. dev. of 7 runs, 10000000 loops each)

In [6]: %timeit if not l: pass
24.4 ns ¬± 0.796 ns per loop (mean ¬± std. dev. of 7 runs, 10000000 loops each)

In [7]: %timeit if not not l: pass
23.4 ns ¬± 0.793 ns per loop (mean ¬± std. dev. of 7 runs, 10000000 loops each)
``````

Now let"s look at the case for an unempty list:

``````In [8]: l = [1]

In [9]: %timeit if l: pass
23.7 ns ¬± 1.06 ns per loop (mean ¬± std. dev. of 7 runs, 10000000 loops each)

In [10]: %timeit if not l: pass
23.6 ns ¬± 1.64 ns per loop (mean ¬± std. dev. of 7 runs, 10000000 loops each)

In [11]: %timeit if not not l: pass
26.3 ns ¬± 1 ns per loop (mean ¬± std. dev. of 7 runs, 10000000 loops each)
``````

What we can see here is that it makes little difference whether you pass in an actual `bool` to the condition check or the list itself, and if anything, giving the list, as is, is faster.

Python is written in C; it uses its logic at the C level. Anything you write in Python will be slower. And it will likely be orders of magnitude slower unless you"re using the mechanisms built into Python directly.

I know `object` columns `type` makes the data hard to convert with a `pandas` function. When I received the data like this, the first thing that came to mind was to "flatten" or unnest the columns .

I am using `pandas` and `python` functions for this type of question. If you are worried about the speed of the above solutions, check user3483203"s answer, since it"s using `numpy` and most of the time `numpy` is faster . I recommend `Cpython` and `numba` if speed matters.

Method 0 [pandas >= 0.25]
Starting from pandas 0.25, if you only need to explode one column, you can use the `pandas.DataFrame.explode` function:

``````df.explode("B")

A  B
0  1  1
1  1  2
0  2  1
1  2  2
``````

Given a dataframe with an empty `list` or a `NaN` in the column. An empty list will not cause an issue, but a `NaN` will need to be filled with a `list`

``````df = pd.DataFrame({"A": [1, 2, 3, 4],"B": [[1, 2], [1, 2], [], np.nan]})
df.B = df.B.fillna({i: [] for i in df.index})  # replace NaN with []
df.explode("B")

A    B
0  1    1
0  1    2
1  2    1
1  2    2
2  3  NaN
3  4  NaN
``````

Method 1
`apply + pd.Series` (easy to understand but in terms of performance not recommended . )

``````df.set_index("A").B.apply(pd.Series).stack().reset_index(level=0).rename(columns={0:"B"})
Out[463]:
A  B
0  1  1
1  1  2
0  2  1
1  2  2
``````

Method 2
Using `repeat` with `DataFrame` constructor , re-create your dataframe (good at performance, not good at multiple columns )

``````df=pd.DataFrame({"A":df.A.repeat(df.B.str.len()),"B":np.concatenate(df.B.values)})
df
Out[465]:
A  B
0  1  1
0  1  2
1  2  1
1  2  2
``````

Method 2.1
for example besides A we have A.1 .....A.n. If we still use the method(Method 2) above it is hard for us to re-create the columns one by one .

Solution : `join` or `merge` with the `index` after "unnest" the single columns

``````s=pd.DataFrame({"B":np.concatenate(df.B.values)},index=df.index.repeat(df.B.str.len()))
s.join(df.drop("B",1),how="left")
Out[477]:
B  A
0  1  1
0  2  1
1  1  2
1  2  2
``````

If you need the column order exactly the same as before, add `reindex` at the end.

``````s.join(df.drop("B",1),how="left").reindex(columns=df.columns)
``````

Method 3
recreate the `list`

``````pd.DataFrame([[x] + [z] for x, y in df.values for z in y],columns=df.columns)
Out[488]:
A  B
0  1  1
1  1  2
2  2  1
3  2  2
``````

If more than two columns, use

``````s=pd.DataFrame([[x] + [z] for x, y in zip(df.index,df.B) for z in y])
s.merge(df,left_on=0,right_index=True)
Out[491]:
0  1  A       B
0  0  1  1  [1, 2]
1  0  2  1  [1, 2]
2  1  1  2  [1, 2]
3  1  2  2  [1, 2]
``````

Method 4
using `reindex` or `loc`

``````df.reindex(df.index.repeat(df.B.str.len())).assign(B=np.concatenate(df.B.values))
Out[554]:
A  B
0  1  1
0  1  2
1  2  1
1  2  2

#df.loc[df.index.repeat(df.B.str.len())].assign(B=np.concatenate(df.B.values))
``````

Method 5
when the list only contains unique values:

``````df=pd.DataFrame({"A":[1,2],"B":[[1,2],[3,4]]})
from collections import ChainMap
d = dict(ChainMap(*map(dict.fromkeys, df["B"], df["A"])))
pd.DataFrame(list(d.items()),columns=df.columns[::-1])
Out[574]:
B  A
0  1  1
1  2  1
2  3  2
3  4  2
``````

Method 6
using `numpy` for high performance:

``````newvalues=np.dstack((np.repeat(df.A.values,list(map(len,df.B.values))),np.concatenate(df.B.values)))
pd.DataFrame(data=newvalues[0],columns=df.columns)
A  B
0  1  1
1  1  2
2  2  1
3  2  2
``````

Method 7
using base function `itertools` `cycle` and `chain`: Pure python solution just for fun

``````from itertools import cycle,chain
l=df.values.tolist()
l1=[list(zip([x[0]], cycle(x[1])) if len([x[0]]) > len(x[1]) else list(zip(cycle([x[0]]), x[1]))) for x in l]
pd.DataFrame(list(chain.from_iterable(l1)),columns=df.columns)
A  B
0  1  1
1  1  2
2  2  1
3  2  2
``````

Generalizing to multiple columns

``````df=pd.DataFrame({"A":[1,2],"B":[[1,2],[3,4]],"C":[[1,2],[3,4]]})
df
Out[592]:
A       B       C
0  1  [1, 2]  [1, 2]
1  2  [3, 4]  [3, 4]
``````

Self-def function:

``````def unnesting(df, explode):
idx = df.index.repeat(df[explode[0]].str.len())
df1 = pd.concat([
pd.DataFrame({x: np.concatenate(df[x].values)}) for x in explode], axis=1)
df1.index = idx

return df1.join(df.drop(explode, 1), how="left")

unnesting(df,["B","C"])
Out[609]:
B  C  A
0  1  1  1
0  2  2  1
1  3  3  2
1  4  4  2
``````

### Column-wise Unnesting

All above method is talking about the vertical unnesting and explode , If you do need expend the list horizontal, Check with `pd.DataFrame` constructor

``````df.join(pd.DataFrame(df.B.tolist(),index=df.index).add_prefix("B_"))
Out[33]:
A       B       C  B_0  B_1
0  1  [1, 2]  [1, 2]    1    2
1  2  [3, 4]  [3, 4]    3    4
``````

Updated function

``````def unnesting(df, explode, axis):
if axis==1:
idx = df.index.repeat(df[explode[0]].str.len())
df1 = pd.concat([
pd.DataFrame({x: np.concatenate(df[x].values)}) for x in explode], axis=1)
df1.index = idx

return df1.join(df.drop(explode, 1), how="left")
else :
df1 = pd.concat([
pd.DataFrame(df[x].tolist(), index=df.index).add_prefix(x) for x in explode], axis=1)
return df1.join(df.drop(explode, 1), how="left")
``````

Test Output

``````unnesting(df, ["B","C"], axis=0)
Out[36]:
B0  B1  C0  C1  A
0   1   2   1   2  1
1   3   4   3   4  2
``````

Update 2021-02-17 with original explode function

``````def unnesting(df, explode, axis):
if axis==1:
df1 = pd.concat([df[x].explode() for x in explode], axis=1)
return df1.join(df.drop(explode, 1), how="left")
else :
df1 = pd.concat([
pd.DataFrame(df[x].tolist(), index=df.index).add_prefix(x) for x in explode], axis=1)
return df1.join(df.drop(explode, 1), how="left")
``````