# Python | List of tuples to dictionary conversion

__dict__ | Python Methods and Functions | to_dict

Let's discuss some of the ways this can be accomplished.

Method # 1: Using Dictionary Comprehension

This problem can be solved by using the shorthand a dictionary dictionary that executes the classic naive one-line looping method within the dictionary.

 ` # Python3 demo code ` ` # List of converting tuple to dictionary ` ` # using comprehension list `   ` # initializing list ` ` test_list ` ` = ` ` [(` ` 'Nikhil' ` `, ` ` 21 ` `, ` ` 'JIIT' ` < code class = "plain">), ( ` 'Akash' ` `, ` ` 22 ` `, ` ` 'JIIT' ` `), (` ` 'Akshat' ` `, ` ` 22 ` `, ` ` 'JIIT' ` `)] ` ` `  ` # print original list ` ` print ` ` ( ` ` "The original list:" ` ` + ` ` str ` ` (test_list)) `   ` # using comprehension list ` ` # List of converting tuple to dictionary ` ` res ` ` = < / code> {sub [ 0 ]: sub [ 1 :] for sub in test_list} ``   # print result print ( "The dictionary after conversion:" + str (res)) `

Exit:

The original list: [('Nikhil', 21, 'JIIT'), ('Akash', 22, 'JIIT'), ('Akshat', 22, 'JIIT')]
The dictionary after conversion: {'Nikhil': (21, 'JIIT'), 'Akshat': (22, 'JIIT'), 'Akash': (22, 'JIIT')}

Method # 2: Using ` dict () ` + dictionary understanding diction

Performs a task similar to the method described above, the only difference is in the way the dictionary is created. In the above method, a dictionary is created using comprehension, here the ` dict ` function is used to create a dictionary.

 ` # Python3 demo code ` ` # List of converting tuple to dictionary ` ` # using dict () + dictionary `   ` # initializing list ` ` test_list ` ` = ` ` [(` ` 'Nikhil' ` `, ` ` 21 ` `, ` ` 'JIIT' ` `), (` ` 'Akash' ` `, ` ` 22 ` `, ` `' JIIT' < / code> ), ( 'Akshat' , 22 , 'JIIT' )] ``   # print original list print ( "The original list:" + str (test_list))   # using dict () + dictionary # List of converting tuple to dictionary res = dict ((idx [ 0 ], idx [ 1 :]) for idx in test_list)    # print result print ( "The dictionary after conversion:" + str (res)) `

Output:

The original list: [('Nikhil' , 21, 'JIIT'), ('Akash', 22, 'JIIT'), ('Akshat', 22, 'JIIT')]
The dictionary after conversion: {'Nikhil': (21, ' JIIT '),' Akshat ': (22,' JIIT '),' Akash ': (22,' JIIT ')}

## How do I merge two dictionaries in a single expression (taking union of dictionaries)?

### Question by Carl Meyer

I have two Python dictionaries, and I want to write a single expression that returns these two dictionaries, merged (i.e. taking the union). The `update()` method would be what I need, if it returned its result instead of modifying a dictionary in-place.

``````>>> x = {"a": 1, "b": 2}
>>> y = {"b": 10, "c": 11}
>>> z = x.update(y)
>>> print(z)
None
>>> x
{"a": 1, "b": 10, "c": 11}
``````

How can I get that final merged dictionary in `z`, not `x`?

(To be extra-clear, the last-one-wins conflict-handling of `dict.update()` is what I"m looking for as well.)

## Iterating over dictionaries using "for" loops

I am a bit puzzled by the following code:

``````d = {"x": 1, "y": 2, "z": 3}
for key in d:
print (key, "corresponds to", d[key])
``````

What I don"t understand is the `key` portion. How does Python recognize that it needs only to read the key from the dictionary? Is `key` a special word in Python? Or is it simply a variable?

## How do I sort a dictionary by value?

### Question by FKCoder

I have a dictionary of values read from two fields in a database: a string field and a numeric field. The string field is unique, so that is the key of the dictionary.

I can sort on the keys, but how can I sort based on the values?

Note: I have read Stack Overflow question here How do I sort a list of dictionaries by a value of the dictionary? and probably could change my code to have a list of dictionaries, but since I do not really need a list of dictionaries I wanted to know if there is a simpler solution to sort either in ascending or descending order.

## How can I add new keys to a dictionary?

Is it possible to add a key to a Python dictionary after it has been created?

It doesn"t seem to have an `.add()` method.

## Check if a given key already exists in a dictionary

I wanted to test if a key exists in a dictionary before updating the value for the key. I wrote the following code:

``````if "key1" in dict.keys():
print "blah"
else:
print "boo"
``````

I think this is not the best way to accomplish this task. Is there a better way to test for a key in the dictionary?

## How do I sort a list of dictionaries by a value of the dictionary?

I have a list of dictionaries and want each item to be sorted by a specific value.

Take into consideration the list:

``````[{"name":"Homer", "age":39}, {"name":"Bart", "age":10}]
``````

When sorted by `name`, it should become:

``````[{"name":"Bart", "age":10}, {"name":"Homer", "age":39}]
``````

## How can I remove a key from a Python dictionary?

When deleting a key from a dictionary, I use:

``````if "key" in my_dict:
del my_dict["key"]
``````

Is there a one line way of doing this?

## Delete an element from a dictionary

Is there a way to delete an item from a dictionary in Python?

Additionally, how can I delete an item from a dictionary to return a copy (i.e., not modifying the original)?

## How do I convert two lists into a dictionary?

### Question by Guido García

Imagine that you have the following list.

``````keys = ["name", "age", "food"]
values = ["Monty", 42, "spam"]
``````

What is the simplest way to produce the following dictionary?

``````a_dict = {"name": "Monty", "age": 42, "food": "spam"}
``````

## Create a dictionary with list comprehension

I like the Python list comprehension syntax.

Can it be used to create dictionaries too? For example, by iterating over pairs of keys and values:

``````mydict = {(k,v) for (k,v) in blah blah blah}  # doesn"t work
``````

# In Python, what is the purpose of `__slots__` and what are the cases one should avoid this?

## TLDR:

The special attribute `__slots__` allows you to explicitly state which instance attributes you expect your object instances to have, with the expected results:

1. faster attribute access.
2. space savings in memory.

The space savings is from

1. Storing value references in slots instead of `__dict__`.
2. Denying `__dict__` and `__weakref__` creation if parent classes deny them and you declare `__slots__`.

### Quick Caveats

Small caveat, you should only declare a particular slot one time in an inheritance tree. For example:

``````class Base:
__slots__ = "foo", "bar"

class Right(Base):
__slots__ = "baz",

class Wrong(Base):
__slots__ = "foo", "bar", "baz"        # redundant foo and bar
``````

Python doesn"t object when you get this wrong (it probably should), problems might not otherwise manifest, but your objects will take up more space than they otherwise should. Python 3.8:

``````>>> from sys import getsizeof
>>> getsizeof(Right()), getsizeof(Wrong())
(56, 72)
``````

This is because the Base"s slot descriptor has a slot separate from the Wrong"s. This shouldn"t usually come up, but it could:

``````>>> w = Wrong()
>>> w.foo = "foo"
>>> Base.foo.__get__(w)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
AttributeError: foo
>>> Wrong.foo.__get__(w)
"foo"
``````

The biggest caveat is for multiple inheritance - multiple "parent classes with nonempty slots" cannot be combined.

To accommodate this restriction, follow best practices: Factor out all but one or all parents" abstraction which their concrete class respectively and your new concrete class collectively will inherit from - giving the abstraction(s) empty slots (just like abstract base classes in the standard library).

See section on multiple inheritance below for an example.

### Requirements:

• To have attributes named in `__slots__` to actually be stored in slots instead of a `__dict__`, a class must inherit from `object` (automatic in Python 3, but must be explicit in Python 2).

• To prevent the creation of a `__dict__`, you must inherit from `object` and all classes in the inheritance must declare `__slots__` and none of them can have a `"__dict__"` entry.

There are a lot of details if you wish to keep reading.

## Why use `__slots__`: Faster attribute access.

The creator of Python, Guido van Rossum, states that he actually created `__slots__` for faster attribute access.

It is trivial to demonstrate measurably significant faster access:

``````import timeit

class Foo(object): __slots__ = "foo",

class Bar(object): pass

slotted = Foo()
not_slotted = Bar()

def get_set_delete_fn(obj):
def get_set_delete():
obj.foo = "foo"
obj.foo
del obj.foo
return get_set_delete
``````

and

``````>>> min(timeit.repeat(get_set_delete_fn(slotted)))
0.2846834529991611
>>> min(timeit.repeat(get_set_delete_fn(not_slotted)))
0.3664822799983085
``````

The slotted access is almost 30% faster in Python 3.5 on Ubuntu.

``````>>> 0.3664822799983085 / 0.2846834529991611
1.2873325658284342
``````

In Python 2 on Windows I have measured it about 15% faster.

## Why use `__slots__`: Memory Savings

Another purpose of `__slots__` is to reduce the space in memory that each object instance takes up.

The space saved over using `__dict__` can be significant.

SQLAlchemy attributes a lot of memory savings to `__slots__`.

To verify this, using the Anaconda distribution of Python 2.7 on Ubuntu Linux, with `guppy.hpy` (aka heapy) and `sys.getsizeof`, the size of a class instance without `__slots__` declared, and nothing else, is 64 bytes. That does not include the `__dict__`. Thank you Python for lazy evaluation again, the `__dict__` is apparently not called into existence until it is referenced, but classes without data are usually useless. When called into existence, the `__dict__` attribute is a minimum of 280 bytes additionally.

In contrast, a class instance with `__slots__` declared to be `()` (no data) is only 16 bytes, and 56 total bytes with one item in slots, 64 with two.

For 64 bit Python, I illustrate the memory consumption in bytes in Python 2.7 and 3.6, for `__slots__` and `__dict__` (no slots defined) for each point where the dict grows in 3.6 (except for 0, 1, and 2 attributes):

``````       Python 2.7             Python 3.6
attrs  __slots__  __dict__*   __slots__  __dict__* | *(no slots defined)
none   16         56 + 272‚Ä†   16         56 + 112‚Ä† | ‚Ä†if __dict__ referenced
one    48         56 + 272    48         56 + 112
two    56         56 + 272    56         56 + 112
six    88         56 + 1040   88         56 + 152
11     128        56 + 1040   128        56 + 240
22     216        56 + 3344   216        56 + 408
43     384        56 + 3344   384        56 + 752
``````

So, in spite of smaller dicts in Python 3, we see how nicely `__slots__` scale for instances to save us memory, and that is a major reason you would want to use `__slots__`.

Just for completeness of my notes, note that there is a one-time cost per slot in the class"s namespace of 64 bytes in Python 2, and 72 bytes in Python 3, because slots use data descriptors like properties, called "members".

``````>>> Foo.foo
<member "foo" of "Foo" objects>
>>> type(Foo.foo)
<class "member_descriptor">
>>> getsizeof(Foo.foo)
72
``````

## Demonstration of `__slots__`:

To deny the creation of a `__dict__`, you must subclass `object`. Everything subclasses `object` in Python 3, but in Python 2 you had to be explicit:

``````class Base(object):
__slots__ = ()
``````

now:

``````>>> b = Base()
>>> b.a = "a"
Traceback (most recent call last):
File "<pyshell#38>", line 1, in <module>
b.a = "a"
AttributeError: "Base" object has no attribute "a"
``````

Or subclass another class that defines `__slots__`

``````class Child(Base):
__slots__ = ("a",)
``````

and now:

``````c = Child()
c.a = "a"
``````

but:

``````>>> c.b = "b"
Traceback (most recent call last):
File "<pyshell#42>", line 1, in <module>
c.b = "b"
AttributeError: "Child" object has no attribute "b"
``````

To allow `__dict__` creation while subclassing slotted objects, just add `"__dict__"` to the `__slots__` (note that slots are ordered, and you shouldn"t repeat slots that are already in parent classes):

``````class SlottedWithDict(Child):
__slots__ = ("__dict__", "b")

swd = SlottedWithDict()
swd.a = "a"
swd.b = "b"
swd.c = "c"
``````

and

``````>>> swd.__dict__
{"c": "c"}
``````

Or you don"t even need to declare `__slots__` in your subclass, and you will still use slots from the parents, but not restrict the creation of a `__dict__`:

``````class NoSlots(Child): pass
ns = NoSlots()
ns.a = "a"
ns.b = "b"
``````

And:

``````>>> ns.__dict__
{"b": "b"}
``````

However, `__slots__` may cause problems for multiple inheritance:

``````class BaseA(object):
__slots__ = ("a",)

class BaseB(object):
__slots__ = ("b",)
``````

Because creating a child class from parents with both non-empty slots fails:

``````>>> class Child(BaseA, BaseB): __slots__ = ()
Traceback (most recent call last):
File "<pyshell#68>", line 1, in <module>
class Child(BaseA, BaseB): __slots__ = ()
TypeError: Error when calling the metaclass bases
multiple bases have instance lay-out conflict
``````

If you run into this problem, You could just remove `__slots__` from the parents, or if you have control of the parents, give them empty slots, or refactor to abstractions:

``````from abc import ABC

class AbstractA(ABC):
__slots__ = ()

class BaseA(AbstractA):
__slots__ = ("a",)

class AbstractB(ABC):
__slots__ = ()

class BaseB(AbstractB):
__slots__ = ("b",)

class Child(AbstractA, AbstractB):
__slots__ = ("a", "b")

c = Child() # no problem!
``````

### Add `"__dict__"` to `__slots__` to get dynamic assignment:

``````class Foo(object):
__slots__ = "bar", "baz", "__dict__"
``````

and now:

``````>>> foo = Foo()
>>> foo.boink = "boink"
``````

So with `"__dict__"` in slots we lose some of the size benefits with the upside of having dynamic assignment and still having slots for the names we do expect.

When you inherit from an object that isn"t slotted, you get the same sort of semantics when you use `__slots__` - names that are in `__slots__` point to slotted values, while any other values are put in the instance"s `__dict__`.

Avoiding `__slots__` because you want to be able to add attributes on the fly is actually not a good reason - just add `"__dict__"` to your `__slots__` if this is required.

You can similarly add `__weakref__` to `__slots__` explicitly if you need that feature.

### Set to empty tuple when subclassing a namedtuple:

The namedtuple builtin make immutable instances that are very lightweight (essentially, the size of tuples) but to get the benefits, you need to do it yourself if you subclass them:

``````from collections import namedtuple
class MyNT(namedtuple("MyNT", "bar baz")):
"""MyNT is an immutable and lightweight object"""
__slots__ = ()
``````

usage:

``````>>> nt = MyNT("bar", "baz")
>>> nt.bar
"bar"
>>> nt.baz
"baz"
``````

And trying to assign an unexpected attribute raises an `AttributeError` because we have prevented the creation of `__dict__`:

``````>>> nt.quux = "quux"
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
AttributeError: "MyNT" object has no attribute "quux"
``````

You can allow `__dict__` creation by leaving off `__slots__ = ()`, but you can"t use non-empty `__slots__` with subtypes of tuple.

## Biggest Caveat: Multiple inheritance

Even when non-empty slots are the same for multiple parents, they cannot be used together:

``````class Foo(object):
__slots__ = "foo", "bar"
class Bar(object):
__slots__ = "foo", "bar" # alas, would work if empty, i.e. ()

>>> class Baz(Foo, Bar): pass
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: Error when calling the metaclass bases
multiple bases have instance lay-out conflict
``````

Using an empty `__slots__` in the parent seems to provide the most flexibility, allowing the child to choose to prevent or allow (by adding `"__dict__"` to get dynamic assignment, see section above) the creation of a `__dict__`:

``````class Foo(object): __slots__ = ()
class Bar(object): __slots__ = ()
class Baz(Foo, Bar): __slots__ = ("foo", "bar")
b = Baz()
b.foo, b.bar = "foo", "bar"
``````

You don"t have to have slots - so if you add them, and remove them later, it shouldn"t cause any problems.

Going out on a limb here: If you"re composing mixins or using abstract base classes, which aren"t intended to be instantiated, an empty `__slots__` in those parents seems to be the best way to go in terms of flexibility for subclassers.

To demonstrate, first, let"s create a class with code we"d like to use under multiple inheritance

``````class AbstractBase:
__slots__ = ()
def __init__(self, a, b):
self.a = a
self.b = b
def __repr__(self):
return f"{type(self).__name__}({repr(self.a)}, {repr(self.b)})"
``````

We could use the above directly by inheriting and declaring the expected slots:

``````class Foo(AbstractBase):
__slots__ = "a", "b"
``````

But we don"t care about that, that"s trivial single inheritance, we need another class we might also inherit from, maybe with a noisy attribute:

``````class AbstractBaseC:
__slots__ = ()
@property
def c(self):
print("getting c!")
return self._c
@c.setter
def c(self, arg):
print("setting c!")
self._c = arg
``````

Now if both bases had nonempty slots, we couldn"t do the below. (In fact, if we wanted, we could have given `AbstractBase` nonempty slots a and b, and left them out of the below declaration - leaving them in would be wrong):

``````class Concretion(AbstractBase, AbstractBaseC):
__slots__ = "a b _c".split()
``````

And now we have functionality from both via multiple inheritance, and can still deny `__dict__` and `__weakref__` instantiation:

``````>>> c = Concretion("a", "b")
>>> c.c = c
setting c!
>>> c.c
getting c!
Concretion("a", "b")
>>> c.d = "d"
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
AttributeError: "Concretion" object has no attribute "d"
``````

## Other cases to avoid slots:

• Avoid them when you want to perform `__class__` assignment with another class that doesn"t have them (and you can"t add them) unless the slot layouts are identical. (I am very interested in learning who is doing this and why.)
• Avoid them if you want to subclass variable length builtins like long, tuple, or str, and you want to add attributes to them.
• Avoid them if you insist on providing default values via class attributes for instance variables.

You may be able to tease out further caveats from the rest of the `__slots__` documentation (the 3.7 dev docs are the most current), which I have made significant recent contributions to.

## Critiques of other answers

The current top answers cite outdated information and are quite hand-wavy and miss the mark in some important ways.

### Do not "only use `__slots__` when instantiating lots of objects"

I quote:

"You would want to use `__slots__` if you are going to instantiate a lot (hundreds, thousands) of objects of the same class."

Abstract Base Classes, for example, from the `collections` module, are not instantiated, yet `__slots__` are declared for them.

Why?

If a user wishes to deny `__dict__` or `__weakref__` creation, those things must not be available in the parent classes.

`__slots__` contributes to reusability when creating interfaces or mixins.

It is true that many Python users aren"t writing for reusability, but when you are, having the option to deny unnecessary space usage is valuable.

### `__slots__` doesn"t break pickling

When pickling a slotted object, you may find it complains with a misleading `TypeError`:

``````>>> pickle.loads(pickle.dumps(f))
TypeError: a class that defines __slots__ without defining __getstate__ cannot be pickled
``````

This is actually incorrect. This message comes from the oldest protocol, which is the default. You can select the latest protocol with the `-1` argument. In Python 2.7 this would be `2` (which was introduced in 2.3), and in 3.6 it is `4`.

``````>>> pickle.loads(pickle.dumps(f, -1))
<__main__.Foo object at 0x1129C770>
``````

in Python 2.7:

``````>>> pickle.loads(pickle.dumps(f, 2))
<__main__.Foo object at 0x1129C770>
``````

in Python 3.6

``````>>> pickle.loads(pickle.dumps(f, 4))
<__main__.Foo object at 0x1129C770>
``````

So I would keep this in mind, as it is a solved problem.

## Critique of the (until Oct 2, 2016) accepted answer

The first paragraph is half short explanation, half predictive. Here"s the only part that actually answers the question

The proper use of `__slots__` is to save space in objects. Instead of having a dynamic dict that allows adding attributes to objects at anytime, there is a static structure which does not allow additions after creation. This saves the overhead of one dict for every object that uses slots

The second half is wishful thinking, and off the mark:

While this is sometimes a useful optimization, it would be completely unnecessary if the Python interpreter was dynamic enough so that it would only require the dict when there actually were additions to the object.

Python actually does something similar to this, only creating the `__dict__` when it is accessed, but creating lots of objects with no data is fairly ridiculous.

The second paragraph oversimplifies and misses actual reasons to avoid `__slots__`. The below is not a real reason to avoid slots (for actual reasons, see the rest of my answer above.):

They change the behavior of the objects that have slots in a way that can be abused by control freaks and static typing weenies.

It then goes on to discuss other ways of accomplishing that perverse goal with Python, not discussing anything to do with `__slots__`.

The third paragraph is more wishful thinking. Together it is mostly off-the-mark content that the answerer didn"t even author and contributes to ammunition for critics of the site.

# Memory usage evidence

Create some normal objects and slotted objects:

``````>>> class Foo(object): pass
>>> class Bar(object): __slots__ = ()
``````

Instantiate a million of them:

``````>>> foos = [Foo() for f in xrange(1000000)]
>>> bars = [Bar() for b in xrange(1000000)]
``````

Inspect with `guppy.hpy().heap()`:

``````>>> guppy.hpy().heap()
Partition of a set of 2028259 objects. Total size = 99763360 bytes.
Index  Count   %     Size   % Cumulative  % Kind (class / dict of class)
0 1000000  49 64000000  64  64000000  64 __main__.Foo
1     169   0 16281480  16  80281480  80 list
2 1000000  49 16000000  16  96281480  97 __main__.Bar
3   12284   1   987472   1  97268952  97 str
...
``````

Access the regular objects and their `__dict__` and inspect again:

``````>>> for f in foos:
...     f.__dict__
>>> guppy.hpy().heap()
Partition of a set of 3028258 objects. Total size = 379763480 bytes.
Index  Count   %      Size    % Cumulative  % Kind (class / dict of class)
0 1000000  33 280000000  74 280000000  74 dict of __main__.Foo
1 1000000  33  64000000  17 344000000  91 __main__.Foo
2     169   0  16281480   4 360281480  95 list
3 1000000  33  16000000   4 376281480  99 __main__.Bar
4   12284   0    987472   0 377268952  99 str
...
``````

This is consistent with the history of Python, from Unifying types and classes in Python 2.2

If you subclass a built-in type, extra space is automatically added to the instances to accomodate `__dict__` and `__weakrefs__`. (The `__dict__` is not initialized until you use it though, so you shouldn"t worry about the space occupied by an empty dictionary for each instance you create.) If you don"t need this extra space, you can add the phrase "`__slots__ = []`" to your class.

# `os.listdir()` - list in the current directory

With listdir in os module you get the files and the folders in the current dir

`````` import os
arr = os.listdir()
print(arr)

>>> ["\$RECYCLE.BIN", "work.txt", "3ebooks.txt", "documents"]
``````

## Looking in a directory

``````arr = os.listdir("c:\files")
``````

# `glob` from glob

with glob you can specify a type of file to list like this

``````import glob

txtfiles = []
for file in glob.glob("*.txt"):
txtfiles.append(file)
``````

## `glob` in a list comprehension

``````mylist = [f for f in glob.glob("*.txt")]
``````

## get the full path of only files in the current directory

``````import os
from os import listdir
from os.path import isfile, join

cwd = os.getcwd()
onlyfiles = [os.path.join(cwd, f) for f in os.listdir(cwd) if
os.path.isfile(os.path.join(cwd, f))]
print(onlyfiles)

["G:\getfilesname\getfilesname.py", "G:\getfilesname\example.txt"]
``````

## Getting the full path name with `os.path.abspath`

You get the full path in return

`````` import os
files_path = [os.path.abspath(x) for x in os.listdir()]
print(files_path)

["F:\documentiapplications.txt", "F:\documenticollections.txt"]
``````

## Walk: going through sub directories

os.walk returns the root, the directories list and the files list, that is why I unpacked them in r, d, f in the for loop; it, then, looks for other files and directories in the subfolders of the root and so on until there are no subfolders.

``````import os

# Getting the current work directory (cwd)
thisdir = os.getcwd()

# r=root, d=directories, f = files
for r, d, f in os.walk(thisdir):
for file in f:
if file.endswith(".docx"):
print(os.path.join(r, file))
``````

### `os.listdir()`: get files in the current directory (Python 2)

In Python 2, if you want the list of the files in the current directory, you have to give the argument as "." or os.getcwd() in the os.listdir method.

`````` import os
arr = os.listdir(".")
print(arr)

>>> ["\$RECYCLE.BIN", "work.txt", "3ebooks.txt", "documents"]
``````

### To go up in the directory tree

``````# Method 1
x = os.listdir("..")

# Method 2
x= os.listdir("/")
``````

### Get files: `os.listdir()` in a particular directory (Python 2 and 3)

`````` import os
arr = os.listdir("F:\python")
print(arr)

>>> ["\$RECYCLE.BIN", "work.txt", "3ebooks.txt", "documents"]
``````

### Get files of a particular subdirectory with `os.listdir()`

``````import os

x = os.listdir("./content")
``````

### `os.walk(".")` - current directory

`````` import os
arr = next(os.walk("."))[2]
print(arr)

>>> ["5bs_Turismo1.pdf", "5bs_Turismo1.pptx", "esperienza.txt"]
``````

### `next(os.walk("."))` and `os.path.join("dir", "file")`

`````` import os
arr = []
for d,r,f in next(os.walk("F:\_python")):
for file in f:
arr.append(os.path.join(r,file))

for f in arr:
print(files)

>>> F:\_python\dict_class.py
>>> F:\_python\programmi.txt
``````

### `next(os.walk("F:\")` - get the full path - list comprehension

`````` [os.path.join(r,file) for r,d,f in next(os.walk("F:\_python")) for file in f]

>>> ["F:\_python\dict_class.py", "F:\_python\programmi.txt"]
``````

### `os.walk` - get full path - all files in sub dirs**

``````x = [os.path.join(r,file) for r,d,f in os.walk("F:\_python") for file in f]
print(x)

>>> ["F:\_python\dict.py", "F:\_python\progr.txt", "F:\_python\readl.py"]
``````

### `os.listdir()` - get only txt files

`````` arr_txt = [x for x in os.listdir() if x.endswith(".txt")]
print(arr_txt)

>>> ["work.txt", "3ebooks.txt"]
``````

## Using `glob` to get the full path of the files

If I should need the absolute path of the files:

``````from path import path
from glob import glob
x = [path(f).abspath() for f in glob("F:\*.txt")]
for f in x:
print(f)

>>> F:acquistionline.txt
>>> F:acquisti_2018.txt
>>> F:ootstrap_jquery_ecc.txt
``````

## Using `os.path.isfile` to avoid directories in the list

``````import os.path
listOfFiles = [f for f in os.listdir() if os.path.isfile(f)]
print(listOfFiles)

>>> ["a simple game.py", "data.txt", "decorator.py"]
``````

## Using `pathlib` from Python 3.4

``````import pathlib

flist = []
for p in pathlib.Path(".").iterdir():
if p.is_file():
print(p)
flist.append(p)

>>> error.PNG
>>> exemaker.bat
>>> guiprova.mp3
>>> setup.py
>>> speak_gui2.py
>>> thumb.PNG
``````

With `list comprehension`:

``````flist = [p for p in pathlib.Path(".").iterdir() if p.is_file()]
``````

Alternatively, use `pathlib.Path()` instead of `pathlib.Path(".")`

## Use glob method in pathlib.Path()

``````import pathlib

py = pathlib.Path().glob("*.py")
for file in py:
print(file)

>>> stack_overflow_list.py
>>> stack_overflow_list_tkinter.py
``````

## Get all and only files with os.walk

``````import os
x = [i[2] for i in os.walk(".")]
y=[]
for t in x:
for f in t:
y.append(f)
print(y)

>>> ["append_to_list.py", "data.txt", "data1.txt", "data2.txt", "data_180617", "os_walk.py", "READ2.py", "read_data.py", "somma_defaltdic.py", "substitute_words.py", "sum_data.py", "data.txt", "data1.txt", "data_180617"]
``````

## Get only files with next and walk in a directory

`````` import os
x = next(os.walk("F://python"))[2]
print(x)

>>> ["calculator.bat","calculator.py"]
``````

## Get only directories with next and walk in a directory

`````` import os
next(os.walk("F://python"))[1] # for the current dir use (".")

>>> ["python3","others"]
``````

## Get all the subdir names with `walk`

``````for r,d,f in os.walk("F:\_python"):
for dirs in d:
print(dirs)

>>> .vscode
>>> pyexcel
>>> pyschool.py
>>> subtitles
>>> _metaprogramming
>>> .ipynb_checkpoints
``````

## `os.scandir()` from Python 3.5 and greater

``````import os
x = [f.name for f in os.scandir() if f.is_file()]
print(x)

>>> ["calculator.bat","calculator.py"]

# Another example with scandir (a little variation from docs.python.org)
# This one is more efficient than os.listdir.
# In this case, it shows the files only in the current directory
# where the script is executed.

import os
with os.scandir() as i:
for entry in i:
if entry.is_file():
print(entry.name)

>>> ebookmaker.py
>>> error.PNG
>>> exemaker.bat
>>> guiprova.mp3
>>> setup.py
>>> speakgui4.py
>>> speak_gui2.py
>>> speak_gui3.py
>>> thumb.PNG
``````

# Examples:

## Ex. 1: How many files are there in the subdirectories?

In this example, we look for the number of files that are included in all the directory and its subdirectories.

``````import os

def count(dir, counter=0):
"returns number of files in dir and subdirs"
for pack in os.walk(dir):
for f in pack[2]:
counter += 1
return dir + " : " + str(counter) + "files"

print(count("F:\python"))

>>> "F:\python" : 12057 files"
``````

## Ex.2: How to copy all files from a directory to another?

A script to make order in your computer finding all files of a type (default: pptx) and copying them in a new folder.

``````import os
import shutil
from path import path

destination = "F:\file_copied"
# os.makedirs(destination)

def copyfile(dir, filetype="pptx", counter=0):
"Searches for pptx (or other - pptx is the default) files and copies them"
for pack in os.walk(dir):
for f in pack[2]:
if f.endswith(filetype):
fullpath = pack[0] + "\" + f
print(fullpath)
shutil.copy(fullpath, destination)
counter += 1
if counter > 0:
print("-" * 30)
print("	==> Found in: `" + dir + "` : " + str(counter) + " files
")

for dir in os.listdir():
"searches for folders that starts with `_`"
if dir[0] == "_":
# copyfile(dir, filetype="pdf")
copyfile(dir, filetype="txt")

>>> _compiti18Compito Contabilit√† 1conti.txt
>>> _compiti18Compito Contabilit√† 1modula4.txt
>>> _compiti18Compito Contabilit√† 1moduloa4.txt
>>> ------------------------
>>> ==> Found in: `_compiti18` : 3 files
``````

## Ex. 3: How to get all the files in a txt file

In case you want to create a txt file with all the file names:

``````import os
mylist = ""
with open("filelist.txt", "w", encoding="utf-8") as file:
for eachfile in os.listdir():
mylist += eachfile + "
"
file.write(mylist)
``````

## Example: txt with all the files of an hard drive

``````"""
We are going to save a txt file with all the files in your directory.
We will use the function walk()
"""

import os

# see all the methods of os
# print(*dir(os), sep=", ")
listafile = []
percorso = []
with open("lista_file.txt", "w", encoding="utf-8") as testo:
for root, dirs, files in os.walk("D:\"):
for file in files:
listafile.append(file)
percorso.append(root + "\" + file)
testo.write(file + "
")
listafile.sort()
print("N. of files", len(listafile))
with open("lista_file_ordinata.txt", "w", encoding="utf-8") as testo_ordinato:
for file in listafile:
testo_ordinato.write(file + "
")

with open("percorso.txt", "w", encoding="utf-8") as file_percorso:
for file in percorso:
file_percorso.write(file + "
")

os.system("lista_file.txt")
os.system("lista_file_ordinata.txt")
os.system("percorso.txt")
``````

## All the file of C: in one text file

This is a shorter version of the previous code. Change the folder where to start finding the files if you need to start from another position. This code generate a 50 mb on text file on my computer with something less then 500.000 lines with files with the complete path.

``````import os

with open("file.txt", "w", encoding="utf-8") as filewrite:
for r, d, f in os.walk("C:\"):
for file in f:
filewrite.write(f"{r + file}
")
``````

## How to write a file with all paths in a folder of a type

With this function you can create a txt file that will have the name of a type of file that you look for (ex. pngfile.txt) with all the full path of all the files of that type. It can be useful sometimes, I think.

``````import os

def searchfiles(extension=".ttf", folder="H:\"):
"Create a txt file with all the file of a type"
with open(extension[1:] + "file.txt", "w", encoding="utf-8") as filewrite:
for r, d, f in os.walk(folder):
for file in f:
if file.endswith(extension):
filewrite.write(f"{r + file}
")

# looking for png file (fonts) in the hard disk H:
searchfiles(".png", "H:\")

>>> H:4bs_18Dolphins5.png
>>> H:4bs_18Dolphins6.png
>>> H:4bs_18Dolphins7.png
>>> H:5_18marketing htmlassetsimageslogo2.png
>>> H:7z001.png
>>> H:7z002.png
``````

## (New) Find all files and open them with tkinter GUI

I just wanted to add in this 2019 a little app to search for all files in a dir and be able to open them by doubleclicking on the name of the file in the list.

``````import tkinter as tk
import os

def searchfiles(extension=".txt", folder="H:\"):
"insert all files in the listbox"
for r, d, f in os.walk(folder):
for file in f:
if file.endswith(extension):
lb.insert(0, r + "\" + file)

def open_file():
os.startfile(lb.get(lb.curselection()[0]))

root = tk.Tk()
root.geometry("400x400")
bt = tk.Button(root, text="Search", command=lambda:searchfiles(".png", "H:\"))
bt.pack()
lb = tk.Listbox(root)
lb.pack(fill="both", expand=1)
lb.bind("<Double-Button>", lambda x: open_file())
root.mainloop()
``````

TL;DR: If you are using Python 3.10 or later, it just works. As of today (2019), in 3.7+ you must turn this feature on using a future statement (`from __future__ import annotations`). In Python 3.6 or below, use a string.

I guess you got this exception:

``````NameError: name "Position" is not defined
``````

This is because `Position` must be defined before you can use it in an annotation unless you are using Python 3.10 or later.

## Python 3.7+: `from __future__ import annotations`

Python 3.7 introduces PEP 563: postponed evaluation of annotations. A module that uses the future statement `from __future__ import annotations` will store annotations as strings automatically:

``````from __future__ import annotations

class Position:
def __add__(self, other: Position) -> Position:
...
``````

This is scheduled to become the default in Python 3.10. Since Python still is a dynamically typed language so no type checking is done at runtime, typing annotations should have no performance impact, right? Wrong! Before python 3.7 the typing module used to be one of the slowest python modules in core so if you `import typing` you will see up to 7 times increase in performance when you upgrade to 3.7.

## Python <3.7: use a string

According to PEP 484, you should use a string instead of the class itself:

``````class Position:
...
def __add__(self, other: "Position") -> "Position":
...
``````

If you use the Django framework this may be familiar as Django models also use strings for forward references (foreign key definitions where the foreign model is `self` or is not declared yet). This should work with Pycharm and other tools.

## Sources

The relevant parts of PEP 484 and PEP 563, to spare you the trip:

# Forward references

When a type hint contains names that have not been defined yet, that definition may be expressed as a string literal, to be resolved later.

A situation where this occurs commonly is the definition of a container class, where the class being defined occurs in the signature of some of the methods. For example, the following code (the start of a simple binary tree implementation) does not work:

``````class Tree:
def __init__(self, left: Tree, right: Tree):
self.left = left
self.right = right
``````

To address this, we write:

``````class Tree:
def __init__(self, left: "Tree", right: "Tree"):
self.left = left
self.right = right
``````

The string literal should contain a valid Python expression (i.e., compile(lit, "", "eval") should be a valid code object) and it should evaluate without errors once the module has been fully loaded. The local and global namespace in which it is evaluated should be the same namespaces in which default arguments to the same function would be evaluated.

and PEP 563:

# Implementation

In Python 3.10, function and variable annotations will no longer be evaluated at definition time. Instead, a string form will be preserved in the respective¬†`__annotations__`¬†dictionary. Static type checkers will see no difference in behavior, whereas tools using annotations at runtime will have to perform postponed evaluation.

...

## Enabling the future behavior in Python 3.7

The functionality described above can be enabled starting from Python 3.7 using the following special import:

``````from __future__ import annotations
``````

## Things that you may be tempted to do instead

### A. Define a dummy `Position`

Before the class definition, place a dummy definition:

``````class Position(object):
pass

class Position(object):
...
``````

This will get rid of the `NameError` and may even look OK:

``````>>> Position.__add__.__annotations__
{"other": __main__.Position, "return": __main__.Position}
``````

But is it?

``````>>> for k, v in Position.__add__.__annotations__.items():
...     print(k, "is Position:", v is Position)
return is Position: False
other is Position: False
``````

### B. Monkey-patch in order to add the annotations:

You may want to try some Python meta programming magic and write a decorator to monkey-patch the class definition in order to add annotations:

``````class Position:
...
return self.__class__(self.x + other.x, self.y + other.y)
``````

The decorator should be responsible for the equivalent of this:

``````Position.__add__.__annotations__["return"] = Position
``````

At least it seems right:

``````>>> for k, v in Position.__add__.__annotations__.items():
...     print(k, "is Position:", v is Position)
return is Position: True
other is Position: True
``````

Probably too much trouble.

As you are in python3 , use `dict.items()` instead of `dict.iteritems()`

`iteritems()` was removed in python3, so you can"t use this method anymore.

Take a look at Python 3.0 Wiki Built-in Changes section, where it is stated:

Removed `dict.iteritems()`, `dict.iterkeys()`, and `dict.itervalues()`.

Instead: use `dict.items()`, `dict.keys()`, and `dict.values()` respectively.

My quick & dirty JSON dump that eats dates and everything:

``````json.dumps(my_dictionary, indent=4, sort_keys=True, default=str)
``````

`default` is a function applied to objects that aren"t serializable.
In this case it"s `str`, so it just converts everything it doesn"t know to strings. Which is great for serialization but not so great when deserializing (hence the "quick & dirty") as anything might have been string-ified without warning, e.g. a function or numpy array.

``````a = numpy.array([0, 3, 0, 1, 0, 1, 2, 1, 0, 0, 0, 0, 1, 3, 4])
unique, counts = numpy.unique(a, return_counts=True)
dict(zip(unique, counts))

# {0: 7, 1: 4, 2: 1, 3: 2, 4: 1}
``````

Non-numpy way:

``````import collections, numpy
a = numpy.array([0, 3, 0, 1, 0, 1, 2, 1, 0, 0, 0, 0, 1, 3, 4])
collections.Counter(a)

# Counter({0: 7, 1: 4, 3: 2, 2: 1, 4: 1})
``````

Because `[]` and `{}` are literal syntax. Python can create bytecode just to create the list or dictionary objects:

``````>>> import dis
>>> dis.dis(compile("[]", "", "eval"))
1           0 BUILD_LIST               0
3 RETURN_VALUE
>>> dis.dis(compile("{}", "", "eval"))
1           0 BUILD_MAP                0
3 RETURN_VALUE
``````

`list()` and `dict()` are separate objects. Their names need to be resolved, the stack has to be involved to push the arguments, the frame has to be stored to retrieve later, and a call has to be made. That all takes more time.

For the empty case, that means you have at the very least a `LOAD_NAME` (which has to search through the global namespace as well as the `builtins` module) followed by a `CALL_FUNCTION`, which has to preserve the current frame:

``````>>> dis.dis(compile("list()", "", "eval"))
1           0 LOAD_NAME                0 (list)
3 CALL_FUNCTION            0
6 RETURN_VALUE
>>> dis.dis(compile("dict()", "", "eval"))
1           0 LOAD_NAME                0 (dict)
3 CALL_FUNCTION            0
6 RETURN_VALUE
``````

You can time the name lookup separately with `timeit`:

``````>>> import timeit
>>> timeit.timeit("list", number=10**7)
0.30749011039733887
>>> timeit.timeit("dict", number=10**7)
0.4215109348297119
``````

The time discrepancy there is probably a dictionary hash collision. Subtract those times from the times for calling those objects, and compare the result against the times for using literals:

``````>>> timeit.timeit("[]", number=10**7)
0.30478692054748535
>>> timeit.timeit("{}", number=10**7)
0.31482696533203125
>>> timeit.timeit("list()", number=10**7)
0.9991960525512695
>>> timeit.timeit("dict()", number=10**7)
1.0200958251953125
``````

So having to call the object takes an additional `1.00 - 0.31 - 0.30 == 0.39` seconds per 10 million calls.

You can avoid the global lookup cost by aliasing the global names as locals (using a `timeit` setup, everything you bind to a name is a local):

``````>>> timeit.timeit("_list", "_list = list", number=10**7)
0.1866450309753418
>>> timeit.timeit("_dict", "_dict = dict", number=10**7)
0.19016098976135254
>>> timeit.timeit("_list()", "_list = list", number=10**7)
0.841480016708374
>>> timeit.timeit("_dict()", "_dict = dict", number=10**7)
0.7233691215515137
``````

but you never can overcome that `CALL_FUNCTION` cost.

There are many ways to convert an instance to a dictionary, with varying degrees of corner case handling and closeness to the desired result.

## 1. `instance.__dict__`

``````instance.__dict__
``````

which returns

``````{"_foreign_key_cache": <OtherModel: OtherModel object>,
"_state": <django.db.models.base.ModelState at 0x7ff0993f6908>,
"auto_now_add": datetime.datetime(2018, 12, 20, 21, 34, 29, 494827, tzinfo=<UTC>),
"foreign_key_id": 2,
"id": 1,
"normal_value": 1,
``````

This is by far the simplest, but is missing `many_to_many`, `foreign_key` is misnamed, and it has two unwanted extra things in it.

## 2. `model_to_dict`

``````from django.forms.models import model_to_dict
model_to_dict(instance)
``````

which returns

``````{"foreign_key": 2,
"id": 1,
"many_to_many": [<OtherModel: OtherModel object>],
"normal_value": 1}
``````

This is the only one with `many_to_many`, but is missing the uneditable fields.

## 3. `model_to_dict(..., fields=...)`

``````from django.forms.models import model_to_dict
model_to_dict(instance, fields=[field.name for field in instance._meta.fields])
``````

which returns

``````{"foreign_key": 2, "id": 1, "normal_value": 1}
``````

This is strictly worse than the standard `model_to_dict` invocation.

## 4. `query_set.values()`

``````SomeModel.objects.filter(id=instance.id).values()[0]
``````

which returns

``````{"auto_now_add": datetime.datetime(2018, 12, 20, 21, 34, 29, 494827, tzinfo=<UTC>),
"foreign_key_id": 2,
"id": 1,
"normal_value": 1,
``````

This is the same output as `instance.__dict__` but without the extra fields. `foreign_key_id` is still wrong and `many_to_many` is still missing.

## 5. Custom Function

The code for django"s `model_to_dict` had most of the answer. It explicitly removed non-editable fields, so removing that check and getting the ids of foreign keys for many to many fields results in the following code which behaves as desired:

``````from itertools import chain

def to_dict(instance):
opts = instance._meta
data = {}
for f in chain(opts.concrete_fields, opts.private_fields):
data[f.name] = f.value_from_object(instance)
for f in opts.many_to_many:
data[f.name] = [i.id for i in f.value_from_object(instance)]
return data
``````

While this is the most complicated option, calling `to_dict(instance)` gives us exactly the desired result:

``````{"auto_now_add": datetime.datetime(2018, 12, 20, 21, 34, 29, 494827, tzinfo=<UTC>),
"foreign_key": 2,
"id": 1,
"many_to_many": [2],
"normal_value": 1,
``````

## 6. Use Serializers

Django Rest Framework"s ModelSerialzer allows you to build a serializer automatically from a model.

``````from rest_framework import serializers
class SomeModelSerializer(serializers.ModelSerializer):
class Meta:
model = SomeModel
fields = "__all__"

SomeModelSerializer(instance).data
``````

returns

``````{"auto_now_add": "2018-12-20T21:34:29.494827Z",
"foreign_key": 2,
"id": 1,
"many_to_many": [2],
"normal_value": 1,
``````

This is almost as good as the custom function, but auto_now_add is a string instead of a datetime object.

## Bonus Round: better model printing

If you want a django model that has a better python command-line display, have your models child-class the following:

``````from django.db import models
from itertools import chain

class PrintableModel(models.Model):
def __repr__(self):
return str(self.to_dict())

def to_dict(instance):
opts = instance._meta
data = {}
for f in chain(opts.concrete_fields, opts.private_fields):
data[f.name] = f.value_from_object(instance)
for f in opts.many_to_many:
data[f.name] = [i.id for i in f.value_from_object(instance)]
return data

class Meta:
abstract = True
``````

So, for example, if we define our models as such:

``````class OtherModel(PrintableModel): pass

class SomeModel(PrintableModel):
normal_value = models.IntegerField()
foreign_key = models.ForeignKey(OtherModel, related_name="ref1")
many_to_many = models.ManyToManyField(OtherModel, related_name="ref2")
``````

Calling `SomeModel.objects.first()` now gives output like this:

``````{"auto_now_add": datetime.datetime(2018, 12, 20, 21, 34, 29, 494827, tzinfo=<UTC>),
"foreign_key": 2,
"id": 1,
"many_to_many": [2],
"normal_value": 1,
``````

Are dictionaries ordered in Python 3.6+?

They are insertion ordered[1]. As of Python 3.6, for the CPython implementation of Python, dictionaries remember the order of items inserted. This is considered an implementation detail in Python 3.6; you need to use `OrderedDict` if you want insertion ordering that"s guaranteed across other implementations of Python (and other ordered behavior[1]).

As of Python 3.7, this is no longer an implementation detail and instead becomes a language feature. From a python-dev message by GvR:

Make it so. "Dict keeps insertion order" is the ruling. Thanks!

This simply means that you can depend on it. Other implementations of Python must also offer an insertion ordered dictionary if they wish to be a conforming implementation of Python 3.7.

How does the Python `3.6` dictionary implementation perform better[2] than the older one while preserving element order?

Essentially, by keeping two arrays.

• The first array, `dk_entries`, holds the entries (of type ` PyDictKeyEntry`) for the dictionary in the order that they were inserted. Preserving order is achieved by this being an append only array where new items are always inserted at the end (insertion order).

• The second, `dk_indices`, holds the indices for the `dk_entries` array (that is, values that indicate the position of the corresponding entry in `dk_entries`). This array acts as the hash table. When a key is hashed it leads to one of the indices stored in `dk_indices` and the corresponding entry is fetched by indexing `dk_entries`. Since only indices are kept, the type of this array depends on the overall size of the dictionary (ranging from type `int8_t`(`1` byte) to `int32_t`/`int64_t` (`4`/`8` bytes) on `32`/`64` bit builds)

In the previous implementation, a sparse array of type `PyDictKeyEntry` and size `dk_size` had to be allocated; unfortunately, it also resulted in a lot of empty space since that array was not allowed to be more than `2/3 * dk_size` full for performance reasons. (and the empty space still had `PyDictKeyEntry` size!).

This is not the case now since only the required entries are stored (those that have been inserted) and a sparse array of type `intX_t` (`X` depending on dict size) `2/3 * dk_size`s full is kept. The empty space changed from type `PyDictKeyEntry` to `intX_t`.

So, obviously, creating a sparse array of type `PyDictKeyEntry` is much more memory demanding than a sparse array for storing `int`s.

You can see the full conversation on Python-Dev regarding this feature if interested, it is a good read.

In the original proposal made by Raymond Hettinger, a visualization of the data structures used can be seen which captures the gist of the idea.

For example, the dictionary:

``````d = {"timmy": "red", "barry": "green", "guido": "blue"}
``````

is currently stored as [keyhash, key, value]:

``````entries = [["--", "--", "--"],
[-8522787127447073495, "barry", "green"],
["--", "--", "--"],
["--", "--", "--"],
["--", "--", "--"],
[-9092791511155847987, "timmy", "red"],
["--", "--", "--"],
[-6480567542315338377, "guido", "blue"]]
``````

Instead, the data should be organized as follows:

``````indices =  [None, 1, None, None, None, 0, None, 2]
entries =  [[-9092791511155847987, "timmy", "red"],
[-8522787127447073495, "barry", "green"],
[-6480567542315338377, "guido", "blue"]]
``````

As you can visually now see, in the original proposal, a lot of space is essentially empty to reduce collisions and make look-ups faster. With the new approach, you reduce the memory required by moving the sparseness where it"s really required, in the indices.

[1]: I say "insertion ordered" and not "ordered" since, with the existence of OrderedDict, "ordered" suggests further behavior that the `dict` object *doesn"t provide*. OrderedDicts are reversible, provide order sensitive methods and, mainly, provide an order-sensive equality tests (`==`, `!=`). `dict`s currently don"t offer any of those behaviors/methods.
[2]: The new dictionary implementations performs better **memory wise** by being designed more compactly; that"s the main benefit here. Speed wise, the difference isn"t so drastic, there"s places where the new dict might introduce slight regressions ([key-lookups, for example][10]) while in others (iteration and resizing come to mind) a performance boost should be present. Overall, the performance of the dictionary, especially in real-life situations, improves due to the compactness introduced.

`json.dumps()` converts a dictionary to `str` object, not a `json(dict)` object! So you have to load your `str` into a `dict` to use it by using `json.loads()` method

See `json.dumps()` as a save method and `json.loads()` as a retrieve method.

This is the code sample which might help you understand it more:

``````import json

r = {"is_claimed": "True", "rating": 3.5}
r = json.dumps(r)
type(r) #Output str
``````

## Convert JSON string to dict using Python

I"m a little bit confused with JSON in Python. To me, it seems like a dictionary, and for that reason I"m trying to do that:

``````{
"glossary":
{
"title": "example glossary",
"GlossDiv":
{
"title": "S",
"GlossList":
{
"GlossEntry":
{
"ID": "SGML",
"SortAs": "SGML",
"GlossTerm": "Standard Generalized Markup Language",
"Acronym": "SGML",
"Abbrev": "ISO 8879:1986",
"GlossDef":
{
"para": "A meta-markup language, used to create markup languages such as DocBook.",
"GlossSeeAlso": ["GML", "XML"]
},
"GlossSee": "markup"
}
}
}
}
}
``````

But when I do `print dict(json)`, it gives an error.

How can I transform this string into a structure and then call `json["title"]` to obtain "example glossary"?

## Convert Django Model object to dict with all of the fields intact

How does one convert a django Model object to a dict with all of its fields? All ideally includes foreign keys and fields with editable=False.

Let me elaborate. Let"s say I have a django model like the following:

``````from django.db import models

class OtherModel(models.Model): pass

class SomeModel(models.Model):
normal_value = models.IntegerField()
foreign_key = models.ForeignKey(OtherModel, related_name="ref1")
many_to_many = models.ManyToManyField(OtherModel, related_name="ref2")
``````

In the terminal, I have done the following:

``````other_model = OtherModel()
other_model.save()
instance = SomeModel()
instance.normal_value = 1
instance.foreign_key = other_model
instance.save()
instance.save()
``````

I want to convert this to the following dictionary:

``````{"auto_now_add": datetime.datetime(2015, 3, 16, 21, 34, 14, 926738, tzinfo=<UTC>),
"foreign_key": 1,
"id": 1,
"many_to_many": [1],
"normal_value": 1,
``````

Questions with unsatisfactory answers:

Django: Converting an entire set of a Model's objects into a single dictionary

How can I turn Django Model objects into a dictionary and still have their foreign keys?

## Converting JSON String to Dictionary Not List

I am trying to pass in a JSON file and convert the data into a dictionary.

So far, this is what I have done:

``````import json
json1_file = open("json1")
``````

I"m expecting `json1_data` to be a `dict` type but it actually comes out as a `list` type when I check it with `type(json1_data)`.

What am I missing? I need this to be a dictionary so I can access one of the keys.

## python tuple to dict

For the tuple, `t = ((1, "a"),(2, "b"))` `dict(t)` returns `{1: "a", 2: "b"}`

Is there a good way to get `{"a": 1, "b": 2}` (keys and vals swapped)?

Ultimately, I want to be able to return `1` given `"a"` or `2` given `"b"`, perhaps converting to a dict is not the best way.

## String to Dictionary in Python

So I"ve spent way to much time on this, and it seems to me like it should be a simple fix. I"m trying to use Facebook"s Authentication to register users on my site, and I"m trying to do it server side. I"ve gotten to the point where I get my access token, and when I go to:

I get the information I"m looking for as a string that"s like this:

`{"id":"123456789";"name":"John Doe";"first_name":"John";"last_name":"Doe";"link":"http://www.facebook.com/jdoe";"gender":"male";"email":"jdoeu0040gmail.com";"timezone":-7,"locale":"en_US";"verified":true,"updated_time":"2011-01-12T02:43:35+0000"}`

It seems like I should just be able to use `dict(string)` on this but I"m getting this error:

`ValueError: dictionary update sequence element #0 has length 1; 2 is required`

So I tried using Pickle, but got this error:

`KeyError: "{"`

I tried using `django.serializers` to de-serialize it but had similar results. Any thoughts? I feel like the answer has to be simple, and I"m just being stupid. Thanks for any help!

## python pandas dataframe to dictionary

I"ve a two columns dataframe, and intend to convert it to python dictionary - the first column will be the key and the second will be the value. Thank you in advance.

Dataframe:

``````    id    value
0    0     10.2
1    1      5.7
2    2      7.4
``````

## How to convert list of key-value tuples into dictionary?

I have a list that looks like:

``````[("A", 1), ("B", 2), ("C", 3)]
``````

I want to turn it into a dictionary that looks like:

``````{"A": 1, "B": 2, "C": 3}
``````

EDIT: My list of tuples is actually more like:

``````[(A, 12937012397), (BERA, 2034927830), (CE, 2349057340)]
``````

I am getting the error `ValueError: dictionary update sequence element #0 has length 1916; 2 is required`

## URL query parameters to dict python

Is there a way to parse a URL (with some python library) and return a python dictionary with the keys and values of a query parameters part of the URL?

For example:

``````url = "http://www.example.org/default.html?ct=32&op=92&item=98"
``````

expected return:

``````{"ct":32, "op":92, "item":98}
``````

## List of tuples to dictionary

Here"s how I"m currently converting a list of tuples to dictionary in Python:

``````l = [("a",1),("b",2)]
h = {}
[h.update({k:v}) for k,v in l]
> [None, None]
h
> {"a": 1, "b": 2}
``````

Is there a better way? It seems like there should be a one-liner to do this.

## python pandas dataframe columns convert to dict key and value

I have a pandas data frame with multiple columns and I would like to construct a dict from two columns: one as the dict"s keys and the other as the dict"s values. How can I do that?

Dataframe:

``````           area  count
co tp
DE Lake      10      7
Forest       20      5
FR Lake      30      2
Forest       40      3
``````

I need to define area as key, count as value in dict. Thank you in advance.

There are many ways to convert an instance to a dictionary, with varying degrees of corner case handling and closeness to the desired result.

## 1. `instance.__dict__`

``````instance.__dict__
``````

which returns

``````{"_foreign_key_cache": <OtherModel: OtherModel object>,
"_state": <django.db.models.base.ModelState at 0x7ff0993f6908>,
"auto_now_add": datetime.datetime(2018, 12, 20, 21, 34, 29, 494827, tzinfo=<UTC>),
"foreign_key_id": 2,
"id": 1,
"normal_value": 1,
``````

This is by far the simplest, but is missing `many_to_many`, `foreign_key` is misnamed, and it has two unwanted extra things in it.

## 2. `model_to_dict`

``````from django.forms.models import model_to_dict
model_to_dict(instance)
``````

which returns

``````{"foreign_key": 2,
"id": 1,
"many_to_many": [<OtherModel: OtherModel object>],
"normal_value": 1}
``````

This is the only one with `many_to_many`, but is missing the uneditable fields.

## 3. `model_to_dict(..., fields=...)`

``````from django.forms.models import model_to_dict
model_to_dict(instance, fields=[field.name for field in instance._meta.fields])
``````

which returns

``````{"foreign_key": 2, "id": 1, "normal_value": 1}
``````

This is strictly worse than the standard `model_to_dict` invocation.

## 4. `query_set.values()`

``````SomeModel.objects.filter(id=instance.id).values()[0]
``````

which returns

``````{"auto_now_add": datetime.datetime(2018, 12, 20, 21, 34, 29, 494827, tzinfo=<UTC>),
"foreign_key_id": 2,
"id": 1,
"normal_value": 1,
``````

This is the same output as `instance.__dict__` but without the extra fields. `foreign_key_id` is still wrong and `many_to_many` is still missing.

## 5. Custom Function

The code for django"s `model_to_dict` had most of the answer. It explicitly removed non-editable fields, so removing that check and getting the ids of foreign keys for many to many fields results in the following code which behaves as desired:

``````from itertools import chain

def to_dict(instance):
opts = instance._meta
data = {}
for f in chain(opts.concrete_fields, opts.private_fields):
data[f.name] = f.value_from_object(instance)
for f in opts.many_to_many:
data[f.name] = [i.id for i in f.value_from_object(instance)]
return data
``````

While this is the most complicated option, calling `to_dict(instance)` gives us exactly the desired result:

``````{"auto_now_add": datetime.datetime(2018, 12, 20, 21, 34, 29, 494827, tzinfo=<UTC>),
"foreign_key": 2,
"id": 1,
"many_to_many": [2],
"normal_value": 1,
``````

## 6. Use Serializers

Django Rest Framework"s ModelSerialzer allows you to build a serializer automatically from a model.

``````from rest_framework import serializers
class SomeModelSerializer(serializers.ModelSerializer):
class Meta:
model = SomeModel
fields = "__all__"

SomeModelSerializer(instance).data
``````

returns

``````{"auto_now_add": "2018-12-20T21:34:29.494827Z",
"foreign_key": 2,
"id": 1,
"many_to_many": [2],
"normal_value": 1,
``````

This is almost as good as the custom function, but auto_now_add is a string instead of a datetime object.

## Bonus Round: better model printing

If you want a django model that has a better python command-line display, have your models child-class the following:

``````from django.db import models
from itertools import chain

class PrintableModel(models.Model):
def __repr__(self):
return str(self.to_dict())

def to_dict(instance):
opts = instance._meta
data = {}
for f in chain(opts.concrete_fields, opts.private_fields):
data[f.name] = f.value_from_object(instance)
for f in opts.many_to_many:
data[f.name] = [i.id for i in f.value_from_object(instance)]
return data

class Meta:
abstract = True
``````

So, for example, if we define our models as such:

``````class OtherModel(PrintableModel): pass

class SomeModel(PrintableModel):
normal_value = models.IntegerField()
foreign_key = models.ForeignKey(OtherModel, related_name="ref1")
many_to_many = models.ManyToManyField(OtherModel, related_name="ref2")
``````

Calling `SomeModel.objects.first()` now gives output like this:

``````{"auto_now_add": datetime.datetime(2018, 12, 20, 21, 34, 29, 494827, tzinfo=<UTC>),
"foreign_key": 2,
"id": 1,
"many_to_many": [2],
"normal_value": 1,
``````

Use `df.to_dict("records")` -- gives the output without having to transpose externally.

``````In [2]: df.to_dict("records")
Out[2]:
[{"customer": 1L, "item1": "apple", "item2": "milk", "item3": "tomato"},
{"customer": 2L, "item1": "water", "item2": "orange", "item3": "potato"},
{"customer": 3L, "item1": "juice", "item2": "mango", "item3": "chips"}]
``````

## Edit

As John Galt mentions in his answer , you should probably instead use `df.to_dict("records")`. It"s faster than transposing manually.

``````In [20]: timeit df.T.to_dict().values()
1000 loops, best of 3: 395 ¬µs per loop

In [21]: timeit df.to_dict("records")
10000 loops, best of 3: 53 ¬µs per loop
``````

Use `df.T.to_dict().values()`, like below:

``````In [1]: df
Out[1]:
customer  item1   item2   item3
0         1  apple    milk  tomato
1         2  water  orange  potato
2         3  juice   mango   chips

In [2]: df.T.to_dict().values()
Out[2]:
[{"customer": 1.0, "item1": "apple", "item2": "milk", "item3": "tomato"},
{"customer": 2.0, "item1": "water", "item2": "orange", "item3": "potato"},
{"customer": 3.0, "item1": "juice", "item2": "mango", "item3": "chips"}]
``````

The currently selected solution produces incorrect results. To correctly solve this problem, we can perform a left-join from `df1` to `df2`, making sure to first get just the unique rows for `df2`.

First, we need to modify the original DataFrame to add the row with data [3, 10].

``````df1 = pd.DataFrame(data = {"col1" : [1, 2, 3, 4, 5, 3],
"col2" : [10, 11, 12, 13, 14, 10]})
df2 = pd.DataFrame(data = {"col1" : [1, 2, 3],
"col2" : [10, 11, 12]})

df1

col1  col2
0     1    10
1     2    11
2     3    12
3     4    13
4     5    14
5     3    10

df2

col1  col2
0     1    10
1     2    11
2     3    12
``````

Perform a left-join, eliminating duplicates in `df2` so that each row of `df1` joins with exactly 1 row of `df2`. Use the parameter `indicator` to return an extra column indicating which table the row was from.

``````df_all = df1.merge(df2.drop_duplicates(), on=["col1","col2"],
how="left", indicator=True)
df_all

col1  col2     _merge
0     1    10       both
1     2    11       both
2     3    12       both
3     4    13  left_only
4     5    14  left_only
5     3    10  left_only
``````

Create a boolean condition:

``````df_all["_merge"] == "left_only"

0    False
1    False
2    False
3     True
4     True
5     True
Name: _merge, dtype: bool
``````

### Why other solutions are wrong

A few solutions make the same mistake - they only check that each value is independently in each column, not together in the same row. Adding the last row, which is unique but has the values from both columns from `df2` exposes the mistake:

``````common = df1.merge(df2,on=["col1","col2"])
(~df1.col1.isin(common.col1))&(~df1.col2.isin(common.col2))
0    False
1    False
2    False
3     True
4     True
5    False
dtype: bool
``````

This solution gets the same wrong result:

``````df1.isin(df2.to_dict("l")).all(1)
``````

## How do I convert a list of dictionaries to a pandas DataFrame?

The other answers are correct, but not much has been explained in terms of advantages and limitations of these methods. The aim of this post will be to show examples of these methods under different situations, discuss when to use (and when not to use), and suggest alternatives.

## `DataFrame()`, `DataFrame.from_records()`, and `.from_dict()`

Depending on the structure and format of your data, there are situations where either all three methods work, or some work better than others, or some don"t work at all.

Consider a very contrived example.

``````np.random.seed(0)
data = pd.DataFrame(
np.random.choice(10, (3, 4)), columns=list("ABCD")).to_dict("r")

print(data)
[{"A": 5, "B": 0, "C": 3, "D": 3},
{"A": 7, "B": 9, "C": 3, "D": 5},
{"A": 2, "B": 4, "C": 7, "D": 6}]
``````

This list consists of "records" with every keys present. This is the simplest case you could encounter.

``````# The following methods all produce the same output.
pd.DataFrame(data)
pd.DataFrame.from_dict(data)
pd.DataFrame.from_records(data)

A  B  C  D
0  5  0  3  3
1  7  9  3  5
2  2  4  7  6
``````

### Word on Dictionary Orientations: `orient="index"`/`"columns"`

Before continuing, it is important to make the distinction between the different types of dictionary orientations, and support with pandas. There are two primary types: "columns", and "index".

`orient="columns"`
Dictionaries with the "columns" orientation will have their keys correspond to columns in the equivalent DataFrame.

For example, `data` above is in the "columns" orient.

``````data_c = [
{"A": 5, "B": 0, "C": 3, "D": 3},
{"A": 7, "B": 9, "C": 3, "D": 5},
{"A": 2, "B": 4, "C": 7, "D": 6}]
``````
``````pd.DataFrame.from_dict(data_c, orient="columns")

A  B  C  D
0  5  0  3  3
1  7  9  3  5
2  2  4  7  6
``````

Note: If you are using `pd.DataFrame.from_records`, the orientation is assumed to be "columns" (you cannot specify otherwise), and the dictionaries will be loaded accordingly.

`orient="index"`
With this orient, keys are assumed to correspond to index values. This kind of data is best suited for `pd.DataFrame.from_dict`.

``````data_i ={
0: {"A": 5, "B": 0, "C": 3, "D": 3},
1: {"A": 7, "B": 9, "C": 3, "D": 5},
2: {"A": 2, "B": 4, "C": 7, "D": 6}}
``````
``````pd.DataFrame.from_dict(data_i, orient="index")

A  B  C  D
0  5  0  3  3
1  7  9  3  5
2  2  4  7  6
``````

This case is not considered in the OP, but is still useful to know.

### Setting Custom Index

If you need a custom index on the resultant DataFrame, you can set it using the `index=...` argument.

``````pd.DataFrame(data, index=["a", "b", "c"])
# pd.DataFrame.from_records(data, index=["a", "b", "c"])

A  B  C  D
a  5  0  3  3
b  7  9  3  5
c  2  4  7  6
``````

This is not supported by `pd.DataFrame.from_dict`.

### Dealing with Missing Keys/Columns

All methods work out-of-the-box when handling dictionaries with missing keys/column values. For example,

``````data2 = [
{"A": 5, "C": 3, "D": 3},
{"A": 7, "B": 9, "F": 5},
{"B": 4, "C": 7, "E": 6}]
``````
``````# The methods below all produce the same output.
pd.DataFrame(data2)
pd.DataFrame.from_dict(data2)
pd.DataFrame.from_records(data2)

A    B    C    D    E    F
0  5.0  NaN  3.0  3.0  NaN  NaN
1  7.0  9.0  NaN  NaN  NaN  5.0
2  NaN  4.0  7.0  NaN  6.0  NaN
``````

### Reading Subset of Columns

"What if I don"t want to read in every single column"? You can easily specify this using the `columns=...` parameter.

For example, from the example dictionary of `data2` above, if you wanted to read only columns "A", "D", and "F", you can do so by passing a list:

``````pd.DataFrame(data2, columns=["A", "D", "F"])
# pd.DataFrame.from_records(data2, columns=["A", "D", "F"])

A    D    F
0  5.0  3.0  NaN
1  7.0  NaN  5.0
2  NaN  NaN  NaN
``````

This is not supported by `pd.DataFrame.from_dict` with the default orient "columns".

``````pd.DataFrame.from_dict(data2, orient="columns", columns=["A", "B"])
``````
``````ValueError: cannot use columns parameter with orient="columns"
``````

### Reading Subset of Rows

Not supported by any of these methods directly. You will have to iterate over your data and perform a reverse delete in-place as you iterate. For example, to extract only the 0th and 2nd rows from `data2` above, you can use:

``````rows_to_select = {0, 2}
for i in reversed(range(len(data2))):
if i not in rows_to_select:
del data2[i]

pd.DataFrame(data2)
# pd.DataFrame.from_dict(data2)
# pd.DataFrame.from_records(data2)

A    B  C    D    E
0  5.0  NaN  3  3.0  NaN
1  NaN  4.0  7  NaN  6.0
``````

## The Panacea: `json_normalize` for Nested Data

A strong, robust alternative to the methods outlined above is the `json_normalize` function which works with lists of dictionaries (records), and in addition can also handle nested dictionaries.

``````pd.json_normalize(data)

A  B  C  D
0  5  0  3  3
1  7  9  3  5
2  2  4  7  6
``````
``````pd.json_normalize(data2)

A    B  C    D    E
0  5.0  NaN  3  3.0  NaN
1  NaN  4.0  7  NaN  6.0
``````

Again, keep in mind that the data passed to `json_normalize` needs to be in the list-of-dictionaries (records) format.

As mentioned, `json_normalize` can also handle nested dictionaries. Here"s an example taken from the documentation.

``````data_nested = [
{"counties": [{"name": "Dade", "population": 12345},
{"name": "Broward", "population": 40000},
{"name": "Palm Beach", "population": 60000}],
"info": {"governor": "Rick Scott"},
"shortname": "FL",
"state": "Florida"},
{"counties": [{"name": "Summit", "population": 1234},
{"name": "Cuyahoga", "population": 1337}],
"info": {"governor": "John Kasich"},
"shortname": "OH",
"state": "Ohio"}
]
``````
``````pd.json_normalize(data_nested,
record_path="counties",
meta=["state", "shortname", ["info", "governor"]])

name  population    state shortname info.governor
0        Dade       12345  Florida        FL    Rick Scott
1     Broward       40000  Florida        FL    Rick Scott
2  Palm Beach       60000  Florida        FL    Rick Scott
3      Summit        1234     Ohio        OH   John Kasich
4    Cuyahoga        1337     Ohio        OH   John Kasich
``````

For more information on the `meta` and `record_path` arguments, check out the documentation.

## Summarising

Here"s a table of all the methods discussed above, along with supported features/functionality.

* Use `orient="columns"` and then transpose to get the same effect as `orient="index"`.

TLDR; No, `for` loops are not blanket "bad", at least, not always. It is probably more accurate to say that some vectorized operations are slower than iterating, versus saying that iteration is faster than some vectorized operations. Knowing when and why is key to getting the most performance out of your code. In a nutshell, these are the situations where it is worth considering an alternative to vectorized pandas functions:

1. When your data is small (...depending on what you"re doing),
2. When dealing with `object`/mixed dtypes
3. When using the `str`/regex accessor functions

Let"s examine these situations individually.

### Iteration v/s Vectorization on Small Data

Pandas follows a "Convention Over Configuration" approach in its API design. This means that the same API has been fitted to cater to a broad range of data and use cases.

When a pandas function is called, the following things (among others) must internally be handled by the function, to ensure working

1. Index/axis alignment
2. Handling mixed datatypes
3. Handling missing data

Almost every function will have to deal with these to varying extents, and this presents an overhead. The overhead is less for numeric functions (for example, `Series.add`), while it is more pronounced for string functions (for example, `Series.str.replace`).

`for` loops, on the other hand, are faster then you think. What"s even better is list comprehensions (which create lists through `for` loops) are even faster as they are optimized iterative mechanisms for list creation.

List comprehensions follow the pattern

``````[f(x) for x in seq]
``````

Where `seq` is a pandas series or DataFrame column. Or, when operating over multiple columns,

``````[f(x, y) for x, y in zip(seq1, seq2)]
``````

Where `seq1` and `seq2` are columns.

Numeric Comparison
Consider a simple boolean indexing operation. The list comprehension method has been timed against `Series.ne` (`!=`) and `query`. Here are the functions:

``````# Boolean indexing with Numeric value comparison.
df[df.A != df.B]                            # vectorized !=
df.query("A != B")                          # query (numexpr)
df[[x != y for x, y in zip(df.A, df.B)]]    # list comp
``````

For simplicity, I have used the `perfplot` package to run all the timeit tests in this post. The timings for the operations above are below:

The list comprehension outperforms `query` for moderately sized N, and even outperforms the vectorized not equals comparison for tiny N. Unfortunately, the list comprehension scales linearly, so it does not offer much performance gain for larger N.

Note
It is worth mentioning that much of the benefit of list comprehension come from not having to worry about the index alignment, but this means that if your code is dependent on indexing alignment, this will break. In some cases, vectorised operations over the underlying NumPy arrays can be considered as bringing in the "best of both worlds", allowing for vectorisation without all the unneeded overhead of the pandas functions. This means that you can rewrite the operation above as

``````df[df.A.values != df.B.values]
``````

Which outperforms both the pandas and list comprehension equivalents:

NumPy vectorization is out of the scope of this post, but it is definitely worth considering, if performance matters.

Value Counts
Taking another example - this time, with another vanilla python construct that is faster than a for loop - `collections.Counter`. A common requirement is to compute the value counts and return the result as a dictionary. This is done with `value_counts`, `np.unique`, and `Counter`:

``````# Value Counts comparison.
ser.value_counts(sort=False).to_dict()           # value_counts
dict(zip(*np.unique(ser, return_counts=True)))   # np.unique
Counter(ser)                                     # Counter
``````

The results are more pronounced, `Counter` wins out over both vectorized methods for a larger range of small N (~3500).

Note
More trivia (courtesy @user2357112). The `Counter` is implemented with a C accelerator, so while it still has to work with python objects instead of the underlying C datatypes, it is still faster than a `for` loop. Python power!

Of course, the take away from here is that the performance depends on your data and use case. The point of these examples is to convince you not to rule out these solutions as legitimate options. If these still don"t give you the performance you need, there is always cython and numba. Let"s add this test into the mix.

``````from numba import njit, prange

@njit(parallel=True)
result = [False] * len(x)
for i in prange(len(x)):
result[i] = x[i] != y[i]

return np.array(result)

df[get_mask(df.A.values, df.B.values)] # numba
``````

Numba offers JIT compilation of loopy python code to very powerful vectorized code. Understanding how to make numba work involves a learning curve.

### Operations with Mixed/`object` dtypes

String-based Comparison
Revisiting the filtering example from the first section, what if the columns being compared are strings? Consider the same 3 functions above, but with the input DataFrame cast to string.

``````# Boolean indexing with string value comparison.
df[df.A != df.B]                            # vectorized !=
df.query("A != B")                          # query (numexpr)
df[[x != y for x, y in zip(df.A, df.B)]]    # list comp
``````

So, what changed? The thing to note here is that string operations are inherently difficult to vectorize. Pandas treats strings as objects, and all operations on objects fall back to a slow, loopy implementation.

Now, because this loopy implementation is surrounded by all the overhead mentioned above, there is a constant magnitude difference between these solutions, even though they scale the same.

When it comes to operations on mutable/complex objects, there is no comparison. List comprehension outperforms all operations involving dicts and lists.

Accessing Dictionary Value(s) by Key
Here are timings for two operations that extract a value from a column of dictionaries: `map` and the list comprehension. The setup is in the Appendix, under the heading "Code Snippets".

``````# Dictionary value extraction.
ser.map(operator.itemgetter("value"))     # map
pd.Series([x.get("value") for x in ser])  # list comprehension
``````

Positional List Indexing
Timings for 3 operations that extract the 0th element from a list of columns (handling exceptions), `map`, `str.get` accessor method, and the list comprehension:

``````# List positional indexing.
def get_0th(lst):
try:
return lst[0]
# Handle empty lists and NaNs gracefully.
except (IndexError, TypeError):
return np.nan
``````

``````ser.map(get_0th)                                          # map
ser.str[0]                                                # str accessor
pd.Series([x[0] if len(x) > 0 else np.nan for x in ser])  # list comp
pd.Series([get_0th(x) for x in ser])                      # list comp safe
``````

Note
If the index matters, you would want to do:

``````pd.Series([...], index=ser.index)
``````

When reconstructing the series.

List Flattening
A final example is flattening lists. This is another common problem, and demonstrates just how powerful pure python is here.

``````# Nested list flattening.
pd.DataFrame(ser.tolist()).stack().reset_index(drop=True)  # stack
pd.Series(list(chain.from_iterable(ser.tolist())))         # itertools.chain
pd.Series([y for x in ser for y in x])                     # nested list comp
``````

Both `itertools.chain.from_iterable` and the nested list comprehension are pure python constructs, and scale much better than the `stack` solution.

These timings are a strong indication of the fact that pandas is not equipped to work with mixed dtypes, and that you should probably refrain from using it to do so. Wherever possible, data should be present as scalar values (ints/floats/strings) in separate columns.

Lastly, the applicability of these solutions depend widely on your data. So, the best thing to do would be to test these operations on your data before deciding what to go with. Notice how I have not timed `apply` on these solutions, because it would skew the graph (yes, it"s that slow).

### Regex Operations, and `.str` Accessor Methods

Pandas can apply regex operations such as `str.contains`, `str.extract`, and `str.extractall`, as well as other "vectorized" string operations (such as `str.split`, str.find`,`str.translate`, and so on) on string columns. These functions are slower than list comprehensions, and are meant to be more convenience functions than anything else.

It is usually much faster to pre-compile a regex pattern and iterate over your data with `re.compile` (also see Is it worth using Python's re.compile?). The list comp equivalent to `str.contains` looks something like this:

``````p = re.compile(...)
ser2 = pd.Series([x for x in ser if p.search(x)])
``````

Or,

``````ser2 = ser[[bool(p.search(x)) for x in ser]]
``````

If you need to handle NaNs, you can do something like

``````ser[[bool(p.search(x)) if pd.notnull(x) else False for x in ser]]
``````

The list comp equivalent to `str.extract` (without groups) will look something like:

``````df["col2"] = [p.search(x).group(0) for x in df["col"]]
``````

If you need to handle no-matches and NaNs, you can use a custom function (still faster!):

``````def matcher(x):
m = p.search(str(x))
if m:
return m.group(0)
return np.nan

df["col2"] = [matcher(x) for x in df["col"]]
``````

The `matcher` function is very extensible. It can be fitted to return a list for each capture group, as needed. Just extract query the `group` or `groups` attribute of the matcher object.

For `str.extractall`, change `p.search` to `p.findall`.

String Extraction
Consider a simple filtering operation. The idea is to extract 4 digits if it is preceded by an upper case letter.

``````# Extracting strings.
p = re.compile(r"(?<=[A-Z])(d{4})")
def matcher(x):
m = p.search(x)
if m:
return m.group(0)
return np.nan

ser.str.extract(r"(?<=[A-Z])(d{4})", expand=False)   #  str.extract
pd.Series([matcher(x) for x in ser])                  #  list comprehension
``````

More Examples
Full disclosure - I am the author (in part or whole) of these posts listed below.

### Conclusion

As shown from the examples above, iteration shines when working with small rows of DataFrames, mixed datatypes, and regular expressions.

The speedup you get depends on your data and your problem, so your mileage may vary. The best thing to do is to carefully run tests and see if the payout is worth the effort.

The "vectorized" functions shine in their simplicity and readability, so if performance is not critical, you should definitely prefer those.

Another side note, certain string operations deal with constraints that favour the use of NumPy. Here are two examples where careful NumPy vectorization outperforms python:

Additionally, sometimes just operating on the underlying arrays via `.values` as opposed to on the Series or DataFrames can offer a healthy enough speedup for most usual scenarios (see the Note in the Numeric Comparison section above). So, for example `df[df.A.values != df.B.values]` would show instant performance boosts over `df[df.A != df.B]`. Using `.values` may not be appropriate in every situation, but it is a useful hack to know.

As mentioned above, it"s up to you to decide whether these solutions are worth the trouble of implementing.

### Appendix: Code Snippets

``````import perfplot
import operator
import pandas as pd
import numpy as np
import re

from collections import Counter
from itertools import chain
``````

``````# Boolean indexing with Numeric value comparison.
perfplot.show(
setup=lambda n: pd.DataFrame(np.random.choice(1000, (n, 2)), columns=["A","B"]),
kernels=[
lambda df: df[df.A != df.B],
lambda df: df.query("A != B"),
lambda df: df[[x != y for x, y in zip(df.A, df.B)]],
lambda df: df[get_mask(df.A.values, df.B.values)]
],
labels=["vectorized !=", "query (numexpr)", "list comp", "numba"],
n_range=[2**k for k in range(0, 15)],
xlabel="N"
)
``````

``````# Value Counts comparison.
perfplot.show(
setup=lambda n: pd.Series(np.random.choice(1000, n)),
kernels=[
lambda ser: ser.value_counts(sort=False).to_dict(),
lambda ser: dict(zip(*np.unique(ser, return_counts=True))),
lambda ser: Counter(ser),
],
labels=["value_counts", "np.unique", "Counter"],
n_range=[2**k for k in range(0, 15)],
xlabel="N",
equality_check=lambda x, y: dict(x) == dict(y)
)
``````

``````# Boolean indexing with string value comparison.
perfplot.show(
setup=lambda n: pd.DataFrame(np.random.choice(1000, (n, 2)), columns=["A","B"], dtype=str),
kernels=[
lambda df: df[df.A != df.B],
lambda df: df.query("A != B"),
lambda df: df[[x != y for x, y in zip(df.A, df.B)]],
],
labels=["vectorized !=", "query (numexpr)", "list comp"],
n_range=[2**k for k in range(0, 15)],
xlabel="N",
equality_check=None
)
``````

``````# Dictionary value extraction.
ser1 = pd.Series([{"key": "abc", "value": 123}, {"key": "xyz", "value": 456}])
perfplot.show(
setup=lambda n: pd.concat([ser1] * n, ignore_index=True),
kernels=[
lambda ser: ser.map(operator.itemgetter("value")),
lambda ser: pd.Series([x.get("value") for x in ser]),
],
labels=["map", "list comprehension"],
n_range=[2**k for k in range(0, 15)],
xlabel="N",
equality_check=None
)
``````

``````# List positional indexing.
ser2 = pd.Series([["a", "b", "c"], [1, 2], []])
perfplot.show(
setup=lambda n: pd.concat([ser2] * n, ignore_index=True),
kernels=[
lambda ser: ser.map(get_0th),
lambda ser: ser.str[0],
lambda ser: pd.Series([x[0] if len(x) > 0 else np.nan for x in ser]),
lambda ser: pd.Series([get_0th(x) for x in ser]),
],
labels=["map", "str accessor", "list comprehension", "list comp safe"],
n_range=[2**k for k in range(0, 15)],
xlabel="N",
equality_check=None
)
``````

``````# Nested list flattening.
ser3 = pd.Series([["a", "b", "c"], ["d", "e"], ["f", "g"]])
perfplot.show(
setup=lambda n: pd.concat([ser2] * n, ignore_index=True),
kernels=[
lambda ser: pd.DataFrame(ser.tolist()).stack().reset_index(drop=True),
lambda ser: pd.Series(list(chain.from_iterable(ser.tolist()))),
lambda ser: pd.Series([y for x in ser for y in x]),
],
labels=["stack", "itertools.chain", "nested list comp"],
n_range=[2**k for k in range(0, 15)],
xlabel="N",
equality_check=None

)
``````

``````# Extracting strings.
ser4 = pd.Series(["foo xyz", "test A1234", "D3345 xtz"])
perfplot.show(
setup=lambda n: pd.concat([ser4] * n, ignore_index=True),
kernels=[
lambda ser: ser.str.extract(r"(?<=[A-Z])(d{4})", expand=False),
lambda ser: pd.Series([matcher(x) for x in ser])
],
labels=["str.extract", "list comprehension"],
n_range=[2**k for k in range(0, 15)],
xlabel="N",
equality_check=None
)
``````

I"d like to shed a little bit more light on the interplay of `iter`, `__iter__` and `__getitem__` and what happens behind the curtains. Armed with that knowledge, you will be able to understand why the best you can do is

``````try:
iter(maybe_iterable)
print("iteration will probably work")
except TypeError:
print("not iterable")
``````

I will list the facts first and then follow up with a quick reminder of what happens when you employ a `for` loop in python, followed by a discussion to illustrate the facts.

# Facts

1. You can get an iterator from any object `o` by calling `iter(o)` if at least one of the following conditions holds true:

a) `o` has an `__iter__` method which returns an iterator object. An iterator is any object with an `__iter__` and a `__next__` (Python 2: `next`) method.

b) `o` has a `__getitem__` method.

2. Checking for an instance of `Iterable` or `Sequence`, or checking for the attribute `__iter__` is not enough.

3. If an object `o` implements only `__getitem__`, but not `__iter__`, `iter(o)` will construct an iterator that tries to fetch items from `o` by integer index, starting at index 0. The iterator will catch any `IndexError` (but no other errors) that is raised and then raises `StopIteration` itself.

4. In the most general sense, there"s no way to check whether the iterator returned by `iter` is sane other than to try it out.

5. If an object `o` implements `__iter__`, the `iter` function will make sure that the object returned by `__iter__` is an iterator. There is no sanity check if an object only implements `__getitem__`.

6. `__iter__` wins. If an object `o` implements both `__iter__` and `__getitem__`, `iter(o)` will call `__iter__`.

7. If you want to make your own objects iterable, always implement the `__iter__` method.

# `for` loops

In order to follow along, you need an understanding of what happens when you employ a `for` loop in Python. Feel free to skip right to the next section if you already know.

When you use `for item in o` for some iterable object `o`, Python calls `iter(o)` and expects an iterator object as the return value. An iterator is any object which implements a `__next__` (or `next` in Python 2) method and an `__iter__` method.

By convention, the `__iter__` method of an iterator should return the object itself (i.e. `return self`). Python then calls `next` on the iterator until `StopIteration` is raised. All of this happens implicitly, but the following demonstration makes it visible:

``````import random

class DemoIterable(object):
def __iter__(self):
print("__iter__ called")
return DemoIterator()

class DemoIterator(object):
def __iter__(self):
return self

def __next__(self):
print("__next__ called")
r = random.randint(1, 10)
if r == 5:
print("raising StopIteration")
raise StopIteration
return r
``````

Iteration over a `DemoIterable`:

``````>>> di = DemoIterable()
>>> for x in di:
...     print(x)
...
__iter__ called
__next__ called
9
__next__ called
8
__next__ called
10
__next__ called
3
__next__ called
10
__next__ called
raising StopIteration
``````

# Discussion and illustrations

On point 1 and 2: getting an iterator and unreliable checks

Consider the following class:

``````class BasicIterable(object):
def __getitem__(self, item):
if item == 3:
raise IndexError
return item
``````

Calling `iter` with an instance of `BasicIterable` will return an iterator without any problems because `BasicIterable` implements `__getitem__`.

``````>>> b = BasicIterable()
>>> iter(b)
<iterator object at 0x7f1ab216e320>
``````

However, it is important to note that `b` does not have the `__iter__` attribute and is not considered an instance of `Iterable` or `Sequence`:

``````>>> from collections import Iterable, Sequence
>>> hasattr(b, "__iter__")
False
>>> isinstance(b, Iterable)
False
>>> isinstance(b, Sequence)
False
``````

This is why Fluent Python by Luciano Ramalho recommends calling `iter` and handling the potential `TypeError` as the most accurate way to check whether an object is iterable. Quoting directly from the book:

As of Python 3.4, the most accurate way to check whether an object `x` is iterable is to call `iter(x)` and handle a `TypeError` exception if it isn‚Äôt. This is more accurate than using `isinstance(x, abc.Iterable)` , because `iter(x)` also considers the legacy `__getitem__` method, while the `Iterable` ABC does not.

On point 3: Iterating over objects which only provide `__getitem__`, but not `__iter__`

Iterating over an instance of `BasicIterable` works as expected: Python constructs an iterator that tries to fetch items by index, starting at zero, until an `IndexError` is raised. The demo object"s `__getitem__` method simply returns the `item` which was supplied as the argument to `__getitem__(self, item)` by the iterator returned by `iter`.

``````>>> b = BasicIterable()
>>> it = iter(b)
>>> next(it)
0
>>> next(it)
1
>>> next(it)
2
>>> next(it)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
StopIteration
``````

Note that the iterator raises `StopIteration` when it cannot return the next item and that the `IndexError` which is raised for `item == 3` is handled internally. This is why looping over a `BasicIterable` with a `for` loop works as expected:

``````>>> for x in b:
...     print(x)
...
0
1
2
``````

Here"s another example in order to drive home the concept of how the iterator returned by `iter` tries to access items by index. `WrappedDict` does not inherit from `dict`, which means instances won"t have an `__iter__` method.

``````class WrappedDict(object): # note: no inheritance from dict!
def __init__(self, dic):
self._dict = dic

def __getitem__(self, item):
try:
return self._dict[item] # delegate to dict.__getitem__
except KeyError:
raise IndexError
``````

Note that calls to `__getitem__` are delegated to `dict.__getitem__` for which the square bracket notation is simply a shorthand.

``````>>> w = WrappedDict({-1: "not printed",
...                   0: "hi", 1: "StackOverflow", 2: "!",
...                   4: "not printed",
...                   "x": "not printed"})
>>> for x in w:
...     print(x)
...
hi
StackOverflow
!
``````

On point 4 and 5: `iter` checks for an iterator when it calls `__iter__`:

When `iter(o)` is called for an object `o`, `iter` will make sure that the return value of `__iter__`, if the method is present, is an iterator. This means that the returned object must implement `__next__` (or `next` in Python 2) and `__iter__`. `iter` cannot perform any sanity checks for objects which only provide `__getitem__`, because it has no way to check whether the items of the object are accessible by integer index.

``````class FailIterIterable(object):
def __iter__(self):
return object() # not an iterator

class FailGetitemIterable(object):
def __getitem__(self, item):
raise Exception
``````

Note that constructing an iterator from `FailIterIterable` instances fails immediately, while constructing an iterator from `FailGetItemIterable` succeeds, but will throw an Exception on the first call to `__next__`.

``````>>> fii = FailIterIterable()
>>> iter(fii)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: iter() returned non-iterator of type "object"
>>>
>>> fgi = FailGetitemIterable()
>>> it = iter(fgi)
>>> next(it)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/path/iterdemo.py", line 42, in __getitem__
raise Exception
Exception
``````

On point 6: `__iter__` wins

This one is straightforward. If an object implements `__iter__` and `__getitem__`, `iter` will call `__iter__`. Consider the following class

``````class IterWinsDemo(object):
def __iter__(self):
return iter(["__iter__", "wins"])

def __getitem__(self, item):
return ["__getitem__", "wins"][item]
``````

and the output when looping over an instance:

``````>>> iwd = IterWinsDemo()
>>> for x in iwd:
...     print(x)
...
__iter__
wins
``````

On point 7: your iterable classes should implement `__iter__`

You might ask yourself why most builtin sequences like `list` implement an `__iter__` method when `__getitem__` would be sufficient.

``````class WrappedList(object): # note: no inheritance from list!
def __init__(self, lst):
self._list = lst

def __getitem__(self, item):
return self._list[item]
``````

After all, iteration over instances of the class above, which delegates calls to `__getitem__` to `list.__getitem__` (using the square bracket notation), will work fine:

``````>>> wl = WrappedList(["A", "B", "C"])
>>> for x in wl:
...     print(x)
...
A
B
C
``````

The reasons your custom iterables should implement `__iter__` are as follows:

1. If you implement `__iter__`, instances will be considered iterables, and `isinstance(o, collections.abc.Iterable)` will return `True`.
2. If the object returned by `__iter__` is not an iterator, `iter` will fail immediately and raise a `TypeError`.
3. The special handling of `__getitem__` exists for backwards compatibility reasons. Quoting again from Fluent Python:

That is why any Python sequence is iterable: they all implement `__getitem__` . In fact, the standard sequences also implement `__iter__`, and yours should too, because the special handling of `__getitem__` exists for backward compatibility reasons and may be gone in the future (although it is not deprecated as I write this).

• Parquet format is designed for long-term storage, where Arrow is more intended for short term or ephemeral storage (Arrow may be more suitable for long-term storage after the 1.0.0 release happens, since the binary format will be stable then)

• Parquet is more expensive to write than Feather as it features more layers of encoding and compression. Feather is unmodified raw columnar Arrow memory. We will probably add simple compression to Feather in the future.

• Due to dictionary encoding, RLE encoding, and data page compression, Parquet files will often be much smaller than Feather files

• Parquet is a standard storage format for analytics that"s supported by many different systems: Spark, Hive, Impala, various AWS services, in future by BigQuery, etc. So if you are doing analytics, Parquet is a good option as a reference storage format for query by multiple systems

The benchmarks you showed are going to be very noisy since the data you read and wrote is very small. You should try compressing at least 100MB or upwards 1GB of data to get some more informative benchmarks, see e.g. http://wesmckinney.com/blog/python-parquet-multithreading/

Hope this helps

There are at least five six ways. The preferred way depends on what your use case is.

# Option 1:

Simply add an `asdict()` method.

Based on the problem description I would very much consider the `asdict` way of doing things suggested by other answers. This is because it does not appear that your object is really much of a collection:

``````class Wharrgarbl(object):

...

def asdict(self):
return {"a": self.a, "b": self.b, "c": self.c}
``````

Using the other options below could be confusing for others unless it is very obvious exactly which object members would and would not be iterated or specified as key-value pairs.

# Option 1a:

Inherit your class from `"typing.NamedTuple"` (or the mostly equivalent `"collections.namedtuple"`), and use the `_asdict` method provided for you.

``````from typing import NamedTuple

class Wharrgarbl(NamedTuple):
a: str
b: str
c: str
sum: int = 6
version: str = "old"
``````

Using a named tuple is a very convenient way to add lots of functionality to your class with a minimum of effort, including an `_asdict` method. However, a limitation is that, as shown above, the NT will include all the members in its `_asdict`.

If there are members you don"t want to include in your dictionary, you"ll need to modify the `_asdict` result:

``````from typing import NamedTuple

class Wharrgarbl(NamedTuple):
a: str
b: str
c: str
sum: int = 6
version: str = "old"

def _asdict(self):
d = super()._asdict()
del d["sum"]
del d["version"]
return d
``````

Another limitation is that NT is read-only. This may or may not be desirable.

# Option 2:

Implement `__iter__`.

Like this, for example:

``````def __iter__(self):
yield "a", self.a
yield "b", self.b
yield "c", self.c
``````

Now you can just do:

``````dict(my_object)
``````

This works because the `dict()` constructor accepts an iterable of `(key, value)` pairs to construct a dictionary. Before doing this, ask yourself the question whether iterating the object as a series of key,value pairs in this manner- while convenient for creating a `dict`- might actually be surprising behavior in other contexts. E.g., ask yourself the question "what should the behavior of `list(my_object)` be...?"

Additionally, note that accessing values directly using the get item `obj["a"]` syntax will not work, and keyword argument unpacking won"t work. For those, you"d need to implement the mapping protocol.

# Option 3:

Implement the mapping protocol. This allows access-by-key behavior, casting to a `dict` without using `__iter__`, and also provides unpacking behavior (`{**my_obj}`) and keyword unpacking behavior if all the keys are strings (`dict(**my_obj)`).

The mapping protocol requires that you provide (at minimum) two methods together: `keys()` and `__getitem__`.

``````class MyKwargUnpackable:
def keys(self):
return list("abc")
def __getitem__(self, key):
return dict(zip("abc", "one two three".split()))[key]
``````

Now you can do things like:

``````>>> m=MyKwargUnpackable()
>>> m["a"]
"one"
>>> dict(m)  # cast to dict directly
{"a": "one", "b": "two", "c": "three"}
>>> dict(**m)  # unpack as kwargs
{"a": "one", "b": "two", "c": "three"}
``````

As mentioned above, if you are using a new enough version of python you can also unpack your mapping-protocol object into a dictionary comprehension like so (and in this case it is not required that your keys be strings):

``````>>> {**m}
{"a": "one", "b": "two", "c": "three"}
``````

Note that the mapping protocol takes precedence over the `__iter__` method when casting an object to a `dict` directly (without using kwarg unpacking, i.e. `dict(m)`). So it is possible- and sometimes convenient- to cause the object to have different behavior when used as an iterable (e.g., `list(m)`) vs. when cast to a `dict` (`dict(m)`).

EMPHASIZED: Just because you CAN use the mapping protocol, does NOT mean that you SHOULD do so. Does it actually make sense for your object to be passed around as a set of key-value pairs, or as keyword arguments and values? Does accessing it by key- just like a dictionary- really make sense?

If the answer to these questions is yes, it"s probably a good idea to consider the next option.

# Option 4:

Look into using the `"collections.abc`" module.

Inheriting your class from `"collections.abc.Mapping` or `"collections.abc.MutableMapping` signals to other users that, for all intents and purposes, your class is a mapping * and can be expected to behave that way.

You can still cast your object to a `dict` just as you require, but there would probably be little reason to do so. Because of duck typing, bothering to cast your mapping object to a `dict` would just be an additional unnecessary step the majority of the time.

As noted in the comments below: it"s worth mentioning that doing this the abc way essentially turns your object class into a `dict`-like class (assuming you use `MutableMapping` and not the read-only `Mapping` base class). Everything you would be able to do with `dict`, you could do with your own class object. This may be, or may not be, desirable.

Also consider looking at the numerical abcs in the `numbers` module:

https://docs.python.org/3/library/numbers.html

Since you"re also casting your object to an `int`, it might make more sense to essentially turn your class into a full fledged `int` so that casting isn"t necessary.

# Option 5:

Look into using the `dataclasses` module (Python 3.7 only), which includes a convenient `asdict()` utility method.

``````from dataclasses import dataclass, asdict, field, InitVar

@dataclass
class Wharrgarbl(object):
a: int
b: int
c: int
sum: InitVar[int]  # note: InitVar will exclude this from the dict
version: InitVar[str] = "old"

def __post_init__(self, sum, version):
self.sum = 6  # this looks like an OP mistake?
self.version = str(version)
``````

Now you can do this:

``````    >>> asdict(Wharrgarbl(1,2,3,4,"X"))
{"a": 1, "b": 2, "c": 3}
``````

# Option 6:

Use `typing.TypedDict`, which has been added in python 3.8.

NOTE: option 6 is likely NOT what the OP, or other readers based on the title of this question, are looking for. See additional comments below.

``````class Wharrgarbl(TypedDict):
a: str
b: str
c: str
``````

Using this option, the resulting object is a `dict` (emphasis: it is not a `Wharrgarbl`). There is no reason at all to "cast" it to a dict (unless you are making a copy).

And since the object is a `dict`, the initialization signature is identical to that of `dict` and as such it only accepts keyword arguments or another dictionary.

``````    >>> w = Wharrgarbl(a=1,b=2,b=3)
>>> w
{"a": 1, "b": 2, "c": 3}
>>> type(w)
<class "dict">
``````

Emphasized: the above "class" `Wharrgarbl` isn"t actually a new class at all. It is simply syntactic sugar for creating typed `dict` objects with fields of different types for the type checker.

As such this option can be pretty convenient for signaling to readers of your code (and also to a type checker such as mypy) that such a `dict` object is expected to have specific keys with specific value types.

But this means you cannot, for example, add other methods, although you can try:

``````class MyDict(TypedDict):
def my_fancy_method(self):
return "world changing result"
``````

...but it won"t work:

``````>>> MyDict().my_fancy_method()
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
AttributeError: "dict" object has no attribute "my_fancy_method"
``````

* "Mapping" has become the standard "name" of the `dict`-like duck type

Or if you are already using pandas, You can do it with `json_normalize()` like so:

``````import pandas as pd

d = {"a": 1,
"c": {"a": 2, "b": {"x": 5, "y" : 10}},
"d": [1, 2, 3]}

df = pd.json_normalize(d, sep="_")

print(df.to_dict(orient="records")[0])
``````

Output:

``````{"a": 1, "c_a": 2, "c_b_x": 5, "c_b_y": 10, "d": [1, 2, 3]}
``````