Dictionary Methods in Python | Install 2 (update (), has_key (), fromkeys () …)

File handling | fromkeys | has_key | Python Methods and Functions

# Python code to demonstrate how it works
# fromkeys () and update ()

 
# Initializing dictionary 1

dic1 = { ' Name' : 'Nandini' , 'Age' : 19 }

 
# Initializing Dictionary 2

dic2 = { 'ID' : 2541997 }

 
# Initializing sequence

sequ = ( ' Name' , 'Age' , 'ID' )

  
# using update to add dic2 values ​​to dic 1
dic1.update (dic2)

 
# print updated dictionary values ​​

print ( "The updated dictionary is:" )

print ( str (dic1))

 
# using fromkeys () to convert sequence to dictionary

dict = dict . fromkeys (sequ, 5 )

 
# print new dictionary values ​​

print ( "The new dictionary values ​​are:" )

print ( str ( dict ))  

Output:

 The updated dictionary is: {'Age': 19 , 'Name':' Nandini', 'ID': 2541997} The new dictionary values ​​are: {' Age': 5, 'Name': 5,' ID': 5} 

3. has_key () : — This function returns true if the specified dictionary is in the dictionary, otherwise it returns false.

4. get (key, def_val) : — This function returns the key value associated with the key mentioned in the arguments. If the key is missing, the default is returned.

# Python code to demonstrate how it works
# has_key () and get ()

 
# Dictionary initialization

dict = { 'Name' : ' Nandini' , 'Age' : 19 }

  
# using has_key () to check if there is dic1 has a key

if dict . has_key (  'Name' ):

  print ( "Name is a key" )

else : print ( "Name is not a key" )

 
# using get () to print the key value

print ( "The value associated with ID is:" )

print ( dict . get ( 'ID' , "Not Present" ))

 
# print dictionary values ​​

print ( "The dictionary values ​​are:" )

print ( str ( dict ))

Output:

 Name is a key The value associated with ID is: Not Present The dictionary values ​​are: {'Name':' Nandini', 'Age': 19} 

5. setdefault (key, def_value) : — This function also looks for a key and displays its value like get (), but creates a new key with def_value if the key is missing.

# Python code to demonstrate how it works
# set as default ()

 
# Initializing the dictionary

dict = { 'Name' : 'Nandini' , ' Age' : 19 }

 
# using setdefault () to print the key value

print ( "The value associated with Age is:" , end = "")

print ( dict . setdefault ( 'ID' , " No ID " ))

  
# print dictionary values ​​

print ( "The dictionary values ​​are:" )

print ( str ( dict ))

Output:

 The value associated with Age is: No ID The dictionary values ​​are: {'Name':' Nandini', 'Age': 19,' ID' : 'No ID'} 

This article is courtesy of Manjeet Singh . If you are as Python.Engineering and would like to contribute, you can also write an article using contribute.python.engineering or by posting an article contribute @ python.engineering. See my article appearing on the Python.Engineering homepage and help other geeks.

Please post comments if you find anything wrong or if you would like to share more information on the topic discussed above.





Dictionary Methods in Python | Install 2 (update (), has_key (), fromkeys () ...): StackOverflow Questions

Answer #1

I know object columns type makes the data hard to convert with a pandas function. When I received the data like this, the first thing that came to mind was to "flatten" or unnest the columns .

I am using pandas and python functions for this type of question. If you are worried about the speed of the above solutions, check user3483203"s answer, since it"s using numpy and most of the time numpy is faster . I recommend Cpython and numba if speed matters.


Method 0 [pandas >= 0.25]
Starting from pandas 0.25, if you only need to explode one column, you can use the pandas.DataFrame.explode function:

df.explode("B")

       A  B
    0  1  1
    1  1  2
    0  2  1
    1  2  2

Given a dataframe with an empty list or a NaN in the column. An empty list will not cause an issue, but a NaN will need to be filled with a list

df = pd.DataFrame({"A": [1, 2, 3, 4],"B": [[1, 2], [1, 2], [], np.nan]})
df.B = df.B.fillna({i: [] for i in df.index})  # replace NaN with []
df.explode("B")

   A    B
0  1    1
0  1    2
1  2    1
1  2    2
2  3  NaN
3  4  NaN

Method 1
apply + pd.Series (easy to understand but in terms of performance not recommended . )

df.set_index("A").B.apply(pd.Series).stack().reset_index(level=0).rename(columns={0:"B"})
Out[463]: 
   A  B
0  1  1
1  1  2
0  2  1
1  2  2

Method 2
Using repeat with DataFrame constructor , re-create your dataframe (good at performance, not good at multiple columns )

df=pd.DataFrame({"A":df.A.repeat(df.B.str.len()),"B":np.concatenate(df.B.values)})
df
Out[465]: 
   A  B
0  1  1
0  1  2
1  2  1
1  2  2

Method 2.1
for example besides A we have A.1 .....A.n. If we still use the method(Method 2) above it is hard for us to re-create the columns one by one .

Solution : join or merge with the index after "unnest" the single columns

s=pd.DataFrame({"B":np.concatenate(df.B.values)},index=df.index.repeat(df.B.str.len()))
s.join(df.drop("B",1),how="left")
Out[477]: 
   B  A
0  1  1
0  2  1
1  1  2
1  2  2

If you need the column order exactly the same as before, add reindex at the end.

s.join(df.drop("B",1),how="left").reindex(columns=df.columns)

Method 3
recreate the list

pd.DataFrame([[x] + [z] for x, y in df.values for z in y],columns=df.columns)
Out[488]: 
   A  B
0  1  1
1  1  2
2  2  1
3  2  2

If more than two columns, use

s=pd.DataFrame([[x] + [z] for x, y in zip(df.index,df.B) for z in y])
s.merge(df,left_on=0,right_index=True)
Out[491]: 
   0  1  A       B
0  0  1  1  [1, 2]
1  0  2  1  [1, 2]
2  1  1  2  [1, 2]
3  1  2  2  [1, 2]

Method 4
using reindex or loc

df.reindex(df.index.repeat(df.B.str.len())).assign(B=np.concatenate(df.B.values))
Out[554]: 
   A  B
0  1  1
0  1  2
1  2  1
1  2  2

#df.loc[df.index.repeat(df.B.str.len())].assign(B=np.concatenate(df.B.values))

Method 5
when the list only contains unique values:

df=pd.DataFrame({"A":[1,2],"B":[[1,2],[3,4]]})
from collections import ChainMap
d = dict(ChainMap(*map(dict.fromkeys, df["B"], df["A"])))
pd.DataFrame(list(d.items()),columns=df.columns[::-1])
Out[574]: 
   B  A
0  1  1
1  2  1
2  3  2
3  4  2

Method 6
using numpy for high performance:

newvalues=np.dstack((np.repeat(df.A.values,list(map(len,df.B.values))),np.concatenate(df.B.values)))
pd.DataFrame(data=newvalues[0],columns=df.columns)
   A  B
0  1  1
1  1  2
2  2  1
3  2  2

Method 7
using base function itertools cycle and chain: Pure python solution just for fun

from itertools import cycle,chain
l=df.values.tolist()
l1=[list(zip([x[0]], cycle(x[1])) if len([x[0]]) > len(x[1]) else list(zip(cycle([x[0]]), x[1]))) for x in l]
pd.DataFrame(list(chain.from_iterable(l1)),columns=df.columns)
   A  B
0  1  1
1  1  2
2  2  1
3  2  2

Generalizing to multiple columns

df=pd.DataFrame({"A":[1,2],"B":[[1,2],[3,4]],"C":[[1,2],[3,4]]})
df
Out[592]: 
   A       B       C
0  1  [1, 2]  [1, 2]
1  2  [3, 4]  [3, 4]

Self-def function:

def unnesting(df, explode):
    idx = df.index.repeat(df[explode[0]].str.len())
    df1 = pd.concat([
        pd.DataFrame({x: np.concatenate(df[x].values)}) for x in explode], axis=1)
    df1.index = idx

    return df1.join(df.drop(explode, 1), how="left")

        
unnesting(df,["B","C"])
Out[609]: 
   B  C  A
0  1  1  1
0  2  2  1
1  3  3  2
1  4  4  2

Column-wise Unnesting

All above method is talking about the vertical unnesting and explode , If you do need expend the list horizontal, Check with pd.DataFrame constructor

df.join(pd.DataFrame(df.B.tolist(),index=df.index).add_prefix("B_"))
Out[33]: 
   A       B       C  B_0  B_1
0  1  [1, 2]  [1, 2]    1    2
1  2  [3, 4]  [3, 4]    3    4

Updated function

def unnesting(df, explode, axis):
    if axis==1:
        idx = df.index.repeat(df[explode[0]].str.len())
        df1 = pd.concat([
            pd.DataFrame({x: np.concatenate(df[x].values)}) for x in explode], axis=1)
        df1.index = idx

        return df1.join(df.drop(explode, 1), how="left")
    else :
        df1 = pd.concat([
                         pd.DataFrame(df[x].tolist(), index=df.index).add_prefix(x) for x in explode], axis=1)
        return df1.join(df.drop(explode, 1), how="left")

Test Output

unnesting(df, ["B","C"], axis=0)
Out[36]: 
   B0  B1  C0  C1  A
0   1   2   1   2  1
1   3   4   3   4  2

Update 2021-02-17 with original explode function

def unnesting(df, explode, axis):
    if axis==1:
        df1 = pd.concat([df[x].explode() for x in explode], axis=1)
        return df1.join(df.drop(explode, 1), how="left")
    else :
        df1 = pd.concat([
                         pd.DataFrame(df[x].tolist(), index=df.index).add_prefix(x) for x in explode], axis=1)
        return df1.join(df.drop(explode, 1), how="left")

Answer #2

The answer is no, but you can use collections.OrderedDict from the Python standard library with just keys (and values as None) for the same purpose.

Update: As of Python 3.7 (and CPython 3.6), standard dict is guaranteed to preserve order and is more performant than OrderedDict. (For backward compatibility and especially readability, however, you may wish to continue using OrderedDict.)

Here"s an example of how to use dict as an ordered set to filter out duplicate items while preserving order, thereby emulating an ordered set. Use the dict class method fromkeys() to create a dict, then simply ask for the keys() back.

>>> keywords = ["foo", "bar", "bar", "foo", "baz", "foo"]

>>> list(dict.fromkeys(keywords))
["foo", "bar", "baz"]

Answer #3

If we need to keep the elements order, how about this:

used = set()
mylist = [u"nowplaying", u"PBS", u"PBS", u"nowplaying", u"job", u"debate", u"thenandnow"]
unique = [x for x in mylist if x not in used and (used.add(x) or True)]

And one more solution using reduce and without the temporary used var.

mylist = [u"nowplaying", u"PBS", u"PBS", u"nowplaying", u"job", u"debate", u"thenandnow"]
unique = reduce(lambda l, x: l.append(x) or l if x not in l else l, mylist, [])

UPDATE - Dec, 2020 - Maybe the best approach!

Starting from python 3.7, the standard dict preserves insertion order.

Changed in version 3.7: Dictionary order is guaranteed to be insertion order. This behavior was an implementation detail of CPython from 3.6.

So this gives us the ability to use dict.from_keys for de-duplication!

NOTE: Credits goes to @rlat for giving us this approach in the comments!

mylist = [u"nowplaying", u"PBS", u"PBS", u"nowplaying", u"job", u"debate", u"thenandnow"]
unique = list(dict.fromkeys(mylist))

In terms of speed - for me its fast enough and readable enough to become my new favorite approach!

UPDATE - March, 2019

And a 3rd solution, which is a neat one, but kind of slow since .index is O(n).

mylist = [u"nowplaying", u"PBS", u"PBS", u"nowplaying", u"job", u"debate", u"thenandnow"]
unique = [x for i, x in enumerate(mylist) if i == mylist.index(x)]

UPDATE - Oct, 2016

Another solution with reduce, but this time without .append which makes it more human readable and easier to understand.

mylist = [u"nowplaying", u"PBS", u"PBS", u"nowplaying", u"job", u"debate", u"thenandnow"]
unique = reduce(lambda l, x: l+[x] if x not in l else l, mylist, [])
#which can also be writed as:
unique = reduce(lambda l, x: l if x in l else l+[x], mylist, [])

NOTE: Have in mind that more human-readable we get, more unperformant the script is. Except only for the dict.from_keys approach which is python 3.7+ specific.

import timeit

setup = "mylist = [u"nowplaying", u"PBS", u"PBS", u"nowplaying", u"job", u"debate", u"thenandnow"]"

#10x to Michael for pointing out that we can get faster with set()
timeit.timeit("[x for x in mylist if x not in used and (used.add(x) or True)]", setup="used = set();"+setup)
0.2029558869980974

timeit.timeit("[x for x in mylist if x not in used and (used.append(x) or True)]", setup="used = [];"+setup)
0.28999493700030143

# 10x to rlat for suggesting this approach!   
timeit.timeit("list(dict.fromkeys(mylist))", setup=setup)
0.31227896199925453

timeit.timeit("reduce(lambda l, x: l.append(x) or l if x not in l else l, mylist, [])", setup="from functools import reduce;"+setup)
0.7149233570016804

timeit.timeit("reduce(lambda l, x: l+[x] if x not in l else l, mylist, [])", setup="from functools import reduce;"+setup)
0.7379565160008497

timeit.timeit("reduce(lambda l, x: l if x in l else l+[x], mylist, [])", setup="from functools import reduce;"+setup)
0.7400134069976048

timeit.timeit("[x for i, x in enumerate(mylist) if i == mylist.index(x)]", setup=setup)
0.9154880290006986

ANSWERING COMMENTS

Because @monica asked a good question about "how is this working?". For everyone having problems figuring it out. I will try to give a more deep explanation about how this works and what sorcery is happening here ;)

So she first asked:

I try to understand why unique = [used.append(x) for x in mylist if x not in used] is not working.

Well it"s actually working

>>> used = []
>>> mylist = [u"nowplaying", u"PBS", u"PBS", u"nowplaying", u"job", u"debate", u"thenandnow"]
>>> unique = [used.append(x) for x in mylist if x not in used]
>>> print used
[u"nowplaying", u"PBS", u"job", u"debate", u"thenandnow"]
>>> print unique
[None, None, None, None, None]

The problem is that we are just not getting the desired results inside the unique variable, but only inside the used variable. This is because during the list comprehension .append modifies the used variable and returns None.

So in order to get the results into the unique variable, and still use the same logic with .append(x) if x not in used, we need to move this .append call on the right side of the list comprehension and just return x on the left side.

But if we are too naive and just go with:

>>> unique = [x for x in mylist if x not in used and used.append(x)]
>>> print unique
[]

We will get nothing in return.

Again, this is because the .append method returns None, and it this gives on our logical expression the following look:

x not in used and None

This will basically always:

  1. evaluates to False when x is in used,
  2. evaluates to None when x is not in used.

And in both cases (False/None), this will be treated as falsy value and we will get an empty list as a result.

But why this evaluates to None when x is not in used? Someone may ask.

Well it"s because this is how Python"s short-circuit operators works.

The expression x and y first evaluates x; if x is false, its value is returned; otherwise, y is evaluated and the resulting value is returned.

So when x is not in used (i.e. when its True) the next part or the expression will be evaluated (used.append(x)) and its value (None) will be returned.

But that"s what we want in order to get the unique elements from a list with duplicates, we want to .append them into a new list only when we they came across for a fist time.

So we really want to evaluate used.append(x) only when x is not in used, maybe if there is a way to turn this None value into a truthy one we will be fine, right?

Well, yes and here is where the 2nd type of short-circuit operators come to play.

The expression x or y first evaluates x; if x is true, its value is returned; otherwise, y is evaluated and the resulting value is returned.

We know that .append(x) will always be falsy, so if we just add one or next to him, we will always get the next part. That"s why we write:

x not in used and (used.append(x) or True)

so we can evaluate used.append(x) and get True as a result, only when the first part of the expression (x not in used) is True.

Similar fashion can be seen in the 2nd approach with the reduce method.

(l.append(x) or l) if x not in l else l
#similar as the above, but maybe more readable
#we return l unchanged when x is in l
#we append x to l and return l when x is not in l
l if x in l else (l.append(x) or l)

where we:

  1. Append x to l and return that l when x is not in l. Thanks to the or statement .append is evaluated and l is returned after that.
  2. Return l untouched when x is in l

Answer #4

In CPython 3.6+ (and all other Python implementations starting with Python 3.7+), dictionaries are ordered, so the way to remove duplicates from an iterable while keeping it in the original order is:

>>> list(dict.fromkeys("abracadabra"))
["a", "b", "r", "c", "d"]

In Python 3.5 and below (including Python 2.7), use the OrderedDict. My timings show that this is now both the fastest and shortest of the various approaches for Python 3.5.

>>> from collections import OrderedDict
>>> list(OrderedDict.fromkeys("abracadabra"))
["a", "b", "r", "c", "d"]

Answer #5

How can I make as "perfect" a subclass of dict as possible?

The end goal is to have a simple dict in which the keys are lowercase.

  • If I override __getitem__/__setitem__, then get/set don"t work. How do I make them work? Surely I don"t need to implement them individually?

  • Am I preventing pickling from working, and do I need to implement __setstate__ etc?

  • Do I need repr, update and __init__?

  • Should I just use mutablemapping (it seems one shouldn"t use UserDict or DictMixin)? If so, how? The docs aren"t exactly enlightening.

The accepted answer would be my first approach, but since it has some issues, and since no one has addressed the alternative, actually subclassing a dict, I"m going to do that here.

What"s wrong with the accepted answer?

This seems like a rather simple request to me:

How can I make as "perfect" a subclass of dict as possible? The end goal is to have a simple dict in which the keys are lowercase.

The accepted answer doesn"t actually subclass dict, and a test for this fails:

>>> isinstance(MyTransformedDict([("Test", "test")]), dict)
False

Ideally, any type-checking code would be testing for the interface we expect, or an abstract base class, but if our data objects are being passed into functions that are testing for dict - and we can"t "fix" those functions, this code will fail.

Other quibbles one might make:

  • The accepted answer is also missing the classmethod: fromkeys.
  • The accepted answer also has a redundant __dict__ - therefore taking up more space in memory:

    >>> s.foo = "bar"
    >>> s.__dict__
    {"foo": "bar", "store": {"test": "test"}}
    

Actually subclassing dict

We can reuse the dict methods through inheritance. All we need to do is create an interface layer that ensures keys are passed into the dict in lowercase form if they are strings.

If I override __getitem__/__setitem__, then get/set don"t work. How do I make them work? Surely I don"t need to implement them individually?

Well, implementing them each individually is the downside to this approach and the upside to using MutableMapping (see the accepted answer), but it"s really not that much more work.

First, let"s factor out the difference between Python 2 and 3, create a singleton (_RaiseKeyError) to make sure we know if we actually get an argument to dict.pop, and create a function to ensure our string keys are lowercase:

from itertools import chain
try:              # Python 2
    str_base = basestring
    items = "iteritems"
except NameError: # Python 3
    str_base = str, bytes, bytearray
    items = "items"

_RaiseKeyError = object() # singleton for no-default behavior

def ensure_lower(maybe_str):
    """dict keys can be any hashable object - only call lower if str"""
    return maybe_str.lower() if isinstance(maybe_str, str_base) else maybe_str

Now we implement - I"m using super with the full arguments so that this code works for Python 2 and 3:

class LowerDict(dict):  # dicts take a mapping or iterable as their optional first argument
    __slots__ = () # no __dict__ - that would be redundant
    @staticmethod # because this doesn"t make sense as a global function.
    def _process_args(mapping=(), **kwargs):
        if hasattr(mapping, items):
            mapping = getattr(mapping, items)()
        return ((ensure_lower(k), v) for k, v in chain(mapping, getattr(kwargs, items)()))
    def __init__(self, mapping=(), **kwargs):
        super(LowerDict, self).__init__(self._process_args(mapping, **kwargs))
    def __getitem__(self, k):
        return super(LowerDict, self).__getitem__(ensure_lower(k))
    def __setitem__(self, k, v):
        return super(LowerDict, self).__setitem__(ensure_lower(k), v)
    def __delitem__(self, k):
        return super(LowerDict, self).__delitem__(ensure_lower(k))
    def get(self, k, default=None):
        return super(LowerDict, self).get(ensure_lower(k), default)
    def setdefault(self, k, default=None):
        return super(LowerDict, self).setdefault(ensure_lower(k), default)
    def pop(self, k, v=_RaiseKeyError):
        if v is _RaiseKeyError:
            return super(LowerDict, self).pop(ensure_lower(k))
        return super(LowerDict, self).pop(ensure_lower(k), v)
    def update(self, mapping=(), **kwargs):
        super(LowerDict, self).update(self._process_args(mapping, **kwargs))
    def __contains__(self, k):
        return super(LowerDict, self).__contains__(ensure_lower(k))
    def copy(self): # don"t delegate w/ super - dict.copy() -> dict :(
        return type(self)(self)
    @classmethod
    def fromkeys(cls, keys, v=None):
        return super(LowerDict, cls).fromkeys((ensure_lower(k) for k in keys), v)
    def __repr__(self):
        return "{0}({1})".format(type(self).__name__, super(LowerDict, self).__repr__())

We use an almost boiler-plate approach for any method or special method that references a key, but otherwise, by inheritance, we get methods: len, clear, items, keys, popitem, and values for free. While this required some careful thought to get right, it is trivial to see that this works.

(Note that haskey was deprecated in Python 2, removed in Python 3.)

Here"s some usage:

>>> ld = LowerDict(dict(foo="bar"))
>>> ld["FOO"]
"bar"
>>> ld["foo"]
"bar"
>>> ld.pop("FoO")
"bar"
>>> ld.setdefault("Foo")
>>> ld
{"foo": None}
>>> ld.get("Bar")
>>> ld.setdefault("Bar")
>>> ld
{"bar": None, "foo": None}
>>> ld.popitem()
("bar", None)

Am I preventing pickling from working, and do I need to implement __setstate__ etc?

pickling

And the dict subclass pickles just fine:

>>> import pickle
>>> pickle.dumps(ld)
b"x80x03c__main__
LowerDict
qx00)x81qx01Xx03x00x00x00fooqx02Ns."
>>> pickle.loads(pickle.dumps(ld))
{"foo": None}
>>> type(pickle.loads(pickle.dumps(ld)))
<class "__main__.LowerDict">

__repr__

Do I need repr, update and __init__?

We defined update and __init__, but you have a beautiful __repr__ by default:

>>> ld # without __repr__ defined for the class, we get this
{"foo": None}

However, it"s good to write a __repr__ to improve the debugability of your code. The ideal test is eval(repr(obj)) == obj. If it"s easy to do for your code, I strongly recommend it:

>>> ld = LowerDict({})
>>> eval(repr(ld)) == ld
True
>>> ld = LowerDict(dict(a=1, b=2, c=3))
>>> eval(repr(ld)) == ld
True

You see, it"s exactly what we need to recreate an equivalent object - this is something that might show up in our logs or in backtraces:

>>> ld
LowerDict({"a": 1, "c": 3, "b": 2})

Conclusion

Should I just use mutablemapping (it seems one shouldn"t use UserDict or DictMixin)? If so, how? The docs aren"t exactly enlightening.

Yeah, these are a few more lines of code, but they"re intended to be comprehensive. My first inclination would be to use the accepted answer, and if there were issues with it, I"d then look at my answer - as it"s a little more complicated, and there"s no ABC to help me get my interface right.

Premature optimization is going for greater complexity in search of performance. MutableMapping is simpler - so it gets an immediate edge, all else being equal. Nevertheless, to lay out all the differences, let"s compare and contrast.

I should add that there was a push to put a similar dictionary into the collections module, but it was rejected. You should probably just do this instead:

my_dict[transform(key)]

It should be far more easily debugable.

Compare and contrast

There are 6 interface functions implemented with the MutableMapping (which is missing fromkeys) and 11 with the dict subclass. I don"t need to implement __iter__ or __len__, but instead I have to implement get, setdefault, pop, update, copy, __contains__, and fromkeys - but these are fairly trivial, since I can use inheritance for most of those implementations.

The MutableMapping implements some things in Python that dict implements in C - so I would expect a dict subclass to be more performant in some cases.

We get a free __eq__ in both approaches - both of which assume equality only if another dict is all lowercase - but again, I think the dict subclass will compare more quickly.

Summary:

  • subclassing MutableMapping is simpler with fewer opportunities for bugs, but slower, takes more memory (see redundant dict), and fails isinstance(x, dict)
  • subclassing dict is faster, uses less memory, and passes isinstance(x, dict), but it has greater complexity to implement.

Which is more perfect? That depends on your definition of perfect.

Answer #6

Super simple in-place assignment: df["new"] = 0

For in-place modification, perform direct assignment. This assignment is broadcasted by pandas for each row.

df = pd.DataFrame("x", index=range(4), columns=list("ABC"))
df

   A  B  C
0  x  x  x
1  x  x  x
2  x  x  x
3  x  x  x

df["new"] = "y"
# Same as,
# df.loc[:, "new"] = "y"
df

   A  B  C new
0  x  x  x   y
1  x  x  x   y
2  x  x  x   y
3  x  x  x   y

Note for object columns

If you want to add an column of empty lists, here is my advice:

  • Consider not doing this. object columns are bad news in terms of performance. Rethink how your data is structured.
  • Consider storing your data in a sparse data structure. More information: sparse data structures
  • If you must store a column of lists, ensure not to copy the same reference multiple times.

    # Wrong
    df["new"] = [[]] * len(df)
    # Right
    df["new"] = [[] for _ in range(len(df))]
    

Generating a copy: df.assign(new=0)

If you need a copy instead, use DataFrame.assign:

df.assign(new="y")

   A  B  C new
0  x  x  x   y
1  x  x  x   y
2  x  x  x   y
3  x  x  x   y

And, if you need to assign multiple such columns with the same value, this is as simple as,

c = ["new1", "new2", ...]
df.assign(**dict.fromkeys(c, "y"))

   A  B  C new1 new2
0  x  x  x    y    y
1  x  x  x    y    y
2  x  x  x    y    y
3  x  x  x    y    y

Multiple column assignment

Finally, if you need to assign multiple columns with different values, you can use assign with a dictionary.

c = {"new1": "w", "new2": "y", "new3": "z"}
df.assign(**c)

   A  B  C new1 new2 new3
0  x  x  x    w    y    z
1  x  x  x    w    y    z
2  x  x  x    w    y    z
3  x  x  x    w    y    z

Answer #7

For the convenience of usage, I sum up the note of striping punctuation from a string in both Python 2 and Python 3. Please refer to other answers for the detailed description.


Python 2

import string

s = "string. With. Punctuation?"
table = string.maketrans("";"")
new_s = s.translate(table, string.punctuation)      # Output: string without punctuation

Python 3

import string

s = "string. With. Punctuation?"
table = str.maketrans(dict.fromkeys(string.punctuation))  # OR {key: None for key in string.punctuation}
new_s = s.translate(table)                          # Output: string without punctuation

Answer #8

What is the difference between @staticmethod and @classmethod in Python?

You may have seen Python code like this pseudocode, which demonstrates the signatures of the various method types and provides a docstring to explain each:

class Foo(object):

    def a_normal_instance_method(self, arg_1, kwarg_2=None):
        """
        Return a value that is a function of the instance with its
        attributes, and other arguments such as arg_1 and kwarg2
        """

    @staticmethod
    def a_static_method(arg_0):
        """
        Return a value that is a function of arg_0. It does not know the 
        instance or class it is called from.
        """

    @classmethod
    def a_class_method(cls, arg1):
        """
        Return a value that is a function of the class and other arguments.
        respects subclassing, it is called with the class it is called from.
        """

The Normal Instance Method

First I"ll explain a_normal_instance_method. This is precisely called an "instance method". When an instance method is used, it is used as a partial function (as opposed to a total function, defined for all values when viewed in source code) that is, when used, the first of the arguments is predefined as the instance of the object, with all of its given attributes. It has the instance of the object bound to it, and it must be called from an instance of the object. Typically, it will access various attributes of the instance.

For example, this is an instance of a string:

", "

if we use the instance method, join on this string, to join another iterable, it quite obviously is a function of the instance, in addition to being a function of the iterable list, ["a", "b", "c"]:

>>> ", ".join(["a", "b", "c"])
"a, b, c"

Bound methods

Instance methods can be bound via a dotted lookup for use later.

For example, this binds the str.join method to the ":" instance:

>>> join_with_colons = ":".join 

And later we can use this as a function that already has the first argument bound to it. In this way, it works like a partial function on the instance:

>>> join_with_colons("abcde")
"a:b:c:d:e"
>>> join_with_colons(["FF", "FF", "FF", "FF", "FF", "FF"])
"FF:FF:FF:FF:FF:FF"

Static Method

The static method does not take the instance as an argument.

It is very similar to a module level function.

However, a module level function must live in the module and be specially imported to other places where it is used.

If it is attached to the object, however, it will follow the object conveniently through importing and inheritance as well.

An example of a static method is str.maketrans, moved from the string module in Python 3. It makes a translation table suitable for consumption by str.translate. It does seem rather silly when used from an instance of a string, as demonstrated below, but importing the function from the string module is rather clumsy, and it"s nice to be able to call it from the class, as in str.maketrans

# demonstrate same function whether called from instance or not:
>>> ", ".maketrans("ABC", "abc")
{65: 97, 66: 98, 67: 99}
>>> str.maketrans("ABC", "abc")
{65: 97, 66: 98, 67: 99}

In python 2, you have to import this function from the increasingly less useful string module:

>>> import string
>>> "ABCDEFG".translate(string.maketrans("ABC", "abc"))
"abcDEFG"

Class Method

A class method is a similar to an instance method in that it takes an implicit first argument, but instead of taking the instance, it takes the class. Frequently these are used as alternative constructors for better semantic usage and it will support inheritance.

The most canonical example of a builtin classmethod is dict.fromkeys. It is used as an alternative constructor of dict, (well suited for when you know what your keys are and want a default value for them.)

>>> dict.fromkeys(["a", "b", "c"])
{"c": None, "b": None, "a": None}

When we subclass dict, we can use the same constructor, which creates an instance of the subclass.

>>> class MyDict(dict): "A dict subclass, use to demo classmethods"
>>> md = MyDict.fromkeys(["a", "b", "c"])
>>> md
{"a": None, "c": None, "b": None}
>>> type(md)
<class "__main__.MyDict">

See the pandas source code for other similar examples of alternative constructors, and see also the official Python documentation on classmethod and staticmethod.

Answer #9

Options to remove duplicates may include the following generic data structures:

  • set: unordered, unique elements
  • ordered set: ordered, unique elements

Here is a summary on quickly getting either one in Python.

Given

from collections import OrderedDict


seq = [u"nowplaying", u"PBS", u"PBS", u"nowplaying", u"job", u"debate", u"thenandnow"]

Code

Option 1 - A set (unordered):

list(set(seq))
# ["thenandnow", "PBS", "debate", "job", "nowplaying"]

Option 2 - Python doesn"t have ordered sets, but here are some ways to mimic one (insertion ordered):

list(OrderedDict.fromkeys(seq))
# ["nowplaying", "PBS", "job", "debate", "thenandnow"]

list(dict.fromkeys(seq))                               # py36
# ["nowplaying", "PBS", "job", "debate", "thenandnow"]

The last option is recommended if using Python 3.6+. See more details in this post.

Note: listed elements must be hashable. See details on the latter example in this blog post. Furthermore, see R. Hettinger"s post on the same technique; the order preserving dict is extended from one of his early implementations. See also more on total ordering.

Answer #10

How can I merge two Python dictionaries in a single expression?

For dictionaries x and y, z becomes a shallowly-merged dictionary with values from y replacing those from x.

  • In Python 3.9.0 or greater (released 17 October 2020): PEP-584, discussed here, was implemented and provides the simplest method:

    z = x | y          # NOTE: 3.9+ ONLY
    
  • In Python 3.5 or greater:

    z = {**x, **y}
    
  • In Python 2, (or 3.4 or lower) write a function:

    def merge_two_dicts(x, y):
        z = x.copy()   # start with keys and values of x
        z.update(y)    # modifies z with keys and values of y
        return z
    

    and now:

    z = merge_two_dicts(x, y)
    

Explanation

Say you have two dictionaries and you want to merge them into a new dictionary without altering the original dictionaries:

x = {"a": 1, "b": 2}
y = {"b": 3, "c": 4}

The desired result is to get a new dictionary (z) with the values merged, and the second dictionary"s values overwriting those from the first.

>>> z
{"a": 1, "b": 3, "c": 4}

A new syntax for this, proposed in PEP 448 and available as of Python 3.5, is

z = {**x, **y}

And it is indeed a single expression.

Note that we can merge in with literal notation as well:

z = {**x, "foo": 1, "bar": 2, **y}

and now:

>>> z
{"a": 1, "b": 3, "foo": 1, "bar": 2, "c": 4}

It is now showing as implemented in the release schedule for 3.5, PEP 478, and it has now made its way into the What"s New in Python 3.5 document.

However, since many organizations are still on Python 2, you may wish to do this in a backward-compatible way. The classically Pythonic way, available in Python 2 and Python 3.0-3.4, is to do this as a two-step process:

z = x.copy()
z.update(y) # which returns None since it mutates z

In both approaches, y will come second and its values will replace x"s values, thus b will point to 3 in our final result.

Not yet on Python 3.5, but want a single expression

If you are not yet on Python 3.5 or need to write backward-compatible code, and you want this in a single expression, the most performant while the correct approach is to put it in a function:

def merge_two_dicts(x, y):
    """Given two dictionaries, merge them into a new dict as a shallow copy."""
    z = x.copy()
    z.update(y)
    return z

and then you have a single expression:

z = merge_two_dicts(x, y)

You can also make a function to merge an arbitrary number of dictionaries, from zero to a very large number:

def merge_dicts(*dict_args):
    """
    Given any number of dictionaries, shallow copy and merge into a new dict,
    precedence goes to key-value pairs in latter dictionaries.
    """
    result = {}
    for dictionary in dict_args:
        result.update(dictionary)
    return result

This function will work in Python 2 and 3 for all dictionaries. e.g. given dictionaries a to g:

z = merge_dicts(a, b, c, d, e, f, g) 

and key-value pairs in g will take precedence over dictionaries a to f, and so on.

Critiques of Other Answers

Don"t use what you see in the formerly accepted answer:

z = dict(x.items() + y.items())

In Python 2, you create two lists in memory for each dict, create a third list in memory with length equal to the length of the first two put together, and then discard all three lists to create the dict. In Python 3, this will fail because you"re adding two dict_items objects together, not two lists -

>>> c = dict(a.items() + b.items())
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: unsupported operand type(s) for +: "dict_items" and "dict_items"

and you would have to explicitly create them as lists, e.g. z = dict(list(x.items()) + list(y.items())). This is a waste of resources and computation power.

Similarly, taking the union of items() in Python 3 (viewitems() in Python 2.7) will also fail when values are unhashable objects (like lists, for example). Even if your values are hashable, since sets are semantically unordered, the behavior is undefined in regards to precedence. So don"t do this:

>>> c = dict(a.items() | b.items())

This example demonstrates what happens when values are unhashable:

>>> x = {"a": []}
>>> y = {"b": []}
>>> dict(x.items() | y.items())
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: unhashable type: "list"

Here"s an example where y should have precedence, but instead the value from x is retained due to the arbitrary order of sets:

>>> x = {"a": 2}
>>> y = {"a": 1}
>>> dict(x.items() | y.items())
{"a": 2}

Another hack you should not use:

z = dict(x, **y)

This uses the dict constructor and is very fast and memory-efficient (even slightly more so than our two-step process) but unless you know precisely what is happening here (that is, the second dict is being passed as keyword arguments to the dict constructor), it"s difficult to read, it"s not the intended usage, and so it is not Pythonic.

Here"s an example of the usage being remediated in django.

Dictionaries are intended to take hashable keys (e.g. frozensets or tuples), but this method fails in Python 3 when keys are not strings.

>>> c = dict(a, **b)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: keyword arguments must be strings

From the mailing list, Guido van Rossum, the creator of the language, wrote:

I am fine with declaring dict({}, **{1:3}) illegal, since after all it is abuse of the ** mechanism.

and

Apparently dict(x, **y) is going around as "cool hack" for "call x.update(y) and return x". Personally, I find it more despicable than cool.

It is my understanding (as well as the understanding of the creator of the language) that the intended usage for dict(**y) is for creating dictionaries for readability purposes, e.g.:

dict(a=1, b=10, c=11)

instead of

{"a": 1, "b": 10, "c": 11}

Response to comments

Despite what Guido says, dict(x, **y) is in line with the dict specification, which btw. works for both Python 2 and 3. The fact that this only works for string keys is a direct consequence of how keyword parameters work and not a short-coming of dict. Nor is using the ** operator in this place an abuse of the mechanism, in fact, ** was designed precisely to pass dictionaries as keywords.

Again, it doesn"t work for 3 when keys are not strings. The implicit calling contract is that namespaces take ordinary dictionaries, while users must only pass keyword arguments that are strings. All other callables enforced it. dict broke this consistency in Python 2:

>>> foo(**{("a", "b"): None})
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: foo() keywords must be strings
>>> dict(**{("a", "b"): None})
{("a", "b"): None}

This inconsistency was bad given other implementations of Python (PyPy, Jython, IronPython). Thus it was fixed in Python 3, as this usage could be a breaking change.

I submit to you that it is malicious incompetence to intentionally write code that only works in one version of a language or that only works given certain arbitrary constraints.

More comments:

dict(x.items() + y.items()) is still the most readable solution for Python 2. Readability counts.

My response: merge_two_dicts(x, y) actually seems much clearer to me, if we"re actually concerned about readability. And it is not forward compatible, as Python 2 is increasingly deprecated.

{**x, **y} does not seem to handle nested dictionaries. the contents of nested keys are simply overwritten, not merged [...] I ended up being burnt by these answers that do not merge recursively and I was surprised no one mentioned it. In my interpretation of the word "merging" these answers describe "updating one dict with another", and not merging.

Yes. I must refer you back to the question, which is asking for a shallow merge of two dictionaries, with the first"s values being overwritten by the second"s - in a single expression.

Assuming two dictionaries of dictionaries, one might recursively merge them in a single function, but you should be careful not to modify the dictionaries from either source, and the surest way to avoid that is to make a copy when assigning values. As keys must be hashable and are usually therefore immutable, it is pointless to copy them:

from copy import deepcopy

def dict_of_dicts_merge(x, y):
    z = {}
    overlapping_keys = x.keys() & y.keys()
    for key in overlapping_keys:
        z[key] = dict_of_dicts_merge(x[key], y[key])
    for key in x.keys() - overlapping_keys:
        z[key] = deepcopy(x[key])
    for key in y.keys() - overlapping_keys:
        z[key] = deepcopy(y[key])
    return z

Usage:

>>> x = {"a":{1:{}}, "b": {2:{}}}
>>> y = {"b":{10:{}}, "c": {11:{}}}
>>> dict_of_dicts_merge(x, y)
{"b": {2: {}, 10: {}}, "a": {1: {}}, "c": {11: {}}}

Coming up with contingencies for other value types is far beyond the scope of this question, so I will point you at my answer to the canonical question on a "Dictionaries of dictionaries merge".

Less Performant But Correct Ad-hocs

These approaches are less performant, but they will provide correct behavior. They will be much less performant than copy and update or the new unpacking because they iterate through each key-value pair at a higher level of abstraction, but they do respect the order of precedence (latter dictionaries have precedence)

You can also chain the dictionaries manually inside a dict comprehension:

{k: v for d in dicts for k, v in d.items()} # iteritems in Python 2.7

or in Python 2.6 (and perhaps as early as 2.4 when generator expressions were introduced):

dict((k, v) for d in dicts for k, v in d.items()) # iteritems in Python 2

itertools.chain will chain the iterators over the key-value pairs in the correct order:

from itertools import chain
z = dict(chain(x.items(), y.items())) # iteritems in Python 2

Performance Analysis

I"m only going to do the performance analysis of the usages known to behave correctly. (Self-contained so you can copy and paste yourself.)

from timeit import repeat
from itertools import chain

x = dict.fromkeys("abcdefg")
y = dict.fromkeys("efghijk")

def merge_two_dicts(x, y):
    z = x.copy()
    z.update(y)
    return z

min(repeat(lambda: {**x, **y}))
min(repeat(lambda: merge_two_dicts(x, y)))
min(repeat(lambda: {k: v for d in (x, y) for k, v in d.items()}))
min(repeat(lambda: dict(chain(x.items(), y.items()))))
min(repeat(lambda: dict(item for d in (x, y) for item in d.items())))

In Python 3.8.1, NixOS:

>>> min(repeat(lambda: {**x, **y}))
1.0804965235292912
>>> min(repeat(lambda: merge_two_dicts(x, y)))
1.636518670246005
>>> min(repeat(lambda: {k: v for d in (x, y) for k, v in d.items()}))
3.1779992282390594
>>> min(repeat(lambda: dict(chain(x.items(), y.items()))))
2.740647904574871
>>> min(repeat(lambda: dict(item for d in (x, y) for item in d.items())))
4.266070580109954
$ uname -a
Linux nixos 4.19.113 #1-NixOS SMP Wed Mar 25 07:06:15 UTC 2020 x86_64 GNU/Linux

Resources on Dictionaries

Dictionary Methods in Python | Install 2 (update (), has_key (), fromkeys () ...): StackOverflow Questions

Should I use "has_key()" or "in" on Python dicts?

I wonder what is better to do:

d = {"a": 1, "b": 2}
"a" in d
True

or:

d = {"a": 1, "b": 2}
d.has_key("a")
True

Answer #1

has_key was removed in Python 3. From the documentation:

  • Removed dict.has_key() ‚Äì use the in operator instead.

Here"s an example:

if start not in graph:
    return None

Answer #2

os.environ behaves like a python dictionary, so all the common dictionary operations can be performed. In addition to the get and set operations mentioned in the other answers, we can also simply check if a key exists. The keys and values should be stored as strings.

Python 3

For python 3, dictionaries use the in keyword instead of has_key

>>> import os
>>> "HOME" in os.environ  # Check an existing env. variable
True
...

Python 2

>>> import os
>>> os.environ.has_key("HOME")  # Check an existing env. variable
True
>>> os.environ.has_key("FOO")   # Check for a non existing variable
False
>>> os.environ["FOO"] = "1"     # Set a new env. variable (String value)
>>> os.environ.has_key("FOO")
True
>>> os.environ.get("FOO")       # Retrieve the value
"1"

There is one important thing to note about using os.environ:

Although child processes inherit the environment from the parent process, I had run into an issue recently and figured out, if you have other scripts updating the environment while your python script is running, calling os.environ again will not reflect the latest values.

Excerpt from the docs:

This mapping is captured the first time the os module is imported, typically during Python startup as part of processing site.py. Changes to the environment made after this time are not reflected in os.environ, except for changes made by modifying os.environ directly.

os.environ.data which stores all the environment variables, is a dict object, which contains all the environment values:

>>> type(os.environ.data)  # changed to _data since v3.2 (refer comment below)
<type "dict">

Answer #3

in is definitely more pythonic.

In fact has_key() was removed in Python 3.x.

Answer #4

As of today, there are four available approaches, two of them requiring a certain storage backend:

  1. Django-eav (the original package is no longer mantained but has some thriving forks)

    This solution is based on Entity Attribute Value data model, essentially, it uses several tables to store dynamic attributes of objects. Great parts about this solution is that it:

    • uses several pure and simple Django models to represent dynamic fields, which makes it simple to understand and database-agnostic;
    • allows you to effectively attach/detach dynamic attribute storage to Django model with simple commands like:

      eav.unregister(Encounter)
      eav.register(Patient)
      
    • Nicely integrates with Django admin;

    • At the same time being really powerful.

    Downsides:

    • Not very efficient. This is more of a criticism of the EAV pattern itself, which requires manually merging the data from a column format to a set of key-value pairs in the model.
    • Harder to maintain. Maintaining data integrity requires a multi-column unique key constraint, which may be inefficient on some databases.
    • You will need to select one of the forks, since the official package is no longer maintained and there is no clear leader.

    The usage is pretty straightforward:

    import eav
    from app.models import Patient, Encounter
    
    eav.register(Encounter)
    eav.register(Patient)
    Attribute.objects.create(name="age", datatype=Attribute.TYPE_INT)
    Attribute.objects.create(name="height", datatype=Attribute.TYPE_FLOAT)
    Attribute.objects.create(name="weight", datatype=Attribute.TYPE_FLOAT)
    Attribute.objects.create(name="city", datatype=Attribute.TYPE_TEXT)
    Attribute.objects.create(name="country", datatype=Attribute.TYPE_TEXT)
    
    self.yes = EnumValue.objects.create(value="yes")
    self.no = EnumValue.objects.create(value="no")
    self.unkown = EnumValue.objects.create(value="unkown")
    ynu = EnumGroup.objects.create(name="Yes / No / Unknown")
    ynu.enums.add(self.yes)
    ynu.enums.add(self.no)
    ynu.enums.add(self.unkown)
    
    Attribute.objects.create(name="fever", datatype=Attribute.TYPE_ENUM,
                                           enum_group=ynu)
    
    # When you register a model within EAV,
    # you can access all of EAV attributes:
    
    Patient.objects.create(name="Bob", eav__age=12,
                               eav__fever=no, eav__city="New York",
                               eav__country="USA")
    # You can filter queries based on their EAV fields:
    
    query1 = Patient.objects.filter(Q(eav__city__contains="Y"))
    query2 = Q(eav__city__contains="Y") |  Q(eav__fever=no)
    
  2. Hstore, JSON or JSONB fields in PostgreSQL

    PostgreSQL supports several more complex data types. Most are supported via third-party packages, but in recent years Django has adopted them into django.contrib.postgres.fields.

    HStoreField:

    Django-hstore was originally a third-party package, but Django 1.8 added HStoreField as a built-in, along with several other PostgreSQL-supported field types.

    This approach is good in a sense that it lets you have the best of both worlds: dynamic fields and relational database. However, hstore is not ideal performance-wise, especially if you are going to end up storing thousands of items in one field. It also only supports strings for values.

    #app/models.py
    from django.contrib.postgres.fields import HStoreField
    class Something(models.Model):
        name = models.CharField(max_length=32)
        data = models.HStoreField(db_index=True)
    

    In Django"s shell you can use it like this:

    >>> instance = Something.objects.create(
                     name="something",
                     data={"a": "1", "b": "2"}
               )
    >>> instance.data["a"]
    "1"        
    >>> empty = Something.objects.create(name="empty")
    >>> empty.data
    {}
    >>> empty.data["a"] = "1"
    >>> empty.save()
    >>> Something.objects.get(name="something").data["a"]
    "1"
    

    You can issue indexed queries against hstore fields:

    # equivalence
    Something.objects.filter(data={"a": "1", "b": "2"})
    
    # subset by key/value mapping
    Something.objects.filter(data__a="1")
    
    # subset by list of keys
    Something.objects.filter(data__has_keys=["a", "b"])
    
    # subset by single key
    Something.objects.filter(data__has_key="a")    
    

    JSONField:

    JSON/JSONB fields support any JSON-encodable data type, not just key/value pairs, but also tend to be faster and (for JSONB) more compact than Hstore. Several packages implement JSON/JSONB fields including django-pgfields, but as of Django 1.9, JSONField is a built-in using JSONB for storage. JSONField is similar to HStoreField, and may perform better with large dictionaries. It also supports types other than strings, such as integers, booleans and nested dictionaries.

    #app/models.py
    from django.contrib.postgres.fields import JSONField
    class Something(models.Model):
        name = models.CharField(max_length=32)
        data = JSONField(db_index=True)
    

    Creating in the shell:

    >>> instance = Something.objects.create(
                     name="something",
                     data={"a": 1, "b": 2, "nested": {"c":3}}
               )
    

    Indexed queries are nearly identical to HStoreField, except nesting is possible. Complex indexes may require manually creation (or a scripted migration).

    >>> Something.objects.filter(data__a=1)
    >>> Something.objects.filter(data__nested__c=3)
    >>> Something.objects.filter(data__has_key="a")
    
  3. Django MongoDB

    Or other NoSQL Django adaptations -- with them you can have fully dynamic models.

    NoSQL Django libraries are great, but keep in mind that they are not 100% the Django-compatible, for example, to migrate to Django-nonrel from standard Django you will need to replace ManyToMany with ListField among other things.

    Checkout this Django MongoDB example:

    from djangotoolbox.fields import DictField
    
    class Image(models.Model):
        exif = DictField()
    ...
    
    >>> image = Image.objects.create(exif=get_exif_data(...))
    >>> image.exif
    {u"camera_model" : "Spamcams 4242", "exposure_time" : 0.3, ...}
    

    You can even create embedded lists of any Django models:

    class Container(models.Model):
        stuff = ListField(EmbeddedModelField())
    
    class FooModel(models.Model):
        foo = models.IntegerField()
    
    class BarModel(models.Model):
        bar = models.CharField()
    ...
    
    >>> Container.objects.create(
        stuff=[FooModel(foo=42), BarModel(bar="spam")]
    )
    
  4. Django-mutant: Dynamic models based on syncdb and South-hooks

    Django-mutant implements fully dynamic Foreign Key and m2m fields. And is inspired by incredible but somewhat hackish solutions by Will Hardy and Michael Hall.

    All of these are based on Django South hooks, which, according to Will Hardy"s talk at DjangoCon 2011 (watch it!) are nevertheless robust and tested in production (relevant source code).

    First to implement this was Michael Hall.

    Yes, this is magic, with these approaches you can achieve fully dynamic Django apps, models and fields with any relational database backend. But at what cost? Will stability of application suffer upon heavy use? These are the questions to be considered. You need to be sure to maintain a proper lock in order to allow simultaneous database altering requests.

    If you are using Michael Halls lib, your code will look like this:

    from dynamo import models
    
    test_app, created = models.DynamicApp.objects.get_or_create(
                          name="dynamo"
                        )
    test, created = models.DynamicModel.objects.get_or_create(
                      name="Test",
                      verbose_name="Test Model",
                      app=test_app
                   )
    foo, created = models.DynamicModelField.objects.get_or_create(
                      name = "foo",
                      verbose_name = "Foo Field",
                      model = test,
                      field_type = "dynamiccharfield",
                      null = True,
                      blank = True,
                      unique = False,
                      help_text = "Test field for Foo",
                   )
    bar, created = models.DynamicModelField.objects.get_or_create(
                      name = "bar",
                      verbose_name = "Bar Field",
                      model = test,
                      field_type = "dynamicintegerfield",
                      null = True,
                      blank = True,
                      unique = False,
                      help_text = "Test field for Bar",
                   )
    

Answer #5

You can test for the presence of a key in a dictionary, using the in keyword:

d = {"a": 1, "b": 2}
"a" in d # <== evaluates to True
"c" in d # <== evaluates to False

A common use for checking the existence of a key in a dictionary before mutating it is to default-initialize the value (e.g. if your values are lists, for example, and you want to ensure that there is an empty list to which you can append when inserting the first value for a key). In cases such as those, you may find the collections.defaultdict() type to be of interest.

In older code, you may also find some uses of has_key(), a deprecated method for checking the existence of keys in dictionaries (just use key_name in dict_name, instead).

Answer #6

in wins hands-down, not just in elegance (and not being deprecated;-) but also in performance, e.g.:

$ python -mtimeit -s"d=dict.fromkeys(range(99))" "12 in d"
10000000 loops, best of 3: 0.0983 usec per loop
$ python -mtimeit -s"d=dict.fromkeys(range(99))" "d.has_key(12)"
1000000 loops, best of 3: 0.21 usec per loop

While the following observation is not always true, you"ll notice that usually, in Python, the faster solution is more elegant and Pythonic; that"s why -mtimeit is SO helpful -- it"s not just about saving a hundred nanoseconds here and there!-)

Answer #7

Your first example is perfectly fine. Even the official Python documentation recommends this style known as EAFP.

Personally, I prefer to avoid nesting when it"s not necessary:

def __getattribute__(self, item):
    try:
        return object.__getattribute__(item)
    except AttributeError:
        pass  # Fallback to dict
    try:
        return self.dict[item]
    except KeyError:
        raise AttributeError("The object doesn"t have such attribute") from None

PS. has_key() has been deprecated for a long time in Python 2. Use item in self.dict instead.

Tutorials