Create An Empty Javascript List

| | | | | | | |

Contents

To initialize a list in Python, assign one with square brackets, initialize with the list () function, create an empty list with multiplication, or use a list comprehension. The most common way to declare a list in Python is to use square brackets.

A list is a data structure in Python that contains an ordered sequence of zero or more elements . Lists in Python are editable, which means they can be edited. They can contain any type of data, such as a string or a dictionary.

When you start working with lists you might ask yourself: how do I initialize a list in Python ? Or, in other words: how to create a Python list ?

This tutorial will answer that question and look at ways to initialize a list in Python. We’ll go over the basics of lists and how you can initialize a list using bracketed notation, list (), list multiplication, and understanding lists.


Updating the Python list

lists store zero or more items. Each element of a list is named element . Lists are often referred to as Python array .

In Python, the elements of a list are separated by commas and the list itself is surrounded by a square parenthesis. Here is an example of a list in Python:

Our list contains three values: Software Engineer, Web Developer and Data Analyst. We can reference our list using the "jobs" variable we declared above.


How to create a list in Python

You can initialize a list in Python using brackets, the list () method, multiplying lists and understanding lists.

The brackets allow you to initialize an empty list or a list that contains predefined values. The list () method works the same as square brackets. Although you need to add brackets and a list of items inside the list () method if you want to add some initial values.

List multiplication allows you to create a new list that repeats the elements of an existing list. Understanding Lists allows you to create a new list from the contents of an existing list.


Python Create List: brackets

You can use brackets or the () list method to initialize an empty list in Python. For the purposes of this tutorial, we’ll focus on how you can use square brackets or the list () method to initialize an empty list.

If you want to create an empty list list without values, you can declare your ad in two ways. First, you can declare a list with no values ‚Äã‚Äãby specifying a set of brackets with no component values. Here is an example of this syntax:

Our code returns: [].

Square brackets with nothing between them represent an empty list.

We can specify some default values ‚Äã‚Äãby appending them between our square brackets:

We have declared a list object with two initial values: "Software Engineer" and "Data Analyst".

Python Create List: list () method

Another way to create an empty list in Python with no values ‚Äã‚Äãis to use the list () method . Here is an example of the list () method in action:

Our code returns: [].

These first two approaches return the same answer: an empty list. There are no standards for when it is best to use one of these approaches. The empty square brackets ([]) approach is generally used because it is more concise.

Python Declare List: List Multiplication

One approach to initialize a list with multiple values ‚Äã‚Äãis list multiplication. This approach allows you to create a list with a number of predefined values.

So let’s say we create a program that asks us to name our top 10 favorite books. We want to initialize an array that will store the books that we insert. We could do this using the following code:

Our code returns:

Our program has created a list that contains 10 values.

Let’s break down our code. In the first line, we use the multiplication syntax to declare a list with 10 values. The value we use for each item in this list is "" or an empty string. Next, we use the Python print () function to print our list to the console.

We can use this syntax to initialize a list with whatever value we want. If we wanted our list to start with 10 values ‚Äã‚Äãequal to Choose a book., we could use the following code:

When we run our code, we create a list with 10 values. Each value in the list is equal to Choose a book.. We print this list on the console:

Python Declare List: List Understanding

In the previous section, we showed how to initialize a list using list multiplication. This approach is useful because it allows you to initialize a list with default values.

You can also initialize a list with default values ‚Äã‚Äãusing the list comprehension method. An Understanding Python List refers to a technique that allows you to create lists using an existing iterable object. The iterable object can be a list or an range () statement or some other type of iterable.

Understanding lists is a useful way to define an iterator-based list because it is elegant, simple, and broadly recognized.

Let’s say we want to create a list with 10 values ‚Äã‚Äãequal to Pick a book., as we did in the previous example. We could do this by understanding the lists using the following code:

Our code returns:

In the example above, we are using list comprehension to create a list.The list comprehension statement is the iterator for i in range ( 10), used to create a list that repeats the value Choose a book ten times.

Conclusion

Initializing a list is a fundamental part of working with lists This tutorial explains how to initialize empty lists using square brackets and list () This is also shows how to use list multiplication and understand the techniques to create a list containing a number of values.

You are now ready to start initializing lists in Python like a pro!

Read our How to learn Python for tips on learning more about the Python programming language.

Create An Empty Javascript List: StackOverflow Questions

How do I merge two dictionaries in a single expression (taking union of dictionaries)?

Question by Carl Meyer

I have two Python dictionaries, and I want to write a single expression that returns these two dictionaries, merged (i.e. taking the union). The update() method would be what I need, if it returned its result instead of modifying a dictionary in-place.

>>> x = {"a": 1, "b": 2}
>>> y = {"b": 10, "c": 11}
>>> z = x.update(y)
>>> print(z)
None
>>> x
{"a": 1, "b": 10, "c": 11}

How can I get that final merged dictionary in z, not x?

(To be extra-clear, the last-one-wins conflict-handling of dict.update() is what I"m looking for as well.)

Answer #1:

How can I merge two Python dictionaries in a single expression?

For dictionaries x and y, z becomes a shallowly-merged dictionary with values from y replacing those from x.

  • In Python 3.9.0 or greater (released 17 October 2020): PEP-584, discussed here, was implemented and provides the simplest method:

    z = x | y          # NOTE: 3.9+ ONLY
    
  • In Python 3.5 or greater:

    z = {**x, **y}
    
  • In Python 2, (or 3.4 or lower) write a function:

    def merge_two_dicts(x, y):
        z = x.copy()   # start with keys and values of x
        z.update(y)    # modifies z with keys and values of y
        return z
    

    and now:

    z = merge_two_dicts(x, y)
    

Explanation

Say you have two dictionaries and you want to merge them into a new dictionary without altering the original dictionaries:

x = {"a": 1, "b": 2}
y = {"b": 3, "c": 4}

The desired result is to get a new dictionary (z) with the values merged, and the second dictionary"s values overwriting those from the first.

>>> z
{"a": 1, "b": 3, "c": 4}

A new syntax for this, proposed in PEP 448 and available as of Python 3.5, is

z = {**x, **y}

And it is indeed a single expression.

Note that we can merge in with literal notation as well:

z = {**x, "foo": 1, "bar": 2, **y}

and now:

>>> z
{"a": 1, "b": 3, "foo": 1, "bar": 2, "c": 4}

It is now showing as implemented in the release schedule for 3.5, PEP 478, and it has now made its way into the What"s New in Python 3.5 document.

However, since many organizations are still on Python 2, you may wish to do this in a backward-compatible way. The classically Pythonic way, available in Python 2 and Python 3.0-3.4, is to do this as a two-step process:

z = x.copy()
z.update(y) # which returns None since it mutates z

In both approaches, y will come second and its values will replace x"s values, thus b will point to 3 in our final result.

Not yet on Python 3.5, but want a single expression

If you are not yet on Python 3.5 or need to write backward-compatible code, and you want this in a single expression, the most performant while the correct approach is to put it in a function:

def merge_two_dicts(x, y):
    """Given two dictionaries, merge them into a new dict as a shallow copy."""
    z = x.copy()
    z.update(y)
    return z

and then you have a single expression:

z = merge_two_dicts(x, y)

You can also make a function to merge an arbitrary number of dictionaries, from zero to a very large number:

def merge_dicts(*dict_args):
    """
    Given any number of dictionaries, shallow copy and merge into a new dict,
    precedence goes to key-value pairs in latter dictionaries.
    """
    result = {}
    for dictionary in dict_args:
        result.update(dictionary)
    return result

This function will work in Python 2 and 3 for all dictionaries. e.g. given dictionaries a to g:

z = merge_dicts(a, b, c, d, e, f, g) 

and key-value pairs in g will take precedence over dictionaries a to f, and so on.

Critiques of Other Answers

Don"t use what you see in the formerly accepted answer:

z = dict(x.items() + y.items())

In Python 2, you create two lists in memory for each dict, create a third list in memory with length equal to the length of the first two put together, and then discard all three lists to create the dict. In Python 3, this will fail because you"re adding two dict_items objects together, not two lists -

>>> c = dict(a.items() + b.items())
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: unsupported operand type(s) for +: "dict_items" and "dict_items"

and you would have to explicitly create them as lists, e.g. z = dict(list(x.items()) + list(y.items())). This is a waste of resources and computation power.

Similarly, taking the union of items() in Python 3 (viewitems() in Python 2.7) will also fail when values are unhashable objects (like lists, for example). Even if your values are hashable, since sets are semantically unordered, the behavior is undefined in regards to precedence. So don"t do this:

>>> c = dict(a.items() | b.items())

This example demonstrates what happens when values are unhashable:

>>> x = {"a": []}
>>> y = {"b": []}
>>> dict(x.items() | y.items())
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: unhashable type: "list"

Here"s an example where y should have precedence, but instead the value from x is retained due to the arbitrary order of sets:

>>> x = {"a": 2}
>>> y = {"a": 1}
>>> dict(x.items() | y.items())
{"a": 2}

Another hack you should not use:

z = dict(x, **y)

This uses the dict constructor and is very fast and memory-efficient (even slightly more so than our two-step process) but unless you know precisely what is happening here (that is, the second dict is being passed as keyword arguments to the dict constructor), it"s difficult to read, it"s not the intended usage, and so it is not Pythonic.

Here"s an example of the usage being remediated in django.

Dictionaries are intended to take hashable keys (e.g. frozensets or tuples), but this method fails in Python 3 when keys are not strings.

>>> c = dict(a, **b)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: keyword arguments must be strings

From the mailing list, Guido van Rossum, the creator of the language, wrote:

I am fine with declaring dict({}, **{1:3}) illegal, since after all it is abuse of the ** mechanism.

and

Apparently dict(x, **y) is going around as "cool hack" for "call x.update(y) and return x". Personally, I find it more despicable than cool.

It is my understanding (as well as the understanding of the creator of the language) that the intended usage for dict(**y) is for creating dictionaries for readability purposes, e.g.:

dict(a=1, b=10, c=11)

instead of

{"a": 1, "b": 10, "c": 11}

Response to comments

Despite what Guido says, dict(x, **y) is in line with the dict specification, which btw. works for both Python 2 and 3. The fact that this only works for string keys is a direct consequence of how keyword parameters work and not a short-coming of dict. Nor is using the ** operator in this place an abuse of the mechanism, in fact, ** was designed precisely to pass dictionaries as keywords.

Again, it doesn"t work for 3 when keys are not strings. The implicit calling contract is that namespaces take ordinary dictionaries, while users must only pass keyword arguments that are strings. All other callables enforced it. dict broke this consistency in Python 2:

>>> foo(**{("a", "b"): None})
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: foo() keywords must be strings
>>> dict(**{("a", "b"): None})
{("a", "b"): None}

This inconsistency was bad given other implementations of Python (PyPy, Jython, IronPython). Thus it was fixed in Python 3, as this usage could be a breaking change.

I submit to you that it is malicious incompetence to intentionally write code that only works in one version of a language or that only works given certain arbitrary constraints.

More comments:

dict(x.items() + y.items()) is still the most readable solution for Python 2. Readability counts.

My response: merge_two_dicts(x, y) actually seems much clearer to me, if we"re actually concerned about readability. And it is not forward compatible, as Python 2 is increasingly deprecated.

{**x, **y} does not seem to handle nested dictionaries. the contents of nested keys are simply overwritten, not merged [...] I ended up being burnt by these answers that do not merge recursively and I was surprised no one mentioned it. In my interpretation of the word "merging" these answers describe "updating one dict with another", and not merging.

Yes. I must refer you back to the question, which is asking for a shallow merge of two dictionaries, with the first"s values being overwritten by the second"s - in a single expression.

Assuming two dictionaries of dictionaries, one might recursively merge them in a single function, but you should be careful not to modify the dictionaries from either source, and the surest way to avoid that is to make a copy when assigning values. As keys must be hashable and are usually therefore immutable, it is pointless to copy them:

from copy import deepcopy

def dict_of_dicts_merge(x, y):
    z = {}
    overlapping_keys = x.keys() & y.keys()
    for key in overlapping_keys:
        z[key] = dict_of_dicts_merge(x[key], y[key])
    for key in x.keys() - overlapping_keys:
        z[key] = deepcopy(x[key])
    for key in y.keys() - overlapping_keys:
        z[key] = deepcopy(y[key])
    return z

Usage:

>>> x = {"a":{1:{}}, "b": {2:{}}}
>>> y = {"b":{10:{}}, "c": {11:{}}}
>>> dict_of_dicts_merge(x, y)
{"b": {2: {}, 10: {}}, "a": {1: {}}, "c": {11: {}}}

Coming up with contingencies for other value types is far beyond the scope of this question, so I will point you at my answer to the canonical question on a "Dictionaries of dictionaries merge".

Less Performant But Correct Ad-hocs

These approaches are less performant, but they will provide correct behavior. They will be much less performant than copy and update or the new unpacking because they iterate through each key-value pair at a higher level of abstraction, but they do respect the order of precedence (latter dictionaries have precedence)

You can also chain the dictionaries manually inside a dict comprehension:

{k: v for d in dicts for k, v in d.items()} # iteritems in Python 2.7

or in Python 2.6 (and perhaps as early as 2.4 when generator expressions were introduced):

dict((k, v) for d in dicts for k, v in d.items()) # iteritems in Python 2

itertools.chain will chain the iterators over the key-value pairs in the correct order:

from itertools import chain
z = dict(chain(x.items(), y.items())) # iteritems in Python 2

Performance Analysis

I"m only going to do the performance analysis of the usages known to behave correctly. (Self-contained so you can copy and paste yourself.)

from timeit import repeat
from itertools import chain

x = dict.fromkeys("abcdefg")
y = dict.fromkeys("efghijk")

def merge_two_dicts(x, y):
    z = x.copy()
    z.update(y)
    return z

min(repeat(lambda: {**x, **y}))
min(repeat(lambda: merge_two_dicts(x, y)))
min(repeat(lambda: {k: v for d in (x, y) for k, v in d.items()}))
min(repeat(lambda: dict(chain(x.items(), y.items()))))
min(repeat(lambda: dict(item for d in (x, y) for item in d.items())))

In Python 3.8.1, NixOS:

>>> min(repeat(lambda: {**x, **y}))
1.0804965235292912
>>> min(repeat(lambda: merge_two_dicts(x, y)))
1.636518670246005
>>> min(repeat(lambda: {k: v for d in (x, y) for k, v in d.items()}))
3.1779992282390594
>>> min(repeat(lambda: dict(chain(x.items(), y.items()))))
2.740647904574871
>>> min(repeat(lambda: dict(item for d in (x, y) for item in d.items())))
4.266070580109954
$ uname -a
Linux nixos 4.19.113 #1-NixOS SMP Wed Mar 25 07:06:15 UTC 2020 x86_64 GNU/Linux

Resources on Dictionaries

Answer #2:

In your case, what you can do is:

z = dict(list(x.items()) + list(y.items()))

This will, as you want it, put the final dict in z, and make the value for key b be properly overridden by the second (y) dict"s value:

>>> x = {"a":1, "b": 2}
>>> y = {"b":10, "c": 11}
>>> z = dict(list(x.items()) + list(y.items()))
>>> z
{"a": 1, "c": 11, "b": 10}

If you use Python 2, you can even remove the list() calls. To create z:

>>> z = dict(x.items() + y.items())
>>> z
{"a": 1, "c": 11, "b": 10}

If you use Python version 3.9.0a4 or greater, then you can directly use:

x = {"a":1, "b": 2}
y = {"b":10, "c": 11}
z = x | y
print(z)
{"a": 1, "c": 11, "b": 10}

Answer #3:

An alternative:

z = x.copy()
z.update(y)

Answer #4:

Another, more concise, option:

z = dict(x, **y)

Note: this has become a popular answer, but it is important to point out that if y has any non-string keys, the fact that this works at all is an abuse of a CPython implementation detail, and it does not work in Python 3, or in PyPy, IronPython, or Jython. Also, Guido is not a fan. So I can"t recommend this technique for forward-compatible or cross-implementation portable code, which really means it should be avoided entirely.

Answer #5:

This probably won"t be a popular answer, but you almost certainly do not want to do this. If you want a copy that"s a merge, then use copy (or deepcopy, depending on what you want) and then update. The two lines of code are much more readable - more Pythonic - than the single line creation with .items() + .items(). Explicit is better than implicit.

In addition, when you use .items() (pre Python 3.0), you"re creating a new list that contains the items from the dict. If your dictionaries are large, then that is quite a lot of overhead (two large lists that will be thrown away as soon as the merged dict is created). update() can work more efficiently, because it can run through the second dict item-by-item.

In terms of time:

>>> timeit.Timer("dict(x, **y)", "x = dict(zip(range(1000), range(1000)))
y=dict(zip(range(1000,2000), range(1000,2000)))").timeit(100000)
15.52571702003479
>>> timeit.Timer("temp = x.copy()
temp.update(y)", "x = dict(zip(range(1000), range(1000)))
y=dict(zip(range(1000,2000), range(1000,2000)))").timeit(100000)
15.694622993469238
>>> timeit.Timer("dict(x.items() + y.items())", "x = dict(zip(range(1000), range(1000)))
y=dict(zip(range(1000,2000), range(1000,2000)))").timeit(100000)
41.484580039978027

IMO the tiny slowdown between the first two is worth it for the readability. In addition, keyword arguments for dictionary creation was only added in Python 2.3, whereas copy() and update() will work in older versions.

Create An Empty Javascript List: StackOverflow Questions

Answer #6:

How can I merge two Python dictionaries in a single expression?

For dictionaries x and y, z becomes a shallowly-merged dictionary with values from y replacing those from x.

  • In Python 3.9.0 or greater (released 17 October 2020): PEP-584, discussed here, was implemented and provides the simplest method:

    z = x | y          # NOTE: 3.9+ ONLY
    
  • In Python 3.5 or greater:

    z = {**x, **y}
    
  • In Python 2, (or 3.4 or lower) write a function:

    def merge_two_dicts(x, y):
        z = x.copy()   # start with keys and values of x
        z.update(y)    # modifies z with keys and values of y
        return z
    

    and now:

    z = merge_two_dicts(x, y)
    

Explanation

Say you have two dictionaries and you want to merge them into a new dictionary without altering the original dictionaries:

x = {"a": 1, "b": 2}
y = {"b": 3, "c": 4}

The desired result is to get a new dictionary (z) with the values merged, and the second dictionary"s values overwriting those from the first.

>>> z
{"a": 1, "b": 3, "c": 4}

A new syntax for this, proposed in PEP 448 and available as of Python 3.5, is

z = {**x, **y}

And it is indeed a single expression.

Note that we can merge in with literal notation as well:

z = {**x, "foo": 1, "bar": 2, **y}

and now:

>>> z
{"a": 1, "b": 3, "foo": 1, "bar": 2, "c": 4}

It is now showing as implemented in the release schedule for 3.5, PEP 478, and it has now made its way into the What"s New in Python 3.5 document.

However, since many organizations are still on Python 2, you may wish to do this in a backward-compatible way. The classically Pythonic way, available in Python 2 and Python 3.0-3.4, is to do this as a two-step process:

z = x.copy()
z.update(y) # which returns None since it mutates z

In both approaches, y will come second and its values will replace x"s values, thus b will point to 3 in our final result.

Not yet on Python 3.5, but want a single expression

If you are not yet on Python 3.5 or need to write backward-compatible code, and you want this in a single expression, the most performant while the correct approach is to put it in a function:

def merge_two_dicts(x, y):
    """Given two dictionaries, merge them into a new dict as a shallow copy."""
    z = x.copy()
    z.update(y)
    return z

and then you have a single expression:

z = merge_two_dicts(x, y)

You can also make a function to merge an arbitrary number of dictionaries, from zero to a very large number:

def merge_dicts(*dict_args):
    """
    Given any number of dictionaries, shallow copy and merge into a new dict,
    precedence goes to key-value pairs in latter dictionaries.
    """
    result = {}
    for dictionary in dict_args:
        result.update(dictionary)
    return result

This function will work in Python 2 and 3 for all dictionaries. e.g. given dictionaries a to g:

z = merge_dicts(a, b, c, d, e, f, g) 

and key-value pairs in g will take precedence over dictionaries a to f, and so on.

Critiques of Other Answers

Don"t use what you see in the formerly accepted answer:

z = dict(x.items() + y.items())

In Python 2, you create two lists in memory for each dict, create a third list in memory with length equal to the length of the first two put together, and then discard all three lists to create the dict. In Python 3, this will fail because you"re adding two dict_items objects together, not two lists -

>>> c = dict(a.items() + b.items())
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: unsupported operand type(s) for +: "dict_items" and "dict_items"

and you would have to explicitly create them as lists, e.g. z = dict(list(x.items()) + list(y.items())). This is a waste of resources and computation power.

Similarly, taking the union of items() in Python 3 (viewitems() in Python 2.7) will also fail when values are unhashable objects (like lists, for example). Even if your values are hashable, since sets are semantically unordered, the behavior is undefined in regards to precedence. So don"t do this:

>>> c = dict(a.items() | b.items())

This example demonstrates what happens when values are unhashable:

>>> x = {"a": []}
>>> y = {"b": []}
>>> dict(x.items() | y.items())
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: unhashable type: "list"

Here"s an example where y should have precedence, but instead the value from x is retained due to the arbitrary order of sets:

>>> x = {"a": 2}
>>> y = {"a": 1}
>>> dict(x.items() | y.items())
{"a": 2}

Another hack you should not use:

z = dict(x, **y)

This uses the dict constructor and is very fast and memory-efficient (even slightly more so than our two-step process) but unless you know precisely what is happening here (that is, the second dict is being passed as keyword arguments to the dict constructor), it"s difficult to read, it"s not the intended usage, and so it is not Pythonic.

Here"s an example of the usage being remediated in django.

Dictionaries are intended to take hashable keys (e.g. frozensets or tuples), but this method fails in Python 3 when keys are not strings.

>>> c = dict(a, **b)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: keyword arguments must be strings

From the mailing list, Guido van Rossum, the creator of the language, wrote:

I am fine with declaring dict({}, **{1:3}) illegal, since after all it is abuse of the ** mechanism.

and

Apparently dict(x, **y) is going around as "cool hack" for "call x.update(y) and return x". Personally, I find it more despicable than cool.

It is my understanding (as well as the understanding of the creator of the language) that the intended usage for dict(**y) is for creating dictionaries for readability purposes, e.g.:

dict(a=1, b=10, c=11)

instead of

{"a": 1, "b": 10, "c": 11}

Response to comments

Despite what Guido says, dict(x, **y) is in line with the dict specification, which btw. works for both Python 2 and 3. The fact that this only works for string keys is a direct consequence of how keyword parameters work and not a short-coming of dict. Nor is using the ** operator in this place an abuse of the mechanism, in fact, ** was designed precisely to pass dictionaries as keywords.

Again, it doesn"t work for 3 when keys are not strings. The implicit calling contract is that namespaces take ordinary dictionaries, while users must only pass keyword arguments that are strings. All other callables enforced it. dict broke this consistency in Python 2:

>>> foo(**{("a", "b"): None})
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: foo() keywords must be strings
>>> dict(**{("a", "b"): None})
{("a", "b"): None}

This inconsistency was bad given other implementations of Python (PyPy, Jython, IronPython). Thus it was fixed in Python 3, as this usage could be a breaking change.

I submit to you that it is malicious incompetence to intentionally write code that only works in one version of a language or that only works given certain arbitrary constraints.

More comments:

dict(x.items() + y.items()) is still the most readable solution for Python 2. Readability counts.

My response: merge_two_dicts(x, y) actually seems much clearer to me, if we"re actually concerned about readability. And it is not forward compatible, as Python 2 is increasingly deprecated.

{**x, **y} does not seem to handle nested dictionaries. the contents of nested keys are simply overwritten, not merged [...] I ended up being burnt by these answers that do not merge recursively and I was surprised no one mentioned it. In my interpretation of the word "merging" these answers describe "updating one dict with another", and not merging.

Yes. I must refer you back to the question, which is asking for a shallow merge of two dictionaries, with the first"s values being overwritten by the second"s - in a single expression.

Assuming two dictionaries of dictionaries, one might recursively merge them in a single function, but you should be careful not to modify the dictionaries from either source, and the surest way to avoid that is to make a copy when assigning values. As keys must be hashable and are usually therefore immutable, it is pointless to copy them:

from copy import deepcopy

def dict_of_dicts_merge(x, y):
    z = {}
    overlapping_keys = x.keys() & y.keys()
    for key in overlapping_keys:
        z[key] = dict_of_dicts_merge(x[key], y[key])
    for key in x.keys() - overlapping_keys:
        z[key] = deepcopy(x[key])
    for key in y.keys() - overlapping_keys:
        z[key] = deepcopy(y[key])
    return z

Usage:

>>> x = {"a":{1:{}}, "b": {2:{}}}
>>> y = {"b":{10:{}}, "c": {11:{}}}
>>> dict_of_dicts_merge(x, y)
{"b": {2: {}, 10: {}}, "a": {1: {}}, "c": {11: {}}}

Coming up with contingencies for other value types is far beyond the scope of this question, so I will point you at my answer to the canonical question on a "Dictionaries of dictionaries merge".

Less Performant But Correct Ad-hocs

These approaches are less performant, but they will provide correct behavior. They will be much less performant than copy and update or the new unpacking because they iterate through each key-value pair at a higher level of abstraction, but they do respect the order of precedence (latter dictionaries have precedence)

You can also chain the dictionaries manually inside a dict comprehension:

{k: v for d in dicts for k, v in d.items()} # iteritems in Python 2.7

or in Python 2.6 (and perhaps as early as 2.4 when generator expressions were introduced):

dict((k, v) for d in dicts for k, v in d.items()) # iteritems in Python 2

itertools.chain will chain the iterators over the key-value pairs in the correct order:

from itertools import chain
z = dict(chain(x.items(), y.items())) # iteritems in Python 2

Performance Analysis

I"m only going to do the performance analysis of the usages known to behave correctly. (Self-contained so you can copy and paste yourself.)

from timeit import repeat
from itertools import chain

x = dict.fromkeys("abcdefg")
y = dict.fromkeys("efghijk")

def merge_two_dicts(x, y):
    z = x.copy()
    z.update(y)
    return z

min(repeat(lambda: {**x, **y}))
min(repeat(lambda: merge_two_dicts(x, y)))
min(repeat(lambda: {k: v for d in (x, y) for k, v in d.items()}))
min(repeat(lambda: dict(chain(x.items(), y.items()))))
min(repeat(lambda: dict(item for d in (x, y) for item in d.items())))

In Python 3.8.1, NixOS:

>>> min(repeat(lambda: {**x, **y}))
1.0804965235292912
>>> min(repeat(lambda: merge_two_dicts(x, y)))
1.636518670246005
>>> min(repeat(lambda: {k: v for d in (x, y) for k, v in d.items()}))
3.1779992282390594
>>> min(repeat(lambda: dict(chain(x.items(), y.items()))))
2.740647904574871
>>> min(repeat(lambda: dict(item for d in (x, y) for item in d.items())))
4.266070580109954
$ uname -a
Linux nixos 4.19.113 #1-NixOS SMP Wed Mar 25 07:06:15 UTC 2020 x86_64 GNU/Linux

Resources on Dictionaries

Answer #7:

In your case, what you can do is:

z = dict(list(x.items()) + list(y.items()))

This will, as you want it, put the final dict in z, and make the value for key b be properly overridden by the second (y) dict"s value:

>>> x = {"a":1, "b": 2}
>>> y = {"b":10, "c": 11}
>>> z = dict(list(x.items()) + list(y.items()))
>>> z
{"a": 1, "c": 11, "b": 10}

If you use Python 2, you can even remove the list() calls. To create z:

>>> z = dict(x.items() + y.items())
>>> z
{"a": 1, "c": 11, "b": 10}

If you use Python version 3.9.0a4 or greater, then you can directly use:

x = {"a":1, "b": 2}
y = {"b":10, "c": 11}
z = x | y
print(z)
{"a": 1, "c": 11, "b": 10}

Answer #8:

An alternative:

z = x.copy()
z.update(y)

Answer #9:

Another, more concise, option:

z = dict(x, **y)

Note: this has become a popular answer, but it is important to point out that if y has any non-string keys, the fact that this works at all is an abuse of a CPython implementation detail, and it does not work in Python 3, or in PyPy, IronPython, or Jython. Also, Guido is not a fan. So I can"t recommend this technique for forward-compatible or cross-implementation portable code, which really means it should be avoided entirely.

Answer #10:

This probably won"t be a popular answer, but you almost certainly do not want to do this. If you want a copy that"s a merge, then use copy (or deepcopy, depending on what you want) and then update. The two lines of code are much more readable - more Pythonic - than the single line creation with .items() + .items(). Explicit is better than implicit.

In addition, when you use .items() (pre Python 3.0), you"re creating a new list that contains the items from the dict. If your dictionaries are large, then that is quite a lot of overhead (two large lists that will be thrown away as soon as the merged dict is created). update() can work more efficiently, because it can run through the second dict item-by-item.

In terms of time:

>>> timeit.Timer("dict(x, **y)", "x = dict(zip(range(1000), range(1000)))
y=dict(zip(range(1000,2000), range(1000,2000)))").timeit(100000)
15.52571702003479
>>> timeit.Timer("temp = x.copy()
temp.update(y)", "x = dict(zip(range(1000), range(1000)))
y=dict(zip(range(1000,2000), range(1000,2000)))").timeit(100000)
15.694622993469238
>>> timeit.Timer("dict(x.items() + y.items())", "x = dict(zip(range(1000), range(1000)))
y=dict(zip(range(1000,2000), range(1000,2000)))").timeit(100000)
41.484580039978027

IMO the tiny slowdown between the first two is worth it for the readability. In addition, keyword arguments for dictionary creation was only added in Python 2.3, whereas copy() and update() will work in older versions.

Create An Empty Javascript List: StackOverflow Questions

How to insert newlines on argparse help text?

I"m using argparse in Python 2.7 for parsing input options. One of my options is a multiple choice. I want to make a list in its help text, e.g.

from argparse import ArgumentParser

parser = ArgumentParser(description="test")

parser.add_argument("-g", choices=["a", "b", "g", "d", "e"], default="a",
    help="Some option, where
"
         " a = alpha
"
         " b = beta
"
         " g = gamma
"
         " d = delta
"
         " e = epsilon")

parser.parse_args()

However, argparse strips all newlines and consecutive spaces. The result looks like

~/Downloads:52$ python2.7 x.py -h
usage: x.py [-h] [-g {a,b,g,d,e}]

test

optional arguments:
  -h, --help      show this help message and exit
  -g {a,b,g,d,e}  Some option, where a = alpha b = beta g = gamma d = delta e
                  = epsilon

How to insert newlines in the help text?

Answer #1:

Try using RawTextHelpFormatter:

from argparse import RawTextHelpFormatter
parser = ArgumentParser(description="test", formatter_class=RawTextHelpFormatter)

Is a Python list guaranteed to have its elements stay in the order they are inserted in?

If I have the following Python code

>>> x = []
>>> x = x + [1]
>>> x = x + [2]
>>> x = x + [3]
>>> x
[1, 2, 3]

Will x be guaranteed to always be [1,2,3], or are other orderings of the interim elements possible?

Answer #1:

Yes, the order of elements in a python list is persistent.

Inserting image into IPython notebook markdown

I am starting to depend heavily on the IPython notebook app to develop and document algorithms. It is awesome; but there is something that seems like it should be possible, but I can"t figure out how to do it:

I would like to insert a local image into my (local) IPython notebook markdown to aid in documenting an algorithm. I know enough to add something like <img src="image.png"> to the markdown, but that is about as far as my knowledge goes. I assume I could put the image in the directory represented by 127.0.0.1:8888 (or some subdirectory) to be able to access it, but I can"t figure out where that directory is. (I"m working on a mac.) So, is it possible to do what I"m trying to do without too much trouble?

Answer #1:

Most of the answers given so far go in the wrong direction, suggesting to load additional libraries and use the code instead of markup. In Ipython/Jupyter Notebooks it is very simple. Make sure the cell is indeed in markup and to display a image use:

![alt text](imagename.png "Title")

Further advantage compared to the other methods proposed is that you can display all common file formats including jpg, png, and gif (animations).

Answer #2:

Files inside the notebook dir are available under a "files/" url. So if it"s in the base path, it would be <img src="files/image.png">, and subdirs etc. are also available: <img src="files/subdir/image.png">, etc.

Update: starting with IPython 2.0, the files/ prefix is no longer needed (cf. release notes). So now the solution <img src="image.png"> simply works as expected.

how do I insert a column at a specific column index in pandas?

Can I insert a column at a specific column index in pandas?

import pandas as pd
df = pd.DataFrame({"l":["a","b","c","d"], "v":[1,2,1,2]})
df["n"] = 0

This will put column n as the last column of df, but isn"t there a way to tell df to put n at the beginning?

Answer #1:

see docs: http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.insert.html

using loc = 0 will insert at the beginning

df.insert(loc, column, value)

df = pd.DataFrame({"B": [1, 2, 3], "C": [4, 5, 6]})

df
Out: 
   B  C
0  1  4
1  2  5
2  3  6

idx = 0
new_col = [7, 8, 9]  # can be a list, a Series, an array or a scalar   
df.insert(loc=idx, column="A", value=new_col)

df
Out: 
   A  B  C
0  7  1  4
1  8  2  5
2  9  3  6

Create An Empty Javascript List: StackOverflow Questions

JSON datetime between Python and JavaScript

Question by kevin

I want to send a datetime.datetime object in serialized form from Python using JSON and de-serialize in JavaScript using JSON. What is the best way to do this?

Answer #1:

You can add the "default" parameter to json.dumps to handle this:

date_handler = lambda obj: (
    obj.isoformat()
    if isinstance(obj, (datetime.datetime, datetime.date))
    else None
)
json.dumps(datetime.datetime.now(), default=date_handler)
""2010-04-20T20:08:21.634121""

Which is ISO 8601 format.

A more comprehensive default handler function:

def handler(obj):
    if hasattr(obj, "isoformat"):
        return obj.isoformat()
    elif isinstance(obj, ...):
        return ...
    else:
        raise TypeError, "Object of type %s with value of %s is not JSON serializable" % (type(obj), repr(obj))

Update: Added output of type as well as value.
Update: Also handle date

Javascript equivalent of Python"s zip function

Is there a javascript equivalent of Python"s zip function? That is, given multiple arrays of equal lengths create an array of pairs.

For instance, if I have three arrays that look like this:

var array1 = [1, 2, 3];
var array2 = ["a","b","c"];
var array3 = [4, 5, 6];

The output array should be:

var output array:[[1,"a",4], [2,"b",5], [3,"c",6]]

Answer #1:

2016 update:

Here"s a snazzier Ecmascript 6 version:

zip= rows=>rows[0].map((_,c)=>rows.map(row=>row[c]))

Illustration equiv. to Python{zip(*args)}:

> zip([["row0col0", "row0col1", "row0col2"],
       ["row1col0", "row1col1", "row1col2"]]);
[["row0col0","row1col0"],
 ["row0col1","row1col1"],
 ["row0col2","row1col2"]]

(and FizzyTea points out that ES6 has variadic argument syntax, so the following function definition will act like python, but see below for disclaimer... this will not be its own inverse so zip(zip(x)) will not equal x; though as Matt Kramer points out zip(...zip(...x))==x (like in regular python zip(*zip(*x))==x))

Alternative definition equiv. to Python{zip}:

> zip = (...rows) => [...rows[0]].map((_,c) => rows.map(row => row[c]))
> zip( ["row0col0", "row0col1", "row0col2"] ,
       ["row1col0", "row1col1", "row1col2"] );
             // note zip(row0,row1), not zip(matrix)
same answer as above

(Do note that the ... syntax may have performance issues at this time, and possibly in the future, so if you use the second answer with variadic arguments, you may want to perf test it. That said it"s been quite a while since it"s been in the standard.)

Make sure to note the addendum if you wish to use this on strings (perhaps there"s a better way to do it now with es6 iterables).


Here"s a oneliner:

function zip(arrays) {
    return arrays[0].map(function(_,i){
        return arrays.map(function(array){return array[i]})
    });
}

// > zip([[1,2],[11,22],[111,222]])
// [[1,11,111],[2,22,222]]]

// If you believe the following is a valid return value:
//   > zip([])
//   []
// then you can special-case it, or just do
//  return arrays.length==0 ? [] : arrays[0].map(...)

The above assumes that the arrays are of equal size, as they should be. It also assumes you pass in a single list of lists argument, unlike Python"s version where the argument list is variadic. If you want all of these "features", see below. It takes just about 2 extra lines of code.

The following will mimic Python"s zip behavior on edge cases where the arrays are not of equal size, silently pretending the longer parts of arrays don"t exist:

function zip() {
    var args = [].slice.call(arguments);
    var shortest = args.length==0 ? [] : args.reduce(function(a,b){
        return a.length<b.length ? a : b
    });

    return shortest.map(function(_,i){
        return args.map(function(array){return array[i]})
    });
}

// > zip([1,2],[11,22],[111,222,333])
// [[1,11,111],[2,22,222]]]

// > zip()
// []

This will mimic Python"s itertools.zip_longest behavior, inserting undefined where arrays are not defined:

function zip() {
    var args = [].slice.call(arguments);
    var longest = args.reduce(function(a,b){
        return a.length>b.length ? a : b
    }, []);

    return longest.map(function(_,i){
        return args.map(function(array){return array[i]})
    });
}

// > zip([1,2],[11,22],[111,222,333])
// [[1,11,111],[2,22,222],[null,null,333]]

// > zip()
// []

If you use these last two version (variadic aka. multiple-argument versions), then zip is no longer its own inverse. To mimic the zip(*[...]) idiom from Python, you will need to do zip.apply(this, [...]) when you want to invert the zip function or if you want to similarly have a variable number of lists as input.


addendum:

To make this handle any iterable (e.g. in Python you can use zip on strings, ranges, map objects, etc.), you could define the following:

function iterView(iterable) {
    // returns an array equivalent to the iterable
}

However if you write zip in the following way, even that won"t be necessary:

function zip(arrays) {
    return Array.apply(null,Array(arrays[0].length)).map(function(_,i){
        return arrays.map(function(array){return array[i]})
    });
}

Demo:

> JSON.stringify( zip(["abcde",[1,2,3,4,5]]) )
[["a",1],["b",2],["c",3],["d",4],["e",5]]

(Or you could use a range(...) Python-style function if you"ve written one already. Eventually you will be able to use ECMAScript array comprehensions or generators.)

What blocks Ruby, Python to get Javascript V8 speed?

Are there any Ruby / Python features that are blocking implementation of optimizations (e.g. inline caching) V8 engine has?

Python is co-developed by Google guys so it shouldn"t be blocked by software patents.

Or this is rather matter of resources put into the V8 project by Google.

Answer #1:

What blocks Ruby, Python to get Javascript V8 speed?

Nothing.

Well, okay: money. (And time, people, resources, but if you have money, you can buy those.)

V8 has a team of brilliant, highly-specialized, highly-experienced (and thus highly-paid) engineers working on it, that have decades of experience (I"m talking individually – collectively it"s more like centuries) in creating high-performance execution engines for dynamic OO languages. They are basically the same people who also created the Sun HotSpot JVM (among many others).

Lars Bak, the lead developer, has been literally working on VMs for 25 years (and all of those VMs have lead up to V8), which is basically his entire (professional) life. Some of the people writing Ruby VMs aren"t even 25 years old.

Are there any Ruby / Python features that are blocking implementation of optimizations (e.g. inline caching) V8 engine has?

Given that at least IronRuby, JRuby, MagLev, MacRuby and Rubinius have either monomorphic (IronRuby) or polymorphic inline caching, the answer is obviously no.

Modern Ruby implementations already do a great deal of optimizations. For example, for certain operations, Rubinius"s Hash class is faster than YARV"s. Now, this doesn"t sound terribly exciting until you realize that Rubinius"s Hash class is implemented in 100% pure Ruby, while YARV"s is implemented in 100% hand-optimized C.

So, at least in some cases, Rubinius can generate better code than GCC!

Or this is rather matter of resources put into the V8 project by Google.

Yes. Not just Google. The lineage of V8"s source code is 25 years old now. The people who are working on V8 also created the Self VM (to this day one of the fastest dynamic OO language execution engines ever created), the Animorphic Smalltalk VM (to this day one of the fastest Smalltalk execution engines ever created), the HotSpot JVM (the fastest JVM ever created, probably the fastest VM period) and OOVM (one of the most efficient Smalltalk VMs ever created).

In fact, Lars Bak, the lead developer of V8, worked on every single one of those, plus a few others.

Django Template Variables and Javascript

When I render a page using the Django template renderer, I can pass in a dictionary variable containing various values to manipulate them in the page using {{ myVar }}.

Is there a way to access the same variable in Javascript (perhaps using the DOM, I don"t know how Django makes the variables accessible)? I want to be able to lookup details using an AJAX lookup based on the values contained in the variables passed in.

Answer #1:

The {{variable}} is substituted directly into the HTML. Do a view source; it isn"t a "variable" or anything like it. It"s just rendered text.

Having said that, you can put this kind of substitution into your JavaScript.

<script type="text/javascript"> 
   var a = "{{someDjangoVariable}}";
</script>

This gives you "dynamic" javascript.

Web-scraping JavaScript page with Python

I"m trying to develop a simple web scraper. I want to extract text without the HTML code. In fact, I achieve this goal, but I have seen that in some pages where JavaScript is loaded I didn"t obtain good results.

For example, if some JavaScript code adds some text, I can"t see it, because when I call

response = urllib2.urlopen(request)

I get the original text without the added one (because JavaScript is executed in the client).

So, I"m looking for some ideas to solve this problem.

Answer #1:

EDIT 30/Dec/2017: This answer appears in top results of Google searches, so I decided to update it. The old answer is still at the end.

dryscape isn"t maintained anymore and the library dryscape developers recommend is Python 2 only. I have found using Selenium"s python library with Phantom JS as a web driver fast enough and easy to get the work done.

Once you have installed Phantom JS, make sure the phantomjs binary is available in the current path:

phantomjs --version
# result:
2.1.1

Example

To give an example, I created a sample page with following HTML code. (link):

<!DOCTYPE html>
<html>
<head>
  <meta charset="utf-8">
  <title>Javascript scraping test</title>
</head>
<body>
  <p id="intro-text">No javascript support</p>
  <script>
     document.getElementById("intro-text").innerHTML = "Yay! Supports javascript";
  </script> 
</body>
</html>

without javascript it says: No javascript support and with javascript: Yay! Supports javascript

Scraping without JS support:

import requests
from bs4 import BeautifulSoup
response = requests.get(my_url)
soup = BeautifulSoup(response.text)
soup.find(id="intro-text")
# Result:
<p id="intro-text">No javascript support</p>

Scraping with JS support:

from selenium import webdriver
driver = webdriver.PhantomJS()
driver.get(my_url)
p_element = driver.find_element_by_id(id_="intro-text")
print(p_element.text)
# result:
"Yay! Supports javascript"

You can also use Python library dryscrape to scrape javascript driven websites.

Scraping with JS support:

import dryscrape
from bs4 import BeautifulSoup
session = dryscrape.Session()
session.visit(my_url)
response = session.body()
soup = BeautifulSoup(response)
soup.find(id="intro-text")
# Result:
<p id="intro-text">Yay! Supports javascript</p>

Create An Empty Javascript List: StackOverflow Questions

Meaning of @classmethod and @staticmethod for beginner?

Question by user1632861

Could someone explain to me the meaning of @classmethod and @staticmethod in python? I need to know the difference and the meaning.

As far as I understand, @classmethod tells a class that it"s a method which should be inherited into subclasses, or... something. However, what"s the point of that? Why not just define the class method without adding @classmethod or @staticmethod or any @ definitions?

tl;dr: when should I use them, why should I use them, and how should I use them?

Answer #1:

Though classmethod and staticmethod are quite similar, there"s a slight difference in usage for both entities: classmethod must have a reference to a class object as the first parameter, whereas staticmethod can have no parameters at all.

Example

class Date(object):

    def __init__(self, day=0, month=0, year=0):
        self.day = day
        self.month = month
        self.year = year

    @classmethod
    def from_string(cls, date_as_string):
        day, month, year = map(int, date_as_string.split("-"))
        date1 = cls(day, month, year)
        return date1

    @staticmethod
    def is_date_valid(date_as_string):
        day, month, year = map(int, date_as_string.split("-"))
        return day <= 31 and month <= 12 and year <= 3999

date2 = Date.from_string("11-09-2012")
is_date = Date.is_date_valid("11-09-2012")

Explanation

Let"s assume an example of a class, dealing with date information (this will be our boilerplate):

class Date(object):

    def __init__(self, day=0, month=0, year=0):
        self.day = day
        self.month = month
        self.year = year

This class obviously could be used to store information about certain dates (without timezone information; let"s assume all dates are presented in UTC).

Here we have __init__, a typical initializer of Python class instances, which receives arguments as a typical instancemethod, having the first non-optional argument (self) that holds a reference to a newly created instance.

Class Method

We have some tasks that can be nicely done using classmethods.

Let"s assume that we want to create a lot of Date class instances having date information coming from an outer source encoded as a string with format "dd-mm-yyyy". Suppose we have to do this in different places in the source code of our project.

So what we must do here is:

  1. Parse a string to receive day, month and year as three integer variables or a 3-item tuple consisting of that variable.
  2. Instantiate Date by passing those values to the initialization call.

This will look like:

day, month, year = map(int, string_date.split("-"))
date1 = Date(day, month, year)

For this purpose, C++ can implement such a feature with overloading, but Python lacks this overloading. Instead, we can use classmethod. Let"s create another "constructor".

    @classmethod
    def from_string(cls, date_as_string):
        day, month, year = map(int, date_as_string.split("-"))
        date1 = cls(day, month, year)
        return date1

date2 = Date.from_string("11-09-2012")

Let"s look more carefully at the above implementation, and review what advantages we have here:

  1. We"ve implemented date string parsing in one place and it"s reusable now.
  2. Encapsulation works fine here (if you think that you could implement string parsing as a single function elsewhere, this solution fits the OOP paradigm far better).
  3. cls is an object that holds the class itself, not an instance of the class. It"s pretty cool because if we inherit our Date class, all children will have from_string defined also.

Static method

What about staticmethod? It"s pretty similar to classmethod but doesn"t take any obligatory parameters (like a class method or instance method does).

Let"s look at the next use case.

We have a date string that we want to validate somehow. This task is also logically bound to the Date class we"ve used so far, but doesn"t require instantiation of it.

Here is where staticmethod can be useful. Let"s look at the next piece of code:

    @staticmethod
    def is_date_valid(date_as_string):
        day, month, year = map(int, date_as_string.split("-"))
        return day <= 31 and month <= 12 and year <= 3999

    # usage:
    is_date = Date.is_date_valid("11-09-2012")

So, as we can see from usage of staticmethod, we don"t have any access to what the class is---it"s basically just a function, called syntactically like a method, but without access to the object and its internals (fields and another methods), while classmethod does.

Answer #2:

Rostyslav Dzinko"s answer is very appropriate. I thought I could highlight one other reason you should choose @classmethod over @staticmethod when you are creating an additional constructor.

In the example above, Rostyslav used the @classmethod from_string as a Factory to create Date objects from otherwise unacceptable parameters. The same can be done with @staticmethod as is shown in the code below:

class Date:
  def __init__(self, month, day, year):
    self.month = month
    self.day   = day
    self.year  = year


  def display(self):
    return "{0}-{1}-{2}".format(self.month, self.day, self.year)


  @staticmethod
  def millenium(month, day):
    return Date(month, day, 2000)

new_year = Date(1, 1, 2013)               # Creates a new Date object
millenium_new_year = Date.millenium(1, 1) # also creates a Date object. 

# Proof:
new_year.display()           # "1-1-2013"
millenium_new_year.display() # "1-1-2000"

isinstance(new_year, Date) # True
isinstance(millenium_new_year, Date) # True

Thus both new_year and millenium_new_year are instances of the Date class.

But, if you observe closely, the Factory process is hard-coded to create Date objects no matter what. What this means is that even if the Date class is subclassed, the subclasses will still create plain Date objects (without any properties of the subclass). See that in the example below:

class DateTime(Date):
  def display(self):
      return "{0}-{1}-{2} - 00:00:00PM".format(self.month, self.day, self.year)


datetime1 = DateTime(10, 10, 1990)
datetime2 = DateTime.millenium(10, 10)

isinstance(datetime1, DateTime) # True
isinstance(datetime2, DateTime) # False

datetime1.display() # returns "10-10-1990 - 00:00:00PM"
datetime2.display() # returns "10-10-2000" because it"s not a DateTime object but a Date object. Check the implementation of the millenium method on the Date class for more details.

datetime2 is not an instance of DateTime? WTF? Well, that"s because of the @staticmethod decorator used.

In most cases, this is undesired. If what you want is a Factory method that is aware of the class that called it, then @classmethod is what you need.

Rewriting Date.millenium as (that"s the only part of the above code that changes):

@classmethod
def millenium(cls, month, day):
    return cls(month, day, 2000)

ensures that the class is not hard-coded but rather learnt. cls can be any subclass. The resulting object will rightly be an instance of cls.
Let"s test that out:

datetime1 = DateTime(10, 10, 1990)
datetime2 = DateTime.millenium(10, 10)

isinstance(datetime1, DateTime) # True
isinstance(datetime2, DateTime) # True


datetime1.display() # "10-10-1990 - 00:00:00PM"
datetime2.display() # "10-10-2000 - 00:00:00PM"

The reason is, as you know by now, that @classmethod was used instead of @staticmethod

Answer #3:

@classmethod means: when this method is called, we pass the class as the first argument instead of the instance of that class (as we normally do with methods). This means you can use the class and its properties inside that method rather than a particular instance.

@staticmethod means: when this method is called, we don"t pass an instance of the class to it (as we normally do with methods). This means you can put a function inside a class but you can"t access the instance of that class (this is useful when your method does not use the instance).

What is the meaning of single and double underscore before an object name?

Can someone please explain the exact meaning of having single and double leading underscores before an object"s name in Python, and the difference between both?

Also, does that meaning stay the same regardless of whether the object in question is a variable, a function, a method, etc.?

Answer #1:

Single Underscore

Names, in a class, with a leading underscore are simply to indicate to other programmers that the attribute or method is intended to be private. However, nothing special is done with the name itself.

To quote PEP-8:

_single_leading_underscore: weak "internal use" indicator. E.g. from M import * does not import objects whose name starts with an underscore.

Double Underscore (Name Mangling)

From the Python docs:

Any identifier of the form __spam (at least two leading underscores, at most one trailing underscore) is textually replaced with _classname__spam, where classname is the current class name with leading underscore(s) stripped. This mangling is done without regard to the syntactic position of the identifier, so it can be used to define class-private instance and class variables, methods, variables stored in globals, and even variables stored in instances. private to this class on instances of other classes.

And a warning from the same page:

Name mangling is intended to give classes an easy way to define “private” instance variables and methods, without having to worry about instance variables defined by derived classes, or mucking with instance variables by code outside the class. Note that the mangling rules are designed mostly to avoid accidents; it still is possible for a determined soul to access or modify a variable that is considered private.

Example

>>> class MyClass():
...     def __init__(self):
...             self.__superprivate = "Hello"
...             self._semiprivate = ", world!"
...
>>> mc = MyClass()
>>> print mc.__superprivate
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AttributeError: myClass instance has no attribute "__superprivate"
>>> print mc._semiprivate
, world!
>>> print mc.__dict__
{"_MyClass__superprivate": "Hello", "_semiprivate": ", world!"}

Answer #2:

__foo__: this is just a convention, a way for the Python system to use names that won"t conflict with user names.

_foo: this is just a convention, a way for the programmer to indicate that the variable is private (whatever that means in Python).

__foo: this has real meaning: the interpreter replaces this name with _classname__foo as a way to ensure that the name will not overlap with a similar name in another class.

No other form of underscores have meaning in the Python world.

There"s no difference between class, variable, global, etc in these conventions.

Create An Empty Javascript List: StackOverflow Questions

Create list of single item repeated N times

I want to create a series of lists, all of varying lengths. Each list will contain the same element e, repeated n times (where n = length of the list).

How do I create the lists, without using a list comprehension [e for number in xrange(n)] for each list?

Answer #1:

You can also write:

[e] * n

You should note that if e is for example an empty list you get a list with n references to the same list, not n independent empty lists.

Performance testing

At first glance it seems that repeat is the fastest way to create a list with n identical elements:

>>> timeit.timeit("itertools.repeat(0, 10)", "import itertools", number = 1000000)
0.37095273281943264
>>> timeit.timeit("[0] * 10", "import itertools", number = 1000000)
0.5577236771712819

But wait - it"s not a fair test...

>>> itertools.repeat(0, 10)
repeat(0, 10)  # Not a list!!!

The function itertools.repeat doesn"t actually create the list, it just creates an object that can be used to create a list if you wish! Let"s try that again, but converting to a list:

>>> timeit.timeit("list(itertools.repeat(0, 10))", "import itertools", number = 1000000)
1.7508119747063233

So if you want a list, use [e] * n. If you want to generate the elements lazily, use repeat.

Answer #2:

>>> [5] * 4
[5, 5, 5, 5]

Be careful when the item being repeated is a list. The list will not be cloned: all the elements will refer to the same list!

>>> x=[5]
>>> y=[x] * 4
>>> y
[[5], [5], [5], [5]]
>>> y[0][0] = 6
>>> y
[[6], [6], [6], [6]]

What is the best way to repeatedly execute a function every x seconds?

Question by DavidM

I want to repeatedly execute a function in Python every 60 seconds forever (just like an NSTimer in Objective C). This code will run as a daemon and is effectively like calling the python script every minute using a cron, but without requiring that to be set up by the user.

In this question about a cron implemented in Python, the solution appears to effectively just sleep() for x seconds. I don"t need such advanced functionality so perhaps something like this would work

while True:
    # Code executed here
    time.sleep(60)

Are there any foreseeable problems with this code?

Answer #1:

If your program doesn"t have a event loop already, use the sched module, which implements a general purpose event scheduler.

import sched, time
s = sched.scheduler(time.time, time.sleep)
def do_something(sc): 
    print("Doing stuff...")
    # do your stuff
    s.enter(60, 1, do_something, (sc,))

s.enter(60, 1, do_something, (s,))
s.run()

If you"re already using an event loop library like asyncio, trio, tkinter, PyQt5, gobject, kivy, and many others - just schedule the task using your existing event loop library"s methods, instead.

Answer #2:

Lock your time loop to the system clock like this:

import time
starttime = time.time()
while True:
    print "tick"
    time.sleep(60.0 - ((time.time() - starttime) % 60.0))

Answer #3:

If you want a non-blocking way to execute your function periodically, instead of a blocking infinite loop I"d use a threaded timer. This way your code can keep running and perform other tasks and still have your function called every n seconds. I use this technique a lot for printing progress info on long, CPU/Disk/Network intensive tasks.

Here"s the code I"ve posted in a similar question, with start() and stop() control:

from threading import Timer

class RepeatedTimer(object):
    def __init__(self, interval, function, *args, **kwargs):
        self._timer     = None
        self.interval   = interval
        self.function   = function
        self.args       = args
        self.kwargs     = kwargs
        self.is_running = False
        self.start()

    def _run(self):
        self.is_running = False
        self.start()
        self.function(*self.args, **self.kwargs)

    def start(self):
        if not self.is_running:
            self._timer = Timer(self.interval, self._run)
            self._timer.start()
            self.is_running = True

    def stop(self):
        self._timer.cancel()
        self.is_running = False

Usage:

from time import sleep

def hello(name):
    print "Hello %s!" % name

print "starting..."
rt = RepeatedTimer(1, hello, "World") # it auto-starts, no need of rt.start()
try:
    sleep(5) # your long-running job goes here...
finally:
    rt.stop() # better in a try/finally block to make sure the program ends!

Features:

  • Standard library only, no external dependencies
  • start() and stop() are safe to call multiple times even if the timer has already started/stopped
  • function to be called can have positional and named arguments
  • You can change interval anytime, it will be effective after next run. Same for args, kwargs and even function!

Create An Empty Javascript List: StackOverflow Questions

How to print number with commas as thousands separators?

I am trying to print an integer in Python 2.6.1 with commas as thousands separators. For example, I want to show the number 1234567 as 1,234,567. How would I go about doing this? I have seen many examples on Google, but I am looking for the simplest practical way.

It does not need to be locale-specific to decide between periods and commas. I would prefer something as simple as reasonably possible.

Answer #1:

Locale unaware

"{:,}".format(value)  # For Python ‚â•2.7
f"{value:,}"  # For Python ‚â•3.6

Locale aware

import locale
locale.setlocale(locale.LC_ALL, "")  # Use "" for auto, or force e.g. to "en_US.UTF-8"

"{:n}".format(value)  # For Python ‚â•2.7
f"{value:n}"  # For Python ‚â•3.6

Reference

Per Format Specification Mini-Language,

The "," option signals the use of a comma for a thousands separator. For a locale aware separator, use the "n" integer presentation type instead.

Answer #2:

I got this to work:

>>> import locale
>>> locale.setlocale(locale.LC_ALL, "en_US")
"en_US"
>>> locale.format("%d", 1255000, grouping=True)
"1,255,000"

Sure, you don"t need internationalization support, but it"s clear, concise, and uses a built-in library.

P.S. That "%d" is the usual %-style formatter. You can have only one formatter, but it can be whatever you need in terms of field width and precision settings.

P.P.S. If you can"t get locale to work, I"d suggest a modified version of Mark"s answer:

def intWithCommas(x):
    if type(x) not in [type(0), type(0L)]:
        raise TypeError("Parameter must be an integer.")
    if x < 0:
        return "-" + intWithCommas(-x)
    result = ""
    while x >= 1000:
        x, r = divmod(x, 1000)
        result = ",%03d%s" % (r, result)
    return "%d%s" % (x, result)

Recursion is useful for the negative case, but one recursion per comma seems a bit excessive to me.

Answer #3:

I"m surprised that no one has mentioned that you can do this with f-strings in Python 3.6+ as easy as this:

>>> num = 10000000
>>> print(f"{num:,}")
10,000,000

... where the part after the colon is the format specifier. The comma is the separator character you want, so f"{num:_}" uses underscores instead of a comma. Only "," and "_" is possible to use with this method.

This is equivalent of using format(num, ",") for older versions of python 3.

Answer #4:

For inefficiency and unreadability it"s hard to beat:

>>> import itertools
>>> s = "-1234567"
>>> ",".join(["%s%s%s" % (x[0], x[1] or "", x[2] or "") for x in itertools.izip_longest(s[::-1][::3], s[::-1][1::3], s[::-1][2::3])])[::-1].replace("-,","-")

How would you make a comma-separated string from a list of strings?

Question by mweerden

What would be your preferred way to concatenate strings from a sequence such that between every two consecutive pairs a comma is added. That is, how do you map, for instance, ["a", "b", "c"] to "a,b,c"? (The cases ["s"] and [] should be mapped to "s" and "", respectively.)

I usually end up using something like "".join(map(lambda x: x+",",l))[:-1], but also feeling somewhat unsatisfied.

Answer #1:

my_list = ["a", "b", "c", "d"]
my_string = ",".join(my_list)
"a,b,c,d"

This won"t work if the list contains integers


And if the list contains non-string types (such as integers, floats, bools, None) then do:

my_string = ",".join(map(str, my_list)) 

Create An Empty Javascript List: StackOverflow Questions

How do I merge two dictionaries in a single expression (taking union of dictionaries)?

Question by Carl Meyer

I have two Python dictionaries, and I want to write a single expression that returns these two dictionaries, merged (i.e. taking the union). The update() method would be what I need, if it returned its result instead of modifying a dictionary in-place.

>>> x = {"a": 1, "b": 2}
>>> y = {"b": 10, "c": 11}
>>> z = x.update(y)
>>> print(z)
None
>>> x
{"a": 1, "b": 10, "c": 11}

How can I get that final merged dictionary in z, not x?

(To be extra-clear, the last-one-wins conflict-handling of dict.update() is what I"m looking for as well.)

Answer #1:

How can I merge two Python dictionaries in a single expression?

For dictionaries x and y, z becomes a shallowly-merged dictionary with values from y replacing those from x.

  • In Python 3.9.0 or greater (released 17 October 2020): PEP-584, discussed here, was implemented and provides the simplest method:

    z = x | y          # NOTE: 3.9+ ONLY
    
  • In Python 3.5 or greater:

    z = {**x, **y}
    
  • In Python 2, (or 3.4 or lower) write a function:

    def merge_two_dicts(x, y):
        z = x.copy()   # start with keys and values of x
        z.update(y)    # modifies z with keys and values of y
        return z
    

    and now:

    z = merge_two_dicts(x, y)
    

Explanation

Say you have two dictionaries and you want to merge them into a new dictionary without altering the original dictionaries:

x = {"a": 1, "b": 2}
y = {"b": 3, "c": 4}

The desired result is to get a new dictionary (z) with the values merged, and the second dictionary"s values overwriting those from the first.

>>> z
{"a": 1, "b": 3, "c": 4}

A new syntax for this, proposed in PEP 448 and available as of Python 3.5, is

z = {**x, **y}

And it is indeed a single expression.

Note that we can merge in with literal notation as well:

z = {**x, "foo": 1, "bar": 2, **y}

and now:

>>> z
{"a": 1, "b": 3, "foo": 1, "bar": 2, "c": 4}

It is now showing as implemented in the release schedule for 3.5, PEP 478, and it has now made its way into the What"s New in Python 3.5 document.

However, since many organizations are still on Python 2, you may wish to do this in a backward-compatible way. The classically Pythonic way, available in Python 2 and Python 3.0-3.4, is to do this as a two-step process:

z = x.copy()
z.update(y) # which returns None since it mutates z

In both approaches, y will come second and its values will replace x"s values, thus b will point to 3 in our final result.

Not yet on Python 3.5, but want a single expression

If you are not yet on Python 3.5 or need to write backward-compatible code, and you want this in a single expression, the most performant while the correct approach is to put it in a function:

def merge_two_dicts(x, y):
    """Given two dictionaries, merge them into a new dict as a shallow copy."""
    z = x.copy()
    z.update(y)
    return z

and then you have a single expression:

z = merge_two_dicts(x, y)

You can also make a function to merge an arbitrary number of dictionaries, from zero to a very large number:

def merge_dicts(*dict_args):
    """
    Given any number of dictionaries, shallow copy and merge into a new dict,
    precedence goes to key-value pairs in latter dictionaries.
    """
    result = {}
    for dictionary in dict_args:
        result.update(dictionary)
    return result

This function will work in Python 2 and 3 for all dictionaries. e.g. given dictionaries a to g:

z = merge_dicts(a, b, c, d, e, f, g) 

and key-value pairs in g will take precedence over dictionaries a to f, and so on.

Critiques of Other Answers

Don"t use what you see in the formerly accepted answer:

z = dict(x.items() + y.items())

In Python 2, you create two lists in memory for each dict, create a third list in memory with length equal to the length of the first two put together, and then discard all three lists to create the dict. In Python 3, this will fail because you"re adding two dict_items objects together, not two lists -

>>> c = dict(a.items() + b.items())
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: unsupported operand type(s) for +: "dict_items" and "dict_items"

and you would have to explicitly create them as lists, e.g. z = dict(list(x.items()) + list(y.items())). This is a waste of resources and computation power.

Similarly, taking the union of items() in Python 3 (viewitems() in Python 2.7) will also fail when values are unhashable objects (like lists, for example). Even if your values are hashable, since sets are semantically unordered, the behavior is undefined in regards to precedence. So don"t do this:

>>> c = dict(a.items() | b.items())

This example demonstrates what happens when values are unhashable:

>>> x = {"a": []}
>>> y = {"b": []}
>>> dict(x.items() | y.items())
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: unhashable type: "list"

Here"s an example where y should have precedence, but instead the value from x is retained due to the arbitrary order of sets:

>>> x = {"a": 2}
>>> y = {"a": 1}
>>> dict(x.items() | y.items())
{"a": 2}

Another hack you should not use:

z = dict(x, **y)

This uses the dict constructor and is very fast and memory-efficient (even slightly more so than our two-step process) but unless you know precisely what is happening here (that is, the second dict is being passed as keyword arguments to the dict constructor), it"s difficult to read, it"s not the intended usage, and so it is not Pythonic.

Here"s an example of the usage being remediated in django.

Dictionaries are intended to take hashable keys (e.g. frozensets or tuples), but this method fails in Python 3 when keys are not strings.

>>> c = dict(a, **b)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: keyword arguments must be strings

From the mailing list, Guido van Rossum, the creator of the language, wrote:

I am fine with declaring dict({}, **{1:3}) illegal, since after all it is abuse of the ** mechanism.

and

Apparently dict(x, **y) is going around as "cool hack" for "call x.update(y) and return x". Personally, I find it more despicable than cool.

It is my understanding (as well as the understanding of the creator of the language) that the intended usage for dict(**y) is for creating dictionaries for readability purposes, e.g.:

dict(a=1, b=10, c=11)

instead of

{"a": 1, "b": 10, "c": 11}

Response to comments

Despite what Guido says, dict(x, **y) is in line with the dict specification, which btw. works for both Python 2 and 3. The fact that this only works for string keys is a direct consequence of how keyword parameters work and not a short-coming of dict. Nor is using the ** operator in this place an abuse of the mechanism, in fact, ** was designed precisely to pass dictionaries as keywords.

Again, it doesn"t work for 3 when keys are not strings. The implicit calling contract is that namespaces take ordinary dictionaries, while users must only pass keyword arguments that are strings. All other callables enforced it. dict broke this consistency in Python 2:

>>> foo(**{("a", "b"): None})
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: foo() keywords must be strings
>>> dict(**{("a", "b"): None})
{("a", "b"): None}

This inconsistency was bad given other implementations of Python (PyPy, Jython, IronPython). Thus it was fixed in Python 3, as this usage could be a breaking change.

I submit to you that it is malicious incompetence to intentionally write code that only works in one version of a language or that only works given certain arbitrary constraints.

More comments:

dict(x.items() + y.items()) is still the most readable solution for Python 2. Readability counts.

My response: merge_two_dicts(x, y) actually seems much clearer to me, if we"re actually concerned about readability. And it is not forward compatible, as Python 2 is increasingly deprecated.

{**x, **y} does not seem to handle nested dictionaries. the contents of nested keys are simply overwritten, not merged [...] I ended up being burnt by these answers that do not merge recursively and I was surprised no one mentioned it. In my interpretation of the word "merging" these answers describe "updating one dict with another", and not merging.

Yes. I must refer you back to the question, which is asking for a shallow merge of two dictionaries, with the first"s values being overwritten by the second"s - in a single expression.

Assuming two dictionaries of dictionaries, one might recursively merge them in a single function, but you should be careful not to modify the dictionaries from either source, and the surest way to avoid that is to make a copy when assigning values. As keys must be hashable and are usually therefore immutable, it is pointless to copy them:

from copy import deepcopy

def dict_of_dicts_merge(x, y):
    z = {}
    overlapping_keys = x.keys() & y.keys()
    for key in overlapping_keys:
        z[key] = dict_of_dicts_merge(x[key], y[key])
    for key in x.keys() - overlapping_keys:
        z[key] = deepcopy(x[key])
    for key in y.keys() - overlapping_keys:
        z[key] = deepcopy(y[key])
    return z

Usage:

>>> x = {"a":{1:{}}, "b": {2:{}}}
>>> y = {"b":{10:{}}, "c": {11:{}}}
>>> dict_of_dicts_merge(x, y)
{"b": {2: {}, 10: {}}, "a": {1: {}}, "c": {11: {}}}

Coming up with contingencies for other value types is far beyond the scope of this question, so I will point you at my answer to the canonical question on a "Dictionaries of dictionaries merge".

Less Performant But Correct Ad-hocs

These approaches are less performant, but they will provide correct behavior. They will be much less performant than copy and update or the new unpacking because they iterate through each key-value pair at a higher level of abstraction, but they do respect the order of precedence (latter dictionaries have precedence)

You can also chain the dictionaries manually inside a dict comprehension:

{k: v for d in dicts for k, v in d.items()} # iteritems in Python 2.7

or in Python 2.6 (and perhaps as early as 2.4 when generator expressions were introduced):

dict((k, v) for d in dicts for k, v in d.items()) # iteritems in Python 2

itertools.chain will chain the iterators over the key-value pairs in the correct order:

from itertools import chain
z = dict(chain(x.items(), y.items())) # iteritems in Python 2

Performance Analysis

I"m only going to do the performance analysis of the usages known to behave correctly. (Self-contained so you can copy and paste yourself.)

from timeit import repeat
from itertools import chain

x = dict.fromkeys("abcdefg")
y = dict.fromkeys("efghijk")

def merge_two_dicts(x, y):
    z = x.copy()
    z.update(y)
    return z

min(repeat(lambda: {**x, **y}))
min(repeat(lambda: merge_two_dicts(x, y)))
min(repeat(lambda: {k: v for d in (x, y) for k, v in d.items()}))
min(repeat(lambda: dict(chain(x.items(), y.items()))))
min(repeat(lambda: dict(item for d in (x, y) for item in d.items())))

In Python 3.8.1, NixOS:

>>> min(repeat(lambda: {**x, **y}))
1.0804965235292912
>>> min(repeat(lambda: merge_two_dicts(x, y)))
1.636518670246005
>>> min(repeat(lambda: {k: v for d in (x, y) for k, v in d.items()}))
3.1779992282390594
>>> min(repeat(lambda: dict(chain(x.items(), y.items()))))
2.740647904574871
>>> min(repeat(lambda: dict(item for d in (x, y) for item in d.items())))
4.266070580109954
$ uname -a
Linux nixos 4.19.113 #1-NixOS SMP Wed Mar 25 07:06:15 UTC 2020 x86_64 GNU/Linux

Resources on Dictionaries

Answer #2:

In your case, what you can do is:

z = dict(list(x.items()) + list(y.items()))

This will, as you want it, put the final dict in z, and make the value for key b be properly overridden by the second (y) dict"s value:

>>> x = {"a":1, "b": 2}
>>> y = {"b":10, "c": 11}
>>> z = dict(list(x.items()) + list(y.items()))
>>> z
{"a": 1, "c": 11, "b": 10}

If you use Python 2, you can even remove the list() calls. To create z:

>>> z = dict(x.items() + y.items())
>>> z
{"a": 1, "c": 11, "b": 10}

If you use Python version 3.9.0a4 or greater, then you can directly use:

x = {"a":1, "b": 2}
y = {"b":10, "c": 11}
z = x | y
print(z)
{"a": 1, "c": 11, "b": 10}

Answer #3:

An alternative:

z = x.copy()
z.update(y)

Answer #4:

Another, more concise, option:

z = dict(x, **y)

Note: this has become a popular answer, but it is important to point out that if y has any non-string keys, the fact that this works at all is an abuse of a CPython implementation detail, and it does not work in Python 3, or in PyPy, IronPython, or Jython. Also, Guido is not a fan. So I can"t recommend this technique for forward-compatible or cross-implementation portable code, which really means it should be avoided entirely.

Answer #5:

This probably won"t be a popular answer, but you almost certainly do not want to do this. If you want a copy that"s a merge, then use copy (or deepcopy, depending on what you want) and then update. The two lines of code are much more readable - more Pythonic - than the single line creation with .items() + .items(). Explicit is better than implicit.

In addition, when you use .items() (pre Python 3.0), you"re creating a new list that contains the items from the dict. If your dictionaries are large, then that is quite a lot of overhead (two large lists that will be thrown away as soon as the merged dict is created). update() can work more efficiently, because it can run through the second dict item-by-item.

In terms of time:

>>> timeit.Timer("dict(x, **y)", "x = dict(zip(range(1000), range(1000)))
y=dict(zip(range(1000,2000), range(1000,2000)))").timeit(100000)
15.52571702003479
>>> timeit.Timer("temp = x.copy()
temp.update(y)", "x = dict(zip(range(1000), range(1000)))
y=dict(zip(range(1000,2000), range(1000,2000)))").timeit(100000)
15.694622993469238
>>> timeit.Timer("dict(x.items() + y.items())", "x = dict(zip(range(1000), range(1000)))
y=dict(zip(range(1000,2000), range(1000,2000)))").timeit(100000)
41.484580039978027

IMO the tiny slowdown between the first two is worth it for the readability. In addition, keyword arguments for dictionary creation was only added in Python 2.3, whereas copy() and update() will work in older versions.

Create An Empty Javascript List: StackOverflow Questions

How to remove convexity defects in a Sudoku square?

I was doing a fun project: Solving a Sudoku from an input image using OpenCV (as in Google goggles etc). And I have completed the task, but at the end I found a little problem for which I came here.

I did the programming using Python API of OpenCV 2.3.1.

Below is what I did :

  1. Read the image
  2. Find the contours
  3. Select the one with maximum area, ( and also somewhat equivalent to square).
  4. Find the corner points.

    e.g. given below:

    enter image description here

    (Notice here that the green line correctly coincides with the true boundary of the Sudoku, so the Sudoku can be correctly warped. Check next image)

  5. warp the image to a perfect square

    eg image:

    enter image description here

  6. Perform OCR ( for which I used the method I have given in Simple Digit Recognition OCR in OpenCV-Python )

And the method worked well.

Problem:

Check out this image.

Performing the step 4 on this image gives the result below:

enter image description here

The red line drawn is the original contour which is the true outline of sudoku boundary.

The green line drawn is approximated contour which will be the outline of warped image.

Which of course, there is difference between green line and red line at the top edge of sudoku. So while warping, I am not getting the original boundary of the Sudoku.

My Question :

How can I warp the image on the correct boundary of the Sudoku, i.e. the red line OR how can I remove the difference between red line and green line? Is there any method for this in OpenCV?

Answer #1:

I have a solution that works, but you"ll have to translate it to OpenCV yourself. It"s written in Mathematica.

The first step is to adjust the brightness in the image, by dividing each pixel with the result of a closing operation:

src = ColorConvert[Import["http://davemark.com/images/sudoku.jpg"], "Grayscale"];
white = Closing[src, DiskMatrix[5]];
srcAdjusted = Image[ImageData[src]/ImageData[white]]

enter image description here

The next step is to find the sudoku area, so I can ignore (mask out) the background. For that, I use connected component analysis, and select the component that"s got the largest convex area:

components = 
  ComponentMeasurements[
    [email protected][srcAdjusted], {"ConvexArea", "Mask"}][[All, 
    2]];
largestComponent = Image[SortBy[components, First][[-1, 2]]]

enter image description here

By filling this image, I get a mask for the sudoku grid:

mask = FillingTransform[largestComponent]

enter image description here

Now, I can use a 2nd order derivative filter to find the vertical and horizontal lines in two separate images:

lY = ImageMultiply[MorphologicalBinarize[GaussianFilter[srcAdjusted, 3, {2, 0}], {0.02, 0.05}], mask];
lX = ImageMultiply[MorphologicalBinarize[GaussianFilter[srcAdjusted, 3, {0, 2}], {0.02, 0.05}], mask];

enter image description here

I use connected component analysis again to extract the grid lines from these images. The grid lines are much longer than the digits, so I can use caliper length to select only the grid lines-connected components. Sorting them by position, I get 2x10 mask images for each of the vertical/horizontal grid lines in the image:

verticalGridLineMasks = 
  SortBy[ComponentMeasurements[
      lX, {"CaliperLength", "Centroid", "Mask"}, # > 100 &][[All, 
      2]], #[[2, 1]] &][[All, 3]];
horizontalGridLineMasks = 
  SortBy[ComponentMeasurements[
      lY, {"CaliperLength", "Centroid", "Mask"}, # > 100 &][[All, 
      2]], #[[2, 2]] &][[All, 3]];

enter image description here

Next I take each pair of vertical/horizontal grid lines, dilate them, calculate the pixel-by-pixel intersection, and calculate the center of the result. These points are the grid line intersections:

centerOfGravity[l_] := 
 ComponentMeasurements[Image[l], "Centroid"][[1, 2]]
gridCenters = 
  Table[centerOfGravity[
    ImageData[Dilation[Image[h], DiskMatrix[2]]]*
     ImageData[Dilation[Image[v], DiskMatrix[2]]]], {h, 
    horizontalGridLineMasks}, {v, verticalGridLineMasks}];

enter image description here

The last step is to define two interpolation functions for X/Y mapping through these points, and transform the image using these functions:

fnX = ListInterpolation[gridCenters[[All, All, 1]]];
fnY = ListInterpolation[gridCenters[[All, All, 2]]];
transformed = 
 ImageTransformation[
  srcAdjusted, {fnX @@ Reverse[#], fnY @@ Reverse[#]} &, {9*50, 9*50},
   PlotRange -> {{1, 10}, {1, 10}}, DataRange -> Full]

enter image description here

All of the operations are basic image processing function, so this should be possible in OpenCV, too. The spline-based image transformation might be harder, but I don"t think you really need it. Probably using the perspective transformation you use now on each individual cell will give good enough results.

Answer #2:

Nikie"s answer solved my problem, but his answer was in Mathematica. So I thought I should give its OpenCV adaptation here. But after implementing I could see that OpenCV code is much bigger than nikie"s mathematica code. And also, I couldn"t find interpolation method done by nikie in OpenCV ( although it can be done using scipy, i will tell it when time comes.)

1. Image PreProcessing ( closing operation )

import cv2
import numpy as np

img = cv2.imread("dave.jpg")
img = cv2.GaussianBlur(img,(5,5),0)
gray = cv2.cvtColor(img,cv2.COLOR_BGR2GRAY)
mask = np.zeros((gray.shape),np.uint8)
kernel1 = cv2.getStructuringElement(cv2.MORPH_ELLIPSE,(11,11))

close = cv2.morphologyEx(gray,cv2.MORPH_CLOSE,kernel1)
div = np.float32(gray)/(close)
res = np.uint8(cv2.normalize(div,div,0,255,cv2.NORM_MINMAX))
res2 = cv2.cvtColor(res,cv2.COLOR_GRAY2BGR)

Result :

Result of closing

2. Finding Sudoku Square and Creating Mask Image

thresh = cv2.adaptiveThreshold(res,255,0,1,19,2)
contour,hier = cv2.findContours(thresh,cv2.RETR_TREE,cv2.CHAIN_APPROX_SIMPLE)

max_area = 0
best_cnt = None
for cnt in contour:
    area = cv2.contourArea(cnt)
    if area > 1000:
        if area > max_area:
            max_area = area
            best_cnt = cnt

cv2.drawContours(mask,[best_cnt],0,255,-1)
cv2.drawContours(mask,[best_cnt],0,0,2)

res = cv2.bitwise_and(res,mask)

Result :

enter image description here

3. Finding Vertical lines

kernelx = cv2.getStructuringElement(cv2.MORPH_RECT,(2,10))

dx = cv2.Sobel(res,cv2.CV_16S,1,0)
dx = cv2.convertScaleAbs(dx)
cv2.normalize(dx,dx,0,255,cv2.NORM_MINMAX)
ret,close = cv2.threshold(dx,0,255,cv2.THRESH_BINARY+cv2.THRESH_OTSU)
close = cv2.morphologyEx(close,cv2.MORPH_DILATE,kernelx,iterations = 1)

contour, hier = cv2.findContours(close,cv2.RETR_EXTERNAL,cv2.CHAIN_APPROX_SIMPLE)
for cnt in contour:
    x,y,w,h = cv2.boundingRect(cnt)
    if h/w > 5:
        cv2.drawContours(close,[cnt],0,255,-1)
    else:
        cv2.drawContours(close,[cnt],0,0,-1)
close = cv2.morphologyEx(close,cv2.MORPH_CLOSE,None,iterations = 2)
closex = close.copy()

Result :

enter image description here

4. Finding Horizontal Lines

kernely = cv2.getStructuringElement(cv2.MORPH_RECT,(10,2))
dy = cv2.Sobel(res,cv2.CV_16S,0,2)
dy = cv2.convertScaleAbs(dy)
cv2.normalize(dy,dy,0,255,cv2.NORM_MINMAX)
ret,close = cv2.threshold(dy,0,255,cv2.THRESH_BINARY+cv2.THRESH_OTSU)
close = cv2.morphologyEx(close,cv2.MORPH_DILATE,kernely)

contour, hier = cv2.findContours(close,cv2.RETR_EXTERNAL,cv2.CHAIN_APPROX_SIMPLE)
for cnt in contour:
    x,y,w,h = cv2.boundingRect(cnt)
    if w/h > 5:
        cv2.drawContours(close,[cnt],0,255,-1)
    else:
        cv2.drawContours(close,[cnt],0,0,-1)

close = cv2.morphologyEx(close,cv2.MORPH_DILATE,None,iterations = 2)
closey = close.copy()

Result :

enter image description here

Of course, this one is not so good.

5. Finding Grid Points

res = cv2.bitwise_and(closex,closey)

Result :

enter image description here

6. Correcting the defects

Here, nikie does some kind of interpolation, about which I don"t have much knowledge. And i couldn"t find any corresponding function for this OpenCV. (may be it is there, i don"t know).

Check out this SOF which explains how to do this using SciPy, which I don"t want to use : Image transformation in OpenCV

So, here I took 4 corners of each sub-square and applied warp Perspective to each.

For that, first we find the centroids.

contour, hier = cv2.findContours(res,cv2.RETR_LIST,cv2.CHAIN_APPROX_SIMPLE)
centroids = []
for cnt in contour:
    mom = cv2.moments(cnt)
    (x,y) = int(mom["m10"]/mom["m00"]), int(mom["m01"]/mom["m00"])
    cv2.circle(img,(x,y),4,(0,255,0),-1)
    centroids.append((x,y))

But resulting centroids won"t be sorted. Check out below image to see their order:

enter image description here

So we sort them from left to right, top to bottom.

centroids = np.array(centroids,dtype = np.float32)
c = centroids.reshape((100,2))
c2 = c[np.argsort(c[:,1])]

b = np.vstack([c2[i*10:(i+1)*10][np.argsort(c2[i*10:(i+1)*10,0])] for i in xrange(10)])
bm = b.reshape((10,10,2))

Now see below their order :

enter image description here

Finally we apply the transformation and create a new image of size 450x450.

output = np.zeros((450,450,3),np.uint8)
for i,j in enumerate(b):
    ri = i/10
    ci = i%10
    if ci != 9 and ri!=9:
        src = bm[ri:ri+2, ci:ci+2 , :].reshape((4,2))
        dst = np.array( [ [ci*50,ri*50],[(ci+1)*50-1,ri*50],[ci*50,(ri+1)*50-1],[(ci+1)*50-1,(ri+1)*50-1] ], np.float32)
        retval = cv2.getPerspectiveTransform(src,dst)
        warp = cv2.warpPerspective(res2,retval,(450,450))
        output[ri*50:(ri+1)*50-1 , ci*50:(ci+1)*50-1] = warp[ri*50:(ri+1)*50-1 , ci*50:(ci+1)*50-1].copy()

Result :

enter image description here

The result is almost same as nikie"s, but code length is large. May be, better methods are available out there, but until then, this works OK.

Regards ARK.

Is there a library function for Root mean square error (RMSE) in python?

I know I could implement a root mean squared error function like this:

def rmse(predictions, targets):
    return np.sqrt(((predictions - targets) ** 2).mean())

What I"m looking for if this rmse function is implemented in a library somewhere, perhaps in scipy or scikit-learn?

Answer #1:

sklearn >= 0.22.0

sklearn.metrics has a mean_squared_error function with a squared kwarg (defaults to True). Setting squared to False will return the RMSE.

from sklearn.metrics import mean_squared_error

rms = mean_squared_error(y_actual, y_predicted, squared=False)

sklearn < 0.22.0

sklearn.metrics has a mean_squared_error function. The RMSE is just the square root of whatever it returns.

from sklearn.metrics import mean_squared_error
from math import sqrt

rms = sqrt(mean_squared_error(y_actual, y_predicted))

Answer #2:

What is RMSE? Also known as MSE, RMD, or RMS. What problem does it solve?

If you understand RMSE: (Root mean squared error), MSE: (Mean Squared Error) RMD (Root mean squared deviation) and RMS: (Root Mean Squared), then asking for a library to calculate this for you is unnecessary over-engineering. All these metrics are a single line of python code at most 2 inches long. The three metrics rmse, mse, rmd, and rms are at their core conceptually identical.

RMSE answers the question: "How similar, on average, are the numbers in list1 to list2?". The two lists must be the same size. I want to "wash out the noise between any two given elements, wash out the size of the data collected, and get a single number feel for change over time".

Intuition and ELI5 for RMSE:

Imagine you are learning to throw darts at a dart board. Every day you practice for one hour. You want to figure out if you are getting better or getting worse. So every day you make 10 throws and measure the distance between the bullseye and where your dart hit.

You make a list of those numbers list1. Use the root mean squared error between the distances at day 1 and a list2 containing all zeros. Do the same on the 2nd and nth days. What you will get is a single number that hopefully decreases over time. When your RMSE number is zero, you hit bullseyes every time. If the rmse number goes up, you are getting worse.

Example in calculating root mean squared error in python:

import numpy as np
d = [0.000, 0.166, 0.333]   #ideal target distances, these can be all zeros.
p = [0.000, 0.254, 0.998]   #your performance goes here

print("d is: " + str(["%.8f" % elem for elem in d]))
print("p is: " + str(["%.8f" % elem for elem in p]))

def rmse(predictions, targets):
    return np.sqrt(((predictions - targets) ** 2).mean())

rmse_val = rmse(np.array(d), np.array(p))
print("rms error is: " + str(rmse_val))

Which prints:

d is: ["0.00000000", "0.16600000", "0.33300000"]
p is: ["0.00000000", "0.25400000", "0.99800000"]
rms error between lists d and p is: 0.387284994115

The mathematical notation:

root mean squared deviation explained

Glyph Legend: n is a whole positive integer representing the number of throws. i represents a whole positive integer counter that enumerates sum. d stands for the ideal distances, the list2 containing all zeros in above example. p stands for performance, the list1 in the above example. superscript 2 stands for numeric squared. di is the i"th index of d. pi is the i"th index of p.

The rmse done in small steps so it can be understood:

def rmse(predictions, targets):

    differences = predictions - targets                       #the DIFFERENCEs.

    differences_squared = differences ** 2                    #the SQUAREs of ^

    mean_of_differences_squared = differences_squared.mean()  #the MEAN of ^

    rmse_val = np.sqrt(mean_of_differences_squared)           #ROOT of ^

    return rmse_val                                           #get the ^

How does every step of RMSE work:

Subtracting one number from another gives you the distance between them.

8 - 5 = 3         #absolute distance between 8 and 5 is +3
-20 - 10 = -30    #absolute distance between -20 and 10 is +30

If you multiply any number times itself, the result is always positive because negative times negative is positive:

3*3     = 9   = positive
-30*-30 = 900 = positive

Add them all up, but wait, then an array with many elements would have a larger error than a small array, so average them by the number of elements.

But wait, we squared them all earlier to force them positive. Undo the damage with a square root!

That leaves you with a single number that represents, on average, the distance between every value of list1 to it"s corresponding element value of list2.

If the RMSE value goes down over time we are happy because variance is decreasing.

RMSE isn"t the most accurate line fitting strategy, total least squares is:

Root mean squared error measures the vertical distance between the point and the line, so if your data is shaped like a banana, flat near the bottom and steep near the top, then the RMSE will report greater distances to points high, but short distances to points low when in fact the distances are equivalent. This causes a skew where the line prefers to be closer to points high than low.

If this is a problem the total least squares method fixes this: https://mubaris.com/posts/linear-regression

Gotchas that can break this RMSE function:

If there are nulls or infinity in either input list, then output rmse value is is going to not make sense. There are three strategies to deal with nulls / missing values / infinities in either list: Ignore that component, zero it out or add a best guess or a uniform random noise to all timesteps. Each remedy has its pros and cons depending on what your data means. In general ignoring any component with a missing value is preferred, but this biases the RMSE toward zero making you think performance has improved when it really hasn"t. Adding random noise on a best guess could be preferred if there are lots of missing values.

In order to guarantee relative correctness of the RMSE output, you must eliminate all nulls/infinites from the input.

RMSE has zero tolerance for outlier data points which don"t belong

Root mean squared error squares relies on all data being right and all are counted as equal. That means one stray point that"s way out in left field is going to totally ruin the whole calculation. To handle outlier data points and dismiss their tremendous influence after a certain threshold, see Robust estimators that build in a threshold for dismissal of outliers.

What"s the difference between lists enclosed by square brackets and parentheses in Python?

>>> x=[1,2]
>>> x[1]
2
>>> x=(1,2)
>>> x[1]
2

Are they both valid? Is one preferred for some reason?

Answer #1:

Square brackets are lists while parentheses are tuples.

A list is mutable, meaning you can change its contents:

>>> x = [1,2]
>>> x.append(3)
>>> x
[1, 2, 3]

while tuples are not:

>>> x = (1,2)
>>> x
(1, 2)
>>> x.append(3)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AttributeError: "tuple" object has no attribute "append"

The other main difference is that a tuple is hashable, meaning that you can use it as a key to a dictionary, among other things. For example:

>>> x = (1,2)
>>> y = [1,2]
>>> z = {}
>>> z[x] = 3
>>> z
{(1, 2): 3}
>>> z[y] = 4
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: unhashable type: "list"

Note that, as many people have pointed out, you can add tuples together. For example:

>>> x = (1,2)
>>> x += (3,)
>>> x
(1, 2, 3)

However, this does not mean tuples are mutable. In the example above, a new tuple is constructed by adding together the two tuples as arguments. The original tuple is not modified. To demonstrate this, consider the following:

>>> x = (1,2)
>>> y = x
>>> x += (3,)
>>> x
(1, 2, 3)
>>> y
(1, 2)

Whereas, if you were to construct this same example with a list, y would also be updated:

>>> x = [1, 2]
>>> y = x
>>> x += [3]
>>> x
[1, 2, 3]
>>> y
[1, 2, 3]

Shop

Best laptop for Sims 4

$

Best laptop for Zoom

$499

Best laptop for Minecraft

$590

Best laptop for engineering student

$

Best laptop for development

$

Best laptop for Cricut Maker

$

Best laptop for hacking

$890

Best laptop for Machine Learning

$950

Latest questions

NUMPYNUMPY

psycopg2: insert multiple rows with one query

12 answers

NUMPYNUMPY

How to convert Nonetype to int or string?

12 answers

NUMPYNUMPY

How to specify multiple return types using type-hints

12 answers

NUMPYNUMPY

Javascript Error: IPython is not defined in JupyterLab

12 answers

News

Wiki

Python OpenCV | cv2.putText () method

numpy.arctan2 () in Python

Python | os.path.realpath () method

Python OpenCV | cv2.circle () method

Python OpenCV cv2.cvtColor () method

Python - Move item to the end of the list

time.perf_counter () function in Python

Check if one list is a subset of another in Python

Python os.path.join () method