matrix operations | identity function ()

identity | NumPy | Python Methods and Functions

numpy.matlib.identity() — another function to do matrix operations in numpy. Returns the square matrix of identity of the given input size.

Syntax: numpy.matlib.identity (n, dtype = None)

Parameters :
n: [int] Number of rows and columns in the output matrix.
dtype: [optional] Desired output data -type.

Return: nxn matrix with its main diagonal set to one, and all other elements zero.

Code # 1:

# Python program explaining
# numpy.matlib .identity () function

 
# import matrix library from numpy

import numpy as geek

import n umpy.matlib

 
# desired output 3 x 3 square matrix

out_mat = geek.matlib.identity ( 3

print ( "Output matrix:" , out_mat) 

Output:

 Output matrix: [[1. 0. 0.] [0 . 1. 0.] [0. 0. 1.]] 

Code # 2:

# Python program explaining
# numpy.matlib.identity () function

 
# import numpy library and matr ocy

import numpy as geek

import numpy.matlib

 
# desired output 5 x 5 square matrix

out_mat = geek.matlib.identity (n = 5 , dtype = int

print ( "Output matrix:" , out_mat) 

Output:

 Output matrix: [[1 0 0 0 0] [0 1 0 0 0] [0 0 1 0 0] [0 0 0 1 0] [0 0 0 0 1]] 



matrix operations | identity function (): StackOverflow Questions

Is there a builtin identity function in python?

I"d like to point to a function that does nothing:

def identity(*args)
    return args

my use case is something like this

try:
    gettext.find(...)
    ...
    _ = gettext.gettext
else:
    _ = identity

Of course, I could use the identity defined above, but a built-in would certainly run faster (and avoid bugs introduced by my own).

Apparently, map and filter use None for the identity, but this is specific to their implementations.

>>> _=None
>>> _("hello")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: "NoneType" object is not callable

Answer #1

As I mentioned to David Wolever, there"s more to this than meets the eye; both methods dispatch to is; you can prove this by doing

min(Timer("x == x", setup="x = "a" * 1000000").repeat(10, 10000))
#>>> 0.00045456900261342525

min(Timer("x == y", setup="x = "a" * 1000000; y = "a" * 1000000").repeat(10, 10000))
#>>> 0.5256857610074803

The first can only be so fast because it checks by identity.

To find out why one would take longer than the other, let"s trace through execution.

They both start in ceval.c, from COMPARE_OP since that is the bytecode involved

TARGET(COMPARE_OP) {
    PyObject *right = POP();
    PyObject *left = TOP();
    PyObject *res = cmp_outcome(oparg, left, right);
    Py_DECREF(left);
    Py_DECREF(right);
    SET_TOP(res);
    if (res == NULL)
        goto error;
    PREDICT(POP_JUMP_IF_FALSE);
    PREDICT(POP_JUMP_IF_TRUE);
    DISPATCH();
}

This pops the values from the stack (technically it only pops one)

PyObject *right = POP();
PyObject *left = TOP();

and runs the compare:

PyObject *res = cmp_outcome(oparg, left, right);

cmp_outcome is this:

static PyObject *
cmp_outcome(int op, PyObject *v, PyObject *w)
{
    int res = 0;
    switch (op) {
    case PyCmp_IS: ...
    case PyCmp_IS_NOT: ...
    case PyCmp_IN:
        res = PySequence_Contains(w, v);
        if (res < 0)
            return NULL;
        break;
    case PyCmp_NOT_IN: ...
    case PyCmp_EXC_MATCH: ...
    default:
        return PyObject_RichCompare(v, w, op);
    }
    v = res ? Py_True : Py_False;
    Py_INCREF(v);
    return v;
}

This is where the paths split. The PyCmp_IN branch does

int
PySequence_Contains(PyObject *seq, PyObject *ob)
{
    Py_ssize_t result;
    PySequenceMethods *sqm = seq->ob_type->tp_as_sequence;
    if (sqm != NULL && sqm->sq_contains != NULL)
        return (*sqm->sq_contains)(seq, ob);
    result = _PySequence_IterSearch(seq, ob, PY_ITERSEARCH_CONTAINS);
    return Py_SAFE_DOWNCAST(result, Py_ssize_t, int);
}

Note that a tuple is defined as

static PySequenceMethods tuple_as_sequence = {
    ...
    (objobjproc)tuplecontains,                  /* sq_contains */
};

PyTypeObject PyTuple_Type = {
    ...
    &tuple_as_sequence,                         /* tp_as_sequence */
    ...
};

So the branch

if (sqm != NULL && sqm->sq_contains != NULL)

will be taken and *sqm->sq_contains, which is the function (objobjproc)tuplecontains, will be taken.

This does

static int
tuplecontains(PyTupleObject *a, PyObject *el)
{
    Py_ssize_t i;
    int cmp;

    for (i = 0, cmp = 0 ; cmp == 0 && i < Py_SIZE(a); ++i)
        cmp = PyObject_RichCompareBool(el, PyTuple_GET_ITEM(a, i),
                                           Py_EQ);
    return cmp;
}

...Wait, wasn"t that PyObject_RichCompareBool what the other branch took? Nope, that was PyObject_RichCompare.

That code path was short so it likely just comes down to the speed of these two. Let"s compare.

int
PyObject_RichCompareBool(PyObject *v, PyObject *w, int op)
{
    PyObject *res;
    int ok;

    /* Quick result when objects are the same.
       Guarantees that identity implies equality. */
    if (v == w) {
        if (op == Py_EQ)
            return 1;
        else if (op == Py_NE)
            return 0;
    }

    ...
}

The code path in PyObject_RichCompareBool pretty much immediately terminates. For PyObject_RichCompare, it does

PyObject *
PyObject_RichCompare(PyObject *v, PyObject *w, int op)
{
    PyObject *res;

    assert(Py_LT <= op && op <= Py_GE);
    if (v == NULL || w == NULL) { ... }
    if (Py_EnterRecursiveCall(" in comparison"))
        return NULL;
    res = do_richcompare(v, w, op);
    Py_LeaveRecursiveCall();
    return res;
}

The Py_EnterRecursiveCall/Py_LeaveRecursiveCall combo are not taken in the previous path, but these are relatively quick macros that"ll short-circuit after incrementing and decrementing some globals.

do_richcompare does:

static PyObject *
do_richcompare(PyObject *v, PyObject *w, int op)
{
    richcmpfunc f;
    PyObject *res;
    int checked_reverse_op = 0;

    if (v->ob_type != w->ob_type && ...) { ... }
    if ((f = v->ob_type->tp_richcompare) != NULL) {
        res = (*f)(v, w, op);
        if (res != Py_NotImplemented)
            return res;
        ...
    }
    ...
}

This does some quick checks to call v->ob_type->tp_richcompare which is

PyTypeObject PyUnicode_Type = {
    ...
    PyUnicode_RichCompare,      /* tp_richcompare */
    ...
};

which does

PyObject *
PyUnicode_RichCompare(PyObject *left, PyObject *right, int op)
{
    int result;
    PyObject *v;

    if (!PyUnicode_Check(left) || !PyUnicode_Check(right))
        Py_RETURN_NOTIMPLEMENTED;

    if (PyUnicode_READY(left) == -1 ||
        PyUnicode_READY(right) == -1)
        return NULL;

    if (left == right) {
        switch (op) {
        case Py_EQ:
        case Py_LE:
        case Py_GE:
            /* a string is equal to itself */
            v = Py_True;
            break;
        case Py_NE:
        case Py_LT:
        case Py_GT:
            v = Py_False;
            break;
        default:
            ...
        }
    }
    else if (...) { ... }
    else { ...}
    Py_INCREF(v);
    return v;
}

Namely, this shortcuts on left == right... but only after doing

    if (!PyUnicode_Check(left) || !PyUnicode_Check(right))

    if (PyUnicode_READY(left) == -1 ||
        PyUnicode_READY(right) == -1)

All in all the paths then look something like this (manually recursively inlining, unrolling and pruning known branches)

POP()                           # Stack stuff
TOP()                           #
                                #
case PyCmp_IN:                  # Dispatch on operation
                                #
sqm != NULL                     # Dispatch to builtin op
sqm->sq_contains != NULL        #
*sqm->sq_contains               #
                                #
cmp == 0                        # Do comparison in loop
i < Py_SIZE(a)                  #
v == w                          #
op == Py_EQ                     #
++i                             # 
cmp == 0                        #
                                #
res < 0                         # Convert to Python-space
res ? Py_True : Py_False        #
Py_INCREF(v)                    #
                                #
Py_DECREF(left)                 # Stack stuff
Py_DECREF(right)                #
SET_TOP(res)                    #
res == NULL                     #
DISPATCH()                      #

vs

POP()                           # Stack stuff
TOP()                           #
                                #
default:                        # Dispatch on operation
                                #
Py_LT <= op                     # Checking operation
op <= Py_GE                     #
v == NULL                       #
w == NULL                       #
Py_EnterRecursiveCall(...)      # Recursive check
                                #
v->ob_type != w->ob_type        # More operation checks
f = v->ob_type->tp_richcompare  # Dispatch to builtin op
f != NULL                       #
                                #
!PyUnicode_Check(left)          # ...More checks
!PyUnicode_Check(right))        #
PyUnicode_READY(left) == -1     #
PyUnicode_READY(right) == -1    #
left == right                   # Finally, doing comparison
case Py_EQ:                     # Immediately short circuit
Py_INCREF(v);                   #
                                #
res != Py_NotImplemented        #
                                #
Py_LeaveRecursiveCall()         # Recursive check
                                #
Py_DECREF(left)                 # Stack stuff
Py_DECREF(right)                #
SET_TOP(res)                    #
res == NULL                     #
DISPATCH()                      #

Now, PyUnicode_Check and PyUnicode_READY are pretty cheap since they only check a couple of fields, but it should be obvious that the top one is a smaller code path, it has fewer function calls, only one switch statement and is just a bit thinner.

TL;DR:

Both dispatch to if (left_pointer == right_pointer); the difference is just how much work they do to get there. in just does less.

Answer #2

Using not a to test whether a is None assumes that the other possible values of a have a truth value of True. However, most NumPy arrays don"t have a truth value at all, and not cannot be applied to them.

If you want to test whether an object is None, the most general, reliable way is to literally use an is check against None:

if a is None:
    ...
else:
    ...

This doesn"t depend on objects having a truth value, so it works with NumPy arrays.

Note that the test has to be is, not ==. is is an object identity test. == is whatever the arguments say it is, and NumPy arrays say it"s a broadcasted elementwise equality comparison, producing a boolean array:

>>> a = numpy.arange(5)
>>> a == None
array([False, False, False, False, False])
>>> if a == None:
...     pass
...
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ValueError: The truth value of an array with more than one element is ambiguous.
 Use a.any() or a.all()

On the other side of things, if you want to test whether an object is a NumPy array, you can test its type:

# Careful - the type is np.ndarray, not np.array. np.array is a factory function.
if type(a) is np.ndarray:
    ...
else:
    ...

You can also use isinstance, which will also return True for subclasses of that type (if that is what you want). Considering how terrible and incompatible np.matrix is, you may not actually want this:

# Again, ndarray, not array, because array is a factory function.
if isinstance(a, np.ndarray):
    ...
else:
    ...    

Answer #3

None, Python"s null?

There"s no null in Python; instead there"s None. As stated already, the most accurate way to test that something has been given None as a value is to use the is identity operator, which tests that two variables refer to the same object.

>>> foo is None
True
>>> foo = "bar"
>>> foo is None
False

The basics

There is and can only be one None

None is the sole instance of the class NoneType and any further attempts at instantiating that class will return the same object, which makes None a singleton. Newcomers to Python often see error messages that mention NoneType and wonder what it is. It"s my personal opinion that these messages could simply just mention None by name because, as we"ll see shortly, None leaves little room to ambiguity. So if you see some TypeError message that mentions that NoneType can"t do this or can"t do that, just know that it"s simply the one None that was being used in a way that it can"t.

Also, None is a built-in constant. As soon as you start Python, it"s available to use from everywhere, whether in module, class, or function. NoneType by contrast is not, you"d need to get a reference to it first by querying None for its class.

>>> NoneType
NameError: name "NoneType" is not defined
>>> type(None)
NoneType

You can check None"s uniqueness with Python"s identity function id(). It returns the unique number assigned to an object, each object has one. If the id of two variables is the same, then they point in fact to the same object.

>>> NoneType = type(None)
>>> id(None)
10748000
>>> my_none = NoneType()
>>> id(my_none)
10748000
>>> another_none = NoneType()
>>> id(another_none)
10748000
>>> def function_that_does_nothing(): pass
>>> return_value = function_that_does_nothing()
>>> id(return_value)
10748000

None cannot be overwritten

In much older versions of Python (before 2.4) it was possible to reassign None, but not any more. Not even as a class attribute or in the confines of a function.

# In Python 2.7
>>> class SomeClass(object):
...     def my_fnc(self):
...             self.None = "foo"
SyntaxError: cannot assign to None
>>> def my_fnc():
        None = "foo"
SyntaxError: cannot assign to None

# In Python 3.5
>>> class SomeClass:
...     def my_fnc(self):
...             self.None = "foo"
SyntaxError: invalid syntax
>>> def my_fnc():
        None = "foo"
SyntaxError: cannot assign to keyword

It"s therefore safe to assume that all None references are the same. There isn"t any "custom" None.

To test for None use the is operator

When writing code you might be tempted to test for Noneness like this:

if value==None:
    pass

Or to test for falsehood like this

if not value:
    pass

You need to understand the implications and why it"s often a good idea to be explicit.

Case 1: testing if a value is None

Why do

value is None

rather than

value==None

?

The first is equivalent to:

id(value)==id(None)

Whereas the expression value==None is in fact applied like this

value.__eq__(None)

If the value really is None then you"ll get what you expected.

>>> nothing = function_that_does_nothing()
>>> nothing.__eq__(None)
True

In most common cases the outcome will be the same, but the __eq__() method opens a door that voids any guarantee of accuracy, since it can be overridden in a class to provide special behavior.

Consider this class.

>>> class Empty(object):
...     def __eq__(self, other):
...         return not other

So you try it on None and it works

>>> empty = Empty()
>>> empty==None
True

But then it also works on the empty string

>>> empty==""
True

And yet

>>> ""==None
False
>>> empty is None
False

Case 2: Using None as a boolean

The following two tests

if value:
    # Do something

if not value:
    # Do something

are in fact evaluated as

if bool(value):
    # Do something

if not bool(value):
    # Do something

None is a "falsey", meaning that if cast to a boolean it will return False and if applied the not operator it will return True. Note however that it"s not a property unique to None. In addition to False itself, the property is shared by empty lists, tuples, sets, dicts, strings, as well as 0, and all objects from classes that implement the __bool__() magic method to return False.

>>> bool(None)
False
>>> not None
True

>>> bool([])
False
>>> not []
True

>>> class MyFalsey(object):
...     def __bool__(self):
...         return False
>>> f = MyFalsey()
>>> bool(f)
False
>>> not f
True

So when testing for variables in the following way, be extra aware of what you"re including or excluding from the test:

def some_function(value=None):
    if not value:
        value = init_value()

In the above, did you mean to call init_value() when the value is set specifically to None, or did you mean that a value set to 0, or the empty string, or an empty list should also trigger the initialization? Like I said, be mindful. As it"s often the case, in Python explicit is better than implicit.

None in practice

None used as a signal value

None has a special status in Python. It"s a favorite baseline value because many algorithms treat it as an exceptional value. In such scenarios it can be used as a flag to signal that a condition requires some special handling (such as the setting of a default value).

You can assign None to the keyword arguments of a function and then explicitly test for it.

def my_function(value, param=None):
    if param is None:
        # Do something outrageous!

You can return it as the default when trying to get to an object"s attribute and then explicitly test for it before doing something special.

value = getattr(some_obj, "some_attribute", None)
if value is None:
    # do something spectacular!

By default a dictionary"s get() method returns None when trying to access a non-existing key:

>>> some_dict = {}
>>> value = some_dict.get("foo")
>>> value is None
True

If you were to try to access it by using the subscript notation a KeyError would be raised

>>> value = some_dict["foo"]
KeyError: "foo"

Likewise if you attempt to pop a non-existing item

>>> value = some_dict.pop("foo")
KeyError: "foo"

which you can suppress with a default value that is usually set to None

value = some_dict.pop("foo", None)
if value is None:
    # Booom!

None used as both a flag and valid value

The above described uses of None apply when it is not considered a valid value, but more like a signal to do something special. There are situations however where it sometimes matters to know where None came from because even though it"s used as a signal it could also be part of the data.

When you query an object for its attribute with getattr(some_obj, "attribute_name", None) getting back None doesn"t tell you if the attribute you were trying to access was set to None or if it was altogether absent from the object. The same situation when accessing a key from a dictionary, like some_dict.get("some_key"), you don"t know if some_dict["some_key"] is missing or if it"s just set to None. If you need that information, the usual way to handle this is to directly attempt accessing the attribute or key from within a try/except construct:

try:
    # Equivalent to getattr() without specifying a default
    # value = getattr(some_obj, "some_attribute")
    value = some_obj.some_attribute
    # Now you handle `None` the data here
    if value is None:
        # Do something here because the attribute was set to None
except AttributeError:
    # We"re now handling the exceptional situation from here.
    # We could assign None as a default value if required.
    value = None
    # In addition, since we now know that some_obj doesn"t have the
    # attribute "some_attribute" we could do something about that.
    log_something(some_obj)

Similarly with dict:

try:
    value = some_dict["some_key"]
    if value is None:
        # Do something here because "some_key" is set to None
except KeyError:
    # Set a default
    value = None
    # And do something because "some_key" was missing
    # from the dict.
    log_something(some_dict)

The above two examples show how to handle object and dictionary cases. What about functions? The same thing, but we use the double asterisks keyword argument to that end:

def my_function(**kwargs):
    try:
        value = kwargs["some_key"]
        if value is None:
            # Do something because "some_key" is explicitly
            # set to None
    except KeyError:
        # We assign the default
        value = None
        # And since it"s not coming from the caller.
        log_something("did not receive "some_key"")

None used only as a valid value

If you find that your code is littered with the above try/except pattern simply to differentiate between None flags and None data, then just use another test value. There"s a pattern where a value that falls outside the set of valid values is inserted as part of the data in a data structure and is used to control and test special conditions (e.g. boundaries, state, etc.). Such a value is called a sentinel and it can be used the way None is used as a signal. It"s trivial to create a sentinel in Python.

undefined = object()

The undefined object above is unique and doesn"t do much of anything that might be of interest to a program, it"s thus an excellent replacement for None as a flag. Some caveats apply, more about that after the code.

With function

def my_function(value, param1=undefined, param2=undefined):
    if param1 is undefined:
        # We know nothing was passed to it, not even None
        log_something("param1 was missing")
        param1 = None


    if param2 is undefined:
        # We got nothing here either
        log_something("param2 was missing")
        param2 = None

With dict

value = some_dict.get("some_key", undefined)
if value is None:
    log_something(""some_key" was set to None")

if value is undefined:
    # We know that the dict didn"t have "some_key"
    log_something(""some_key" was not set at all")
    value = None

With an object

value = getattr(obj, "some_attribute", undefined)
if value is None:
    log_something(""obj.some_attribute" was set to None")
if value is undefined:
    # We know that there"s no obj.some_attribute
    log_something("no "some_attribute" set on obj")
    value = None

As I mentioned earlier, custom sentinels come with some caveats. First, they"re not keywords like None, so Python doesn"t protect them. You can overwrite your undefined above at any time, anywhere in the module it"s defined, so be careful how you expose and use them. Next, the instance returned by object() is not a singleton. If you make that call 10 times you get 10 different objects. Finally, usage of a sentinel is highly idiosyncratic. A sentinel is specific to the library it"s used in and as such its scope should generally be limited to the library"s internals. It shouldn"t "leak" out. External code should only become aware of it, if their purpose is to extend or supplement the library"s API.

Answer #4

I think all of the answers here cover the core of what the lambda function does in the context of sorted() quite nicely, however I still feel like a description that leads to an intuitive understanding is lacking, so here is my two cents.

For the sake of completeness, I"ll state the obvious up front: sorted() returns a list of sorted elements and if we want to sort in a particular way or if we want to sort a complex list of elements (e.g. nested lists or a list of tuples) we can invoke the key argument.

For me, the intuitive understanding of the key argument, why it has to be callable, and the use of lambda as the (anonymous) callable function to accomplish this comes in two parts.

  1. Using lamba ultimately means you don"t have to write (define) an entire function, like the one sblom provided an example of. Lambda functions are created, used, and immediately destroyed - so they don"t funk up your code with more code that will only ever be used once. This, as I understand it, is the core utility of the lambda function and its application for such a role is broad. Its syntax is purely a convention, which is in essence the nature of programmatic syntax in general. Learn the syntax and be done with it.

Lambda syntax is as follows:

lambda input_variable(s): tasty one liner

where lambda is a python keyword.

e.g.

In [1]: f00 = lambda x: x/2

In [2]: f00(10)
Out[2]: 5.0

In [3]: (lambda x: x/2)(10)
Out[3]: 5.0

In [4]: (lambda x, y: x / y)(10, 2)
Out[4]: 5.0

In [5]: (lambda: "amazing lambda")() # func with no args!
Out[5]: "amazing lambda"
  1. The idea behind the key argument is that it should take in a set of instructions that will essentially point the "sorted()" function at those list elements which should be used to sort by. When it says key=, what it really means is: As I iterate through the list, one element at a time (i.e. for e in some_list), I"m going to pass the current element to the function specifed by the key argument and use that to create a transformed list which will inform me on the order of the final sorted list.

Check it out:

In [6]: mylist = [3, 6, 3, 2, 4, 8, 23]  # an example list
# sorted(mylist, key=HowToSort)  # what we will be doing

Base example:

# mylist = [3, 6, 3, 2, 4, 8, 23]
In [7]: sorted(mylist)
Out[7]: [2, 3, 3, 4, 6, 8, 23]  
# all numbers are in ascending order (i.e.from low to high).

Example 1:

# mylist = [3, 6, 3, 2, 4, 8, 23]
In [8]: sorted(mylist, key=lambda x: x % 2 == 0)

# Quick Tip: The % operator returns the *remainder* of a division
# operation. So the key lambda function here is saying "return True 
# if x divided by 2 leaves a remainer of 0, else False". This is a 
# typical way to check if a number is even or odd.

Out[8]: [3, 3, 23, 6, 2, 4, 8]  
# Does this sorted result make intuitive sense to you?

Notice that my lambda function told sorted to check if each element e was even or odd before sorting.

BUT WAIT! You may (or perhaps should) be wondering two things.

First, why are the odd numbers coming before the even numbers? After all, the key value seems to be telling the sorted function to prioritize evens by using the mod operator in x % 2 == 0.

Second, why are the even numbers still out of order? 2 comes before 6, right?

By analyzing this result, we"ll learn something deeper about how the "key" argument really works, especially in conjunction with the anonymous lambda function.

Firstly, you"ll notice that while the odds come before the evens, the evens themselves are not sorted. Why is this?? Lets read the docs:

Key Functions Starting with Python 2.4, both list.sort() and sorted() added a key parameter to specify a function to be called on each list element prior to making comparisons.

We have to do a little bit of reading between the lines here, but what this tells us is that the sort function is only called once, and if we specify the key argument, then we sort by the value that key function points us to.

So what does the example using a modulo return? A boolean value: True == 1, False == 0. So how does sorted deal with this key? It basically transforms the original list to a sequence of 1s and 0s.

[3, 6, 3, 2, 4, 8, 23] becomes [0, 1, 0, 1, 1, 1, 0]

Now we"re getting somewhere. What do you get when you sort the transformed list?

[0, 0, 0, 1, 1, 1, 1]

Okay, so now we know why the odds come before the evens. But the next question is: Why does the 6 still come before the 2 in my final list? Well that"s easy - it is because sorting only happens once! Those 1s still represent the original list values, which are in their original positions relative to each other. Since sorting only happens once, and we don"t call any kind of sort function to order the original even numbers from low to high, those values remain in their original order relative to one another.

The final question is then this: How do I think conceptually about how the order of my boolean values get transformed back in to the original values when I print out the final sorted list?

Sorted() is a built-in method that (fun fact) uses a hybrid sorting algorithm called Timsort that combines aspects of merge sort and insertion sort. It seems clear to me that when you call it, there is a mechanic that holds these values in memory and bundles them with their boolean identity (mask) determined by (...!) the lambda function. The order is determined by their boolean identity calculated from the lambda function, but keep in mind that these sublists (of one"s and zeros) are not themselves sorted by their original values. Hence, the final list, while organized by Odds and Evens, is not sorted by sublist (the evens in this case are out of order). The fact that the odds are ordered is because they were already in order by coincidence in the original list. The takeaway from all this is that when lambda does that transformation, the original order of the sublists are retained.

So how does this all relate back to the original question, and more importantly, our intuition on how we should implement sorted() with its key argument and lambda?

That lambda function can be thought of as a pointer that points to the values we need to sort by, whether its a pointer mapping a value to its boolean transformed by the lambda function, or if its a particular element in a nested list, tuple, dict, etc., again determined by the lambda function.

Lets try and predict what happens when I run the following code.

In [9]: mylist = [(3, 5, 8), (6, 2, 8), (2, 9, 4), (6, 8, 5)]
In[10]: sorted(mylist, key=lambda x: x[1])

My sorted call obviously says, "Please sort this list". The key argument makes that a little more specific by saying, "for each element x in mylist, return the second index of that element, then sort all of the elements of the original list mylist by the sorted order of the list calculated by the lambda function. Since we have a list of tuples, we can return an indexed element from that tuple using the lambda function.

The pointer that will be used to sort would be:

[5, 2, 9, 8] # the second element of each tuple

Sorting this pointer list returns:

[2, 5, 8, 9]

Applying this to mylist, we get:

Out[10]: [(6, 2, 8), (3, 5, 8), (6, 8, 5), (2, 9, 4)]
# Notice the sorted pointer list is the same as the second index of each tuple in this final list

Run that code, and you"ll find that this is the order. Try sorting a list of integers using this key function and you"ll find that the code breaks (why? Because you cannot index an integer of course).

This was a long winded explanation, but I hope this helps to sort your intuition on the use of lambda functions - as the key argument in sorted(), and beyond.

Answer #5

There are three factors at play here which, combined, produce this surprising behavior.

First: the in operator takes a shortcut and checks identity (x is y) before it checks equality (x == y):

>>> n = float("nan")
>>> n in (n, )
True
>>> n == n
False
>>> n is n
True

Second: because of Python"s string interning, both "x"s in "x" in ("x", ) will be identical:

>>> "x" is "x"
True

(big warning: this is implementation-specific behavior! is should never be used to compare strings because it will give surprising answers sometimes; for example "x" * 100 is "x" * 100 ==> False)

Third: as detailed in Veedrac"s fantastic answer, tuple.__contains__ (x in (y, ) is roughly equivalent to (y, ).__contains__(x)) gets to the point of performing the identity check faster than str.__eq__ (again, x == y is roughly equivalent to x.__eq__(y)) does.

You can see evidence for this because x in (y, ) is significantly slower than the logically equivalent, x == y:

In [18]: %timeit "x" in ("x", )
10000000 loops, best of 3: 65.2 ns per loop

In [19]: %timeit "x" == "x"    
10000000 loops, best of 3: 68 ns per loop

In [20]: %timeit "x" in ("y", ) 
10000000 loops, best of 3: 73.4 ns per loop

In [21]: %timeit "x" == "y"    
10000000 loops, best of 3: 56.2 ns per loop

The x in (y, ) case is slower because, after the is comparison fails, the in operator falls back to normal equality checking (i.e., using ==), so the comparison takes about the same amount of time as ==, rendering the entire operation slower because of the overhead of creating the tuple, walking its members, etc.

Note also that a in (b, ) is only faster when a is b:

In [48]: a = 1             

In [49]: b = 2

In [50]: %timeit a is a or a == a
10000000 loops, best of 3: 95.1 ns per loop

In [51]: %timeit a in (a, )      
10000000 loops, best of 3: 140 ns per loop

In [52]: %timeit a is b or a == b
10000000 loops, best of 3: 177 ns per loop

In [53]: %timeit a in (b, )      
10000000 loops, best of 3: 169 ns per loop

(why is a in (b, ) faster than a is b or a == b? My guess would be fewer virtual machine instructions — a in (b, ) is only ~3 instructions, where a is b or a == b will be quite a few more VM instructions)

Veedrac"s answer — https://stackoverflow.com/a/28889838/71522 — goes into much more detail on specifically what happens during each of == and in and is well worth the read.

Answer #6

This is a solution without any external pip dependencies, but works only in Python 3+ (Python 2 won"t work):

from urllib.parse import urlencode
from urllib.request import Request, urlopen

url = "https://httpbin.org/post" # Set destination URL here
post_fields = {"foo": "bar"}     # Set POST fields here

request = Request(url, urlencode(post_fields).encode())
json = urlopen(request).read().decode()
print(json)

Sample output:

{
  "args": {}, 
  "data": "", 
  "files": {}, 
  "form": {
    "foo": "bar"
  }, 
  "headers": {
    "Accept-Encoding": "identity", 
    "Content-Length": "7", 
    "Content-Type": "application/x-www-form-urlencoded", 
    "Host": "httpbin.org", 
    "User-Agent": "Python-urllib/3.3"
  }, 
  "json": null, 
  "origin": "127.0.0.1", 
  "url": "https://httpbin.org/post"
}

Answer #7

A one-line overview:

The behavior of execute() is same in all the cases, but they are 3 different methods, in Engine, Connection, and Session classes.

What exactly is execute():

To understand behavior of execute() we need to look into the Executable class. Executable is a superclass for all “statement” types of objects, including select(), delete(),update(), insert(), text() - in simplest words possible, an Executable is a SQL expression construct supported in SQLAlchemy.

In all the cases the execute() method takes the SQL text or constructed SQL expression i.e. any of the variety of SQL expression constructs supported in SQLAlchemy and returns query results (a ResultProxy - Wraps a DB-API cursor object to provide easier access to row columns.)


To clarify it further (only for conceptual clarification, not a recommended approach):

In addition to Engine.execute() (connectionless execution), Connection.execute(), and Session.execute(), it is also possible to use the execute() directly on any Executable construct. The Executable class has it"s own implementation of execute() - As per official documentation, one line description about what the execute() does is "Compile and execute this Executable". In this case we need to explicitly bind the Executable (SQL expression construct) with a Connection object or, Engine object (which implicitly get a Connection object), so the execute() will know where to execute the SQL.

The following example demonstrates it well - Given a table as below:

from sqlalchemy import MetaData, Table, Column, Integer

meta = MetaData()
users_table = Table("users", meta,
    Column("id", Integer, primary_key=True),
    Column("name", String(50)))

Explicit execution i.e. Connection.execute() - passing the SQL text or constructed SQL expression to the execute() method of Connection:

engine = create_engine("sqlite:///file.db")
connection = engine.connect()
result = connection.execute(users_table.select())
for row in result:
    # ....
connection.close()

Explicit connectionless execution i.e. Engine.execute() - passing the SQL text or constructed SQL expression directly to the execute() method of Engine:

engine = create_engine("sqlite:///file.db")
result = engine.execute(users_table.select())
for row in result:
    # ....
result.close()

Implicit execution i.e. Executable.execute() - is also connectionless, and calls the execute() method of the Executable, that is, it calls execute() method directly on the SQL expression construct (an instance of Executable) itself.

engine = create_engine("sqlite:///file.db")
meta.bind = engine
result = users_table.select().execute()
for row in result:
    # ....
result.close()

Note: Stated the implicit execution example for the purpose of clarification - this way of execution is highly not recommended - as per docs:

“implicit execution” is a very old usage pattern that in most cases is more confusing than it is helpful, and its usage is discouraged. Both patterns seem to encourage the overuse of expedient “short cuts” in application design which lead to problems later on.


Your questions:

As I understand if someone use engine.execute it creates connection, opens session (Alchemy cares about it for you) and executes query.

You"re right for the part "if someone use engine.execute it creates connection " but not for "opens session (Alchemy cares about it for you) and executes query " - Using Engine.execute() and Connection.execute() is (almost) one the same thing, in formal, Connection object gets created implicitly, and in later case we explicitly instantiate it. What really happens in this case is:

`Engine` object (instantiated via `create_engine()`) -> `Connection` object (instantiated via `engine_instance.connect()`) -> `connection.execute({*SQL expression*})`

But is there a global difference between these three ways of performing such task?

At DB layer it"s exactly the same thing, all of them are executing SQL (text expression or various SQL expression constructs). From application"s point of view there are two options:

  • Direct execution - Using Engine.execute() or Connection.execute()
  • Using sessions - efficiently handles transaction as single unit-of-work, with ease via session.add(), session.rollback(), session.commit(), session.close(). It is the way to interact with the DB in case of ORM i.e. mapped tables. Provides identity_map for instantly getting already accessed or newly created/added objects during a single request.

Session.execute() ultimately uses Connection.execute() statement execution method in order to execute the SQL statement. Using Session object is SQLAlchemy ORM"s recommended way for an application to interact with the database.

An excerpt from the docs:

Its important to note that when using the SQLAlchemy ORM, these objects are not generally accessed; instead, the Session object is used as the interface to the database. However, for applications that are built around direct usage of textual SQL statements and/or SQL expression constructs without involvement by the ORM’s higher level management services, the Engine and Connection are king (and queen?) - read on.

Answer #8

Python, should I implement __ne__() operator based on __eq__?

Short Answer: Don"t implement it, but if you must, use ==, not __eq__

In Python 3, != is the negation of == by default, so you are not even required to write a __ne__, and the documentation is no longer opinionated on writing one.

Generally speaking, for Python 3-only code, don"t write one unless you need to overshadow the parent implementation, e.g. for a builtin object.

That is, keep in mind Raymond Hettinger"s comment:

The __ne__ method follows automatically from __eq__ only if __ne__ isn"t already defined in a superclass. So, if you"re inheriting from a builtin, it"s best to override both.

If you need your code to work in Python 2, follow the recommendation for Python 2 and it will work in Python 3 just fine.

In Python 2, Python itself does not automatically implement any operation in terms of another - therefore, you should define the __ne__ in terms of == instead of the __eq__. E.G.

class A(object):
    def __eq__(self, other):
        return self.value == other.value

    def __ne__(self, other):
        return not self == other # NOT `return not self.__eq__(other)`

See proof that

  • implementing __ne__() operator based on __eq__ and
  • not implementing __ne__ in Python 2 at all

provides incorrect behavior in the demonstration below.

Long Answer

The documentation for Python 2 says:

There are no implied relationships among the comparison operators. The truth of x==y does not imply that x!=y is false. Accordingly, when defining __eq__(), one should also define __ne__() so that the operators will behave as expected.

So that means that if we define __ne__ in terms of the inverse of __eq__, we can get consistent behavior.

This section of the documentation has been updated for Python 3:

By default, __ne__() delegates to __eq__() and inverts the result unless it is NotImplemented.

and in the "what"s new" section, we see this behavior has changed:

  • != now returns the opposite of ==, unless == returns NotImplemented.

For implementing __ne__, we prefer to use the == operator instead of using the __eq__ method directly so that if self.__eq__(other) of a subclass returns NotImplemented for the type checked, Python will appropriately check other.__eq__(self) From the documentation:

The NotImplemented object

This type has a single value. There is a single object with this value. This object is accessed through the built-in name NotImplemented. Numeric methods and rich comparison methods may return this value if they do not implement the operation for the operands provided. (The interpreter will then try the reflected operation, or some other fallback, depending on the operator.) Its truth value is true.

When given a rich comparison operator, if they"re not the same type, Python checks if the other is a subtype, and if it has that operator defined, it uses the other"s method first (inverse for <, <=, >= and >). If NotImplemented is returned, then it uses the opposite"s method. (It does not check for the same method twice.) Using the == operator allows for this logic to take place.


Expectations

Semantically, you should implement __ne__ in terms of the check for equality because users of your class will expect the following functions to be equivalent for all instances of A.:

def negation_of_equals(inst1, inst2):
    """always should return same as not_equals(inst1, inst2)"""
    return not inst1 == inst2

def not_equals(inst1, inst2):
    """always should return same as negation_of_equals(inst1, inst2)"""
    return inst1 != inst2

That is, both of the above functions should always return the same result. But this is dependent on the programmer.

Demonstration of unexpected behavior when defining __ne__ based on __eq__:

First the setup:

class BaseEquatable(object):
    def __init__(self, x):
        self.x = x
    def __eq__(self, other):
        return isinstance(other, BaseEquatable) and self.x == other.x

class ComparableWrong(BaseEquatable):
    def __ne__(self, other):
        return not self.__eq__(other)

class ComparableRight(BaseEquatable):
    def __ne__(self, other):
        return not self == other

class EqMixin(object):
    def __eq__(self, other):
        """override Base __eq__ & bounce to other for __eq__, e.g. 
        if issubclass(type(self), type(other)): # True in this example
        """
        return NotImplemented

class ChildComparableWrong(EqMixin, ComparableWrong):
    """__ne__ the wrong way (__eq__ directly)"""

class ChildComparableRight(EqMixin, ComparableRight):
    """__ne__ the right way (uses ==)"""

class ChildComparablePy3(EqMixin, BaseEquatable):
    """No __ne__, only right in Python 3."""

Instantiate non-equivalent instances:

right1, right2 = ComparableRight(1), ChildComparableRight(2)
wrong1, wrong2 = ComparableWrong(1), ChildComparableWrong(2)
right_py3_1, right_py3_2 = BaseEquatable(1), ChildComparablePy3(2)

Expected Behavior:

(Note: while every second assertion of each of the below is equivalent and therefore logically redundant to the one before it, I"m including them to demonstrate that order does not matter when one is a subclass of the other.)

These instances have __ne__ implemented with ==:

assert not right1 == right2
assert not right2 == right1
assert right1 != right2
assert right2 != right1

These instances, testing under Python 3, also work correctly:

assert not right_py3_1 == right_py3_2
assert not right_py3_2 == right_py3_1
assert right_py3_1 != right_py3_2
assert right_py3_2 != right_py3_1

And recall that these have __ne__ implemented with __eq__ - while this is the expected behavior, the implementation is incorrect:

assert not wrong1 == wrong2         # These are contradicted by the
assert not wrong2 == wrong1         # below unexpected behavior!

Unexpected Behavior:

Note that this comparison contradicts the comparisons above (not wrong1 == wrong2).

>>> assert wrong1 != wrong2
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AssertionError

and,

>>> assert wrong2 != wrong1
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AssertionError

Don"t skip __ne__ in Python 2

For evidence that you should not skip implementing __ne__ in Python 2, see these equivalent objects:

>>> right_py3_1, right_py3_1child = BaseEquatable(1), ChildComparablePy3(1)
>>> right_py3_1 != right_py3_1child # as evaluated in Python 2!
True

The above result should be False!

Python 3 source

The default CPython implementation for __ne__ is in typeobject.c in object_richcompare:

case Py_NE:
    /* By default, __ne__() delegates to __eq__() and inverts the result,
       unless the latter returns NotImplemented. */
    if (Py_TYPE(self)->tp_richcompare == NULL) {
        res = Py_NotImplemented;
        Py_INCREF(res);
        break;
    }
    res = (*Py_TYPE(self)->tp_richcompare)(self, other, Py_EQ);
    if (res != NULL && res != Py_NotImplemented) {
        int ok = PyObject_IsTrue(res);
        Py_DECREF(res);
        if (ok < 0)
            res = NULL;
        else {
            if (ok)
                res = Py_False;
            else
                res = Py_True;
            Py_INCREF(res);
        }
    }
    break;

But the default __ne__ uses __eq__?

Python 3"s default __ne__ implementation detail at the C level uses __eq__ because the higher level == (PyObject_RichCompare) would be less efficient - and therefore it must also handle NotImplemented.

If __eq__ is correctly implemented, then the negation of == is also correct - and it allows us to avoid low level implementation details in our __ne__.

Using == allows us to keep our low level logic in one place, and avoid addressing NotImplemented in __ne__.

One might incorrectly assume that == may return NotImplemented.

It actually uses the same logic as the default implementation of __eq__, which checks for identity (see do_richcompare and our evidence below)

class Foo:
    def __ne__(self, other):
        return NotImplemented
    __eq__ = __ne__

f = Foo()
f2 = Foo()

And the comparisons:

>>> f == f
True
>>> f != f
False
>>> f2 == f
False
>>> f2 != f
True

Performance

Don"t take my word for it, let"s see what"s more performant:

class CLevel:
    "Use default logic programmed in C"

class HighLevelPython:
    def __ne__(self, other):
        return not self == other

class LowLevelPython:
    def __ne__(self, other):
        equal = self.__eq__(other)
        if equal is NotImplemented:
            return NotImplemented
        return not equal

def c_level():
    cl = CLevel()
    return lambda: cl != cl

def high_level_python():
    hlp = HighLevelPython()
    return lambda: hlp != hlp

def low_level_python():
    llp = LowLevelPython()
    return lambda: llp != llp

I think these performance numbers speak for themselves:

>>> import timeit
>>> min(timeit.repeat(c_level()))
0.09377292497083545
>>> min(timeit.repeat(high_level_python()))
0.2654011140111834
>>> min(timeit.repeat(low_level_python()))
0.3378178110579029

This makes sense when you consider that low_level_python is doing logic in Python that would otherwise be handled on the C level.

Response to some critics

Another answerer writes:

Aaron Hall’s implementation not self == other of the __ne__ method is incorrect as it can never return NotImplemented (not NotImplemented is False) and therefore the __ne__ method that has priority can never fall back on the __ne__ method that does not have priority.

Having __ne__ never return NotImplemented does not make it incorrect. Instead, we handle prioritization with NotImplemented via the check for equality with ==. Assuming == is correctly implemented, we"re done.

not self == other used to be the default Python 3 implementation of the __ne__ method but it was a bug and it was corrected in Python 3.4 on January 2015, as ShadowRanger noticed (see issue #21408).

Well, let"s explain this.

As noted earlier, Python 3 by default handles __ne__ by first checking if self.__eq__(other) returns NotImplemented (a singleton) - which should be checked for with is and returned if so, else it should return the inverse. Here is that logic written as a class mixin:

class CStyle__ne__:
    """Mixin that provides __ne__ functionality equivalent to 
    the builtin functionality
    """
    def __ne__(self, other):
        equal = self.__eq__(other)
        if equal is NotImplemented:
            return NotImplemented
        return not equal

This is necessary for correctness for C level Python API, and it was introduced in Python 3, making

redundant. All relevant __ne__ methods were removed, including ones implementing their own check as well as ones that delegate to __eq__ directly or via == - and == was the most common way of doing so.

Is Symmetry Important?

Our persistent critic provides a pathological example to make the case for handling NotImplemented in __ne__, valuing symmetry above all else. Let"s steel-man the argument with a clear example:

class B:
    """
    this class has no __eq__ implementation, but asserts 
    any instance is not equal to any other object
    """
    def __ne__(self, other):
        return True

class A:
    "This class asserts instances are equivalent to all other objects"
    def __eq__(self, other):
        return True

>>> A() == B(), B() == A(), A() != B(), B() != A()
(True, True, False, True)

So, by this logic, in order to maintain symmetry, we need to write the complicated __ne__, regardless of Python version.

class B:
    def __ne__(self, other):
        return True

class A:
    def __eq__(self, other):
        return True
    def __ne__(self, other):
        result = other.__eq__(self)
        if result is NotImplemented:
            return NotImplemented
        return not result

>>> A() == B(), B() == A(), A() != B(), B() != A()
(True, True, True, True)

Apparently we should give no mind that these instances are both equal and not equal.

I propose that symmetry is less important than the presumption of sensible code and following the advice of the documentation.

However, if A had a sensible implementation of __eq__, then we could still follow my direction here and we would still have symmetry:

class B:
    def __ne__(self, other):
        return True

class A:
    def __eq__(self, other):
        return False         # <- this boolean changed... 

>>> A() == B(), B() == A(), A() != B(), B() != A()
(False, False, True, True)

Conclusion

For Python 2 compatible code, use == to implement __ne__. It is more:

  • correct
  • simple
  • performant

In Python 3 only, use the low-level negation on the C level - it is even more simple and performant (though the programmer is responsible for determining that it is correct).

Again, do not write low-level logic in high level Python.

Answer #9

Talking about async/await and asyncio is not the same thing. The first is a fundamental, low-level construct (coroutines) while the later is a library using these constructs. Conversely, there is no single ultimate answer.

The following is a general description of how async/await and asyncio-like libraries work. That is, there may be other tricks on top (there are...) but they are inconsequential unless you build them yourself. The difference should be negligible unless you already know enough to not have to ask such a question.

1. Coroutines versus subroutines in a nut shell

Just like subroutines (functions, procedures, ...), coroutines (generators, ...) are an abstraction of call stack and instruction pointer: there is a stack of executing code pieces, and each is at a specific instruction.

The distinction of def versus async def is merely for clarity. The actual difference is return versus yield. From this, await or yield from take the difference from individual calls to entire stacks.

1.1. Subroutines

A subroutine represents a new stack level to hold local variables, and a single traversal of its instructions to reach an end. Consider a subroutine like this:

def subfoo(bar):
     qux = 3
     return qux * bar

When you run it, that means

  1. allocate stack space for bar and qux
  2. recursively execute the first statement and jump to the next statement
  3. once at a return, push its value to the calling stack
  4. clear the stack (1.) and instruction pointer (2.)

Notably, 4. means that a subroutine always starts at the same state. Everything exclusive to the function itself is lost upon completion. A function cannot be resumed, even if there are instructions after return.

root -
  :    - subfoo --
  :/--<---return --/
  |
  V

1.2. Coroutines as persistent subroutines

A coroutine is like a subroutine, but can exit without destroying its state. Consider a coroutine like this:

 def cofoo(bar):
      qux = yield bar  # yield marks a break point
      return qux

When you run it, that means

  1. allocate stack space for bar and qux
  2. recursively execute the first statement and jump to the next statement
    1. once at a yield, push its value to the calling stack but store the stack and instruction pointer
    2. once calling into yield, restore stack and instruction pointer and push arguments to qux
  3. once at a return, push its value to the calling stack
  4. clear the stack (1.) and instruction pointer (2.)

Note the addition of 2.1 and 2.2 - a coroutine can be suspended and resumed at predefined points. This is similar to how a subroutine is suspended during calling another subroutine. The difference is that the active coroutine is not strictly bound to its calling stack. Instead, a suspended coroutine is part of a separate, isolated stack.

root -
  :    - cofoo --
  :/--<+--yield --/
  |    :
  V    :

This means that suspended coroutines can be freely stored or moved between stacks. Any call stack that has access to a coroutine can decide to resume it.

1.3. Traversing the call stack

So far, our coroutine only goes down the call stack with yield. A subroutine can go down and up the call stack with return and (). For completeness, coroutines also need a mechanism to go up the call stack. Consider a coroutine like this:

def wrap():
    yield "before"
    yield from cofoo()
    yield "after"

When you run it, that means it still allocates the stack and instruction pointer like a subroutine. When it suspends, that still is like storing a subroutine.

However, yield from does both. It suspends stack and instruction pointer of wrap and runs cofoo. Note that wrap stays suspended until cofoo finishes completely. Whenever cofoo suspends or something is sent, cofoo is directly connected to the calling stack.

1.4. Coroutines all the way down

As established, yield from allows to connect two scopes across another intermediate one. When applied recursively, that means the top of the stack can be connected to the bottom of the stack.

root -
  :    -> coro_a -yield-from-> coro_b --
  :/ <-+------------------------yield ---/
  |    :
  : --+-- coro_a.send----------yield ---
  :                             coro_b <-/

Note that root and coro_b do not know about each other. This makes coroutines much cleaner than callbacks: coroutines still built on a 1:1 relation like subroutines. Coroutines suspend and resume their entire existing execution stack up until a regular call point.

Notably, root could have an arbitrary number of coroutines to resume. Yet, it can never resume more than one at the same time. Coroutines of the same root are concurrent but not parallel!

1.5. Python"s async and await

The explanation has so far explicitly used the yield and yield from vocabulary of generators - the underlying functionality is the same. The new Python3.5 syntax async and await exists mainly for clarity.

def foo():  # subroutine?
     return None

def foo():  # coroutine?
     yield from foofoo()  # generator? coroutine?

async def foo():  # coroutine!
     await foofoo()  # coroutine!
     return None

The async for and async with statements are needed because you would break the yield from/await chain with the bare for and with statements.

2. Anatomy of a simple event loop

By itself, a coroutine has no concept of yielding control to another coroutine. It can only yield control to the caller at the bottom of a coroutine stack. This caller can then switch to another coroutine and run it.

This root node of several coroutines is commonly an event loop: on suspension, a coroutine yields an event on which it wants resume. In turn, the event loop is capable of efficiently waiting for these events to occur. This allows it to decide which coroutine to run next, or how to wait before resuming.

Such a design implies that there is a set of pre-defined events that the loop understands. Several coroutines await each other, until finally an event is awaited. This event can communicate directly with the event loop by yielding control.

loop -
  :    -> coroutine --await--> event --
  :/ <-+----------------------- yield --/
  |    :
  |    :  # loop waits for event to happen
  |    :
  : --+-- send(reply) -------- yield --
  :        coroutine <--yield-- event <-/

The key is that coroutine suspension allows the event loop and events to directly communicate. The intermediate coroutine stack does not require any knowledge about which loop is running it, nor how events work.

2.1.1. Events in time

The simplest event to handle is reaching a point in time. This is a fundamental block of threaded code as well: a thread repeatedly sleeps until a condition is true. However, a regular sleep blocks execution by itself - we want other coroutines to not be blocked. Instead, we want tell the event loop when it should resume the current coroutine stack.

2.1.2. Defining an Event

An event is simply a value we can identify - be it via an enum, a type or other identity. We can define this with a simple class that stores our target time. In addition to storing the event information, we can allow to await a class directly.

class AsyncSleep:
    """Event to sleep until a point in time"""
    def __init__(self, until: float):
        self.until = until

    # used whenever someone ``await``s an instance of this Event
    def __await__(self):
        # yield this Event to the loop
        yield self
    
    def __repr__(self):
        return "%s(until=%.1f)" % (self.__class__.__name__, self.until)

This class only stores the event - it does not say how to actually handle it.

The only special feature is __await__ - it is what the await keyword looks for. Practically, it is an iterator but not available for the regular iteration machinery.

2.2.1. Awaiting an event

Now that we have an event, how do coroutines react to it? We should be able to express the equivalent of sleep by awaiting our event. To better see what is going on, we wait twice for half the time:

import time

async def asleep(duration: float):
    """await that ``duration`` seconds pass"""
    await AsyncSleep(time.time() + duration / 2)
    await AsyncSleep(time.time() + duration / 2)

We can directly instantiate and run this coroutine. Similar to a generator, using coroutine.send runs the coroutine until it yields a result.

coroutine = asleep(100)
while True:
    print(coroutine.send(None))
    time.sleep(0.1)

This gives us two AsyncSleep events and then a StopIteration when the coroutine is done. Notice that the only delay is from time.sleep in the loop! Each AsyncSleep only stores an offset from the current time.

2.2.2. Event + Sleep

At this point, we have two separate mechanisms at our disposal:

  • AsyncSleep Events that can be yielded from inside a coroutine
  • time.sleep that can wait without impacting coroutines

Notably, these two are orthogonal: neither one affects or triggers the other. As a result, we can come up with our own strategy to sleep to meet the delay of an AsyncSleep.

2.3. A naive event loop

If we have several coroutines, each can tell us when it wants to be woken up. We can then wait until the first of them wants to be resumed, then for the one after, and so on. Notably, at each point we only care about which one is next.

This makes for a straightforward scheduling:

  1. sort coroutines by their desired wake up time
  2. pick the first that wants to wake up
  3. wait until this point in time
  4. run this coroutine
  5. repeat from 1.

A trivial implementation does not need any advanced concepts. A list allows to sort coroutines by date. Waiting is a regular time.sleep. Running coroutines works just like before with coroutine.send.

def run(*coroutines):
    """Cooperatively run all ``coroutines`` until completion"""
    # store wake-up-time and coroutines
    waiting = [(0, coroutine) for coroutine in coroutines]
    while waiting:
        # 2. pick the first coroutine that wants to wake up
        until, coroutine = waiting.pop(0)
        # 3. wait until this point in time
        time.sleep(max(0.0, until - time.time()))
        # 4. run this coroutine
        try:
            command = coroutine.send(None)
        except StopIteration:
            continue
        # 1. sort coroutines by their desired suspension
        if isinstance(command, AsyncSleep):
            waiting.append((command.until, coroutine))
            waiting.sort(key=lambda item: item[0])

Of course, this has ample room for improvement. We can use a heap for the wait queue or a dispatch table for events. We could also fetch return values from the StopIteration and assign them to the coroutine. However, the fundamental principle remains the same.

2.4. Cooperative Waiting

The AsyncSleep event and run event loop are a fully working implementation of timed events.

async def sleepy(identifier: str = "coroutine", count=5):
    for i in range(count):
        print(identifier, "step", i + 1, "at %.2f" % time.time())
        await asleep(0.1)

run(*(sleepy("coroutine %d" % j) for j in range(5)))

This cooperatively switches between each of the five coroutines, suspending each for 0.1 seconds. Even though the event loop is synchronous, it still executes the work in 0.5 seconds instead of 2.5 seconds. Each coroutine holds state and acts independently.

3. I/O event loop

An event loop that supports sleep is suitable for polling. However, waiting for I/O on a file handle can be done more efficiently: the operating system implements I/O and thus knows which handles are ready. Ideally, an event loop should support an explicit "ready for I/O" event.

3.1. The select call

Python already has an interface to query the OS for read I/O handles. When called with handles to read or write, it returns the handles ready to read or write:

readable, writeable, _ = select.select(rlist, wlist, xlist, timeout)

For example, we can open a file for writing and wait for it to be ready:

write_target = open("/tmp/foo")
readable, writeable, _ = select.select([], [write_target], [])

Once select returns, writeable contains our open file.

3.2. Basic I/O event

Similar to the AsyncSleep request, we need to define an event for I/O. With the underlying select logic, the event must refer to a readable object - say an open file. In addition, we store how much data to read.

class AsyncRead:
    def __init__(self, file, amount=1):
        self.file = file
        self.amount = amount
        self._buffer = ""

    def __await__(self):
        while len(self._buffer) < self.amount:
            yield self
            # we only get here if ``read`` should not block
            self._buffer += self.file.read(1)
        return self._buffer

    def __repr__(self):
        return "%s(file=%s, amount=%d, progress=%d)" % (
            self.__class__.__name__, self.file, self.amount, len(self._buffer)
        )

As with AsyncSleep we mostly just store the data required for the underlying system call. This time, __await__ is capable of being resumed multiple times - until our desired amount has been read. In addition, we return the I/O result instead of just resuming.

3.3. Augmenting an event loop with read I/O

The basis for our event loop is still the run defined previously. First, we need to track the read requests. This is no longer a sorted schedule, we only map read requests to coroutines.

# new
waiting_read = {}  # type: Dict[file, coroutine]

Since select.select takes a timeout parameter, we can use it in place of time.sleep.

# old
time.sleep(max(0.0, until - time.time()))
# new
readable, _, _ = select.select(list(reads), [], [])

This gives us all readable files - if there are any, we run the corresponding coroutine. If there are none, we have waited long enough for our current coroutine to run.

# new - reschedule waiting coroutine, run readable coroutine
if readable:
    waiting.append((until, coroutine))
    waiting.sort()
    coroutine = waiting_read[readable[0]]

Finally, we have to actually listen for read requests.

# new
if isinstance(command, AsyncSleep):
    ...
elif isinstance(command, AsyncRead):
    ...

3.4. Putting it together

The above was a bit of a simplification. We need to do some switching to not starve sleeping coroutines if we can always read. We need to handle having nothing to read or nothing to wait for. However, the end result still fits into 30 LOC.

def run(*coroutines):
    """Cooperatively run all ``coroutines`` until completion"""
    waiting_read = {}  # type: Dict[file, coroutine]
    waiting = [(0, coroutine) for coroutine in coroutines]
    while waiting or waiting_read:
        # 2. wait until the next coroutine may run or read ...
        try:
            until, coroutine = waiting.pop(0)
        except IndexError:
            until, coroutine = float("inf"), None
            readable, _, _ = select.select(list(waiting_read), [], [])
        else:
            readable, _, _ = select.select(list(waiting_read), [], [], max(0.0, until - time.time()))
        # ... and select the appropriate one
        if readable and time.time() < until:
            if until and coroutine:
                waiting.append((until, coroutine))
                waiting.sort()
            coroutine = waiting_read.pop(readable[0])
        # 3. run this coroutine
        try:
            command = coroutine.send(None)
        except StopIteration:
            continue
        # 1. sort coroutines by their desired suspension ...
        if isinstance(command, AsyncSleep):
            waiting.append((command.until, coroutine))
            waiting.sort(key=lambda item: item[0])
        # ... or register reads
        elif isinstance(command, AsyncRead):
            waiting_read[command.file] = coroutine

3.5. Cooperative I/O

The AsyncSleep, AsyncRead and run implementations are now fully functional to sleep and/or read. Same as for sleepy, we can define a helper to test reading:

async def ready(path, amount=1024*32):
    print("read", path, "at", "%d" % time.time())
    with open(path, "rb") as file:
        result = await AsyncRead(file, amount)
    print("done", path, "at", "%d" % time.time())
    print("got", len(result), "B")

run(sleepy("background", 5), ready("/dev/urandom"))

Running this, we can see that our I/O is interleaved with the waiting task:

id background round 1
read /dev/urandom at 1530721148
id background round 2
id background round 3
id background round 4
id background round 5
done /dev/urandom at 1530721148
got 1024 B

4. Non-Blocking I/O

While I/O on files gets the concept across, it is not really suitable for a library like asyncio: the select call always returns for files, and both open and read may block indefinitely. This blocks all coroutines of an event loop - which is bad. Libraries like aiofiles use threads and synchronization to fake non-blocking I/O and events on file.

However, sockets do allow for non-blocking I/O - and their inherent latency makes it much more critical. When used in an event loop, waiting for data and retrying can be wrapped without blocking anything.

4.1. Non-Blocking I/O event

Similar to our AsyncRead, we can define a suspend-and-read event for sockets. Instead of taking a file, we take a socket - which must be non-blocking. Also, our __await__ uses socket.recv instead of file.read.

class AsyncRecv:
    def __init__(self, connection, amount=1, read_buffer=1024):
        assert not connection.getblocking(), "connection must be non-blocking for async recv"
        self.connection = connection
        self.amount = amount
        self.read_buffer = read_buffer
        self._buffer = b""

    def __await__(self):
        while len(self._buffer) < self.amount:
            try:
                self._buffer += self.connection.recv(self.read_buffer)
            except BlockingIOError:
                yield self
        return self._buffer

    def __repr__(self):
        return "%s(file=%s, amount=%d, progress=%d)" % (
            self.__class__.__name__, self.connection, self.amount, len(self._buffer)
        )

In contrast to AsyncRead, __await__ performs truly non-blocking I/O. When data is available, it always reads. When no data is available, it always suspends. That means the event loop is only blocked while we perform useful work.

4.2. Un-Blocking the event loop

As far as the event loop is concerned, nothing changes much. The event to listen for is still the same as for files - a file descriptor marked ready by select.

# old
elif isinstance(command, AsyncRead):
    waiting_read[command.file] = coroutine
# new
elif isinstance(command, AsyncRead):
    waiting_read[command.file] = coroutine
elif isinstance(command, AsyncRecv):
    waiting_read[command.connection] = coroutine

At this point, it should be obvious that AsyncRead and AsyncRecv are the same kind of event. We could easily refactor them to be one event with an exchangeable I/O component. In effect, the event loop, coroutines and events cleanly separate a scheduler, arbitrary intermediate code and the actual I/O.

4.3. The ugly side of non-blocking I/O

In principle, what you should do at this point is replicate the logic of read as a recv for AsyncRecv. However, this is much more ugly now - you have to handle early returns when functions block inside the kernel, but yield control to you. For example, opening a connection versus opening a file is much longer:

# file
file = open(path, "rb")
# non-blocking socket
connection = socket.socket()
connection.setblocking(False)
# open without blocking - retry on failure
try:
    connection.connect((url, port))
except BlockingIOError:
    pass

Long story short, what remains is a few dozen lines of Exception handling. The events and event loop already work at this point.

id background round 1
read localhost:25000 at 1530783569
read /dev/urandom at 1530783569
done localhost:25000 at 1530783569 got 32768 B
id background round 2
id background round 3
id background round 4
done /dev/urandom at 1530783569 got 4096 B
id background round 5

Addendum

Example code at github

Answer #10

Python"s “is” operator behaves unexpectedly with integers?

In summary - let me emphasize: Do not use is to compare integers.

This isn"t behavior you should have any expectations about.

Instead, use == and != to compare for equality and inequality, respectively. For example:

>>> a = 1000
>>> a == 1000       # Test integers like this,
True
>>> a != 5000       # or this!
True
>>> a is 1000       # Don"t do this! - Don"t use `is` to test integers!!
False

Explanation

To know this, you need to know the following.

First, what does is do? It is a comparison operator. From the documentation:

The operators is and is not test for object identity: x is y is true if and only if x and y are the same object. x is not y yields the inverse truth value.

And so the following are equivalent.

>>> a is b
>>> id(a) == id(b)

From the documentation:

id Return the “identity” of an object. This is an integer (or long integer) which is guaranteed to be unique and constant for this object during its lifetime. Two objects with non-overlapping lifetimes may have the same id() value.

Note that the fact that the id of an object in CPython (the reference implementation of Python) is the location in memory is an implementation detail. Other implementations of Python (such as Jython or IronPython) could easily have a different implementation for id.

So what is the use-case for is? PEP8 describes:

Comparisons to singletons like None should always be done with is or is not, never the equality operators.

The Question

You ask, and state, the following question (with code):

Why does the following behave unexpectedly in Python?

>>> a = 256
>>> b = 256
>>> a is b
True           # This is an expected result

It is not an expected result. Why is it expected? It only means that the integers valued at 256 referenced by both a and b are the same instance of integer. Integers are immutable in Python, thus they cannot change. This should have no impact on any code. It should not be expected. It is merely an implementation detail.

But perhaps we should be glad that there is not a new separate instance in memory every time we state a value equals 256.

>>> a = 257
>>> b = 257
>>> a is b
False          # What happened here? Why is this False?

Looks like we now have two separate instances of integers with the value of 257 in memory. Since integers are immutable, this wastes memory. Let"s hope we"re not wasting a lot of it. We"re probably not. But this behavior is not guaranteed.

>>> 257 is 257
True           # Yet the literal numbers compare properly

Well, this looks like your particular implementation of Python is trying to be smart and not creating redundantly valued integers in memory unless it has to. You seem to indicate you are using the referent implementation of Python, which is CPython. Good for CPython.

It might be even better if CPython could do this globally, if it could do so cheaply (as there would a cost in the lookup), perhaps another implementation might.

But as for impact on code, you should not care if an integer is a particular instance of an integer. You should only care what the value of that instance is, and you would use the normal comparison operators for that, i.e. ==.

What is does

is checks that the id of two objects are the same. In CPython, the id is the location in memory, but it could be some other uniquely identifying number in another implementation. To restate this with code:

>>> a is b

is the same as

>>> id(a) == id(b)

Why would we want to use is then?

This can be a very fast check relative to say, checking if two very long strings are equal in value. But since it applies to the uniqueness of the object, we thus have limited use-cases for it. In fact, we mostly want to use it to check for None, which is a singleton (a sole instance existing in one place in memory). We might create other singletons if there is potential to conflate them, which we might check with is, but these are relatively rare. Here"s an example (will work in Python 2 and 3) e.g.

SENTINEL_SINGLETON = object() # this will only be created one time.

def foo(keyword_argument=None):
    if keyword_argument is None:
        print("no argument given to foo")
    bar()
    bar(keyword_argument)
    bar("baz")

def bar(keyword_argument=SENTINEL_SINGLETON):
    # SENTINEL_SINGLETON tells us if we were not passed anything
    # as None is a legitimate potential argument we could get.
    if keyword_argument is SENTINEL_SINGLETON:
        print("no argument given to bar")
    else:
        print("argument to bar: {0}".format(keyword_argument))

foo()

Which prints:

no argument given to foo
no argument given to bar
argument to bar: None
argument to bar: baz

And so we see, with is and a sentinel, we are able to differentiate between when bar is called with no arguments and when it is called with None. These are the primary use-cases for is - do not use it to test for equality of integers, strings, tuples, or other things like these.

Get Solution for free from DataCamp guru