Dictionary Methods in Python | Set to 1 (cmp (), len (), items () …)

cmp | File handling | Python Methods and Functions


Some dictionary methods are discussed in this article.

1. str (dic) : — This method is used to return a string denoting all the dictionary keys with their values.

2. items () : — This method is used to return a list with all dictionary keys with values.

# Python code to demonstrate how it works
# str () and items ()

# Initializing the dictionary

dic = { ' Name' : 'Nandini' , 'Age' : 19 }

# using str () to display dic as a string

print ( "The constituents of dictionary as string are:" )

print ( str (dic))

# using str () to display dic as a list

print ( "The constituents of dictionary as list are:" )

print (dic.items ())


 The constituents of dictionary as string are: {'Name':' Nandini', 'Age': 19} The constituents of dictionary as list are: dict_items ([(' Name', 'Nandini'), (' Age', 19)]) 

3. len () : — returns number of key objects dictionary elements.

4. type () : — This function returns the data type of the argument.

# Python code to demonstrate how it works
# len () and type ()

# Dictionary initialization

dic = { 'Name' : ' Nandini' , ' Age' : 19 , 'ID' : 2541997 }

# Initializing list

li = [ 1 , 3 , 5 , 6 ]

# using len () to display the dic size

print ( "The size of dic is:" , end = "")

print ( len (dic))

# using type () to display the data type

print (  "The data type of dic is:" , end = "")

print ( type (dic))

# using type () to display the data type

print ( "The data type of li is:" , end = "")

print ( type (li))


 The size of dic is: 3 The data type of dic is: The data type of li is: 

5. copy () : — This function creates a shallow copy of a dictionary into another dictionary.

6. clear () : — This function is used to clear the contents of a dictionary .

# Python code to demonstrate how it works
# clear () and copy ()

# Initializing the dictionary

dic1 = { 'Name' : ' Nandini' , 'Age' : 19 }

# Initializing the dictionary

dic3 = {}

# using copy () to create a shallow copy of the dictionary

dic3 = dic1.copy ()

# print new dictionary

print ( "The new copied dictionary is:" )

print (dic3.items ())

# clear dictionary
dic1.clear ()

# print the empty dictionary

print ( "The contents of deleted dictionary is:" , end = "")

print (dic1.items ())


 The new copied dictionary is: dict_items ([('Age', 19), (' Name', 'Nandini')]) The contents of deleted dictionary is: dict_items ([]) 

This article courtesy of Manjeet Singh . If you are as Python.Engineering and would like to contribute, you can also write an article using contribute.python.engineering or by posting an article contribute @ python.engineering. See my article appearing on the Python.Engineering homepage and help other geeks.

Please post comments if you find anything wrong or if you would like to share more information on the topic discussed above.

Dictionary Methods in Python | Set to 1 (cmp (), len (), items () ...): StackOverflow Questions

__lt__ instead of __cmp__

Question by Roger Pate

Python 2.x has two ways to overload comparison operators, __cmp__ or the "rich comparison operators" such as __lt__. The rich comparison overloads are said to be preferred, but why is this so?

Rich comparison operators are simpler to implement each, but you must implement several of them with nearly identical logic. However, if you can use the builtin cmp and tuple ordering, then __cmp__ gets quite simple and fulfills all the comparisons:

class A(object):
  def __init__(self, name, age, other):
    self.name = name
    self.age = age
    self.other = other
  def __cmp__(self, other):
    assert isinstance(other, A) # assumption for this example
    return cmp((self.name, self.age, self.other),
               (other.name, other.age, other.other))

This simplicity seems to meet my needs much better than overloading all 6(!) of the rich comparisons. (However, you can get it down to "just" 4 if you rely on the "swapped argument"/reflected behavior, but that results in a net increase of complication, in my humble opinion.)

Are there any unforeseen pitfalls I need to be made aware of if I only overload __cmp__?

I understand the <, <=, ==, etc. operators can be overloaded for other purposes, and can return any object they like. I am not asking about the merits of that approach, but only about differences when using these operators for comparisons in the same sense that they mean for numbers.

Update: As Christopher pointed out, cmp is disappearing in 3.x. Are there any alternatives that make implementing comparisons as easy as the above __cmp__?

Answer #1

Use the source, Luke!

In CPython, range(...).__contains__ (a method wrapper) will eventually delegate to a simple calculation which checks if the value can possibly be in the range. The reason for the speed here is we"re using mathematical reasoning about the bounds, rather than a direct iteration of the range object. To explain the logic used:

  1. Check that the number is between start and stop, and
  2. Check that the stride value doesn"t "step over" our number.

For example, 994 is in range(4, 1000, 2) because:

  1. 4 <= 994 < 1000, and
  2. (994 - 4) % 2 == 0.

The full C code is included below, which is a bit more verbose because of memory management and reference counting details, but the basic idea is there:

static int
range_contains_long(rangeobject *r, PyObject *ob)
    int cmp1, cmp2, cmp3;
    PyObject *tmp1 = NULL;
    PyObject *tmp2 = NULL;
    PyObject *zero = NULL;
    int result = -1;

    zero = PyLong_FromLong(0);
    if (zero == NULL) /* MemoryError in int(0) */
        goto end;

    /* Check if the value can possibly be in the range. */

    cmp1 = PyObject_RichCompareBool(r->step, zero, Py_GT);
    if (cmp1 == -1)
        goto end;
    if (cmp1 == 1) { /* positive steps: start <= ob < stop */
        cmp2 = PyObject_RichCompareBool(r->start, ob, Py_LE);
        cmp3 = PyObject_RichCompareBool(ob, r->stop, Py_LT);
    else { /* negative steps: stop < ob <= start */
        cmp2 = PyObject_RichCompareBool(ob, r->start, Py_LE);
        cmp3 = PyObject_RichCompareBool(r->stop, ob, Py_LT);

    if (cmp2 == -1 || cmp3 == -1) /* TypeError */
        goto end;
    if (cmp2 == 0 || cmp3 == 0) { /* ob outside of range */
        result = 0;
        goto end;

    /* Check that the stride does not invalidate ob"s membership. */
    tmp1 = PyNumber_Subtract(ob, r->start);
    if (tmp1 == NULL)
        goto end;
    tmp2 = PyNumber_Remainder(tmp1, r->step);
    if (tmp2 == NULL)
        goto end;
    /* result = ((int(ob) - start) % step) == 0 */
    result = PyObject_RichCompareBool(tmp2, zero, Py_EQ);
    return result;

static int
range_contains(rangeobject *r, PyObject *ob)
    if (PyLong_CheckExact(ob) || PyBool_Check(ob))
        return range_contains_long(r, ob);

    return (int)_PySequence_IterSearch((PyObject*)r, ob,

The "meat" of the idea is mentioned in the line:

/* result = ((int(ob) - start) % step) == 0 */ 

As a final note - look at the range_contains function at the bottom of the code snippet. If the exact type check fails then we don"t use the clever algorithm described, instead falling back to a dumb iteration search of the range using _PySequence_IterSearch! You can check this behaviour in the interpreter (I"m using v3.5.0 here):

>>> x, r = 1000000000000000, range(1000000000000001)
>>> class MyInt(int):
...     pass
>>> x_ = MyInt(x)
>>> x in r  # calculates immediately :) 
>>> x_ in r  # iterates for ages.. :( 
^Quit (core dumped)

Answer #2

As I mentioned to David Wolever, there"s more to this than meets the eye; both methods dispatch to is; you can prove this by doing

min(Timer("x == x", setup="x = "a" * 1000000").repeat(10, 10000))
#>>> 0.00045456900261342525

min(Timer("x == y", setup="x = "a" * 1000000; y = "a" * 1000000").repeat(10, 10000))
#>>> 0.5256857610074803

The first can only be so fast because it checks by identity.

To find out why one would take longer than the other, let"s trace through execution.

They both start in ceval.c, from COMPARE_OP since that is the bytecode involved

    PyObject *right = POP();
    PyObject *left = TOP();
    PyObject *res = cmp_outcome(oparg, left, right);
    if (res == NULL)
        goto error;

This pops the values from the stack (technically it only pops one)

PyObject *right = POP();
PyObject *left = TOP();

and runs the compare:

PyObject *res = cmp_outcome(oparg, left, right);

cmp_outcome is this:

static PyObject *
cmp_outcome(int op, PyObject *v, PyObject *w)
    int res = 0;
    switch (op) {
    case PyCmp_IS: ...
    case PyCmp_IS_NOT: ...
    case PyCmp_IN:
        res = PySequence_Contains(w, v);
        if (res < 0)
            return NULL;
    case PyCmp_NOT_IN: ...
    case PyCmp_EXC_MATCH: ...
        return PyObject_RichCompare(v, w, op);
    v = res ? Py_True : Py_False;
    return v;

This is where the paths split. The PyCmp_IN branch does

PySequence_Contains(PyObject *seq, PyObject *ob)
    Py_ssize_t result;
    PySequenceMethods *sqm = seq->ob_type->tp_as_sequence;
    if (sqm != NULL && sqm->sq_contains != NULL)
        return (*sqm->sq_contains)(seq, ob);
    result = _PySequence_IterSearch(seq, ob, PY_ITERSEARCH_CONTAINS);
    return Py_SAFE_DOWNCAST(result, Py_ssize_t, int);

Note that a tuple is defined as

static PySequenceMethods tuple_as_sequence = {
    (objobjproc)tuplecontains,                  /* sq_contains */

PyTypeObject PyTuple_Type = {
    &tuple_as_sequence,                         /* tp_as_sequence */

So the branch

if (sqm != NULL && sqm->sq_contains != NULL)

will be taken and *sqm->sq_contains, which is the function (objobjproc)tuplecontains, will be taken.

This does

static int
tuplecontains(PyTupleObject *a, PyObject *el)
    Py_ssize_t i;
    int cmp;

    for (i = 0, cmp = 0 ; cmp == 0 && i < Py_SIZE(a); ++i)
        cmp = PyObject_RichCompareBool(el, PyTuple_GET_ITEM(a, i),
    return cmp;

...Wait, wasn"t that PyObject_RichCompareBool what the other branch took? Nope, that was PyObject_RichCompare.

That code path was short so it likely just comes down to the speed of these two. Let"s compare.

PyObject_RichCompareBool(PyObject *v, PyObject *w, int op)
    PyObject *res;
    int ok;

    /* Quick result when objects are the same.
       Guarantees that identity implies equality. */
    if (v == w) {
        if (op == Py_EQ)
            return 1;
        else if (op == Py_NE)
            return 0;


The code path in PyObject_RichCompareBool pretty much immediately terminates. For PyObject_RichCompare, it does

PyObject *
PyObject_RichCompare(PyObject *v, PyObject *w, int op)
    PyObject *res;

    assert(Py_LT <= op && op <= Py_GE);
    if (v == NULL || w == NULL) { ... }
    if (Py_EnterRecursiveCall(" in comparison"))
        return NULL;
    res = do_richcompare(v, w, op);
    return res;

The Py_EnterRecursiveCall/Py_LeaveRecursiveCall combo are not taken in the previous path, but these are relatively quick macros that"ll short-circuit after incrementing and decrementing some globals.

do_richcompare does:

static PyObject *
do_richcompare(PyObject *v, PyObject *w, int op)
    richcmpfunc f;
    PyObject *res;
    int checked_reverse_op = 0;

    if (v->ob_type != w->ob_type && ...) { ... }
    if ((f = v->ob_type->tp_richcompare) != NULL) {
        res = (*f)(v, w, op);
        if (res != Py_NotImplemented)
            return res;

This does some quick checks to call v->ob_type->tp_richcompare which is

PyTypeObject PyUnicode_Type = {
    PyUnicode_RichCompare,      /* tp_richcompare */

which does

PyObject *
PyUnicode_RichCompare(PyObject *left, PyObject *right, int op)
    int result;
    PyObject *v;

    if (!PyUnicode_Check(left) || !PyUnicode_Check(right))

    if (PyUnicode_READY(left) == -1 ||
        PyUnicode_READY(right) == -1)
        return NULL;

    if (left == right) {
        switch (op) {
        case Py_EQ:
        case Py_LE:
        case Py_GE:
            /* a string is equal to itself */
            v = Py_True;
        case Py_NE:
        case Py_LT:
        case Py_GT:
            v = Py_False;
    else if (...) { ... }
    else { ...}
    return v;

Namely, this shortcuts on left == right... but only after doing

    if (!PyUnicode_Check(left) || !PyUnicode_Check(right))

    if (PyUnicode_READY(left) == -1 ||
        PyUnicode_READY(right) == -1)

All in all the paths then look something like this (manually recursively inlining, unrolling and pruning known branches)

POP()                           # Stack stuff
TOP()                           #
case PyCmp_IN:                  # Dispatch on operation
sqm != NULL                     # Dispatch to builtin op
sqm->sq_contains != NULL        #
*sqm->sq_contains               #
cmp == 0                        # Do comparison in loop
i < Py_SIZE(a)                  #
v == w                          #
op == Py_EQ                     #
++i                             # 
cmp == 0                        #
res < 0                         # Convert to Python-space
res ? Py_True : Py_False        #
Py_INCREF(v)                    #
Py_DECREF(left)                 # Stack stuff
Py_DECREF(right)                #
SET_TOP(res)                    #
res == NULL                     #
DISPATCH()                      #


POP()                           # Stack stuff
TOP()                           #
default:                        # Dispatch on operation
Py_LT <= op                     # Checking operation
op <= Py_GE                     #
v == NULL                       #
w == NULL                       #
Py_EnterRecursiveCall(...)      # Recursive check
v->ob_type != w->ob_type        # More operation checks
f = v->ob_type->tp_richcompare  # Dispatch to builtin op
f != NULL                       #
!PyUnicode_Check(left)          # ...More checks
!PyUnicode_Check(right))        #
PyUnicode_READY(left) == -1     #
PyUnicode_READY(right) == -1    #
left == right                   # Finally, doing comparison
case Py_EQ:                     # Immediately short circuit
Py_INCREF(v);                   #
res != Py_NotImplemented        #
Py_LeaveRecursiveCall()         # Recursive check
Py_DECREF(left)                 # Stack stuff
Py_DECREF(right)                #
SET_TOP(res)                    #
res == NULL                     #
DISPATCH()                      #

Now, PyUnicode_Check and PyUnicode_READY are pretty cheap since they only check a couple of fields, but it should be obvious that the top one is a smaller code path, it has fewer function calls, only one switch statement and is just a bit thinner.


Both dispatch to if (left_pointer == right_pointer); the difference is just how much work they do to get there. in just does less.

Answer #3

To add to Martijn’s answer, this is the relevant part of the source (in C, as the range object is written in native code):

static int
range_contains(rangeobject *r, PyObject *ob)
    if (PyLong_CheckExact(ob) || PyBool_Check(ob))
        return range_contains_long(r, ob);

    return (int)_PySequence_IterSearch((PyObject*)r, ob,

So for PyLong objects (which is int in Python 3), it will use the range_contains_long function to determine the result. And that function essentially checks if ob is in the specified range (although it looks a bit more complex in C).

If it’s not an int object, it falls back to iterating until it finds the value (or not).

The whole logic could be translated to pseudo-Python like this:

def range_contains (rangeObj, obj):
    if isinstance(obj, int):
        return range_contains_long(rangeObj, obj)

    # default logic by iterating
    return any(obj == x for x in rangeObj)

def range_contains_long (r, num):
    if r.step > 0:
        # positive step: r.start <= num < r.stop
        cmp2 = r.start <= num
        cmp3 = num < r.stop
        # negative step: r.start >= num > r.stop
        cmp2 = num <= r.start
        cmp3 = r.stop < num

    # outside of the range boundaries
    if not cmp2 or not cmp3:
        return False

    # num must be on a valid step inside the boundaries
    return (num - r.start) % r.step == 0

Answer #4

This function works in any OS (Unix, Linux, macOS, and Windows)
Python 2 and Python 3

By @radato os.system was replaced by subprocess.call. This avoids shell injection vulnerability in cases where your hostname string might not be validated.

import platform    # For getting the operating system name
import subprocess  # For executing a shell command

def ping(host):
    Returns True if host (str) responds to a ping request.
    Remember that a host may not respond to a ping (ICMP) request even if the host name is valid.

    # Option for the number of packets as a function of
    param = "-n" if platform.system().lower()=="windows" else "-c"

    # Building the command. Ex: "ping -c 1 google.com"
    command = ["ping", param, "1", host]

    return subprocess.call(command) == 0

Note that, according to @ikrase on Windows this function will still return True if you get a Destination Host Unreachable error.


The command is ping in both Windows and Unix-like systems.
The option -n (Windows) or -c (Unix) controls the number of packets which in this example was set to 1.

platform.system() returns the platform name. Ex. "Darwin" on macOS.
subprocess.call() performs a system call. Ex. subprocess.call(["ls","-l"]).

Answer #5

This is more than a bit late, but you can extend the regex expression to account for scientific notation too.

import re

# Format is [(<string>, <expected output>), ...]
ss = [("apple-12.34 ba33na fanc-14.23e-2yapple+45e5+67.56E+3",
       ["-12.34", "33", "-14.23e-2", "+45e5", "+67.56E+3"]),
      ("hello X42 I"m a Y-32.35 string Z30",
       ["42", "-32.35", "30"]),
      ("he33llo 42 I"m a 32 string -30", 
       ["33", "42", "32", "-30"]),
      ("h3110 23 cat 444.4 rabbit 11 2 dog", 
       ["3110", "23", "444.4", "11", "2"]),
      ("hello 12 hi 89", 
       ["12", "89"]),
      ("I like 74,600 commas not,500", 
       ["74,600", "500"]),
      ("I like bad math 1+2=.001", 
       ["1", "+2", ".001"])]

for s, r in ss:
    rr = re.findall("[-+]?[.]?[d]+(?:,ddd)*[.]?d*(?:[eE][-+]?d+)?", s)
    if rr == r:
        print("WRONG", rr, "should be", r)

Gives all good!

Additionally, you can look at the AWS Glue built-in regex

Answer #6

Basic answer:

mylist = ["b", "C", "A"]

This modifies your original list (i.e. sorts in-place). To get a sorted copy of the list, without changing the original, use the sorted() function:

for x in sorted(mylist):
    print x

However, the examples above are a bit naive, because they don"t take locale into account, and perform a case-sensitive sorting. You can take advantage of the optional parameter key to specify custom sorting order (the alternative, using cmp, is a deprecated solution, as it has to be evaluated multiple times - key is only computed once per element).

So, to sort according to the current locale, taking language-specific rules into account (cmp_to_key is a helper function from functools):

sorted(mylist, key=cmp_to_key(locale.strcoll))

And finally, if you need, you can specify a custom locale for sorting:

import locale
locale.setlocale(locale.LC_ALL, "en_US.UTF-8") # vary depending on your lang/locale
assert sorted((u"Ab", u"ad", u"aa"),
  key=cmp_to_key(locale.strcoll)) == [u"aa", u"Ab", u"ad"]

Last note: you will see examples of case-insensitive sorting which use the lower() method - those are incorrect, because they work only for the ASCII subset of characters. Those two are wrong for any non-English data:

# this is incorrect!
mylist.sort(key=lambda x: x.lower())
# alternative notation, a bit faster, but still wrong

Answer #7

You should implement the method __eq__:

class MyClass:
    def __init__(self, foo, bar):
        self.foo = foo
        self.bar = bar
    def __eq__(self, other): 
        if not isinstance(other, MyClass):
            # don"t attempt to compare against unrelated types
            return NotImplemented

        return self.foo == other.foo and self.bar == other.bar

Now it outputs:

>>> x == y

Note that implementing __eq__ will automatically make instances of your class unhashable, which means they can"t be stored in sets and dicts. If you"re not modelling an immutable type (i.e. if the attributes foo and bar may change the value within the lifetime of your object), then it"s recommended to just leave your instances as unhashable.

If you are modelling an immutable type, you should also implement the data model hook __hash__:

class MyClass:

    def __hash__(self):
        # necessary for instances to behave sanely in dicts and sets.
        return hash((self.foo, self.bar))

A general solution, like the idea of looping through __dict__ and comparing values, is not advisable - it can never be truly general because the __dict__ may have uncomparable or unhashable types contained within.

N.B.: be aware that before Python 3, you may need to use __cmp__ instead of __eq__. Python 2 users may also want to implement __ne__, since a sensible default behaviour for inequality (i.e. inverting the equality result) will not be automatically created in Python 2.

Answer #8


a = sorted(a, key=lambda x: x.modified, reverse=True)
#             ^^^^

On Python 2.x, the sorted function takes its arguments in this order:

sorted(iterable, cmp=None, key=None, reverse=False)

so without the key=, the function you pass in will be considered a cmp function which takes 2 arguments.

Answer #9

== is an equality test. It checks whether the right hand side and the left hand side are equal objects (according to their __eq__ or __cmp__ methods.)

is is an identity test. It checks whether the right hand side and the left hand side are the very same object. No methodcalls are done, objects can"t influence the is operation.

You use is (and is not) for singletons, like None, where you don"t care about objects that might want to pretend to be None or where you want to protect against objects breaking when being compared against None.

Answer #10


Indeed there was a patch which included sign() in math, but it wasn"t accepted, because they didn"t agree on what it should return in all the edge cases (+/-0, +/-nan, etc)

So they decided to implement only copysign, which (although more verbose) can be used to delegate to the end user the desired behavior for edge cases - which sometimes might require the call to cmp(x,0).

I don"t know why it"s not a built-in, but I have some thoughts.

Return x with the sign of y.

Most importantly, copysign is a superset of sign! Calling copysign with x=1 is the same as a sign function. So you could just use copysign and forget about it.

>>> math.copysign(1, -4)
>>> math.copysign(1, 3)

If you get sick of passing two whole arguments, you can implement sign this way, and it will still be compatible with the IEEE stuff mentioned by others:

>>> sign = functools.partial(math.copysign, 1) # either of these
>>> sign = lambda x: math.copysign(1, x) # two will work
>>> sign(-4)
>>> sign(3)
>>> sign(0)
>>> sign(-0.0)
>>> sign(float("nan"))

Secondly, usually when you want the sign of something, you just end up multiplying it with another value. And of course that"s basically what copysign does.

So, instead of:

s = sign(a)
b = b * s

You can just do:

b = copysign(b, a)

And yes, I"m surprised you"ve been using Python for 7 years and think cmp could be so easily removed and replaced by sign! Have you never implemented a class with a __cmp__ method? Have you never called cmp and specified a custom comparator function?

In summary, I"ve found myself wanting a sign function too, but copysign with the first argument being 1 will work just fine. I disagree that sign would be more useful than copysign, as I"ve shown that it"s merely a subset of the same functionality.