Python floor () and ceil () function

ceil | floor | Python Methods and Functions

gender ()

The floor () method in Python returns floor of x, that is, the largest integer not greater than x.

  Syntax:  import math math.floor (x)  Parameter:  x-numeric expression.  Returns:  largest integer not greater than x. 

Below is the Python implementation of the floor () method:

# Python program to demonstrate the use of the floor () method

  
# This imports the math module

import math 

 
# prints ceil using the floor () method

print "math.floor (-23.11):" , math.floor ( - 23.11 )

print "math.floor (300.16):" , ma th.floor ( 300.16 )

print "math.floor (300.72):" , math.floor ( 300.72 )

Output:

 math.floor (-23.11): -24.0 math.floor (300.16): 300.0 math.floor (300.72): 300.0 

CEIL ()

The ceil () method in Python returns the ceiling x, which is the smallest integer not less than x.

  Syntax:  import math math.ceil (x)  Parameter:  x: This is a numeric expression.  Returns:  Smallest integer not less than x. 

Below is the implementation of the ceil () method in Python:

# Python program to demonstrate using the ceil () method

 
# This imports the math module

import math 

 
# prints ceil using the ceil () method

print "math.ceil (-23.11):" , math.ceil ( - 23.11 )

print "math.ceil (300.16):" , math.ceil (  300.16 )

print "math.ceil (300.72):" , math.ceil ( 300.72 )

Output:

 math.ceil (-23.11): -23.0 math.ceil (300.16): 301.0 math.ceil (300.72): 301.0 




Python floor () and ceil () function: StackOverflow Questions

Why do Python"s math.ceil() and math.floor() operations return floats instead of integers?

Can someone explain this (straight from the docs- emphasis mine):

math.ceil(x) Return the ceiling of x as a float, the smallest integer value greater than or equal to x.

math.floor(x) Return the floor of x as a float, the largest integer value less than or equal to x.

Why would .ceil and .floor return floats when they are by definition supposed to calculate integers?


EDIT:

Well this got some very good arguments as to why they should return floats, and I was just getting used to the idea, when @jcollado pointed out that they in fact do return ints in Python 3...

Is there a ceiling equivalent of // operator in Python?

I found out about the // operator in Python which in Python 3 does division with floor.

Is there an operator which divides with ceil instead? (I know about the / operator which in Python 3 does floating point division.)

Answer #1

Python has no built-in encryption schemes, no. You also should take encrypted data storage serious; trivial encryption schemes that one developer understands to be insecure and a toy scheme may well be mistaken for a secure scheme by a less experienced developer. If you encrypt, encrypt properly.

You don’t need to do much work to implement a proper encryption scheme however. First of all, don’t re-invent the cryptography wheel, use a trusted cryptography library to handle this for you. For Python 3, that trusted library is cryptography.

I also recommend that encryption and decryption applies to bytes; encode text messages to bytes first; stringvalue.encode() encodes to UTF8, easily reverted again using bytesvalue.decode().

Last but not least, when encrypting and decrypting, we talk about keys, not passwords. A key should not be human memorable, it is something you store in a secret location but machine readable, whereas a password often can be human-readable and memorised. You can derive a key from a password, with a little care.

But for a web application or process running in a cluster without human attention to keep running it, you want to use a key. Passwords are for when only an end-user needs access to the specific information. Even then, you usually secure the application with a password, then exchange encrypted information using a key, perhaps one attached to the user account.

Symmetric key encryption

Fernet – AES CBC + HMAC, strongly recommended

The cryptography library includes the Fernet recipe, a best-practices recipe for using cryptography. Fernet is an open standard, with ready implementations in a wide range of programming languages and it packages AES CBC encryption for you with version information, a timestamp and an HMAC signature to prevent message tampering.

Fernet makes it very easy to encrypt and decrypt messages and keep you secure. It is the ideal method for encrypting data with a secret.

I recommend you use Fernet.generate_key() to generate a secure key. You can use a password too (next section), but a full 32-byte secret key (16 bytes to encrypt with, plus another 16 for the signature) is going to be more secure than most passwords you could think of.

The key that Fernet generates is a bytes object with URL and file safe base64 characters, so printable:

from cryptography.fernet import Fernet

key = Fernet.generate_key()  # store in a secure location
print("Key:", key.decode())

To encrypt or decrypt messages, create a Fernet() instance with the given key, and call the Fernet.encrypt() or Fernet.decrypt(), both the plaintext message to encrypt and the encrypted token are bytes objects.

encrypt() and decrypt() functions would look like:

from cryptography.fernet import Fernet

def encrypt(message: bytes, key: bytes) -> bytes:
    return Fernet(key).encrypt(message)

def decrypt(token: bytes, key: bytes) -> bytes:
    return Fernet(key).decrypt(token)

Demo:

>>> key = Fernet.generate_key()
>>> print(key.decode())
GZWKEhHGNopxRdOHS4H4IyKhLQ8lwnyU7vRLrM3sebY=
>>> message = "John Doe"
>>> encrypt(message.encode(), key)
"gAAAAABciT3pFbbSihD_HZBZ8kqfAj94UhknamBuirZWKivWOukgKQ03qE2mcuvpuwCSuZ-X_Xkud0uWQLZ5e-aOwLC0Ccnepg=="
>>> token = _
>>> decrypt(token, key).decode()
"John Doe"

Fernet with password – key derived from password, weakens the security somewhat

You can use a password instead of a secret key, provided you use a strong key derivation method. You do then have to include the salt and the HMAC iteration count in the message, so the encrypted value is not Fernet-compatible anymore without first separating salt, count and Fernet token:

import secrets
from base64 import urlsafe_b64encode as b64e, urlsafe_b64decode as b64d

from cryptography.fernet import Fernet
from cryptography.hazmat.backends import default_backend
from cryptography.hazmat.primitives import hashes
from cryptography.hazmat.primitives.kdf.pbkdf2 import PBKDF2HMAC

backend = default_backend()
iterations = 100_000

def _derive_key(password: bytes, salt: bytes, iterations: int = iterations) -> bytes:
    """Derive a secret key from a given password and salt"""
    kdf = PBKDF2HMAC(
        algorithm=hashes.SHA256(), length=32, salt=salt,
        iterations=iterations, backend=backend)
    return b64e(kdf.derive(password))

def password_encrypt(message: bytes, password: str, iterations: int = iterations) -> bytes:
    salt = secrets.token_bytes(16)
    key = _derive_key(password.encode(), salt, iterations)
    return b64e(
        b"%b%b%b" % (
            salt,
            iterations.to_bytes(4, "big"),
            b64d(Fernet(key).encrypt(message)),
        )
    )

def password_decrypt(token: bytes, password: str) -> bytes:
    decoded = b64d(token)
    salt, iter, token = decoded[:16], decoded[16:20], b64e(decoded[20:])
    iterations = int.from_bytes(iter, "big")
    key = _derive_key(password.encode(), salt, iterations)
    return Fernet(key).decrypt(token)

Demo:

>>> message = "John Doe"
>>> password = "mypass"
>>> password_encrypt(message.encode(), password)
b"9Ljs-w8IRM3XT1NDBbSBuQABhqCAAAAAAFyJdhiCPXms2vQHO7o81xZJn5r8_PAtro8Qpw48kdKrq4vt-551BCUbcErb_GyYRz8SVsu8hxTXvvKOn9QdewRGDfwx"
>>> token = _
>>> password_decrypt(token, password).decode()
"John Doe"

Including the salt in the output makes it possible to use a random salt value, which in turn ensures the encrypted output is guaranteed to be fully random regardless of password reuse or message repetition. Including the iteration count ensures that you can adjust for CPU performance increases over time without losing the ability to decrypt older messages.

A password alone can be as safe as a Fernet 32-byte random key, provided you generate a properly random password from a similar size pool. 32 bytes gives you 256 ^ 32 number of keys, so if you use an alphabet of 74 characters (26 upper, 26 lower, 10 digits and 12 possible symbols), then your password should be at least math.ceil(math.log(256 ** 32, 74)) == 42 characters long. However, a well-selected larger number of HMAC iterations can mitigate the lack of entropy somewhat as this makes it much more expensive for an attacker to brute force their way in.

Just know that choosing a shorter but still reasonably secure password won’t cripple this scheme, it just reduces the number of possible values a brute-force attacker would have to search through; make sure to pick a strong enough password for your security requirements.

Alternatives

Obscuring

An alternative is not to encrypt. Don"t be tempted to just use a low-security cipher, or a home-spun implementation of, say Vignere. There is no security in these approaches, but may give an inexperienced developer that is given the task to maintain your code in future the illusion of security, which is worse than no security at all.

If all you need is obscurity, just base64 the data; for URL-safe requirements, the base64.urlsafe_b64encode() function is fine. Don"t use a password here, just encode and you are done. At most, add some compression (like zlib):

import zlib
from base64 import urlsafe_b64encode as b64e, urlsafe_b64decode as b64d

def obscure(data: bytes) -> bytes:
    return b64e(zlib.compress(data, 9))

def unobscure(obscured: bytes) -> bytes:
    return zlib.decompress(b64d(obscured))

This turns b"Hello world!" into b"eNrzSM3JyVcozy_KSVEEAB0JBF4=".

Integrity only

If all you need is a way to make sure that the data can be trusted to be unaltered after having been sent to an untrusted client and received back, then you want to sign the data, you can use the hmac library for this with SHA1 (still considered secure for HMAC signing) or better:

import hmac
import hashlib

def sign(data: bytes, key: bytes, algorithm=hashlib.sha256) -> bytes:
    assert len(key) >= algorithm().digest_size, (
        "Key must be at least as long as the digest size of the "
        "hashing algorithm"
    )
    return hmac.new(key, data, algorithm).digest()

def verify(signature: bytes, data: bytes, key: bytes, algorithm=hashlib.sha256) -> bytes:
    expected = sign(data, key, algorithm)
    return hmac.compare_digest(expected, signature)

Use this to sign data, then attach the signature with the data and send that to the client. When you receive the data back, split data and signature and verify. I"ve set the default algorithm to SHA256, so you"ll need a 32-byte key:

key = secrets.token_bytes(32)

You may want to look at the itsdangerous library, which packages this all up with serialisation and de-serialisation in various formats.

Using AES-GCM encryption to provide encryption and integrity

Fernet builds on AEC-CBC with a HMAC signature to ensure integrity of the encrypted data; a malicious attacker can"t feed your system nonsense data to keep your service busy running in circles with bad input, because the ciphertext is signed.

The Galois / Counter mode block cipher produces ciphertext and a tag to serve the same purpose, so can be used to serve the same purposes. The downside is that unlike Fernet there is no easy-to-use one-size-fits-all recipe to reuse on other platforms. AES-GCM also doesn"t use padding, so this encryption ciphertext matches the length of the input message (whereas Fernet / AES-CBC encrypts messages to blocks of fixed length, obscuring the message length somewhat).

AES256-GCM takes the usual 32 byte secret as a key:

key = secrets.token_bytes(32)

then use

import binascii, time
from base64 import urlsafe_b64encode as b64e, urlsafe_b64decode as b64d

from cryptography.hazmat.primitives.ciphers import Cipher, algorithms, modes
from cryptography.hazmat.backends import default_backend
from cryptography.exceptions import InvalidTag

backend = default_backend()

def aes_gcm_encrypt(message: bytes, key: bytes) -> bytes:
    current_time = int(time.time()).to_bytes(8, "big")
    algorithm = algorithms.AES(key)
    iv = secrets.token_bytes(algorithm.block_size // 8)
    cipher = Cipher(algorithm, modes.GCM(iv), backend=backend)
    encryptor = cipher.encryptor()
    encryptor.authenticate_additional_data(current_time)
    ciphertext = encryptor.update(message) + encryptor.finalize()        
    return b64e(current_time + iv + ciphertext + encryptor.tag)

def aes_gcm_decrypt(token: bytes, key: bytes, ttl=None) -> bytes:
    algorithm = algorithms.AES(key)
    try:
        data = b64d(token)
    except (TypeError, binascii.Error):
        raise InvalidToken
    timestamp, iv, tag = data[:8], data[8:algorithm.block_size // 8 + 8], data[-16:]
    if ttl is not None:
        current_time = int(time.time())
        time_encrypted, = int.from_bytes(data[:8], "big")
        if time_encrypted + ttl < current_time or current_time + 60 < time_encrypted:
            # too old or created well before our current time + 1 h to account for clock skew
            raise InvalidToken
    cipher = Cipher(algorithm, modes.GCM(iv, tag), backend=backend)
    decryptor = cipher.decryptor()
    decryptor.authenticate_additional_data(timestamp)
    ciphertext = data[8 + len(iv):-16]
    return decryptor.update(ciphertext) + decryptor.finalize()

I"ve included a timestamp to support the same time-to-live use-cases that Fernet supports.

Other approaches on this page, in Python 3

AES CFB - like CBC but without the need to pad

This is the approach that All –Ü—ï V–∞–∏—ñ—Çy follows, albeit incorrectly. This is the cryptography version, but note that I include the IV in the ciphertext, it should not be stored as a global (reusing an IV weakens the security of the key, and storing it as a module global means it"ll be re-generated the next Python invocation, rendering all ciphertext undecryptable):

import secrets
from base64 import urlsafe_b64encode as b64e, urlsafe_b64decode as b64d

from cryptography.hazmat.primitives.ciphers import Cipher, algorithms, modes
from cryptography.hazmat.backends import default_backend

backend = default_backend()

def aes_cfb_encrypt(message, key):
    algorithm = algorithms.AES(key)
    iv = secrets.token_bytes(algorithm.block_size // 8)
    cipher = Cipher(algorithm, modes.CFB(iv), backend=backend)
    encryptor = cipher.encryptor()
    ciphertext = encryptor.update(message) + encryptor.finalize()
    return b64e(iv + ciphertext)

def aes_cfb_decrypt(ciphertext, key):
    iv_ciphertext = b64d(ciphertext)
    algorithm = algorithms.AES(key)
    size = algorithm.block_size // 8
    iv, encrypted = iv_ciphertext[:size], iv_ciphertext[size:]
    cipher = Cipher(algorithm, modes.CFB(iv), backend=backend)
    decryptor = cipher.decryptor()
    return decryptor.update(encrypted) + decryptor.finalize()

This lacks the added armoring of an HMAC signature and there is no timestamp; you’d have to add those yourself.

The above also illustrates how easy it is to combine basic cryptography building blocks incorrectly; All Іѕ Vаиітy‘s incorrect handling of the IV value can lead to a data breach or all encrypted messages being unreadable because the IV is lost. Using Fernet instead protects you from such mistakes.

AES ECB – not secure

If you previously implemented AES ECB encryption and need to still support this in Python 3, you can do so still with cryptography too. The same caveats apply, ECB is not secure enough for real-life applications. Re-implementing that answer for Python 3, adding automatic handling of padding:

from base64 import urlsafe_b64encode as b64e, urlsafe_b64decode as b64d

from cryptography.hazmat.primitives.ciphers import Cipher, algorithms, modes
from cryptography.hazmat.primitives import padding
from cryptography.hazmat.backends import default_backend

backend = default_backend()

def aes_ecb_encrypt(message, key):
    cipher = Cipher(algorithms.AES(key), modes.ECB(), backend=backend)
    encryptor = cipher.encryptor()
    padder = padding.PKCS7(cipher.algorithm.block_size).padder()
    padded = padder.update(msg_text.encode()) + padder.finalize()
    return b64e(encryptor.update(padded) + encryptor.finalize())

def aes_ecb_decrypt(ciphertext, key):
    cipher = Cipher(algorithms.AES(key), modes.ECB(), backend=backend)
    decryptor = cipher.decryptor()
    unpadder = padding.PKCS7(cipher.algorithm.block_size).unpadder()
    padded = decryptor.update(b64d(ciphertext)) + decryptor.finalize()
    return unpadder.update(padded) + unpadder.finalize()

Again, this lacks the HMAC signature, and you shouldn’t use ECB anyway. The above is there merely to illustrate that cryptography can handle the common cryptographic building blocks, even the ones you shouldn’t actually use.

Answer #2

The TensorFlow Convolution example gives an overview about the difference between SAME and VALID :

  • For the SAME padding, the output height and width are computed as:

    out_height = ceil(float(in_height) / float(strides[1]))
    out_width  = ceil(float(in_width) / float(strides[2]))
    

And

  • For the VALID padding, the output height and width are computed as:

    out_height = ceil(float(in_height - filter_height + 1) / float(strides[1]))
    out_width  = ceil(float(in_width - filter_width + 1) / float(strides[2]))
    

Answer #3

Most answers suggested round or format. round sometimes rounds up, and in my case I needed the value of my variable to be rounded down and not just displayed as such.

round(2.357, 2)  # -> 2.36

I found the answer here: How do I round a floating point number up to a certain decimal place?

import math
v = 2.357
print(math.ceil(v*100)/100)  # -> 2.36
print(math.floor(v*100)/100)  # -> 2.35

or:

from math import floor, ceil

def roundDown(n, d=8):
    d = int("1" + ("0" * d))
    return floor(n * d) / d

def roundUp(n, d=8):
    d = int("1" + ("0" * d))
    return ceil(n * d) / d

Answer #4

The ceil (ceiling) function:

import math
print(int(math.ceil(4.2)))

Answer #5

The difference between import module and from module import foo is mainly subjective. Pick the one you like best and be consistent in your use of it. Here are some points to help you decide.

import module

  • Pros:
    • Less maintenance of your import statements. Don"t need to add any additional imports to start using another item from the module
  • Cons:
    • Typing module.foo in your code can be tedious and redundant (tedium can be minimized by using import module as mo then typing mo.foo)

from module import foo

  • Pros:
    • Less typing to use foo
    • More control over which items of a module can be accessed
  • Cons:
    • To use a new item from the module you have to update your import statement
    • You lose context about foo. For example, it"s less clear what ceil() does compared to math.ceil()

Either method is acceptable, but don"t use from module import *.

For any reasonable large set of code, if you import * you will likely be cementing it into the module, unable to be removed. This is because it is difficult to determine what items used in the code are coming from "module", making it easy to get to the point where you think you don"t use the import any more but it"s extremely difficult to be sure.

Answer #6

Warning: timeit results may vary due to differences in hardware or version of Python.

Below is a script which compares a number of implementations:

Many thanks to stephan for bringing sieve_wheel_30 to my attention. Credit goes to Robert William Hanks for primesfrom2to, primesfrom3to, rwh_primes, rwh_primes1, and rwh_primes2.

Of the plain Python methods tested, with psyco, for n=1000000, rwh_primes1 was the fastest tested.

+---------------------+-------+
| Method              | ms    |
+---------------------+-------+
| rwh_primes1         | 43.0  |
| sieveOfAtkin        | 46.4  |
| rwh_primes          | 57.4  |
| sieve_wheel_30      | 63.0  |
| rwh_primes2         | 67.8  |    
| sieveOfEratosthenes | 147.0 |
| ambi_sieve_plain    | 152.0 |
| sundaram3           | 194.0 |
+---------------------+-------+

Of the plain Python methods tested, without psyco, for n=1000000, rwh_primes2 was the fastest.

+---------------------+-------+
| Method              | ms    |
+---------------------+-------+
| rwh_primes2         | 68.1  |
| rwh_primes1         | 93.7  |
| rwh_primes          | 94.6  |
| sieve_wheel_30      | 97.4  |
| sieveOfEratosthenes | 178.0 |
| ambi_sieve_plain    | 286.0 |
| sieveOfAtkin        | 314.0 |
| sundaram3           | 416.0 |
+---------------------+-------+

Of all the methods tested, allowing numpy, for n=1000000, primesfrom2to was the fastest tested.

+---------------------+-------+
| Method              | ms    |
+---------------------+-------+
| primesfrom2to       | 15.9  |
| primesfrom3to       | 18.4  |
| ambi_sieve          | 29.3  |
+---------------------+-------+

Timings were measured using the command:

python -mtimeit -s"import primes" "primes.{method}(1000000)"

with {method} replaced by each of the method names.

primes.py:

#!/usr/bin/env python
import psyco; psyco.full()
from math import sqrt, ceil
import numpy as np

def rwh_primes(n):
    # https://stackoverflow.com/questions/2068372/fastest-way-to-list-all-primes-below-n-in-python/3035188#3035188
    """ Returns  a list of primes < n """
    sieve = [True] * n
    for i in xrange(3,int(n**0.5)+1,2):
        if sieve[i]:
            sieve[i*i::2*i]=[False]*((n-i*i-1)/(2*i)+1)
    return [2] + [i for i in xrange(3,n,2) if sieve[i]]

def rwh_primes1(n):
    # https://stackoverflow.com/questions/2068372/fastest-way-to-list-all-primes-below-n-in-python/3035188#3035188
    """ Returns  a list of primes < n """
    sieve = [True] * (n/2)
    for i in xrange(3,int(n**0.5)+1,2):
        if sieve[i/2]:
            sieve[i*i/2::i] = [False] * ((n-i*i-1)/(2*i)+1)
    return [2] + [2*i+1 for i in xrange(1,n/2) if sieve[i]]

def rwh_primes2(n):
    # https://stackoverflow.com/questions/2068372/fastest-way-to-list-all-primes-below-n-in-python/3035188#3035188
    """ Input n>=6, Returns a list of primes, 2 <= p < n """
    correction = (n%6>1)
    n = {0:n,1:n-1,2:n+4,3:n+3,4:n+2,5:n+1}[n%6]
    sieve = [True] * (n/3)
    sieve[0] = False
    for i in xrange(int(n**0.5)/3+1):
      if sieve[i]:
        k=3*i+1|1
        sieve[      ((k*k)/3)      ::2*k]=[False]*((n/6-(k*k)/6-1)/k+1)
        sieve[(k*k+4*k-2*k*(i&1))/3::2*k]=[False]*((n/6-(k*k+4*k-2*k*(i&1))/6-1)/k+1)
    return [2,3] + [3*i+1|1 for i in xrange(1,n/3-correction) if sieve[i]]

def sieve_wheel_30(N):
    # http://zerovolt.com/?p=88
    """ Returns a list of primes <= N using wheel criterion 2*3*5 = 30

Copyright 2009 by zerovolt.com
This code is free for non-commercial purposes, in which case you can just leave this comment as a credit for my work.
If you need this code for commercial purposes, please contact me by sending an email to: info [at] zerovolt [dot] com."""
    __smallp = ( 2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47, 53, 59,
    61, 67, 71, 73, 79, 83, 89, 97, 101, 103, 107, 109, 113, 127, 131, 137, 139,
    149, 151, 157, 163, 167, 173, 179, 181, 191, 193, 197, 199, 211, 223, 227,
    229, 233, 239, 241, 251, 257, 263, 269, 271, 277, 281, 283, 293, 307, 311,
    313, 317, 331, 337, 347, 349, 353, 359, 367, 373, 379, 383, 389, 397, 401,
    409, 419, 421, 431, 433, 439, 443, 449, 457, 461, 463, 467, 479, 487, 491,
    499, 503, 509, 521, 523, 541, 547, 557, 563, 569, 571, 577, 587, 593, 599,
    601, 607, 613, 617, 619, 631, 641, 643, 647, 653, 659, 661, 673, 677, 683,
    691, 701, 709, 719, 727, 733, 739, 743, 751, 757, 761, 769, 773, 787, 797,
    809, 811, 821, 823, 827, 829, 839, 853, 857, 859, 863, 877, 881, 883, 887,
    907, 911, 919, 929, 937, 941, 947, 953, 967, 971, 977, 983, 991, 997)

    wheel = (2, 3, 5)
    const = 30
    if N < 2:
        return []
    if N <= const:
        pos = 0
        while __smallp[pos] <= N:
            pos += 1
        return list(__smallp[:pos])
    # make the offsets list
    offsets = (7, 11, 13, 17, 19, 23, 29, 1)
    # prepare the list
    p = [2, 3, 5]
    dim = 2 + N // const
    tk1  = [True] * dim
    tk7  = [True] * dim
    tk11 = [True] * dim
    tk13 = [True] * dim
    tk17 = [True] * dim
    tk19 = [True] * dim
    tk23 = [True] * dim
    tk29 = [True] * dim
    tk1[0] = False
    # help dictionary d
    # d[a , b] = c  ==> if I want to find the smallest useful multiple of (30*pos)+a
    # on tkc, then I need the index given by the product of [(30*pos)+a][(30*pos)+b]
    # in general. If b < a, I need [(30*pos)+a][(30*(pos+1))+b]
    d = {}
    for x in offsets:
        for y in offsets:
            res = (x*y) % const
            if res in offsets:
                d[(x, res)] = y
    # another help dictionary: gives tkx calling tmptk[x]
    tmptk = {1:tk1, 7:tk7, 11:tk11, 13:tk13, 17:tk17, 19:tk19, 23:tk23, 29:tk29}
    pos, prime, lastadded, stop = 0, 0, 0, int(ceil(sqrt(N)))
    # inner functions definition
    def del_mult(tk, start, step):
        for k in xrange(start, len(tk), step):
            tk[k] = False
    # end of inner functions definition
    cpos = const * pos
    while prime < stop:
        # 30k + 7
        if tk7[pos]:
            prime = cpos + 7
            p.append(prime)
            lastadded = 7
            for off in offsets:
                tmp = d[(7, off)]
                start = (pos + prime) if off == 7 else (prime * (const * (pos + 1 if tmp < 7 else 0) + tmp) )//const
                del_mult(tmptk[off], start, prime)
        # 30k + 11
        if tk11[pos]:
            prime = cpos + 11
            p.append(prime)
            lastadded = 11
            for off in offsets:
                tmp = d[(11, off)]
                start = (pos + prime) if off == 11 else (prime * (const * (pos + 1 if tmp < 11 else 0) + tmp) )//const
                del_mult(tmptk[off], start, prime)
        # 30k + 13
        if tk13[pos]:
            prime = cpos + 13
            p.append(prime)
            lastadded = 13
            for off in offsets:
                tmp = d[(13, off)]
                start = (pos + prime) if off == 13 else (prime * (const * (pos + 1 if tmp < 13 else 0) + tmp) )//const
                del_mult(tmptk[off], start, prime)
        # 30k + 17
        if tk17[pos]:
            prime = cpos + 17
            p.append(prime)
            lastadded = 17
            for off in offsets:
                tmp = d[(17, off)]
                start = (pos + prime) if off == 17 else (prime * (const * (pos + 1 if tmp < 17 else 0) + tmp) )//const
                del_mult(tmptk[off], start, prime)
        # 30k + 19
        if tk19[pos]:
            prime = cpos + 19
            p.append(prime)
            lastadded = 19
            for off in offsets:
                tmp = d[(19, off)]
                start = (pos + prime) if off == 19 else (prime * (const * (pos + 1 if tmp < 19 else 0) + tmp) )//const
                del_mult(tmptk[off], start, prime)
        # 30k + 23
        if tk23[pos]:
            prime = cpos + 23
            p.append(prime)
            lastadded = 23
            for off in offsets:
                tmp = d[(23, off)]
                start = (pos + prime) if off == 23 else (prime * (const * (pos + 1 if tmp < 23 else 0) + tmp) )//const
                del_mult(tmptk[off], start, prime)
        # 30k + 29
        if tk29[pos]:
            prime = cpos + 29
            p.append(prime)
            lastadded = 29
            for off in offsets:
                tmp = d[(29, off)]
                start = (pos + prime) if off == 29 else (prime * (const * (pos + 1 if tmp < 29 else 0) + tmp) )//const
                del_mult(tmptk[off], start, prime)
        # now we go back to top tk1, so we need to increase pos by 1
        pos += 1
        cpos = const * pos
        # 30k + 1
        if tk1[pos]:
            prime = cpos + 1
            p.append(prime)
            lastadded = 1
            for off in offsets:
                tmp = d[(1, off)]
                start = (pos + prime) if off == 1 else (prime * (const * pos + tmp) )//const
                del_mult(tmptk[off], start, prime)
    # time to add remaining primes
    # if lastadded == 1, remove last element and start adding them from tk1
    # this way we don"t need an "if" within the last while
    if lastadded == 1:
        p.pop()
    # now complete for every other possible prime
    while pos < len(tk1):
        cpos = const * pos
        if tk1[pos]: p.append(cpos + 1)
        if tk7[pos]: p.append(cpos + 7)
        if tk11[pos]: p.append(cpos + 11)
        if tk13[pos]: p.append(cpos + 13)
        if tk17[pos]: p.append(cpos + 17)
        if tk19[pos]: p.append(cpos + 19)
        if tk23[pos]: p.append(cpos + 23)
        if tk29[pos]: p.append(cpos + 29)
        pos += 1
    # remove exceeding if present
    pos = len(p) - 1
    while p[pos] > N:
        pos -= 1
    if pos < len(p) - 1:
        del p[pos+1:]
    # return p list
    return p

def sieveOfEratosthenes(n):
    """sieveOfEratosthenes(n): return the list of the primes < n."""
    # Code from: <[email protected]>, Nov 30 2006
    # http://groups.google.com/group/comp.lang.python/msg/f1f10ced88c68c2d
    if n <= 2:
        return []
    sieve = range(3, n, 2)
    top = len(sieve)
    for si in sieve:
        if si:
            bottom = (si*si - 3) // 2
            if bottom >= top:
                break
            sieve[bottom::si] = [0] * -((bottom - top) // si)
    return [2] + [el for el in sieve if el]

def sieveOfAtkin(end):
    """sieveOfAtkin(end): return a list of all the prime numbers <end
    using the Sieve of Atkin."""
    # Code by Steve Krenzel, <[email protected]>, improved
    # Code: https://web.archive.org/web/20080324064651/http://krenzel.info/?p=83
    # Info: http://en.wikipedia.org/wiki/Sieve_of_Atkin
    assert end > 0
    lng = ((end-1) // 2)
    sieve = [False] * (lng + 1)

    x_max, x2, xd = int(sqrt((end-1)/4.0)), 0, 4
    for xd in xrange(4, 8*x_max + 2, 8):
        x2 += xd
        y_max = int(sqrt(end-x2))
        n, n_diff = x2 + y_max*y_max, (y_max << 1) - 1
        if not (n & 1):
            n -= n_diff
            n_diff -= 2
        for d in xrange((n_diff - 1) << 1, -1, -8):
            m = n % 12
            if m == 1 or m == 5:
                m = n >> 1
                sieve[m] = not sieve[m]
            n -= d

    x_max, x2, xd = int(sqrt((end-1) / 3.0)), 0, 3
    for xd in xrange(3, 6 * x_max + 2, 6):
        x2 += xd
        y_max = int(sqrt(end-x2))
        n, n_diff = x2 + y_max*y_max, (y_max << 1) - 1
        if not(n & 1):
            n -= n_diff
            n_diff -= 2
        for d in xrange((n_diff - 1) << 1, -1, -8):
            if n % 12 == 7:
                m = n >> 1
                sieve[m] = not sieve[m]
            n -= d

    x_max, y_min, x2, xd = int((2 + sqrt(4-8*(1-end)))/4), -1, 0, 3
    for x in xrange(1, x_max + 1):
        x2 += xd
        xd += 6
        if x2 >= end: y_min = (((int(ceil(sqrt(x2 - end))) - 1) << 1) - 2) << 1
        n, n_diff = ((x*x + x) << 1) - 1, (((x-1) << 1) - 2) << 1
        for d in xrange(n_diff, y_min, -8):
            if n % 12 == 11:
                m = n >> 1
                sieve[m] = not sieve[m]
            n += d

    primes = [2, 3]
    if end <= 3:
        return primes[:max(0,end-2)]

    for n in xrange(5 >> 1, (int(sqrt(end))+1) >> 1):
        if sieve[n]:
            primes.append((n << 1) + 1)
            aux = (n << 1) + 1
            aux *= aux
            for k in xrange(aux, end, 2 * aux):
                sieve[k >> 1] = False

    s  = int(sqrt(end)) + 1
    if s  % 2 == 0:
        s += 1
    primes.extend([i for i in xrange(s, end, 2) if sieve[i >> 1]])

    return primes

def ambi_sieve_plain(n):
    s = range(3, n, 2)
    for m in xrange(3, int(n**0.5)+1, 2): 
        if s[(m-3)/2]: 
            for t in xrange((m*m-3)/2,(n>>1)-1,m):
                s[t]=0
    return [2]+[t for t in s if t>0]

def sundaram3(max_n):
    # https://stackoverflow.com/questions/2068372/fastest-way-to-list-all-primes-below-n-in-python/2073279#2073279
    numbers = range(3, max_n+1, 2)
    half = (max_n)//2
    initial = 4

    for step in xrange(3, max_n+1, 2):
        for i in xrange(initial, half, step):
            numbers[i-1] = 0
        initial += 2*(step+1)

        if initial > half:
            return [2] + filter(None, numbers)

################################################################################
# Using Numpy:
def ambi_sieve(n):
    # http://tommih.blogspot.com/2009/04/fast-prime-number-generator.html
    s = np.arange(3, n, 2)
    for m in xrange(3, int(n ** 0.5)+1, 2): 
        if s[(m-3)/2]: 
            s[(m*m-3)/2::m]=0
    return np.r_[2, s[s>0]]

def primesfrom3to(n):
    # https://stackoverflow.com/questions/2068372/fastest-way-to-list-all-primes-below-n-in-python/3035188#3035188
    """ Returns a array of primes, p < n """
    assert n>=2
    sieve = np.ones(n/2, dtype=np.bool)
    for i in xrange(3,int(n**0.5)+1,2):
        if sieve[i/2]:
            sieve[i*i/2::i] = False
    return np.r_[2, 2*np.nonzero(sieve)[0][1::]+1]    

def primesfrom2to(n):
    # https://stackoverflow.com/questions/2068372/fastest-way-to-list-all-primes-below-n-in-python/3035188#3035188
    """ Input n>=6, Returns a array of primes, 2 <= p < n """
    sieve = np.ones(n/3 + (n%6==2), dtype=np.bool)
    sieve[0] = False
    for i in xrange(int(n**0.5)/3+1):
        if sieve[i]:
            k=3*i+1|1
            sieve[      ((k*k)/3)      ::2*k] = False
            sieve[(k*k+4*k-2*k*(i&1))/3::2*k] = False
    return np.r_[2,3,((3*np.nonzero(sieve)[0]+1)|1)]

if __name__=="__main__":
    import itertools
    import sys

    def test(f1,f2,num):
        print("Testing {f1} and {f2} return same results".format(
            f1=f1.func_name,
            f2=f2.func_name))
        if not all([a==b for a,b in itertools.izip_longest(f1(num),f2(num))]):
            sys.exit("Error: %s(%s) != %s(%s)"%(f1.func_name,num,f2.func_name,num))

    n=1000000
    test(sieveOfAtkin,sieveOfEratosthenes,n)
    test(sieveOfAtkin,ambi_sieve,n)
    test(sieveOfAtkin,ambi_sieve_plain,n) 
    test(sieveOfAtkin,sundaram3,n)
    test(sieveOfAtkin,sieve_wheel_30,n)
    test(sieveOfAtkin,primesfrom3to,n)
    test(sieveOfAtkin,primesfrom2to,n)
    test(sieveOfAtkin,rwh_primes,n)
    test(sieveOfAtkin,rwh_primes1,n)         
    test(sieveOfAtkin,rwh_primes2,n)

Running the script tests that all implementations give the same result.

Answer #7

You can just do upside-down floor division:

def ceildiv(a, b):
    return -(-a // b)

This works because Python"s division operator does floor division (unlike in C, where integer division truncates the fractional part).

This also works with Python"s big integers, because there"s no (lossy) floating-point conversion.

Here"s a demonstration:

>>> from __future__ import division   # a/b is float division
>>> from math import ceil
>>> b = 3
>>> for a in range(-7, 8):
...     print(["%d/%d" % (a, b), int(ceil(a / b)), -(-a // b)])
... 
["-7/3", -2, -2]
["-6/3", -2, -2]
["-5/3", -1, -1]
["-4/3", -1, -1]
["-3/3", -1, -1]
["-2/3", 0, 0]
["-1/3", 0, 0]
["0/3", 0, 0]
["1/3", 1, 1]
["2/3", 1, 1]
["3/3", 1, 1]
["4/3", 2, 2]
["5/3", 2, 2]
["6/3", 2, 2]
["7/3", 3, 3]

Answer #8

Matplotlib uses a dictionary from its colors.py module.

To print the names use:

# python2:

import matplotlib
for name, hex in matplotlib.colors.cnames.iteritems():
    print(name, hex)

# python3:

import matplotlib
for name, hex in matplotlib.colors.cnames.items():
    print(name, hex)

This is the complete dictionary:

cnames = {
"aliceblue":            "#F0F8FF",
"antiquewhite":         "#FAEBD7",
"aqua":                 "#00FFFF",
"aquamarine":           "#7FFFD4",
"azure":                "#F0FFFF",
"beige":                "#F5F5DC",
"bisque":               "#FFE4C4",
"black":                "#000000",
"blanchedalmond":       "#FFEBCD",
"blue":                 "#0000FF",
"blueviolet":           "#8A2BE2",
"brown":                "#A52A2A",
"burlywood":            "#DEB887",
"cadetblue":            "#5F9EA0",
"chartreuse":           "#7FFF00",
"chocolate":            "#D2691E",
"coral":                "#FF7F50",
"cornflowerblue":       "#6495ED",
"cornsilk":             "#FFF8DC",
"crimson":              "#DC143C",
"cyan":                 "#00FFFF",
"darkblue":             "#00008B",
"darkcyan":             "#008B8B",
"darkgoldenrod":        "#B8860B",
"darkgray":             "#A9A9A9",
"darkgreen":            "#006400",
"darkkhaki":            "#BDB76B",
"darkmagenta":          "#8B008B",
"darkolivegreen":       "#556B2F",
"darkorange":           "#FF8C00",
"darkorchid":           "#9932CC",
"darkred":              "#8B0000",
"darksalmon":           "#E9967A",
"darkseagreen":         "#8FBC8F",
"darkslateblue":        "#483D8B",
"darkslategray":        "#2F4F4F",
"darkturquoise":        "#00CED1",
"darkviolet":           "#9400D3",
"deeppink":             "#FF1493",
"deepskyblue":          "#00BFFF",
"dimgray":              "#696969",
"dodgerblue":           "#1E90FF",
"firebrick":            "#B22222",
"floralwhite":          "#FFFAF0",
"forestgreen":          "#228B22",
"fuchsia":              "#FF00FF",
"gainsboro":            "#DCDCDC",
"ghostwhite":           "#F8F8FF",
"gold":                 "#FFD700",
"goldenrod":            "#DAA520",
"gray":                 "#808080",
"green":                "#008000",
"greenyellow":          "#ADFF2F",
"honeydew":             "#F0FFF0",
"hotpink":              "#FF69B4",
"indianred":            "#CD5C5C",
"indigo":               "#4B0082",
"ivory":                "#FFFFF0",
"khaki":                "#F0E68C",
"lavender":             "#E6E6FA",
"lavenderblush":        "#FFF0F5",
"lawngreen":            "#7CFC00",
"lemonchiffon":         "#FFFACD",
"lightblue":            "#ADD8E6",
"lightcoral":           "#F08080",
"lightcyan":            "#E0FFFF",
"lightgoldenrodyellow": "#FAFAD2",
"lightgreen":           "#90EE90",
"lightgray":            "#D3D3D3",
"lightpink":            "#FFB6C1",
"lightsalmon":          "#FFA07A",
"lightseagreen":        "#20B2AA",
"lightskyblue":         "#87CEFA",
"lightslategray":       "#778899",
"lightsteelblue":       "#B0C4DE",
"lightyellow":          "#FFFFE0",
"lime":                 "#00FF00",
"limegreen":            "#32CD32",
"linen":                "#FAF0E6",
"magenta":              "#FF00FF",
"maroon":               "#800000",
"mediumaquamarine":     "#66CDAA",
"mediumblue":           "#0000CD",
"mediumorchid":         "#BA55D3",
"mediumpurple":         "#9370DB",
"mediumseagreen":       "#3CB371",
"mediumslateblue":      "#7B68EE",
"mediumspringgreen":    "#00FA9A",
"mediumturquoise":      "#48D1CC",
"mediumvioletred":      "#C71585",
"midnightblue":         "#191970",
"mintcream":            "#F5FFFA",
"mistyrose":            "#FFE4E1",
"moccasin":             "#FFE4B5",
"navajowhite":          "#FFDEAD",
"navy":                 "#000080",
"oldlace":              "#FDF5E6",
"olive":                "#808000",
"olivedrab":            "#6B8E23",
"orange":               "#FFA500",
"orangered":            "#FF4500",
"orchid":               "#DA70D6",
"palegoldenrod":        "#EEE8AA",
"palegreen":            "#98FB98",
"paleturquoise":        "#AFEEEE",
"palevioletred":        "#DB7093",
"papayawhip":           "#FFEFD5",
"peachpuff":            "#FFDAB9",
"peru":                 "#CD853F",
"pink":                 "#FFC0CB",
"plum":                 "#DDA0DD",
"powderblue":           "#B0E0E6",
"purple":               "#800080",
"red":                  "#FF0000",
"rosybrown":            "#BC8F8F",
"royalblue":            "#4169E1",
"saddlebrown":          "#8B4513",
"salmon":               "#FA8072",
"sandybrown":           "#FAA460",
"seagreen":             "#2E8B57",
"seashell":             "#FFF5EE",
"sienna":               "#A0522D",
"silver":               "#C0C0C0",
"skyblue":              "#87CEEB",
"slateblue":            "#6A5ACD",
"slategray":            "#708090",
"snow":                 "#FFFAFA",
"springgreen":          "#00FF7F",
"steelblue":            "#4682B4",
"tan":                  "#D2B48C",
"teal":                 "#008080",
"thistle":              "#D8BFD8",
"tomato":               "#FF6347",
"turquoise":            "#40E0D0",
"violet":               "#EE82EE",
"wheat":                "#F5DEB3",
"white":                "#FFFFFF",
"whitesmoke":           "#F5F5F5",
"yellow":               "#FFFF00",
"yellowgreen":          "#9ACD32"}

You could plot them like this:

import matplotlib.pyplot as plt
import matplotlib.patches as patches
import matplotlib.colors as colors
import math


fig = plt.figure()
ax = fig.add_subplot(111)

ratio = 1.0 / 3.0
count = math.ceil(math.sqrt(len(colors.cnames)))
x_count = count * ratio
y_count = count / ratio
x = 0
y = 0
w = 1 / x_count
h = 1 / y_count

for c in colors.cnames:
    pos = (x / x_count, y / y_count)
    ax.add_patch(patches.Rectangle(pos, w, h, color=c))
    ax.annotate(c, xy=pos)
    if y >= y_count-1:
        x += 1
        y = 0
    else:
        y += 1

plt.show()

Answer #9

I have an approach which I think is interesting and a bit different from the rest. The main difference in my approach, compared to some of the others, is in how the image segmentation step is performed--I used the DBSCAN clustering algorithm from Python"s scikit-learn; it"s optimized for finding somewhat amorphous shapes that may not necessarily have a single clear centroid.

At the top level, my approach is fairly simple and can be broken down into about 3 steps. First I apply a threshold (or actually, the logical "or" of two separate and distinct thresholds). As with many of the other answers, I assumed that the Christmas tree would be one of the brighter objects in the scene, so the first threshold is just a simple monochrome brightness test; any pixels with values above 220 on a 0-255 scale (where black is 0 and white is 255) are saved to a binary black-and-white image. The second threshold tries to look for red and yellow lights, which are particularly prominent in the trees in the upper left and lower right of the six images, and stand out well against the blue-green background which is prevalent in most of the photos. I convert the rgb image to hsv space, and require that the hue is either less than 0.2 on a 0.0-1.0 scale (corresponding roughly to the border between yellow and green) or greater than 0.95 (corresponding to the border between purple and red) and additionally I require bright, saturated colors: saturation and value must both be above 0.7. The results of the two threshold procedures are logically "or"-ed together, and the resulting matrix of black-and-white binary images is shown below:

Christmas trees, after thresholding on HSV as well as monochrome brightness

You can clearly see that each image has one large cluster of pixels roughly corresponding to the location of each tree, plus a few of the images also have some other small clusters corresponding either to lights in the windows of some of the buildings, or to a background scene on the horizon. The next step is to get the computer to recognize that these are separate clusters, and label each pixel correctly with a cluster membership ID number.

For this task I chose DBSCAN. There is a pretty good visual comparison of how DBSCAN typically behaves, relative to other clustering algorithms, available here. As I said earlier, it does well with amorphous shapes. The output of DBSCAN, with each cluster plotted in a different color, is shown here:

DBSCAN clustering output

There are a few things to be aware of when looking at this result. First is that DBSCAN requires the user to set a "proximity" parameter in order to regulate its behavior, which effectively controls how separated a pair of points must be in order for the algorithm to declare a new separate cluster rather than agglomerating a test point onto an already pre-existing cluster. I set this value to be 0.04 times the size along the diagonal of each image. Since the images vary in size from roughly VGA up to about HD 1080, this type of scale-relative definition is critical.

Another point worth noting is that the DBSCAN algorithm as it is implemented in scikit-learn has memory limits which are fairly challenging for some of the larger images in this sample. Therefore, for a few of the larger images, I actually had to "decimate" (i.e., retain only every 3rd or 4th pixel and drop the others) each cluster in order to stay within this limit. As a result of this culling process, the remaining individual sparse pixels are difficult to see on some of the larger images. Therefore, for display purposes only, the color-coded pixels in the above images have been effectively "dilated" just slightly so that they stand out better. It"s purely a cosmetic operation for the sake of the narrative; although there are comments mentioning this dilation in my code, rest assured that it has nothing to do with any calculations that actually matter.

Once the clusters are identified and labeled, the third and final step is easy: I simply take the largest cluster in each image (in this case, I chose to measure "size" in terms of the total number of member pixels, although one could have just as easily instead used some type of metric that gauges physical extent) and compute the convex hull for that cluster. The convex hull then becomes the tree border. The six convex hulls computed via this method are shown below in red:

Christmas trees with their calculated borders

The source code is written for Python 2.7.6 and it depends on numpy, scipy, matplotlib and scikit-learn. I"ve divided it into two parts. The first part is responsible for the actual image processing:

from PIL import Image
import numpy as np
import scipy as sp
import matplotlib.colors as colors
from sklearn.cluster import DBSCAN
from math import ceil, sqrt

"""
Inputs:

    rgbimg:         [M,N,3] numpy array containing (uint, 0-255) color image

    hueleftthr:     Scalar constant to select maximum allowed hue in the
                    yellow-green region

    huerightthr:    Scalar constant to select minimum allowed hue in the
                    blue-purple region

    satthr:         Scalar constant to select minimum allowed saturation

    valthr:         Scalar constant to select minimum allowed value

    monothr:        Scalar constant to select minimum allowed monochrome
                    brightness

    maxpoints:      Scalar constant maximum number of pixels to forward to
                    the DBSCAN clustering algorithm

    proxthresh:     Proximity threshold to use for DBSCAN, as a fraction of
                    the diagonal size of the image

Outputs:

    borderseg:      [K,2,2] Nested list containing K pairs of x- and y- pixel
                    values for drawing the tree border

    X:              [P,2] List of pixels that passed the threshold step

    labels:         [Q,2] List of cluster labels for points in Xslice (see
                    below)

    Xslice:         [Q,2] Reduced list of pixels to be passed to DBSCAN

"""

def findtree(rgbimg, hueleftthr=0.2, huerightthr=0.95, satthr=0.7, 
             valthr=0.7, monothr=220, maxpoints=5000, proxthresh=0.04):

    # Convert rgb image to monochrome for
    gryimg = np.asarray(Image.fromarray(rgbimg).convert("L"))
    # Convert rgb image (uint, 0-255) to hsv (float, 0.0-1.0)
    hsvimg = colors.rgb_to_hsv(rgbimg.astype(float)/255)

    # Initialize binary thresholded image
    binimg = np.zeros((rgbimg.shape[0], rgbimg.shape[1]))
    # Find pixels with hue<0.2 or hue>0.95 (red or yellow) and saturation/value
    # both greater than 0.7 (saturated and bright)--tends to coincide with
    # ornamental lights on trees in some of the images
    boolidx = np.logical_and(
                np.logical_and(
                  np.logical_or((hsvimg[:,:,0] < hueleftthr),
                                (hsvimg[:,:,0] > huerightthr)),
                                (hsvimg[:,:,1] > satthr)),
                                (hsvimg[:,:,2] > valthr))
    # Find pixels that meet hsv criterion
    binimg[np.where(boolidx)] = 255
    # Add pixels that meet grayscale brightness criterion
    binimg[np.where(gryimg > monothr)] = 255

    # Prepare thresholded points for DBSCAN clustering algorithm
    X = np.transpose(np.where(binimg == 255))
    Xslice = X
    nsample = len(Xslice)
    if nsample > maxpoints:
        # Make sure number of points does not exceed DBSCAN maximum capacity
        Xslice = X[range(0,nsample,int(ceil(float(nsample)/maxpoints)))]

    # Translate DBSCAN proximity threshold to units of pixels and run DBSCAN
    pixproxthr = proxthresh * sqrt(binimg.shape[0]**2 + binimg.shape[1]**2)
    db = DBSCAN(eps=pixproxthr, min_samples=10).fit(Xslice)
    labels = db.labels_.astype(int)

    # Find the largest cluster (i.e., with most points) and obtain convex hull   
    unique_labels = set(labels)
    maxclustpt = 0
    for k in unique_labels:
        class_members = [index[0] for index in np.argwhere(labels == k)]
        if len(class_members) > maxclustpt:
            points = Xslice[class_members]
            hull = sp.spatial.ConvexHull(points)
            maxclustpt = len(class_members)
            borderseg = [[points[simplex,0], points[simplex,1]] for simplex
                          in hull.simplices]

    return borderseg, X, labels, Xslice

and the second part is a user-level script which calls the first file and generates all of the plots above:

#!/usr/bin/env python

from PIL import Image
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.cm as cm
from findtree import findtree

# Image files to process
fname = ["nmzwj.png", "aVZhC.png", "2K9EF.png",
         "YowlH.png", "2y4o5.png", "FWhSP.png"]

# Initialize figures
fgsz = (16,7)        
figthresh = plt.figure(figsize=fgsz, facecolor="w")
figclust  = plt.figure(figsize=fgsz, facecolor="w")
figcltwo  = plt.figure(figsize=fgsz, facecolor="w")
figborder = plt.figure(figsize=fgsz, facecolor="w")
figthresh.canvas.set_window_title("Thresholded HSV and Monochrome Brightness")
figclust.canvas.set_window_title("DBSCAN Clusters (Raw Pixel Output)")
figcltwo.canvas.set_window_title("DBSCAN Clusters (Slightly Dilated for Display)")
figborder.canvas.set_window_title("Trees with Borders")

for ii, name in zip(range(len(fname)), fname):
    # Open the file and convert to rgb image
    rgbimg = np.asarray(Image.open(name))

    # Get the tree borders as well as a bunch of other intermediate values
    # that will be used to illustrate how the algorithm works
    borderseg, X, labels, Xslice = findtree(rgbimg)

    # Display thresholded images
    axthresh = figthresh.add_subplot(2,3,ii+1)
    axthresh.set_xticks([])
    axthresh.set_yticks([])
    binimg = np.zeros((rgbimg.shape[0], rgbimg.shape[1]))
    for v, h in X:
        binimg[v,h] = 255
    axthresh.imshow(binimg, interpolation="nearest", cmap="Greys")

    # Display color-coded clusters
    axclust = figclust.add_subplot(2,3,ii+1) # Raw version
    axclust.set_xticks([])
    axclust.set_yticks([])
    axcltwo = figcltwo.add_subplot(2,3,ii+1) # Dilated slightly for display only
    axcltwo.set_xticks([])
    axcltwo.set_yticks([])
    axcltwo.imshow(binimg, interpolation="nearest", cmap="Greys")
    clustimg = np.ones(rgbimg.shape)    
    unique_labels = set(labels)
    # Generate a unique color for each cluster 
    plcol = cm.rainbow_r(np.linspace(0, 1, len(unique_labels)))
    for lbl, pix in zip(labels, Xslice):
        for col, unqlbl in zip(plcol, unique_labels):
            if lbl == unqlbl:
                # Cluster label of -1 indicates no cluster membership;
                # override default color with black
                if lbl == -1:
                    col = [0.0, 0.0, 0.0, 1.0]
                # Raw version
                for ij in range(3):
                    clustimg[pix[0],pix[1],ij] = col[ij]
                # Dilated just for display
                axcltwo.plot(pix[1], pix[0], "o", markerfacecolor=col, 
                    markersize=1, markeredgecolor=col)
    axclust.imshow(clustimg)
    axcltwo.set_xlim(0, binimg.shape[1]-1)
    axcltwo.set_ylim(binimg.shape[0], -1)

    # Plot original images with read borders around the trees
    axborder = figborder.add_subplot(2,3,ii+1)
    axborder.set_axis_off()
    axborder.imshow(rgbimg, interpolation="nearest")
    for vseg, hseg in borderseg:
        axborder.plot(hseg, vseg, "r-", lw=3)
    axborder.set_xlim(0, binimg.shape[1]-1)
    axborder.set_ylim(binimg.shape[0], -1)

plt.show()

Answer #10

The following are rough guidelines and educated guesses based on experience. You should timeit or profile your concrete use case to get hard numbers, and those numbers may occasionally disagree with the below.

A list comprehension is usually a tiny bit faster than the precisely equivalent for loop (that actually builds a list), most likely because it doesn"t have to look up the list and its append method on every iteration. However, a list comprehension still does a bytecode-level loop:

>>> dis.dis(<the code object for `[x for x in range(10)]`>)
 1           0 BUILD_LIST               0
             3 LOAD_FAST                0 (.0)
       >>    6 FOR_ITER                12 (to 21)
             9 STORE_FAST               1 (x)
            12 LOAD_FAST                1 (x)
            15 LIST_APPEND              2
            18 JUMP_ABSOLUTE            6
       >>   21 RETURN_VALUE

Using a list comprehension in place of a loop that doesn"t build a list, nonsensically accumulating a list of meaningless values and then throwing the list away, is often slower because of the overhead of creating and extending the list. List comprehensions aren"t magic that is inherently faster than a good old loop.

As for functional list processing functions: While these are written in C and probably outperform equivalent functions written in Python, they are not necessarily the fastest option. Some speed up is expected if the function is written in C too. But most cases using a lambda (or other Python function), the overhead of repeatedly setting up Python stack frames etc. eats up any savings. Simply doing the same work in-line, without function calls (e.g. a list comprehension instead of map or filter) is often slightly faster.

Suppose that in a game that I"m developing I need to draw complex and huge maps using for loops. This question would be definitely relevant, for if a list-comprehension, for example, is indeed faster, it would be a much better option in order to avoid lags (Despite the visual complexity of the code).

Chances are, if code like this isn"t already fast enough when written in good non-"optimized" Python, no amount of Python level micro optimization is going to make it fast enough and you should start thinking about dropping to C. While extensive micro optimizations can often speed up Python code considerably, there is a low (in absolute terms) limit to this. Moreover, even before you hit that ceiling, it becomes simply more cost efficient (15% speedup vs. 300% speed up with the same effort) to bite the bullet and write some C.

Python floor () and ceil () function: StackOverflow Questions

Why do Python"s math.ceil() and math.floor() operations return floats instead of integers?

Can someone explain this (straight from the docs- emphasis mine):

math.ceil(x) Return the ceiling of x as a float, the smallest integer value greater than or equal to x.

math.floor(x) Return the floor of x as a float, the largest integer value less than or equal to x.

Why would .ceil and .floor return floats when they are by definition supposed to calculate integers?


EDIT:

Well this got some very good arguments as to why they should return floats, and I was just getting used to the idea, when @jcollado pointed out that they in fact do return ints in Python 3...

Answer #1

Cong Ma does a good job of explaining what __getitem__ is used for - but I want to give you an example which might be useful. Imagine a class which models a building. Within the data for the building it includes a number of attributes, including descriptions of the companies that occupy each floor :

Without using __getitem__ we would have a class like this :

class Building(object):
     def __init__(self, floors):
         self._floors = [None]*floors
     def occupy(self, floor_number, data):
          self._floors[floor_number] = data
     def get_floor_data(self, floor_number):
          return self._floors[floor_number]

building1 = Building(4) # Construct a building with 4 floors
building1.occupy(0, "Reception")
building1.occupy(1, "ABC Corp")
building1.occupy(2, "DEF Inc")
print( building1.get_floor_data(2) )

We could however use __getitem__ (and its counterpart __setitem__) to make the usage of the Building class "nicer".

class Building(object):
     def __init__(self, floors):
         self._floors = [None]*floors
     def __setitem__(self, floor_number, data):
          self._floors[floor_number] = data
     def __getitem__(self, floor_number):
          return self._floors[floor_number]

building1 = Building(4) # Construct a building with 4 floors
building1[0] = "Reception"
building1[1] = "ABC Corp"
building1[2] = "DEF Inc"
print( building1[2] )

Whether you use __setitem__ like this really depends on how you plan to abstract your data - in this case we have decided to treat a building as a container of floors (and you could also implement an iterator for the Building, and maybe even the ability to slice - i.e. get more than one floor"s data at a time - it depends on what you need.

Answer #2

What you have is a float literal without the trailing zero, which you then access the __truediv__ method of. It"s not an operator in itself; the first dot is part of the float value, and the second is the dot operator to access the objects properties and methods.

You can reach the same point by doing the following.

>>> f = 1.
>>> f
1.0
>>> f.__floordiv__
<method-wrapper "__floordiv__" of float object at 0x7f9fb4dc1a20>

Another example

>>> 1..__add__(2.)
3.0

Here we add 1.0 to 2.0, which obviously yields 3.0.

Answer #3

How can I force division to be floating point in Python?

I have two integer values a and b, but I need their ratio in floating point. I know that a < b and I want to calculate a/b, so if I use integer division I"ll always get 0 with a remainder of a.

How can I force c to be a floating point number in Python in the following?

c = a / b

What is really being asked here is:

"How do I force true division such that a / b will return a fraction?"

Upgrade to Python 3

In Python 3, to get true division, you simply do a / b.

>>> 1/2
0.5

Floor division, the classic division behavior for integers, is now a // b:

>>> 1//2
0
>>> 1//2.0
0.0

However, you may be stuck using Python 2, or you may be writing code that must work in both 2 and 3.

If Using Python 2

In Python 2, it"s not so simple. Some ways of dealing with classic Python 2 division are better and more robust than others.

Recommendation for Python 2

You can get Python 3 division behavior in any given module with the following import at the top:

from __future__ import division

which then applies Python 3 style division to the entire module. It also works in a python shell at any given point. In Python 2:

>>> from __future__ import division
>>> 1/2
0.5
>>> 1//2
0
>>> 1//2.0
0.0

This is really the best solution as it ensures the code in your module is more forward compatible with Python 3.

Other Options for Python 2

If you don"t want to apply this to the entire module, you"re limited to a few workarounds. The most popular is to coerce one of the operands to a float. One robust solution is a / (b * 1.0). In a fresh Python shell:

>>> 1/(2 * 1.0)
0.5

Also robust is truediv from the operator module operator.truediv(a, b), but this is likely slower because it"s a function call:

>>> from operator import truediv
>>> truediv(1, 2)
0.5

Not Recommended for Python 2

Commonly seen is a / float(b). This will raise a TypeError if b is a complex number. Since division with complex numbers is defined, it makes sense to me to not have division fail when passed a complex number for the divisor.

>>> 1 / float(2)
0.5
>>> 1 / float(2j)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: can"t convert complex to float

It doesn"t make much sense to me to purposefully make your code more brittle.

You can also run Python with the -Qnew flag, but this has the downside of executing all modules with the new Python 3 behavior, and some of your modules may expect classic division, so I don"t recommend this except for testing. But to demonstrate:

$ python -Qnew -c "print 1/2"
0.5
$ python -Qnew -c "print 1/2j"
-0.5j

Answer #4

This seems to be because multiplication of small numbers is optimized in CPython 3.5, in a way that left shifts by small numbers are not. Positive left shifts always create a larger integer object to store the result, as part of the calculation, while for multiplications of the sort you used in your test, a special optimization avoids this and creates an integer object of the correct size. This can be seen in the source code of Python"s integer implementation.

Because integers in Python are arbitrary-precision, they are stored as arrays of integer "digits", with a limit on the number of bits per integer digit. So in the general case, operations involving integers are not single operations, but instead need to handle the case of multiple "digits". In pyport.h, this bit limit is defined as 30 bits on 64-bit platform, or 15 bits otherwise. (I"ll just call this 30 from here on to keep the explanation simple. But note that if you were using Python compiled for 32-bit, your benchmark"s result would depend on if x were less than 32,768 or not.)

When an operation"s inputs and outputs stay within this 30-bit limit, the operation can be handled in an optimized way instead of the general way. The beginning of the integer multiplication implementation is as follows:

static PyObject *
long_mul(PyLongObject *a, PyLongObject *b)
{
    PyLongObject *z;

    CHECK_BINOP(a, b);

    /* fast path for single-digit multiplication */
    if (Py_ABS(Py_SIZE(a)) <= 1 && Py_ABS(Py_SIZE(b)) <= 1) {
        stwodigits v = (stwodigits)(MEDIUM_VALUE(a)) * MEDIUM_VALUE(b);
#ifdef HAVE_LONG_LONG
        return PyLong_FromLongLong((PY_LONG_LONG)v);
#else
        /* if we don"t have long long then we"re almost certainly
           using 15-bit digits, so v will fit in a long.  In the
           unlikely event that we"re using 30-bit digits on a platform
           without long long, a large v will just cause us to fall
           through to the general multiplication code below. */
        if (v >= LONG_MIN && v <= LONG_MAX)
            return PyLong_FromLong((long)v);
#endif
    }

So when multiplying two integers where each fits in a 30-bit digit, this is done as a direct multiplication by the CPython interpreter, instead of working with the integers as arrays. (MEDIUM_VALUE() called on a positive integer object simply gets its first 30-bit digit.) If the result fits in a single 30-bit digit, PyLong_FromLongLong() will notice this in a relatively small number of operations, and create a single-digit integer object to store it.

In contrast, left shifts are not optimized this way, and every left shift deals with the integer being shifted as an array. In particular, if you look at the source code for long_lshift(), in the case of a small but positive left shift, a 2-digit integer object is always created, if only to have its length truncated to 1 later: (my comments in /*** ***/)

static PyObject *
long_lshift(PyObject *v, PyObject *w)
{
    /*** ... ***/

    wordshift = shiftby / PyLong_SHIFT;   /*** zero for small w ***/
    remshift  = shiftby - wordshift * PyLong_SHIFT;   /*** w for small w ***/

    oldsize = Py_ABS(Py_SIZE(a));   /*** 1 for small v > 0 ***/
    newsize = oldsize + wordshift;
    if (remshift)
        ++newsize;   /*** here newsize becomes at least 2 for w > 0, v > 0 ***/
    z = _PyLong_New(newsize);

    /*** ... ***/
}

Integer division

You didn"t ask about the worse performance of integer floor division compared to right shifts, because that fit your (and my) expectations. But dividing a small positive number by another small positive number is not as optimized as small multiplications, either. Every // computes both the quotient and the remainder using the function long_divrem(). This remainder is computed for a small divisor with a multiplication, and is stored in a newly-allocated integer object, which in this situation is immediately discarded.

Answer #5

What does the “at” (@) symbol do in Python?

In short, it is used in decorator syntax and for matrix multiplication.

In the context of decorators, this syntax:

@decorator
def decorated_function():
    """this function is decorated"""

is equivalent to this:

def decorated_function():
    """this function is decorated"""

decorated_function = decorator(decorated_function)

In the context of matrix multiplication, a @ b invokes a.__matmul__(b) - making this syntax:

a @ b

equivalent to

dot(a, b)

and

a @= b

equivalent to

a = dot(a, b)

where dot is, for example, the numpy matrix multiplication function and a and b are matrices.

How could you discover this on your own?

I also do not know what to search for as searching Python docs or Google does not return relevant results when the @ symbol is included.

If you want to have a rather complete view of what a particular piece of python syntax does, look directly at the grammar file. For the Python 3 branch:

~$ grep -C 1 "@" cpython/Grammar/Grammar 

decorator: "@" dotted_name [ "(" [arglist] ")" ] NEWLINE
decorators: decorator+
--
testlist_star_expr: (test|star_expr) ("," (test|star_expr))* [","]
augassign: ("+=" | "-=" | "*=" | "@=" | "/=" | "%=" | "&=" | "|=" | "^=" |
            "<<=" | ">>=" | "**=" | "//=")
--
arith_expr: term (("+"|"-") term)*
term: factor (("*"|"@"|"/"|"%"|"//") factor)*
factor: ("+"|"-"|"~") factor | power

We can see here that @ is used in three contexts:

  • decorators
  • an operator between factors
  • an augmented assignment operator

Decorator Syntax:

A google search for "decorator python docs" gives as one of the top results, the "Compound Statements" section of the "Python Language Reference." Scrolling down to the section on function definitions, which we can find by searching for the word, "decorator", we see that... there"s a lot to read. But the word, "decorator" is a link to the glossary, which tells us:

decorator

A function returning another function, usually applied as a function transformation using the @wrapper syntax. Common examples for decorators are classmethod() and staticmethod().

The decorator syntax is merely syntactic sugar, the following two function definitions are semantically equivalent:

def f(...):
    ...
f = staticmethod(f)

@staticmethod
def f(...):
    ...

The same concept exists for classes, but is less commonly used there. See the documentation for function definitions and class definitions for more about decorators.

So, we see that

@foo
def bar():
    pass

is semantically the same as:

def bar():
    pass

bar = foo(bar)

They are not exactly the same because Python evaluates the foo expression (which could be a dotted lookup and a function call) before bar with the decorator (@) syntax, but evaluates the foo expression after bar in the other case.

(If this difference makes a difference in the meaning of your code, you should reconsider what you"re doing with your life, because that would be pathological.)

Stacked Decorators

If we go back to the function definition syntax documentation, we see:

@f1(arg)
@f2
def func(): pass

is roughly equivalent to

def func(): pass
func = f1(arg)(f2(func))

This is a demonstration that we can call a function that"s a decorator first, as well as stack decorators. Functions, in Python, are first class objects - which means you can pass a function as an argument to another function, and return functions. Decorators do both of these things.

If we stack decorators, the function, as defined, gets passed first to the decorator immediately above it, then the next, and so on.

That about sums up the usage for @ in the context of decorators.

The Operator, @

In the lexical analysis section of the language reference, we have a section on operators, which includes @, which makes it also an operator:

The following tokens are operators:

+       -       *       **      /       //      %      @
<<      >>      &       |       ^       ~
<       >       <=      >=      ==      !=

and in the next page, the Data Model, we have the section Emulating Numeric Types,

object.__add__(self, other)
object.__sub__(self, other) 
object.__mul__(self, other) 
object.__matmul__(self, other) 
object.__truediv__(self, other) 
object.__floordiv__(self, other)

[...] These methods are called to implement the binary arithmetic operations (+, -, *, @, /, //, [...]

And we see that __matmul__ corresponds to @. If we search the documentation for "matmul" we get a link to What"s new in Python 3.5 with "matmul" under a heading "PEP 465 - A dedicated infix operator for matrix multiplication".

it can be implemented by defining __matmul__(), __rmatmul__(), and __imatmul__() for regular, reflected, and in-place matrix multiplication.

(So now we learn that @= is the in-place version). It further explains:

Matrix multiplication is a notably common operation in many fields of mathematics, science, engineering, and the addition of @ allows writing cleaner code:

S = (H @ beta - r).T @ inv(H @ V @ H.T) @ (H @ beta - r)

instead of:

S = dot((dot(H, beta) - r).T,
        dot(inv(dot(dot(H, V), H.T)), dot(H, beta) - r))

While this operator can be overloaded to do almost anything, in numpy, for example, we would use this syntax to calculate the inner and outer product of arrays and matrices:

>>> from numpy import array, matrix
>>> array([[1,2,3]]).T @ array([[1,2,3]])
array([[1, 2, 3],
       [2, 4, 6],
       [3, 6, 9]])
>>> array([[1,2,3]]) @ array([[1,2,3]]).T
array([[14]])
>>> matrix([1,2,3]).T @ matrix([1,2,3])
matrix([[1, 2, 3],
        [2, 4, 6],
        [3, 6, 9]])
>>> matrix([1,2,3]) @ matrix([1,2,3]).T
matrix([[14]])

Inplace matrix multiplication: @=

While researching the prior usage, we learn that there is also the inplace matrix multiplication. If we attempt to use it, we may find it is not yet implemented for numpy:

>>> m = matrix([1,2,3])
>>> m @= m.T
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: In-place matrix multiplication is not (yet) supported. Use "a = a @ b" instead of "a @= b".

When it is implemented, I would expect the result to look like this:

>>> m = matrix([1,2,3])
>>> m @= m.T
>>> m
matrix([[14]])

Answer #6

The golden spiral method

You said you couldn’t get the golden spiral method to work and that’s a shame because it’s really, really good. I would like to give you a complete understanding of it so that maybe you can understand how to keep this away from being “bunched up.”

So here’s a fast, non-random way to create a lattice that is approximately correct; as discussed above, no lattice will be perfect, but this may be good enough. It is compared to other methods e.g. at BendWavy.org but it just has a nice and pretty look as well as a guarantee about even spacing in the limit.

Primer: sunflower spirals on the unit disk

To understand this algorithm, I first invite you to look at the 2D sunflower spiral algorithm. This is based on the fact that the most irrational number is the golden ratio (1 + sqrt(5))/2 and if one emits points by the approach “stand at the center, turn a golden ratio of whole turns, then emit another point in that direction,” one naturally constructs a spiral which, as you get to higher and higher numbers of points, nevertheless refuses to have well-defined ‘bars’ that the points line up on.(Note 1.)

The algorithm for even spacing on a disk is,

from numpy import pi, cos, sin, sqrt, arange
import matplotlib.pyplot as pp

num_pts = 100
indices = arange(0, num_pts, dtype=float) + 0.5

r = sqrt(indices/num_pts)
theta = pi * (1 + 5**0.5) * indices

pp.scatter(r*cos(theta), r*sin(theta))
pp.show()

and it produces results that look like (n=100 and n=1000):

enter image description here

Spacing the points radially

The key strange thing is the formula r = sqrt(indices / num_pts); how did I come to that one? (Note 2.)

Well, I am using the square root here because I want these to have even-area spacing around the disk. That is the same as saying that in the limit of large N I want a little region R ∈ (r, r + dr), Θ ∈ (θ, θ + dθ) to contain a number of points proportional to its area, which is r dr dθ. Now if we pretend that we are talking about a random variable here, this has a straightforward interpretation as saying that the joint probability density for (R, Θ) is just c r for some constant c. Normalization on the unit disk would then force c = 1/π.

Now let me introduce a trick. It comes from probability theory where it’s known as sampling the inverse CDF: suppose you wanted to generate a random variable with a probability density f(z) and you have a random variable U ~ Uniform(0, 1), just like comes out of random() in most programming languages. How do you do this?

  1. First, turn your density into a cumulative distribution function or CDF, which we will call F(z). A CDF, remember, increases monotonically from 0 to 1 with derivative f(z).
  2. Then calculate the CDF’s inverse function F-1(z).
  3. You will find that Z = F-1(U) is distributed according to the target density. (Note 3).

Now the golden-ratio spiral trick spaces the points out in a nicely even pattern for θ so let’s integrate that out; for the unit disk we are left with F(r) = r2. So the inverse function is F-1(u) = u1/2, and therefore we would generate random points on the disk in polar coordinates with r = sqrt(random()); theta = 2 * pi * random().

Now instead of randomly sampling this inverse function we’re uniformly sampling it, and the nice thing about uniform sampling is that our results about how points are spread out in the limit of large N will behave as if we had randomly sampled it. This combination is the trick. Instead of random() we use (arange(0, num_pts, dtype=float) + 0.5)/num_pts, so that, say, if we want to sample 10 points they are r = 0.05, 0.15, 0.25, ... 0.95. We uniformly sample r to get equal-area spacing, and we use the sunflower increment to avoid awful “bars” of points in the output.

Now doing the sunflower on a sphere

The changes that we need to make to dot the sphere with points merely involve switching out the polar coordinates for spherical coordinates. The radial coordinate of course doesn"t enter into this because we"re on a unit sphere. To keep things a little more consistent here, even though I was trained as a physicist I"ll use mathematicians" coordinates where 0 ≤ φ ≤ π is latitude coming down from the pole and 0 ≤ θ ≤ 2π is longitude. So the difference from above is that we are basically replacing the variable r with φ.

Our area element, which was r dr dθ, now becomes the not-much-more-complicated sin(φ) dφ dθ. So our joint density for uniform spacing is sin(φ)/4π. Integrating out θ, we find f(φ) = sin(φ)/2, thus F(φ) = (1 − cos(φ))/2. Inverting this we can see that a uniform random variable would look like acos(1 - 2 u), but we sample uniformly instead of randomly, so we instead use φk = acos(1 − 2 (k + 0.5)/N). And the rest of the algorithm is just projecting this onto the x, y, and z coordinates:

from numpy import pi, cos, sin, arccos, arange
import mpl_toolkits.mplot3d
import matplotlib.pyplot as pp

num_pts = 1000
indices = arange(0, num_pts, dtype=float) + 0.5

phi = arccos(1 - 2*indices/num_pts)
theta = pi * (1 + 5**0.5) * indices

x, y, z = cos(theta) * sin(phi), sin(theta) * sin(phi), cos(phi);

pp.figure().add_subplot(111, projection="3d").scatter(x, y, z);
pp.show()

Again for n=100 and n=1000 the results look like: enter image description here enter image description here

Further research

I wanted to give a shout out to Martin Roberts’s blog. Note that above I created an offset of my indices by adding 0.5 to each index. This was just visually appealing to me, but it turns out that the choice of offset matters a lot and is not constant over the interval and can mean getting as much as 8% better accuracy in packing if chosen correctly. There should also be a way to get his R2 sequence to cover a sphere and it would be interesting to see if this also produced a nice even covering, perhaps as-is but perhaps needing to be, say, taken from only a half of the unit square cut diagonally or so and stretched around to get a circle.

Notes

  1. Those “bars” are formed by rational approximations to a number, and the best rational approximations to a number come from its continued fraction expression, z + 1/(n_1 + 1/(n_2 + 1/(n_3 + ...))) where z is an integer and n_1, n_2, n_3, ... is either a finite or infinite sequence of positive integers:

    def continued_fraction(r):
        while r != 0:
            n = floor(r)
            yield n
            r = 1/(r - n)
    

    Since the fraction part 1/(...) is always between zero and one, a large integer in the continued fraction allows for a particularly good rational approximation: “one divided by something between 100 and 101” is better than “one divided by something between 1 and 2.” The most irrational number is therefore the one which is 1 + 1/(1 + 1/(1 + ...)) and has no particularly good rational approximations; one can solve φ = 1 + 1/φ by multiplying through by φ to get the formula for the golden ratio.

  2. For folks who are not so familiar with NumPy -- all of the functions are “vectorized,” so that sqrt(array) is the same as what other languages might write map(sqrt, array). So this is a component-by-component sqrt application. The same also holds for division by a scalar or addition with scalars -- those apply to all components in parallel.

  3. The proof is simple once you know that this is the result. If you ask what"s the probability that z < Z < z + dz, this is the same as asking what"s the probability that z < F-1(U) < z + dz, apply F to all three expressions noting that it is a monotonically increasing function, hence F(z) < U < F(z + dz), expand the right hand side out to find F(z) + f(z) dz, and since U is uniform this probability is just f(z) dz as promised.

Answer #7

Let"s visualize (you gonna remember always), enter image description here

In Pandas:

  1. axis=0 means along "indexes". It"s a row-wise operation.

Suppose, to perform concat() operation on dataframe1 & dataframe2, we will take dataframe1 & take out 1st row from dataframe1 and place into the new DF, then we take out another row from dataframe1 and put into new DF, we repeat this process until we reach to the bottom of dataframe1. Then, we do the same process for dataframe2.

Basically, stacking dataframe2 on top of dataframe1 or vice a versa.

E.g making a pile of books on a table or floor

  1. axis=1 means along "columns". It"s a column-wise operation.

Suppose, to perform concat() operation on dataframe1 & dataframe2, we will take out the 1st complete column(a.k.a 1st series) of dataframe1 and place into new DF, then we take out the second column of dataframe1 and keep adjacent to it (sideways), we have to repeat this operation until all columns are finished. Then, we repeat the same process on dataframe2. Basically, stacking dataframe2 sideways.

E.g arranging books on a bookshelf.

More to it, since arrays are better representations to represent a nested n-dimensional structure compared to matrices! so below can help you more to visualize how axis plays an important role when you generalize to more than one dimension. Also, you can actually print/write/draw/visualize any n-dim array but, writing or visualizing the same in a matrix representation(3-dim) is impossible on a paper more than 3-dimensions.

enter image description here

Answer #8

You can make the observation that for a string to be considered repeating, its length must be divisible by the length of its repeated sequence. Given that, here is a solution that generates divisors of the length from 1 to n / 2 inclusive, divides the original string into substrings with the length of the divisors, and tests the equality of the result set:

from math import sqrt, floor

def divquot(n):
    if n > 1:
        yield 1, n
    swapped = []
    for d in range(2, int(floor(sqrt(n))) + 1):
        q, r = divmod(n, d)
        if r == 0:
            yield d, q
            swapped.append((q, d))
    while swapped:
        yield swapped.pop()

def repeats(s):
    n = len(s)
    for d, q in divquot(n):
        sl = s[0:d]
        if sl * q == s:
            return sl
    return None

EDIT: In Python 3, the / operator has changed to do float division by default. To get the int division from Python 2, you can use the // operator instead. Thank you to @TigerhawkT3 for bringing this to my attention.

The // operator performs integer division in both Python 2 and Python 3, so I"ve updated the answer to support both versions. The part where we test to see if all the substrings are equal is now a short-circuiting operation using all and a generator expression.

UPDATE: In response to a change in the original question, the code has now been updated to return the smallest repeating substring if it exists and None if it does not. @godlygeek has suggested using divmod to reduce the number of iterations on the divisors generator, and the code has been updated to match that as well. It now returns all positive divisors of n in ascending order, exclusive of n itself.

Further update for high performance: After multiple tests, I"ve come to the conclusion that simply testing for string equality has the best performance out of any slicing or iterator solution in Python. Thus, I"ve taken a leaf out of @TigerhawkT3 "s book and updated my solution. It"s now over 6x as fast as before, noticably faster than Tigerhawk"s solution but slower than David"s.

Answer #9

float ‚Üí float math.log2(x)

import math

log2 = math.log(x, 2.0)
log2 = math.log2(x)   # python 3.3 or later

float ‚Üí int math.frexp(x)

If all you need is the integer part of log base 2 of a floating point number, extracting the exponent is pretty efficient:

log2int_slow = int(math.floor(math.log(x, 2.0)))    # these give the
log2int_fast = math.frexp(x)[1] - 1                 # same result
  • Python frexp() calls the C function frexp() which just grabs and tweaks the exponent.

  • Python frexp() returns a tuple (mantissa, exponent). So [1] gets the exponent part.

  • For integral powers of 2 the exponent is one more than you might expect. For example 32 is stored as 0.5x2‚Å∂. This explains the - 1 above. Also works for 1/32 which is stored as 0.5x2‚Ū‚Å¥.

  • Floors toward negative infinity, so log‚ÇÇ31 computed this way is 4 not 5. log‚ÇÇ(1/17) is -5 not -4.


int ‚Üí int x.bit_length()

If both input and output are integers, this native integer method could be very efficient:

log2int_faster = x.bit_length() - 1
  • - 1 because 2‚Åø requires n+1 bits. Works for very large integers, e.g. 2**10000.

  • Floors toward negative infinity, so log‚ÇÇ31 computed this way is 4 not 5.

Answer #10

Most answers suggested round or format. round sometimes rounds up, and in my case I needed the value of my variable to be rounded down and not just displayed as such.

round(2.357, 2)  # -> 2.36

I found the answer here: How do I round a floating point number up to a certain decimal place?

import math
v = 2.357
print(math.ceil(v*100)/100)  # -> 2.36
print(math.floor(v*100)/100)  # -> 2.35

or:

from math import floor, ceil

def roundDown(n, d=8):
    d = int("1" + ("0" * d))
    return floor(n * d) / d

def roundUp(n, d=8):
    d = int("1" + ("0" * d))
    return ceil(n * d) / d

Tutorials