Python math function | SQRT ()

Python Methods and Functions | sqrt

Function sqrt () — is a built-in function in the Python programming language that returns the square root of any number.

  Syntax:   math.sqrt (x)   Parameter:  x is any number such that x & gt; = 0  Returns:  It returns the square root of the number passed in the parameter. 


 0.0 2.0 1.8708286933869707 

Error: when x & lt; 0 , it fails due to a runtime error.

# Python3 demo program
# sqrt () method

# import math module

import math 

# print the square root of 0

print (math.sqrt ( 0 )) 

# print the square root of 4

print (math. sqrt ( 4 )) 

# print the square root of 3.5

print (math. sqrt ( 3.5 )) 

# Python3 program to show the error in
# sqrt () method

# import math module

import math 

# throw an error when x & lt; 0

print (math.sqrt ( - 1 )) 


 Traceback (most recent call last): File "/home/", line 8, in print (math.sqrt (-1)) ValueError: math domain error 

Practical application: given a number, check if it is simple or not. 
Approach: run a loop from 2 to sqrt (n) and check if any number in the range (2-sqrt (n)) n divides.

# Python program for practical use of the sqrt () function

# import math module

import math

# function to check if simple or not

def check (n):

if n = = 1 :

return False


# 1 to sqrt (n)

for x in range ( 2 , ( int ) (math.sqrt (n)) + 1 ):

if n % x = = 0 :

return False  

return True

# driver code

n = 23

if check (n):

print ( "prime"

else :

print ( " not prime " )

Exit :


Python math function | SQRT (): StackOverflow Questions

Which is faster in Python: x**.5 or math.sqrt(x)?

I"ve been wondering this for some time. As the title say, which is faster, the actual function or simply raising to the half power?


This is not a matter of premature optimization. This is simply a question of how the underlying code actually works. What is the theory of how Python code works?

I sent Guido van Rossum an email cause I really wanted to know the differences in these methods.

My email:

There are at least 3 ways to do a square root in Python: math.sqrt, the "**" operator and pow(x,.5). I"m just curious as to the differences in the implementation of each of these. When it comes to efficiency which is better?

His response:

pow and ** are equivalent; math.sqrt doesn"t work for complex numbers, and links to the C sqrt() function. As to which one is faster, I have no idea...

Answer #1

I"ve tested all suggested methods plus np.array(map(f, x)) with perfplot (a small project of mine).

Message #1: If you can use numpy"s native functions, do that.

If the function you"re trying to vectorize already is vectorized (like the x**2 example in the original post), using that is much faster than anything else (note the log scale):

enter image description here

If you actually need vectorization, it doesn"t really matter much which variant you use.

enter image description here

Code to reproduce the plots:

import numpy as np
import perfplot
import math

def f(x):
    # return math.sqrt(x)
    return np.sqrt(x)

vf = np.vectorize(f)

def array_for(x):
    return np.array([f(xi) for xi in x])

def array_map(x):
    return np.array(list(map(f, x)))

def fromiter(x):
    return np.fromiter((f(xi) for xi in x), x.dtype)

def vectorize(x):
    return np.vectorize(f)(x)

def vectorize_without_init(x):
    return vf(x)
    n_range=[2 ** k for k in range(20)],
    kernels=[f, array_for, array_map, fromiter,
             vectorize, vectorize_without_init],

Answer #2

This is kind of overkill but let"s give it a go. First lets use statsmodel to find out what the p-values should be

import pandas as pd
import numpy as np
from sklearn import datasets, linear_model
from sklearn.linear_model import LinearRegression
import statsmodels.api as sm
from scipy import stats

diabetes = datasets.load_diabetes()
X =
y =

X2 = sm.add_constant(X)
est = sm.OLS(y, X2)
est2 =

and we get

                         OLS Regression Results                            
Dep. Variable:                      y   R-squared:                       0.518
Model:                            OLS   Adj. R-squared:                  0.507
Method:                 Least Squares   F-statistic:                     46.27
Date:                Wed, 08 Mar 2017   Prob (F-statistic):           3.83e-62
Time:                        10:08:24   Log-Likelihood:                -2386.0
No. Observations:                 442   AIC:                             4794.
Df Residuals:                     431   BIC:                             4839.
Df Model:                          10                                         
Covariance Type:            nonrobust                                         
                 coef    std err          t      P>|t|      [0.025      0.975]
const        152.1335      2.576     59.061      0.000     147.071     157.196
x1           -10.0122     59.749     -0.168      0.867    -127.448     107.424
x2          -239.8191     61.222     -3.917      0.000    -360.151    -119.488
x3           519.8398     66.534      7.813      0.000     389.069     650.610
x4           324.3904     65.422      4.958      0.000     195.805     452.976
x5          -792.1842    416.684     -1.901      0.058   -1611.169      26.801
x6           476.7458    339.035      1.406      0.160    -189.621    1143.113
x7           101.0446    212.533      0.475      0.635    -316.685     518.774
x8           177.0642    161.476      1.097      0.273    -140.313     494.442
x9           751.2793    171.902      4.370      0.000     413.409    1089.150
x10           67.6254     65.984      1.025      0.306     -62.065     197.316
Omnibus:                        1.506   Durbin-Watson:                   2.029
Prob(Omnibus):                  0.471   Jarque-Bera (JB):                1.404
Skew:                           0.017   Prob(JB):                        0.496
Kurtosis:                       2.726   Cond. No.                         227.

Ok, let"s reproduce this. It is kind of overkill as we are almost reproducing a linear regression analysis using Matrix Algebra. But what the heck.

lm = LinearRegression(),y)
params = np.append(lm.intercept_,lm.coef_)
predictions = lm.predict(X)

newX = pd.DataFrame({"Constant":np.ones(len(X))}).join(pd.DataFrame(X))
MSE = (sum((y-predictions)**2))/(len(newX)-len(newX.columns))

# Note if you don"t want to use a DataFrame replace the two lines above with
# newX = np.append(np.ones((len(X),1)), X, axis=1)
# MSE = (sum((y-predictions)**2))/(len(newX)-len(newX[0]))

var_b = MSE*(np.linalg.inv(,newX)).diagonal())
sd_b = np.sqrt(var_b)
ts_b = params/ sd_b

p_values =[2*(1-stats.t.cdf(np.abs(i),(len(newX)-len(newX[0])))) for i in ts_b]

sd_b = np.round(sd_b,3)
ts_b = np.round(ts_b,3)
p_values = np.round(p_values,3)
params = np.round(params,4)

myDF3 = pd.DataFrame()
myDF3["Coefficients"],myDF3["Standard Errors"],myDF3["t values"],myDF3["Probabilities"] = [params,sd_b,ts_b,p_values]

And this gives us.

    Coefficients  Standard Errors  t values  Probabilities
0       152.1335            2.576    59.061         0.000
1       -10.0122           59.749    -0.168         0.867
2      -239.8191           61.222    -3.917         0.000
3       519.8398           66.534     7.813         0.000
4       324.3904           65.422     4.958         0.000
5      -792.1842          416.684    -1.901         0.058
6       476.7458          339.035     1.406         0.160
7       101.0446          212.533     0.475         0.635
8       177.0642          161.476     1.097         0.273
9       751.2793          171.902     4.370         0.000
10       67.6254           65.984     1.025         0.306

So we can reproduce the values from statsmodel.

Answer #3

What is RMSE? Also known as MSE, RMD, or RMS. What problem does it solve?

If you understand RMSE: (Root mean squared error), MSE: (Mean Squared Error) RMD (Root mean squared deviation) and RMS: (Root Mean Squared), then asking for a library to calculate this for you is unnecessary over-engineering. All these metrics are a single line of python code at most 2 inches long. The three metrics rmse, mse, rmd, and rms are at their core conceptually identical.

RMSE answers the question: "How similar, on average, are the numbers in list1 to list2?". The two lists must be the same size. I want to "wash out the noise between any two given elements, wash out the size of the data collected, and get a single number feel for change over time".

Intuition and ELI5 for RMSE:

Imagine you are learning to throw darts at a dart board. Every day you practice for one hour. You want to figure out if you are getting better or getting worse. So every day you make 10 throws and measure the distance between the bullseye and where your dart hit.

You make a list of those numbers list1. Use the root mean squared error between the distances at day 1 and a list2 containing all zeros. Do the same on the 2nd and nth days. What you will get is a single number that hopefully decreases over time. When your RMSE number is zero, you hit bullseyes every time. If the rmse number goes up, you are getting worse.

Example in calculating root mean squared error in python:

import numpy as np
d = [0.000, 0.166, 0.333]   #ideal target distances, these can be all zeros.
p = [0.000, 0.254, 0.998]   #your performance goes here

print("d is: " + str(["%.8f" % elem for elem in d]))
print("p is: " + str(["%.8f" % elem for elem in p]))

def rmse(predictions, targets):
    return np.sqrt(((predictions - targets) ** 2).mean())

rmse_val = rmse(np.array(d), np.array(p))
print("rms error is: " + str(rmse_val))

Which prints:

d is: ["0.00000000", "0.16600000", "0.33300000"]
p is: ["0.00000000", "0.25400000", "0.99800000"]
rms error between lists d and p is: 0.387284994115

The mathematical notation:

root mean squared deviation explained

Glyph Legend: n is a whole positive integer representing the number of throws. i represents a whole positive integer counter that enumerates sum. d stands for the ideal distances, the list2 containing all zeros in above example. p stands for performance, the list1 in the above example. superscript 2 stands for numeric squared. di is the i"th index of d. pi is the i"th index of p.

The rmse done in small steps so it can be understood:

def rmse(predictions, targets):

    differences = predictions - targets                       #the DIFFERENCEs.

    differences_squared = differences ** 2                    #the SQUAREs of ^

    mean_of_differences_squared = differences_squared.mean()  #the MEAN of ^

    rmse_val = np.sqrt(mean_of_differences_squared)           #ROOT of ^

    return rmse_val                                           #get the ^

How does every step of RMSE work:

Subtracting one number from another gives you the distance between them.

8 - 5 = 3         #absolute distance between 8 and 5 is +3
-20 - 10 = -30    #absolute distance between -20 and 10 is +30

If you multiply any number times itself, the result is always positive because negative times negative is positive:

3*3     = 9   = positive
-30*-30 = 900 = positive

Add them all up, but wait, then an array with many elements would have a larger error than a small array, so average them by the number of elements.

But wait, we squared them all earlier to force them positive. Undo the damage with a square root!

That leaves you with a single number that represents, on average, the distance between every value of list1 to it"s corresponding element value of list2.

If the RMSE value goes down over time we are happy because variance is decreasing.

RMSE isn"t the most accurate line fitting strategy, total least squares is:

Root mean squared error measures the vertical distance between the point and the line, so if your data is shaped like a banana, flat near the bottom and steep near the top, then the RMSE will report greater distances to points high, but short distances to points low when in fact the distances are equivalent. This causes a skew where the line prefers to be closer to points high than low.

If this is a problem the total least squares method fixes this:

Gotchas that can break this RMSE function:

If there are nulls or infinity in either input list, then output rmse value is is going to not make sense. There are three strategies to deal with nulls / missing values / infinities in either list: Ignore that component, zero it out or add a best guess or a uniform random noise to all timesteps. Each remedy has its pros and cons depending on what your data means. In general ignoring any component with a missing value is preferred, but this biases the RMSE toward zero making you think performance has improved when it really hasn"t. Adding random noise on a best guess could be preferred if there are lots of missing values.

In order to guarantee relative correctness of the RMSE output, you must eliminate all nulls/infinites from the input.

RMSE has zero tolerance for outlier data points which don"t belong

Root mean squared error squares relies on all data being right and all are counted as equal. That means one stray point that"s way out in left field is going to totally ruin the whole calculation. To handle outlier data points and dismiss their tremendous influence after a certain threshold, see Robust estimators that build in a threshold for dismissal of outliers.

Answer #4

For anyone interested in computing multiple distances at once, I"ve done a little comparison using perfplot (a small project of mine).

The first advice is to organize your data such that the arrays have dimension (3, n) (and are C-contiguous obviously). If adding happens in the contiguous first dimension, things are faster, and it doesn"t matter too much if you use sqrt-sum with axis=0, linalg.norm with axis=0, or

a_min_b = a - b
numpy.sqrt(numpy.einsum("ij,ij->j", a_min_b, a_min_b))

which is, by a slight margin, the fastest variant. (That actually holds true for just one row as well.)

The variants where you sum up over the second axis, axis=1, are all substantially slower.

enter image description here

Code to reproduce the plot:

import numpy
import perfplot
from scipy.spatial import distance

def linalg_norm(data):
    a, b = data[0]
    return numpy.linalg.norm(a - b, axis=1)

def linalg_norm_T(data):
    a, b = data[1]
    return numpy.linalg.norm(a - b, axis=0)

def sqrt_sum(data):
    a, b = data[0]
    return numpy.sqrt(numpy.sum((a - b) ** 2, axis=1))

def sqrt_sum_T(data):
    a, b = data[1]
    return numpy.sqrt(numpy.sum((a - b) ** 2, axis=0))

def scipy_distance(data):
    a, b = data[0]
    return list(map(distance.euclidean, a, b))

def sqrt_einsum(data):
    a, b = data[0]
    a_min_b = a - b
    return numpy.sqrt(numpy.einsum("ij,ij->i", a_min_b, a_min_b))

def sqrt_einsum_T(data):
    a, b = data[1]
    a_min_b = a - b
    return numpy.sqrt(numpy.einsum("ij,ij->j", a_min_b, a_min_b))

def setup(n):
    a = numpy.random.rand(n, 3)
    b = numpy.random.rand(n, 3)
    out0 = numpy.array([a, b])
    out1 = numpy.array([a.T, b.T])
    return out0, out1
    n_range=[2 ** k for k in range(22)],
    xlabel="len(x), len(y)",

Answer #5

The golden spiral method

You said you couldn’t get the golden spiral method to work and that’s a shame because it’s really, really good. I would like to give you a complete understanding of it so that maybe you can understand how to keep this away from being “bunched up.”

So here’s a fast, non-random way to create a lattice that is approximately correct; as discussed above, no lattice will be perfect, but this may be good enough. It is compared to other methods e.g. at but it just has a nice and pretty look as well as a guarantee about even spacing in the limit.

Primer: sunflower spirals on the unit disk

To understand this algorithm, I first invite you to look at the 2D sunflower spiral algorithm. This is based on the fact that the most irrational number is the golden ratio (1 + sqrt(5))/2 and if one emits points by the approach “stand at the center, turn a golden ratio of whole turns, then emit another point in that direction,” one naturally constructs a spiral which, as you get to higher and higher numbers of points, nevertheless refuses to have well-defined ‘bars’ that the points line up on.(Note 1.)

The algorithm for even spacing on a disk is,

from numpy import pi, cos, sin, sqrt, arange
import matplotlib.pyplot as pp

num_pts = 100
indices = arange(0, num_pts, dtype=float) + 0.5

r = sqrt(indices/num_pts)
theta = pi * (1 + 5**0.5) * indices

pp.scatter(r*cos(theta), r*sin(theta))

and it produces results that look like (n=100 and n=1000):

enter image description here

Spacing the points radially

The key strange thing is the formula r = sqrt(indices / num_pts); how did I come to that one? (Note 2.)

Well, I am using the square root here because I want these to have even-area spacing around the disk. That is the same as saying that in the limit of large N I want a little region R ∈ (r, r + dr), Θ ∈ (θ, θ + dθ) to contain a number of points proportional to its area, which is r dr dθ. Now if we pretend that we are talking about a random variable here, this has a straightforward interpretation as saying that the joint probability density for (R, Θ) is just c r for some constant c. Normalization on the unit disk would then force c = 1/π.

Now let me introduce a trick. It comes from probability theory where it’s known as sampling the inverse CDF: suppose you wanted to generate a random variable with a probability density f(z) and you have a random variable U ~ Uniform(0, 1), just like comes out of random() in most programming languages. How do you do this?

  1. First, turn your density into a cumulative distribution function or CDF, which we will call F(z). A CDF, remember, increases monotonically from 0 to 1 with derivative f(z).
  2. Then calculate the CDF’s inverse function F-1(z).
  3. You will find that Z = F-1(U) is distributed according to the target density. (Note 3).

Now the golden-ratio spiral trick spaces the points out in a nicely even pattern for θ so let’s integrate that out; for the unit disk we are left with F(r) = r2. So the inverse function is F-1(u) = u1/2, and therefore we would generate random points on the disk in polar coordinates with r = sqrt(random()); theta = 2 * pi * random().

Now instead of randomly sampling this inverse function we’re uniformly sampling it, and the nice thing about uniform sampling is that our results about how points are spread out in the limit of large N will behave as if we had randomly sampled it. This combination is the trick. Instead of random() we use (arange(0, num_pts, dtype=float) + 0.5)/num_pts, so that, say, if we want to sample 10 points they are r = 0.05, 0.15, 0.25, ... 0.95. We uniformly sample r to get equal-area spacing, and we use the sunflower increment to avoid awful “bars” of points in the output.

Now doing the sunflower on a sphere

The changes that we need to make to dot the sphere with points merely involve switching out the polar coordinates for spherical coordinates. The radial coordinate of course doesn"t enter into this because we"re on a unit sphere. To keep things a little more consistent here, even though I was trained as a physicist I"ll use mathematicians" coordinates where 0 ≤ φ ≤ π is latitude coming down from the pole and 0 ≤ θ ≤ 2π is longitude. So the difference from above is that we are basically replacing the variable r with φ.

Our area element, which was r dr dθ, now becomes the not-much-more-complicated sin(φ) dφ dθ. So our joint density for uniform spacing is sin(φ)/4π. Integrating out θ, we find f(φ) = sin(φ)/2, thus F(φ) = (1 − cos(φ))/2. Inverting this we can see that a uniform random variable would look like acos(1 - 2 u), but we sample uniformly instead of randomly, so we instead use φk = acos(1 − 2 (k + 0.5)/N). And the rest of the algorithm is just projecting this onto the x, y, and z coordinates:

from numpy import pi, cos, sin, arccos, arange
import mpl_toolkits.mplot3d
import matplotlib.pyplot as pp

num_pts = 1000
indices = arange(0, num_pts, dtype=float) + 0.5

phi = arccos(1 - 2*indices/num_pts)
theta = pi * (1 + 5**0.5) * indices

x, y, z = cos(theta) * sin(phi), sin(theta) * sin(phi), cos(phi);

pp.figure().add_subplot(111, projection="3d").scatter(x, y, z);

Again for n=100 and n=1000 the results look like: enter image description here enter image description here

Further research

I wanted to give a shout out to Martin Roberts’s blog. Note that above I created an offset of my indices by adding 0.5 to each index. This was just visually appealing to me, but it turns out that the choice of offset matters a lot and is not constant over the interval and can mean getting as much as 8% better accuracy in packing if chosen correctly. There should also be a way to get his R2 sequence to cover a sphere and it would be interesting to see if this also produced a nice even covering, perhaps as-is but perhaps needing to be, say, taken from only a half of the unit square cut diagonally or so and stretched around to get a circle.


  1. Those “bars” are formed by rational approximations to a number, and the best rational approximations to a number come from its continued fraction expression, z + 1/(n_1 + 1/(n_2 + 1/(n_3 + ...))) where z is an integer and n_1, n_2, n_3, ... is either a finite or infinite sequence of positive integers:

    def continued_fraction(r):
        while r != 0:
            n = floor(r)
            yield n
            r = 1/(r - n)

    Since the fraction part 1/(...) is always between zero and one, a large integer in the continued fraction allows for a particularly good rational approximation: “one divided by something between 100 and 101” is better than “one divided by something between 1 and 2.” The most irrational number is therefore the one which is 1 + 1/(1 + 1/(1 + ...)) and has no particularly good rational approximations; one can solve φ = 1 + 1/φ by multiplying through by φ to get the formula for the golden ratio.

  2. For folks who are not so familiar with NumPy -- all of the functions are “vectorized,” so that sqrt(array) is the same as what other languages might write map(sqrt, array). So this is a component-by-component sqrt application. The same also holds for division by a scalar or addition with scalars -- those apply to all components in parallel.

  3. The proof is simple once you know that this is the result. If you ask what"s the probability that z < Z < z + dz, this is the same as asking what"s the probability that z < F-1(U) < z + dz, apply F to all three expressions noting that it is a monotonically increasing function, hence F(z) < U < F(z + dz), expand the right hand side out to find F(z) + f(z) dz, and since U is uniform this probability is just f(z) dz as promised.

Answer #6

For people (like me) coming here via search engine and just looking for a solution which works out of the box, I recommend installing mpu. Install it via pip install mpu --user and use it like this to get the haversine distance:

import mpu

# Point one
lat1 = 52.2296756
lon1 = 21.0122287

# Point two
lat2 = 52.406374
lon2 = 16.9251681

# What you were looking for
dist = mpu.haversine_distance((lat1, lon1), (lat2, lon2))
print(dist)  # gives 278.45817507541943.

An alternative package is gpxpy.

If you don"t want dependencies, you can use:

import math

def distance(origin, destination):
    Calculate the Haversine distance.

    origin : tuple of float
        (lat, long)
    destination : tuple of float
        (lat, long)

    distance_in_km : float

    >>> origin = (48.1372, 11.5756)  # Munich
    >>> destination = (52.5186, 13.4083)  # Berlin
    >>> round(distance(origin, destination), 1)
    lat1, lon1 = origin
    lat2, lon2 = destination
    radius = 6371  # km

    dlat = math.radians(lat2 - lat1)
    dlon = math.radians(lon2 - lon1)
    a = (math.sin(dlat / 2) * math.sin(dlat / 2) +
         math.cos(math.radians(lat1)) * math.cos(math.radians(lat2)) *
         math.sin(dlon / 2) * math.sin(dlon / 2))
    c = 2 * math.atan2(math.sqrt(a), math.sqrt(1 - a))
    d = radius * c

    return d

if __name__ == "__main__":
    import doctest

The other alternative package is haversine

from haversine import haversine, Unit

lyon = (45.7597, 4.8422) # (lat, lon)
paris = (48.8567, 2.3508)

haversine(lyon, paris)
>> 392.2172595594006  # in kilometers

haversine(lyon, paris, unit=Unit.MILES)
>> 243.71201856934454  # in miles

# you can also use the string abbreviation for units:
haversine(lyon, paris, unit="mi")
>> 243.71201856934454  # in miles

haversine(lyon, paris, unit=Unit.NAUTICAL_MILES)
>> 211.78037755311516  # in nautical miles

They claim to have performance optimization for distances between all points in two vectors

from haversine import haversine_vector, Unit

lyon = (45.7597, 4.8422) # (lat, lon)
paris = (48.8567, 2.3508)
new_york = (40.7033962, -74.2351462)

haversine_vector([lyon, lyon], [paris, new_york], Unit.KILOMETERS)

>> array([ 392.21725956, 6163.43638211])

Answer #7

You can make the observation that for a string to be considered repeating, its length must be divisible by the length of its repeated sequence. Given that, here is a solution that generates divisors of the length from 1 to n / 2 inclusive, divides the original string into substrings with the length of the divisors, and tests the equality of the result set:

from math import sqrt, floor

def divquot(n):
    if n > 1:
        yield 1, n
    swapped = []
    for d in range(2, int(floor(sqrt(n))) + 1):
        q, r = divmod(n, d)
        if r == 0:
            yield d, q
            swapped.append((q, d))
    while swapped:
        yield swapped.pop()

def repeats(s):
    n = len(s)
    for d, q in divquot(n):
        sl = s[0:d]
        if sl * q == s:
            return sl
    return None

EDIT: In Python 3, the / operator has changed to do float division by default. To get the int division from Python 2, you can use the // operator instead. Thank you to @TigerhawkT3 for bringing this to my attention.

The // operator performs integer division in both Python 2 and Python 3, so I"ve updated the answer to support both versions. The part where we test to see if all the substrings are equal is now a short-circuiting operation using all and a generator expression.

UPDATE: In response to a change in the original question, the code has now been updated to return the smallest repeating substring if it exists and None if it does not. @godlygeek has suggested using divmod to reduce the number of iterations on the divisors generator, and the code has been updated to match that as well. It now returns all positive divisors of n in ascending order, exclusive of n itself.

Further update for high performance: After multiple tests, I"ve come to the conclusion that simply testing for string equality has the best performance out of any slicing or iterator solution in Python. Thus, I"ve taken a leaf out of @TigerhawkT3 "s book and updated my solution. It"s now over 6x as fast as before, noticably faster than Tigerhawk"s solution but slower than David"s.

Answer #8

For those trying to make the connection between SNR and a normal random variable generated by numpy:

[1] SNR ratio, where it"s important to keep in mind that P is average power.

Or in dB:
[2] SNR dB2

In this case, we already have a signal and we want to generate noise to give us a desired SNR.

While noise can come in different flavors depending on what you are modeling, a good start (especially for this radio telescope example) is Additive White Gaussian Noise (AWGN). As stated in the previous answers, to model AWGN you need to add a zero-mean gaussian random variable to your original signal. The variance of that random variable will affect the average noise power.

For a Gaussian random variable X, the average power Ep, also known as the second moment, is
[3] Ex

So for white noise, Ex and the average power is then equal to the variance Ex.

When modeling this in python, you can either
1. Calculate variance based on a desired SNR and a set of existing measurements, which would work if you expect your measurements to have fairly consistent amplitude values.
2. Alternatively, you could set noise power to a known level to match something like receiver noise. Receiver noise could be measured by pointing the telescope into free space and calculating average power.

Either way, it"s important to make sure that you add noise to your signal and take averages in the linear space and not in dB units.

Here"s some code to generate a signal and plot voltage, power in Watts, and power in dB:

# Signal Generation
# matplotlib inline

import numpy as np
import matplotlib.pyplot as plt

t = np.linspace(1, 100, 1000)
x_volts = 10*np.sin(t/(2*np.pi))
plt.plot(t, x_volts)
plt.ylabel("Voltage (V)")
plt.xlabel("Time (s)")

x_watts = x_volts ** 2
plt.plot(t, x_watts)
plt.title("Signal Power")
plt.ylabel("Power (W)")
plt.xlabel("Time (s)")

x_db = 10 * np.log10(x_watts)
plt.plot(t, x_db)
plt.title("Signal Power in dB")
plt.ylabel("Power (dB)")
plt.xlabel("Time (s)")

Generated Signal

Here"s an example for adding AWGN based on a desired SNR:

# Adding noise using target SNR

# Set a target SNR
target_snr_db = 20
# Calculate signal power and convert to dB 
sig_avg_watts = np.mean(x_watts)
sig_avg_db = 10 * np.log10(sig_avg_watts)
# Calculate noise according to [2] then convert to watts
noise_avg_db = sig_avg_db - target_snr_db
noise_avg_watts = 10 ** (noise_avg_db / 10)
# Generate an sample of white noise
mean_noise = 0
noise_volts = np.random.normal(mean_noise, np.sqrt(noise_avg_watts), len(x_watts))
# Noise up the original signal
y_volts = x_volts + noise_volts

# Plot signal with noise
plt.plot(t, y_volts)
plt.title("Signal with noise")
plt.ylabel("Voltage (V)")
plt.xlabel("Time (s)")
# Plot in dB
y_watts = y_volts ** 2
y_db = 10 * np.log10(y_watts)
plt.plot(t, 10* np.log10(y_volts**2))
plt.title("Signal with noise (dB)")
plt.ylabel("Power (dB)")
plt.xlabel("Time (s)")

Signal with target SNR

And here"s an example for adding AWGN based on a known noise power:

# Adding noise using a target noise power

# Set a target channel noise power to something very noisy
target_noise_db = 10

# Convert to linear Watt units
target_noise_watts = 10 ** (target_noise_db / 10)

# Generate noise samples
mean_noise = 0
noise_volts = np.random.normal(mean_noise, np.sqrt(target_noise_watts), len(x_watts))

# Noise up the original signal (again) and plot
y_volts = x_volts + noise_volts

# Plot signal with noise
plt.plot(t, y_volts)
plt.title("Signal with noise")
plt.ylabel("Voltage (V)")
plt.xlabel("Time (s)")
# Plot in dB
y_watts = y_volts ** 2
y_db = 10 * np.log10(y_watts)
plt.plot(t, 10* np.log10(y_volts**2))
plt.title("Signal with noise")
plt.ylabel("Power (dB)")
plt.xlabel("Time (s)")

Signal with target noise level

Answer #9

We compare different mode of weight-initialization using the same neural-network(NN) architecture.

All Zeros or Ones

If you follow the principle of Occam"s razor, you might think setting all the weights to 0 or 1 would be the best solution. This is not the case.

With every weight the same, all the neurons at each layer are producing the same output. This makes it hard to decide which weights to adjust.

    # initialize two NN"s with 0 and 1 constant weights
    model_0 = Net(constant_weight=0)
    model_1 = Net(constant_weight=1)
  • After 2 epochs:

plot of training loss with weight initialization to constant

Validation Accuracy
9.625% -- All Zeros
10.050% -- All Ones
Training Loss
2.304  -- All Zeros
1552.281  -- All Ones

Uniform Initialization

A uniform distribution has the equal probability of picking any number from a set of numbers.

Let"s see how well the neural network trains using a uniform weight initialization, where low=0.0 and high=1.0.

Below, we"ll see another way (besides in the Net class code) to initialize the weights of a network. To define weights outside of the model definition, we can:

  1. Define a function that assigns weights by the type of network layer, then
  2. Apply those weights to an initialized model using model.apply(fn), which applies a function to each model layer.
    # takes in a module and applies the specified weight initialization
    def weights_init_uniform(m):
        classname = m.__class__.__name__
        # for every Linear layer in a model..
        if classname.find("Linear") != -1:
            # apply a uniform distribution to the weights and a bias=0
  , 1.0)

    model_uniform = Net()
  • After 2 epochs:

enter image description here

Validation Accuracy
36.667% -- Uniform Weights
Training Loss
3.208  -- Uniform Weights

General rule for setting weights

The general rule for setting the weights in a neural network is to set them to be close to zero without being too small.

Good practice is to start your weights in the range of [-y, y] where y=1/sqrt(n)
(n is the number of inputs to a given neuron).

    # takes in a module and applies the specified weight initialization
    def weights_init_uniform_rule(m):
        classname = m.__class__.__name__
        # for every Linear layer in a model..
        if classname.find("Linear") != -1:
            # get the number of the inputs
            n = m.in_features
            y = 1.0/np.sqrt(n)
  , y)

    # create a new model with these weights
    model_rule = Net()

below we compare performance of NN, weights initialized with uniform distribution [-0.5,0.5) versus the one whose weight is initialized using general rule

  • After 2 epochs:

plot showing performance of uniform initialization of weight versus general rule of initialization

Validation Accuracy
75.817% -- Centered Weights [-0.5, 0.5)
85.208% -- General Rule [-y, y)
Training Loss
0.705  -- Centered Weights [-0.5, 0.5)
0.469  -- General Rule [-y, y)

normal distribution to initialize the weights

The normal distribution should have a mean of 0 and a standard deviation of y=1/sqrt(n), where n is the number of inputs to NN

    ## takes in a module and applies the specified weight initialization
    def weights_init_normal(m):
        """Takes in a module and initializes all linear layers with weight
           values taken from a normal distribution."""

        classname = m.__class__.__name__
        # for every Linear layer in a model
        if classname.find("Linear") != -1:
            y = m.in_features
        # shoud be taken from a normal distribution
        # should be 0

below we show the performance of two NN one initialized using uniform-distribution and the other using normal-distribution

  • After 2 epochs:

performance of weight initialization using uniform-distribution versus the normal distribution

Validation Accuracy
85.775% -- Uniform Rule [-y, y)
84.717% -- Normal Distribution
Training Loss
0.329  -- Uniform Rule [-y, y)
0.443  -- Normal Distribution

Answer #10

scikit-learn"s LinearRegression doesn"t calculate this information but you can easily extend the class to do it:

from sklearn import linear_model
from scipy import stats
import numpy as np

class LinearRegression(linear_model.LinearRegression):
    LinearRegression class after sklearn"s, but calculate t-statistics
    and p-values for model coefficients (betas).
    Additional attributes available after .fit()
    are `t` and `p` which are of the shape (y.shape[1], X.shape[1])
    which is (n_features, n_coefs)
    This class sets the intercept to 0 by default, since usually we include it
    in X.

    def __init__(self, *args, **kwargs):
        if not "fit_intercept" in kwargs:
            kwargs["fit_intercept"] = False
        super(LinearRegression, self)
                .__init__(*args, **kwargs)

    def fit(self, X, y, n_jobs=1):
        self = super(LinearRegression, self).fit(X, y, n_jobs)

        sse = np.sum((self.predict(X) - y) ** 2, axis=0) / float(X.shape[0] - X.shape[1])
        se = np.array([
            np.sqrt(np.diagonal(sse[i] * np.linalg.inv(, X))))
                                                    for i in range(sse.shape[0])

        self.t = self.coef_ / se
        self.p = 2 * (1 - stats.t.cdf(np.abs(self.t), y.shape[0] - X.shape[1]))
        return self

Stolen from here.

You should take a look at statsmodels for this kind of statistical analysis in Python.