Function sqrt () — is a builtin function in the Python programming language that returns the square root of any number.
Syntax: math.sqrt (x) Parameter: x is any number such that x & gt; = 0 Returns: It returns the square root of the number passed in the parameter.
Output:
0.0 2.0 1.8708286933869707
Error: when x & lt; 0 , it fails due to a runtime error.
# Python3 demo program
# sqrt () method
# import math module
import
math
# print the square root of 0
print
(math.sqrt (
0
))
# print the square root of 4
print
(math. sqrt (
4
))
# print the square root of 3.5
print
(math. sqrt (
3.5
))
# Python3 program to show the error in
# sqrt () method
# import math module
import
math
# throw an error when x & lt; 0
print
(math.sqrt (

1
))
Exit:
Traceback (most recent call last): File "/home/67438f8df14f0e41df1b55c6c21499ef.py", line 8, in print (math.sqrt (1)) ValueError: math domain error
Practical application: given a number, check if it is simple or not.
Approach: run a loop from 2 to sqrt (n) and check if any number in the range (2sqrt (n)) n divides.

Exit :
prime
I"ve been wondering this for some time. As the title say, which is faster, the actual function or simply raising to the half power?
UPDATE
This is not a matter of premature optimization. This is simply a question of how the underlying code actually works. What is the theory of how Python code works?
I sent Guido van Rossum an email cause I really wanted to know the differences in these methods.
There are at least 3 ways to do a square root in Python: math.sqrt, the "**" operator and pow(x,.5). I"m just curious as to the differences in the implementation of each of these. When it comes to efficiency which is better?
pow and ** are equivalent; math.sqrt doesn"t work for complex numbers, and links to the C sqrt() function. As to which one is faster, I have no idea...
I"ve tested all suggested methods plus np.array(map(f, x))
with perfplot
(a small project of mine).
Message #1: If you can use numpy"s native functions, do that.
If the function you"re trying to vectorize already is vectorized (like the x**2
example in the original post), using that is much faster than anything else (note the log scale):
If you actually need vectorization, it doesn"t really matter much which variant you use.
Code to reproduce the plots:
import numpy as np
import perfplot
import math
def f(x):
# return math.sqrt(x)
return np.sqrt(x)
vf = np.vectorize(f)
def array_for(x):
return np.array([f(xi) for xi in x])
def array_map(x):
return np.array(list(map(f, x)))
def fromiter(x):
return np.fromiter((f(xi) for xi in x), x.dtype)
def vectorize(x):
return np.vectorize(f)(x)
def vectorize_without_init(x):
return vf(x)
perfplot.show(
setup=np.random.rand,
n_range=[2 ** k for k in range(20)],
kernels=[f, array_for, array_map, fromiter,
vectorize, vectorize_without_init],
xlabel="len(x)",
)
This is kind of overkill but let"s give it a go. First lets use statsmodel to find out what the pvalues should be
import pandas as pd
import numpy as np
from sklearn import datasets, linear_model
from sklearn.linear_model import LinearRegression
import statsmodels.api as sm
from scipy import stats
diabetes = datasets.load_diabetes()
X = diabetes.data
y = diabetes.target
X2 = sm.add_constant(X)
est = sm.OLS(y, X2)
est2 = est.fit()
print(est2.summary())
and we get
OLS Regression Results
==============================================================================
Dep. Variable: y Rsquared: 0.518
Model: OLS Adj. Rsquared: 0.507
Method: Least Squares Fstatistic: 46.27
Date: Wed, 08 Mar 2017 Prob (Fstatistic): 3.83e62
Time: 10:08:24 LogLikelihood: 2386.0
No. Observations: 442 AIC: 4794.
Df Residuals: 431 BIC: 4839.
Df Model: 10
Covariance Type: nonrobust
==============================================================================
coef std err t P>t [0.025 0.975]

const 152.1335 2.576 59.061 0.000 147.071 157.196
x1 10.0122 59.749 0.168 0.867 127.448 107.424
x2 239.8191 61.222 3.917 0.000 360.151 119.488
x3 519.8398 66.534 7.813 0.000 389.069 650.610
x4 324.3904 65.422 4.958 0.000 195.805 452.976
x5 792.1842 416.684 1.901 0.058 1611.169 26.801
x6 476.7458 339.035 1.406 0.160 189.621 1143.113
x7 101.0446 212.533 0.475 0.635 316.685 518.774
x8 177.0642 161.476 1.097 0.273 140.313 494.442
x9 751.2793 171.902 4.370 0.000 413.409 1089.150
x10 67.6254 65.984 1.025 0.306 62.065 197.316
==============================================================================
Omnibus: 1.506 DurbinWatson: 2.029
Prob(Omnibus): 0.471 JarqueBera (JB): 1.404
Skew: 0.017 Prob(JB): 0.496
Kurtosis: 2.726 Cond. No. 227.
==============================================================================
Ok, let"s reproduce this. It is kind of overkill as we are almost reproducing a linear regression analysis using Matrix Algebra. But what the heck.
lm = LinearRegression()
lm.fit(X,y)
params = np.append(lm.intercept_,lm.coef_)
predictions = lm.predict(X)
newX = pd.DataFrame({"Constant":np.ones(len(X))}).join(pd.DataFrame(X))
MSE = (sum((ypredictions)**2))/(len(newX)len(newX.columns))
# Note if you don"t want to use a DataFrame replace the two lines above with
# newX = np.append(np.ones((len(X),1)), X, axis=1)
# MSE = (sum((ypredictions)**2))/(len(newX)len(newX[0]))
var_b = MSE*(np.linalg.inv(np.dot(newX.T,newX)).diagonal())
sd_b = np.sqrt(var_b)
ts_b = params/ sd_b
p_values =[2*(1stats.t.cdf(np.abs(i),(len(newX)len(newX[0])))) for i in ts_b]
sd_b = np.round(sd_b,3)
ts_b = np.round(ts_b,3)
p_values = np.round(p_values,3)
params = np.round(params,4)
myDF3 = pd.DataFrame()
myDF3["Coefficients"],myDF3["Standard Errors"],myDF3["t values"],myDF3["Probabilities"] = [params,sd_b,ts_b,p_values]
print(myDF3)
And this gives us.
Coefficients Standard Errors t values Probabilities
0 152.1335 2.576 59.061 0.000
1 10.0122 59.749 0.168 0.867
2 239.8191 61.222 3.917 0.000
3 519.8398 66.534 7.813 0.000
4 324.3904 65.422 4.958 0.000
5 792.1842 416.684 1.901 0.058
6 476.7458 339.035 1.406 0.160
7 101.0446 212.533 0.475 0.635
8 177.0642 161.476 1.097 0.273
9 751.2793 171.902 4.370 0.000
10 67.6254 65.984 1.025 0.306
So we can reproduce the values from statsmodel.
If you understand RMSE: (Root mean squared error), MSE: (Mean Squared Error) RMD (Root mean squared deviation) and RMS: (Root Mean Squared), then asking for a library to calculate this for you is unnecessary overengineering. All these metrics are a single line of python code at most 2 inches long. The three metrics rmse, mse, rmd, and rms are at their core conceptually identical.
RMSE answers the question: "How similar, on average, are the numbers in list1
to list2
?". The two lists must be the same size. I want to "wash out the noise between any two given elements, wash out the size of the data collected, and get a single number feel for change over time".
Imagine you are learning to throw darts at a dart board. Every day you practice for one hour. You want to figure out if you are getting better or getting worse. So every day you make 10 throws and measure the distance between the bullseye and where your dart hit.
You make a list of those numbers list1
. Use the root mean squared error between the distances at day 1 and a list2
containing all zeros. Do the same on the 2nd and nth days. What you will get is a single number that hopefully decreases over time. When your RMSE number is zero, you hit bullseyes every time. If the rmse number goes up, you are getting worse.
import numpy as np
d = [0.000, 0.166, 0.333] #ideal target distances, these can be all zeros.
p = [0.000, 0.254, 0.998] #your performance goes here
print("d is: " + str(["%.8f" % elem for elem in d]))
print("p is: " + str(["%.8f" % elem for elem in p]))
def rmse(predictions, targets):
return np.sqrt(((predictions  targets) ** 2).mean())
rmse_val = rmse(np.array(d), np.array(p))
print("rms error is: " + str(rmse_val))
Which prints:
d is: ["0.00000000", "0.16600000", "0.33300000"]
p is: ["0.00000000", "0.25400000", "0.99800000"]
rms error between lists d and p is: 0.387284994115
Glyph Legend: n
is a whole positive integer representing the number of throws. i
represents a whole positive integer counter that enumerates sum. d
stands for the ideal distances, the list2
containing all zeros in above example. p
stands for performance, the list1
in the above example. superscript 2 stands for numeric squared. d_{i} is the i"th index of d
. p_{i} is the i"th index of p
.
The rmse done in small steps so it can be understood:
def rmse(predictions, targets):
differences = predictions  targets #the DIFFERENCEs.
differences_squared = differences ** 2 #the SQUAREs of ^
mean_of_differences_squared = differences_squared.mean() #the MEAN of ^
rmse_val = np.sqrt(mean_of_differences_squared) #ROOT of ^
return rmse_val #get the ^
Subtracting one number from another gives you the distance between them.
8  5 = 3 #absolute distance between 8 and 5 is +3
20  10 = 30 #absolute distance between 20 and 10 is +30
If you multiply any number times itself, the result is always positive because negative times negative is positive:
3*3 = 9 = positive
30*30 = 900 = positive
Add them all up, but wait, then an array with many elements would have a larger error than a small array, so average them by the number of elements.
But wait, we squared them all earlier to force them positive. Undo the damage with a square root!
That leaves you with a single number that represents, on average, the distance between every value of list1 to it"s corresponding element value of list2.
If the RMSE value goes down over time we are happy because variance is decreasing.
Root mean squared error measures the vertical distance between the point and the line, so if your data is shaped like a banana, flat near the bottom and steep near the top, then the RMSE will report greater distances to points high, but short distances to points low when in fact the distances are equivalent. This causes a skew where the line prefers to be closer to points high than low.
If this is a problem the total least squares method fixes this: https://mubaris.com/posts/linearregression
If there are nulls or infinity in either input list, then output rmse value is is going to not make sense. There are three strategies to deal with nulls / missing values / infinities in either list: Ignore that component, zero it out or add a best guess or a uniform random noise to all timesteps. Each remedy has its pros and cons depending on what your data means. In general ignoring any component with a missing value is preferred, but this biases the RMSE toward zero making you think performance has improved when it really hasn"t. Adding random noise on a best guess could be preferred if there are lots of missing values.
In order to guarantee relative correctness of the RMSE output, you must eliminate all nulls/infinites from the input.
Root mean squared error squares relies on all data being right and all are counted as equal. That means one stray point that"s way out in left field is going to totally ruin the whole calculation. To handle outlier data points and dismiss their tremendous influence after a certain threshold, see Robust estimators that build in a threshold for dismissal of outliers.
For anyone interested in computing multiple distances at once, I"ve done a little comparison using perfplot (a small project of mine).
The first advice is to organize your data such that the arrays have dimension (3, n)
(and are Ccontiguous obviously). If adding happens in the contiguous first dimension, things are faster, and it doesn"t matter too much if you use sqrtsum
with axis=0
, linalg.norm
with axis=0
, or
a_min_b = a  b
numpy.sqrt(numpy.einsum("ij,ij>j", a_min_b, a_min_b))
which is, by a slight margin, the fastest variant. (That actually holds true for just one row as well.)
The variants where you sum up over the second axis, axis=1
, are all substantially slower.
Code to reproduce the plot:
import numpy
import perfplot
from scipy.spatial import distance
def linalg_norm(data):
a, b = data[0]
return numpy.linalg.norm(a  b, axis=1)
def linalg_norm_T(data):
a, b = data[1]
return numpy.linalg.norm(a  b, axis=0)
def sqrt_sum(data):
a, b = data[0]
return numpy.sqrt(numpy.sum((a  b) ** 2, axis=1))
def sqrt_sum_T(data):
a, b = data[1]
return numpy.sqrt(numpy.sum((a  b) ** 2, axis=0))
def scipy_distance(data):
a, b = data[0]
return list(map(distance.euclidean, a, b))
def sqrt_einsum(data):
a, b = data[0]
a_min_b = a  b
return numpy.sqrt(numpy.einsum("ij,ij>i", a_min_b, a_min_b))
def sqrt_einsum_T(data):
a, b = data[1]
a_min_b = a  b
return numpy.sqrt(numpy.einsum("ij,ij>j", a_min_b, a_min_b))
def setup(n):
a = numpy.random.rand(n, 3)
b = numpy.random.rand(n, 3)
out0 = numpy.array([a, b])
out1 = numpy.array([a.T, b.T])
return out0, out1
perfplot.save(
"norm.png",
setup=setup,
n_range=[2 ** k for k in range(22)],
kernels=[
linalg_norm,
linalg_norm_T,
scipy_distance,
sqrt_sum,
sqrt_sum_T,
sqrt_einsum,
sqrt_einsum_T,
],
xlabel="len(x), len(y)",
)
You said you couldn‚Äôt get the golden spiral method to work and that‚Äôs a shame because it‚Äôs really, really good. I would like to give you a complete understanding of it so that maybe you can understand how to keep this away from being ‚Äúbunched up.‚Äù
So here‚Äôs a fast, nonrandom way to create a lattice that is approximately correct; as discussed above, no lattice will be perfect, but this may be good enough. It is compared to other methods e.g. at BendWavy.org but it just has a nice and pretty look as well as a guarantee about even spacing in the limit.
To understand this algorithm, I first invite you to look at the 2D sunflower spiral algorithm. This is based on the fact that the most irrational number is the golden ratio (1 + sqrt(5))/2
and if one emits points by the approach ‚Äústand at the center, turn a golden ratio of whole turns, then emit another point in that direction,‚Äù one naturally constructs a spiral which, as you get to higher and higher numbers of points, nevertheless refuses to have welldefined ‚Äòbars‚Äô that the points line up on.^{(Note 1.)}
The algorithm for even spacing on a disk is,
from numpy import pi, cos, sin, sqrt, arange
import matplotlib.pyplot as pp
num_pts = 100
indices = arange(0, num_pts, dtype=float) + 0.5
r = sqrt(indices/num_pts)
theta = pi * (1 + 5**0.5) * indices
pp.scatter(r*cos(theta), r*sin(theta))
pp.show()
and it produces results that look like (n=100 and n=1000):
The key strange thing is the formula r = sqrt(indices / num_pts)
; how did I come to that one? ^{(Note 2.)}
Well, I am using the square root here because I want these to have evenarea spacing around the disk. That is the same as saying that in the limit of large N I want a little region R ‚àà (r, r + dr), Œò ‚àà (Œ∏, Œ∏ + dŒ∏) to contain a number of points proportional to its area, which is r dr dŒ∏. Now if we pretend that we are talking about a random variable here, this has a straightforward interpretation as saying that the joint probability density for (R, Œò) is just c r for some constant c. Normalization on the unit disk would then force c = 1/œÄ.
Now let me introduce a trick. It comes from probability theory where it‚Äôs known as sampling the inverse CDF: suppose you wanted to generate a random variable with a probability density f(z) and you have a random variable U ~ Uniform(0, 1), just like comes out of random()
in most programming languages. How do you do this?
Now the goldenratio spiral trick spaces the points out in a nicely even pattern for Œ∏ so let‚Äôs integrate that out; for the unit disk we are left with F(r) = r^{2}. So the inverse function is F^{1}(u) = u^{1/2}, and therefore we would generate random points on the disk in polar coordinates with r = sqrt(random()); theta = 2 * pi * random()
.
Now instead of randomly sampling this inverse function we‚Äôre uniformly sampling it, and the nice thing about uniform sampling is that our results about how points are spread out in the limit of large N will behave as if we had randomly sampled it. This combination is the trick. Instead of random()
we use (arange(0, num_pts, dtype=float) + 0.5)/num_pts
, so that, say, if we want to sample 10 points they are r = 0.05, 0.15, 0.25, ... 0.95
. We uniformly sample r to get equalarea spacing, and we use the sunflower increment to avoid awful ‚Äúbars‚Äù of points in the output.
The changes that we need to make to dot the sphere with points merely involve switching out the polar coordinates for spherical coordinates. The radial coordinate of course doesn"t enter into this because we"re on a unit sphere. To keep things a little more consistent here, even though I was trained as a physicist I"ll use mathematicians" coordinates where 0 ‚â§ œÜ ‚â§ œÄ is latitude coming down from the pole and 0 ‚â§ Œ∏ ‚â§ 2œÄ is longitude. So the difference from above is that we are basically replacing the variable r with œÜ.
Our area element, which was r dr dŒ∏, now becomes the notmuchmorecomplicated sin(œÜ) dœÜ dŒ∏. So our joint density for uniform spacing is sin(œÜ)/4œÄ. Integrating out Œ∏, we find f(œÜ) = sin(œÜ)/2, thus F(œÜ) = (1 ‚àí cos(œÜ))/2. Inverting this we can see that a uniform random variable would look like acos(1  2 u), but we sample uniformly instead of randomly, so we instead use œÜ_{k} = acos(1 ‚àí 2 (k + 0.5)/N). And the rest of the algorithm is just projecting this onto the x, y, and z coordinates:
from numpy import pi, cos, sin, arccos, arange
import mpl_toolkits.mplot3d
import matplotlib.pyplot as pp
num_pts = 1000
indices = arange(0, num_pts, dtype=float) + 0.5
phi = arccos(1  2*indices/num_pts)
theta = pi * (1 + 5**0.5) * indices
x, y, z = cos(theta) * sin(phi), sin(theta) * sin(phi), cos(phi);
pp.figure().add_subplot(111, projection="3d").scatter(x, y, z);
pp.show()
Again for n=100 and n=1000 the results look like:
I wanted to give a shout out to Martin Roberts‚Äôs blog. Note that above I created an offset of my indices by adding 0.5 to each index. This was just visually appealing to me, but it turns out that the choice of offset matters a lot and is not constant over the interval and can mean getting as much as 8% better accuracy in packing if chosen correctly. There should also be a way to get his R_{2} sequence to cover a sphere and it would be interesting to see if this also produced a nice even covering, perhaps asis but perhaps needing to be, say, taken from only a half of the unit square cut diagonally or so and stretched around to get a circle.
Those ‚Äúbars‚Äù are formed by rational approximations to a number, and the best rational approximations to a number come from its continued fraction expression, z + 1/(n_1 + 1/(n_2 + 1/(n_3 + ...)))
where z
is an integer and n_1, n_2, n_3, ...
is either a finite or infinite sequence of positive integers:
def continued_fraction(r):
while r != 0:
n = floor(r)
yield n
r = 1/(r  n)
Since the fraction part 1/(...)
is always between zero and one, a large integer in the continued fraction allows for a particularly good rational approximation: ‚Äúone divided by something between 100 and 101‚Äù is better than ‚Äúone divided by something between 1 and 2.‚Äù The most irrational number is therefore the one which is 1 + 1/(1 + 1/(1 + ...))
and has no particularly good rational approximations; one can solve œÜ = 1 + 1/œÜ by multiplying through by œÜ to get the formula for the golden ratio.
For folks who are not so familiar with NumPy  all of the functions are ‚Äúvectorized,‚Äù so that sqrt(array)
is the same as what other languages might write map(sqrt, array)
. So this is a componentbycomponent sqrt
application. The same also holds for division by a scalar or addition with scalars  those apply to all components in parallel.
The proof is simple once you know that this is the result. If you ask what"s the probability that z < Z < z + dz, this is the same as asking what"s the probability that z < F^{1}(U) < z + dz, apply F to all three expressions noting that it is a monotonically increasing function, hence F(z) < U < F(z + dz), expand the right hand side out to find F(z) + f(z) dz, and since U is uniform this probability is just f(z) dz as promised.
For people (like me) coming here via search engine and just looking for a solution which works out of the box, I recommend installing mpu
. Install it via pip install mpu user
and use it like this to get the haversine distance:
import mpu
# Point one
lat1 = 52.2296756
lon1 = 21.0122287
# Point two
lat2 = 52.406374
lon2 = 16.9251681
# What you were looking for
dist = mpu.haversine_distance((lat1, lon1), (lat2, lon2))
print(dist) # gives 278.45817507541943.
An alternative package is gpxpy
.
If you don"t want dependencies, you can use:
import math
def distance(origin, destination):
"""
Calculate the Haversine distance.
Parameters

origin : tuple of float
(lat, long)
destination : tuple of float
(lat, long)
Returns

distance_in_km : float
Examples

>>> origin = (48.1372, 11.5756) # Munich
>>> destination = (52.5186, 13.4083) # Berlin
>>> round(distance(origin, destination), 1)
504.2
"""
lat1, lon1 = origin
lat2, lon2 = destination
radius = 6371 # km
dlat = math.radians(lat2  lat1)
dlon = math.radians(lon2  lon1)
a = (math.sin(dlat / 2) * math.sin(dlat / 2) +
math.cos(math.radians(lat1)) * math.cos(math.radians(lat2)) *
math.sin(dlon / 2) * math.sin(dlon / 2))
c = 2 * math.atan2(math.sqrt(a), math.sqrt(1  a))
d = radius * c
return d
if __name__ == "__main__":
import doctest
doctest.testmod()
The other alternative package is haversine
from haversine import haversine, Unit
lyon = (45.7597, 4.8422) # (lat, lon)
paris = (48.8567, 2.3508)
haversine(lyon, paris)
>> 392.2172595594006 # in kilometers
haversine(lyon, paris, unit=Unit.MILES)
>> 243.71201856934454 # in miles
# you can also use the string abbreviation for units:
haversine(lyon, paris, unit="mi")
>> 243.71201856934454 # in miles
haversine(lyon, paris, unit=Unit.NAUTICAL_MILES)
>> 211.78037755311516 # in nautical miles
They claim to have performance optimization for distances between all points in two vectors
from haversine import haversine_vector, Unit
lyon = (45.7597, 4.8422) # (lat, lon)
paris = (48.8567, 2.3508)
new_york = (40.7033962, 74.2351462)
haversine_vector([lyon, lyon], [paris, new_york], Unit.KILOMETERS)
>> array([ 392.21725956, 6163.43638211])
You can make the observation that for a string to be considered repeating, its length must be divisible by the length of its repeated sequence. Given that, here is a solution that generates divisors of the length from 1
to n / 2
inclusive, divides the original string into substrings with the length of the divisors, and tests the equality of the result set:
from math import sqrt, floor
def divquot(n):
if n > 1:
yield 1, n
swapped = []
for d in range(2, int(floor(sqrt(n))) + 1):
q, r = divmod(n, d)
if r == 0:
yield d, q
swapped.append((q, d))
while swapped:
yield swapped.pop()
def repeats(s):
n = len(s)
for d, q in divquot(n):
sl = s[0:d]
if sl * q == s:
return sl
return None
EDIT: In Python 3, the /
operator has changed to do float division by default. To get the int
division from Python 2, you can use the //
operator instead. Thank you to @TigerhawkT3 for bringing this to my attention.
The //
operator performs integer division in both Python 2 and Python 3, so I"ve updated the answer to support both versions. The part where we test to see if all the substrings are equal is now a shortcircuiting operation using all
and a generator expression.
UPDATE: In response to a change in the original question, the code has now been updated to return the smallest repeating substring if it exists and None
if it does not. @godlygeek has suggested using divmod
to reduce the number of iterations on the divisors
generator, and the code has been updated to match that as well. It now returns all positive divisors of n
in ascending order, exclusive of n
itself.
Further update for high performance: After multiple tests, I"ve come to the conclusion that simply testing for string equality has the best performance out of any slicing or iterator solution in Python. Thus, I"ve taken a leaf out of @TigerhawkT3 "s book and updated my solution. It"s now over 6x as fast as before, noticably faster than Tigerhawk"s solution but slower than David"s.
For those trying to make the connection between SNR and a normal random variable generated by numpy:
[1] , where it"s important to keep in mind that P is average power.
Or in dB:
[2]
In this case, we already have a signal and we want to generate noise to give us a desired SNR.
While noise can come in different flavors depending on what you are modeling, a good start (especially for this radio telescope example) is Additive White Gaussian Noise (AWGN). As stated in the previous answers, to model AWGN you need to add a zeromean gaussian random variable to your original signal. The variance of that random variable will affect the average noise power.
For a Gaussian random variable X, the average power , also known as the second moment, is
[3]
So for white noise, and the average power is then equal to the variance .
When modeling this in python, you can either
1. Calculate variance based on a desired SNR and a set of existing measurements, which would work if you expect your measurements to have fairly consistent amplitude values.
2. Alternatively, you could set noise power to a known level to match something like receiver noise. Receiver noise could be measured by pointing the telescope into free space and calculating average power.
Either way, it"s important to make sure that you add noise to your signal and take averages in the linear space and not in dB units.
Here"s some code to generate a signal and plot voltage, power in Watts, and power in dB:
# Signal Generation
# matplotlib inline
import numpy as np
import matplotlib.pyplot as plt
t = np.linspace(1, 100, 1000)
x_volts = 10*np.sin(t/(2*np.pi))
plt.subplot(3,1,1)
plt.plot(t, x_volts)
plt.title("Signal")
plt.ylabel("Voltage (V)")
plt.xlabel("Time (s)")
plt.show()
x_watts = x_volts ** 2
plt.subplot(3,1,2)
plt.plot(t, x_watts)
plt.title("Signal Power")
plt.ylabel("Power (W)")
plt.xlabel("Time (s)")
plt.show()
x_db = 10 * np.log10(x_watts)
plt.subplot(3,1,3)
plt.plot(t, x_db)
plt.title("Signal Power in dB")
plt.ylabel("Power (dB)")
plt.xlabel("Time (s)")
plt.show()
Here"s an example for adding AWGN based on a desired SNR:
# Adding noise using target SNR
# Set a target SNR
target_snr_db = 20
# Calculate signal power and convert to dB
sig_avg_watts = np.mean(x_watts)
sig_avg_db = 10 * np.log10(sig_avg_watts)
# Calculate noise according to [2] then convert to watts
noise_avg_db = sig_avg_db  target_snr_db
noise_avg_watts = 10 ** (noise_avg_db / 10)
# Generate an sample of white noise
mean_noise = 0
noise_volts = np.random.normal(mean_noise, np.sqrt(noise_avg_watts), len(x_watts))
# Noise up the original signal
y_volts = x_volts + noise_volts
# Plot signal with noise
plt.subplot(2,1,1)
plt.plot(t, y_volts)
plt.title("Signal with noise")
plt.ylabel("Voltage (V)")
plt.xlabel("Time (s)")
plt.show()
# Plot in dB
y_watts = y_volts ** 2
y_db = 10 * np.log10(y_watts)
plt.subplot(2,1,2)
plt.plot(t, 10* np.log10(y_volts**2))
plt.title("Signal with noise (dB)")
plt.ylabel("Power (dB)")
plt.xlabel("Time (s)")
plt.show()
And here"s an example for adding AWGN based on a known noise power:
# Adding noise using a target noise power
# Set a target channel noise power to something very noisy
target_noise_db = 10
# Convert to linear Watt units
target_noise_watts = 10 ** (target_noise_db / 10)
# Generate noise samples
mean_noise = 0
noise_volts = np.random.normal(mean_noise, np.sqrt(target_noise_watts), len(x_watts))
# Noise up the original signal (again) and plot
y_volts = x_volts + noise_volts
# Plot signal with noise
plt.subplot(2,1,1)
plt.plot(t, y_volts)
plt.title("Signal with noise")
plt.ylabel("Voltage (V)")
plt.xlabel("Time (s)")
plt.show()
# Plot in dB
y_watts = y_volts ** 2
y_db = 10 * np.log10(y_watts)
plt.subplot(2,1,2)
plt.plot(t, 10* np.log10(y_volts**2))
plt.title("Signal with noise")
plt.ylabel("Power (dB)")
plt.xlabel("Time (s)")
plt.show()
If you follow the principle of Occam"s razor, you might think setting all the weights to 0 or 1 would be the best solution. This is not the case.
With every weight the same, all the neurons at each layer are producing the same output. This makes it hard to decide which weights to adjust.
# initialize two NN"s with 0 and 1 constant weights
model_0 = Net(constant_weight=0)
model_1 = Net(constant_weight=1)
Validation Accuracy
9.625%  All Zeros
10.050%  All Ones
Training Loss
2.304  All Zeros
1552.281  All Ones
A uniform distribution has the equal probability of picking any number from a set of numbers.
Let"s see how well the neural network trains using a uniform weight initialization, where low=0.0
and high=1.0
.
Below, we"ll see another way (besides in the Net class code) to initialize the weights of a network. To define weights outside of the model definition, we can:
 Define a function that assigns weights by the type of network layer, then
 Apply those weights to an initialized model using
model.apply(fn)
, which applies a function to each model layer.
# takes in a module and applies the specified weight initialization
def weights_init_uniform(m):
classname = m.__class__.__name__
# for every Linear layer in a model..
if classname.find("Linear") != 1:
# apply a uniform distribution to the weights and a bias=0
m.weight.data.uniform_(0.0, 1.0)
m.bias.data.fill_(0)
model_uniform = Net()
model_uniform.apply(weights_init_uniform)
Validation Accuracy
36.667%  Uniform Weights
Training Loss
3.208  Uniform Weights
The general rule for setting the weights in a neural network is to set them to be close to zero without being too small.
Good practice is to start your weights in the range of [y, y] where
y=1/sqrt(n)
(n is the number of inputs to a given neuron).
# takes in a module and applies the specified weight initialization
def weights_init_uniform_rule(m):
classname = m.__class__.__name__
# for every Linear layer in a model..
if classname.find("Linear") != 1:
# get the number of the inputs
n = m.in_features
y = 1.0/np.sqrt(n)
m.weight.data.uniform_(y, y)
m.bias.data.fill_(0)
# create a new model with these weights
model_rule = Net()
model_rule.apply(weights_init_uniform_rule)
below we compare performance of NN, weights initialized with uniform distribution [0.5,0.5) versus the one whose weight is initialized using general rule
Validation Accuracy
75.817%  Centered Weights [0.5, 0.5)
85.208%  General Rule [y, y)
Training Loss
0.705  Centered Weights [0.5, 0.5)
0.469  General Rule [y, y)
The normal distribution should have a mean of 0 and a standard deviation of
y=1/sqrt(n)
, where n is the number of inputs to NN
## takes in a module and applies the specified weight initialization
def weights_init_normal(m):
"""Takes in a module and initializes all linear layers with weight
values taken from a normal distribution."""
classname = m.__class__.__name__
# for every Linear layer in a model
if classname.find("Linear") != 1:
y = m.in_features
# m.weight.data shoud be taken from a normal distribution
m.weight.data.normal_(0.0,1/np.sqrt(y))
# m.bias.data should be 0
m.bias.data.fill_(0)
below we show the performance of two NN one initialized using uniformdistribution and the other using normaldistribution
Validation Accuracy
85.775%  Uniform Rule [y, y)
84.717%  Normal Distribution
Training Loss
0.329  Uniform Rule [y, y)
0.443  Normal Distribution
scikitlearn"s LinearRegression doesn"t calculate this information but you can easily extend the class to do it:
from sklearn import linear_model
from scipy import stats
import numpy as np
class LinearRegression(linear_model.LinearRegression):
"""
LinearRegression class after sklearn"s, but calculate tstatistics
and pvalues for model coefficients (betas).
Additional attributes available after .fit()
are `t` and `p` which are of the shape (y.shape[1], X.shape[1])
which is (n_features, n_coefs)
This class sets the intercept to 0 by default, since usually we include it
in X.
"""
def __init__(self, *args, **kwargs):
if not "fit_intercept" in kwargs:
kwargs["fit_intercept"] = False
super(LinearRegression, self)
.__init__(*args, **kwargs)
def fit(self, X, y, n_jobs=1):
self = super(LinearRegression, self).fit(X, y, n_jobs)
sse = np.sum((self.predict(X)  y) ** 2, axis=0) / float(X.shape[0]  X.shape[1])
se = np.array([
np.sqrt(np.diagonal(sse[i] * np.linalg.inv(np.dot(X.T, X))))
for i in range(sse.shape[0])
])
self.t = self.coef_ / se
self.p = 2 * (1  stats.t.cdf(np.abs(self.t), y.shape[0]  X.shape[1]))
return self
Stolen from here.
You should take a look at statsmodels for this kind of statistical analysis in Python.