👻 *Check our latest review to choose the best laptop for Machine Learning engineers and Deep learning tasks!*

I am trying to use pip behind a proxy at work.

One of the answers from this post suggested using CNTLM. I installed and configured it per this other post, but running `cntlm.exe -c cntlm.ini -I -M http://google.com`

gave the error `Connection to proxy failed, bailing out`

.

I also tried `pip install -‚Äìproxy=user:[email protected]:3128`

(the default CNTLM port) but that raised `Cannot fetch index base URL http://pypi.python.org/simple/`

. Clearly something"s up with the proxy.

Does anyone know how to check more definitively whether CNTLM is set up right, or if there"s another way around this altogether? I know you can also set the `http_proxy`

environment variable as described here but I"m not sure what credentials to put in. The ones from `cntlm.ini`

?

👻 *Read also: what is the best laptop for engineering students?*

## Using pip behind a proxy with CNTLM around: Questions

Removing white space around a saved image in matplotlib

2 answers

I need to take an image and save it after some process. The figure looks fine when I display it, but after saving the figure, I got some white space around the saved image. I have tried the `"tight"`

option for `savefig`

method, did not work either. The code:

```
import matplotlib.image as mpimg
import matplotlib.pyplot as plt
fig = plt.figure(1)
img = mpimg.imread(path)
plt.imshow(img)
ax=fig.add_subplot(1,1,1)
extent = ax.get_window_extent().transformed(fig.dpi_scale_trans.inverted())
plt.savefig("1.png", bbox_inches=extent)
plt.axis("off")
plt.show()
```

I am trying to draw a basic graph by using NetworkX on a figure and save it. I realized that without a graph it works, but when added a graph I get white space around the saved image;

```
import matplotlib.image as mpimg
import matplotlib.pyplot as plt
import networkx as nx
G = nx.Graph()
G.add_node(1)
G.add_node(2)
G.add_node(3)
G.add_edge(1,3)
G.add_edge(1,2)
pos = {1:[100,120], 2:[200,300], 3:[50,75]}
fig = plt.figure(1)
img = mpimg.imread("image.jpg")
plt.imshow(img)
ax=fig.add_subplot(1,1,1)
nx.draw(G, pos=pos)
extent = ax.get_window_extent().transformed(fig.dpi_scale_trans.inverted())
plt.savefig("1.png", bbox_inches = extent)
plt.axis("off")
plt.show()
```

Answer #1

You can remove the white space padding by setting `bbox_inches="tight"`

in `savefig`

:

```
plt.savefig("test.png",bbox_inches="tight")
```

You"ll have to put the argument to `bbox_inches`

as a string, perhaps this is why it didn"t work earlier for you.

**Possible duplicates:**

Matplotlib plots: removing axis, legends and white spaces

Answer #2

I cannot claim I know exactly why or how my “solution” works, but this is what I had to do when I wanted to plot the outline of a couple of aerofoil sections — without white margins — to a PDF file. (Note that I used matplotlib inside an IPython notebook, with the -pylab flag.)

```
plt.gca().set_axis_off()
plt.subplots_adjust(top = 1, bottom = 0, right = 1, left = 0,
hspace = 0, wspace = 0)
plt.margins(0,0)
plt.gca().xaxis.set_major_locator(plt.NullLocator())
plt.gca().yaxis.set_major_locator(plt.NullLocator())
plt.savefig("filename.pdf", bbox_inches = "tight",
pad_inches = 0)
```

I have tried to deactivate different parts of this, but this always lead to a white margin somewhere. You may even have modify this to keep fat lines near the limits of the figure from being shaved by the lack of margins.

Is there a list of Pytz Timezones?

3 answers

I would like to know what are all the possible values for the timezone argument in the Python library pytz. How to do it?

Answer #1

You can list all the available timezones with `pytz.all_timezones`

:

```
In [40]: import pytz
In [41]: pytz.all_timezones
Out[42]:
["Africa/Abidjan",
"Africa/Accra",
"Africa/Addis_Ababa",
...]
```

There is also `pytz.common_timezones`

:

```
In [45]: len(pytz.common_timezones)
Out[45]: 403
In [46]: len(pytz.all_timezones)
Out[46]: 563
```

Python strptime() and timezones?

3 answers

I have a CSV dumpfile from a Blackberry IPD backup, created using IPDDump.
The date/time strings in here look something like this
(where `EST`

is an Australian time-zone):

```
Tue Jun 22 07:46:22 EST 2010
```

I need to be able to parse this date in Python. At first, I tried to use the `strptime()`

function from datettime.

```
>>> datetime.datetime.strptime("Tue Jun 22 12:10:20 2010 EST", "%a %b %d %H:%M:%S %Y %Z")
```

However, for some reason, the `datetime`

object that comes back doesn"t seem to have any `tzinfo`

associated with it.

I did read on this page that apparently `datetime.strptime`

silently discards `tzinfo`

, however, I checked the documentation, and I can"t find anything to that effect documented here.

I have been able to get the date parsed using a third-party Python library, dateutil, however I"m still curious as to how I was using the in-built `strptime()`

incorrectly? Is there any way to get `strptime()`

to play nicely with timezones?

Answer #1

I recommend using python-dateutil. Its parser has been able to parse every date format I"ve thrown at it so far.

```
>>> from dateutil import parser
>>> parser.parse("Tue Jun 22 07:46:22 EST 2010")
datetime.datetime(2010, 6, 22, 7, 46, 22, tzinfo=tzlocal())
>>> parser.parse("Fri, 11 Nov 2011 03:18:09 -0400")
datetime.datetime(2011, 11, 11, 3, 18, 9, tzinfo=tzoffset(None, -14400))
>>> parser.parse("Sun")
datetime.datetime(2011, 12, 18, 0, 0)
>>> parser.parse("10-11-08")
datetime.datetime(2008, 10, 11, 0, 0)
```

and so on. No dealing with `strptime()`

format nonsense... just throw a date at it and it Does The Right Thing.

**Update**: Oops. I missed in your original question that you mentioned that you used `dateutil`

, sorry about that. But I hope this answer is still useful to other people who stumble across this question when they have date parsing questions and see the utility of that module.

Fitting empirical distribution to theoretical ones with Scipy (Python)?

3 answers

**INTRODUCTION**: I have a list of more than 30,000 integer values ranging from 0 to 47, inclusive, e.g.`[0,0,0,0,..,1,1,1,1,...,2,2,2,2,...,47,47,47,...]`

sampled from some continuous distribution. The values in the list are not necessarily in order, but order doesn"t matter for this problem.

**PROBLEM**: Based on my distribution I would like to calculate p-value (the probability of seeing greater values) for any given value. For example, as you can see p-value for 0 would be approaching 1 and p-value for higher numbers would be tending to 0.

I don"t know if I am right, but to determine probabilities I think I need to fit my data to a theoretical distribution that is the most suitable to describe my data. I assume that some kind of goodness of fit test is needed to determine the best model.

Is there a way to implement such an analysis in Python (`Scipy`

or `Numpy`

)?
Could you present any examples?

Thank you!

Answer #1

# Distribution Fitting with Sum of Square Error (SSE)

This is an update and modification to Saullo"s answer, that uses the full list of the current `scipy.stats`

distributions and returns the distribution with the least SSE between the distribution"s histogram and the data"s histogram.

## Example Fitting

Using the El Ni√±o dataset from `statsmodels`

, the distributions are fit and error is determined. The distribution with the least error is returned.

### All Distributions

### Best Fit Distribution

### Example Code

```
%matplotlib inline
import warnings
import numpy as np
import pandas as pd
import scipy.stats as st
import statsmodels.api as sm
from scipy.stats._continuous_distns import _distn_names
import matplotlib
import matplotlib.pyplot as plt
matplotlib.rcParams["figure.figsize"] = (16.0, 12.0)
matplotlib.style.use("ggplot")
# Create models from data
def best_fit_distribution(data, bins=200, ax=None):
"""Model data by finding best fit distribution to data"""
# Get histogram of original data
y, x = np.histogram(data, bins=bins, density=True)
x = (x + np.roll(x, -1))[:-1] / 2.0
# Best holders
best_distributions = []
# Estimate distribution parameters from data
for ii, distribution in enumerate([d for d in _distn_names if not d in ["levy_stable", "studentized_range"]]):
print("{:>3} / {:<3}: {}".format( ii+1, len(_distn_names), distribution ))
distribution = getattr(st, distribution)
# Try to fit the distribution
try:
# Ignore warnings from data that can"t be fit
with warnings.catch_warnings():
warnings.filterwarnings("ignore")
# fit dist to data
params = distribution.fit(data)
# Separate parts of parameters
arg = params[:-2]
loc = params[-2]
scale = params[-1]
# Calculate fitted PDF and error with fit in distribution
pdf = distribution.pdf(x, loc=loc, scale=scale, *arg)
sse = np.sum(np.power(y - pdf, 2.0))
# if axis pass in add to plot
try:
if ax:
pd.Series(pdf, x).plot(ax=ax)
end
except Exception:
pass
# identify if this distribution is better
best_distributions.append((distribution, params, sse))
except Exception:
pass
return sorted(best_distributions, key=lambda x:x[2])
def make_pdf(dist, params, size=10000):
"""Generate distributions"s Probability Distribution Function """
# Separate parts of parameters
arg = params[:-2]
loc = params[-2]
scale = params[-1]
# Get sane start and end points of distribution
start = dist.ppf(0.01, *arg, loc=loc, scale=scale) if arg else dist.ppf(0.01, loc=loc, scale=scale)
end = dist.ppf(0.99, *arg, loc=loc, scale=scale) if arg else dist.ppf(0.99, loc=loc, scale=scale)
# Build PDF and turn into pandas Series
x = np.linspace(start, end, size)
y = dist.pdf(x, loc=loc, scale=scale, *arg)
pdf = pd.Series(y, x)
return pdf
# Load data from statsmodels datasets
data = pd.Series(sm.datasets.elnino.load_pandas().data.set_index("YEAR").values.ravel())
# Plot for comparison
plt.figure(figsize=(12,8))
ax = data.plot(kind="hist", bins=50, density=True, alpha=0.5, color=list(matplotlib.rcParams["axes.prop_cycle"])[1]["color"])
# Save plot limits
dataYLim = ax.get_ylim()
# Find best fit distribution
best_distibutions = best_fit_distribution(data, 200, ax)
best_dist = best_distibutions[0]
# Update plots
ax.set_ylim(dataYLim)
ax.set_title(u"El Ni√±o sea temp.
All Fitted Distributions")
ax.set_xlabel(u"Temp (¬∞C)")
ax.set_ylabel("Frequency")
# Make PDF with best params
pdf = make_pdf(best_dist[0], best_dist[1])
# Display
plt.figure(figsize=(12,8))
ax = pdf.plot(lw=2, label="PDF", legend=True)
data.plot(kind="hist", bins=50, density=True, alpha=0.5, label="Data", legend=True, ax=ax)
param_names = (best_dist[0].shapes + ", loc, scale").split(", ") if best_dist[0].shapes else ["loc", "scale"]
param_str = ", ".join(["{}={:0.2f}".format(k,v) for k,v in zip(param_names, best_dist[1])])
dist_str = "{}({})".format(best_dist[0].name, param_str)
ax.set_title(u"El Ni√±o sea temp. with best fit distribution
" + dist_str)
ax.set_xlabel(u"Temp. (¬∞C)")
ax.set_ylabel("Frequency")
```