Change language

# NumPy Tutorial – Basics in 20 Minutes! [Updated]

In Python, we gain access to the library with import numpy as NP.

The as NP is actually a convention and you could actually stop at import numpy, although we usually do it as NP.

The fundamental object in numpy is a numpy array.

We create them with NP dot array brackets and then square brackets and pass them a list.

So we will pass this the list of one two three and then that will create the numpy array of one two three.

This is the output.

We can assign a variable name to an array.

So we will call this one A with A is equal to the numpy array.

And then outputting A should get the same result.

NumPy arrays have an important attribute called a shape.

If we are to write A dot shape that returns the shape of A which in this case is three, that is because it has three values in it.

A has one, two and three.

So maybe one, one, two, three.

We would see now that As shape changes to four because theres four things in it.

At this point, you might be wondering why we need a numpy array.

Since Python lists can do both of these different things, we can put elements in them and we can ask how long they are.

We need them because of multidimensional data.

The easiest example of multidimensional data is a table.

For example, you could have one row of data.

Say the first row is one two three, and then the second row is, say, 4 5 6. in order to make a numpy array out of this, We will make this into a list of lists.

We just need to enclose this whole thing in brackets and then place a comma here so that this output says the list of the list of one, two, three, and the list of four, five, six.

Generally, for readability, we also add a space here so that all the brackets line up nicely.

Of course, if these numbers were to be longer, like 52, the right side would not line up, but at least the left side still would.

Now were ready to turn this into a numpy array.

We could simply copy this piece of information and write NP dot array, a circular bracket, and then paste that information.

If we fixed the formatting, we close that to see the array of one, two, three and 4 5 6.

Like before we can attach a variable to this, B, set, B equal to that array.

Again, fix our formatting.

You can always just back up and then press enter and itll light it up properly for you.

If we output B, we should see the same information like we did with A.

We want to know what the shape is of B: B.shape, and it will output the tuple of 2 3.

Breaking this down, its because first we have two things.

We have this thing and then we have this thing.

And this is a three.

Because in each of these two things in that and in that we have three items, one two three in the first, and four, five, six in the second.

We would say that B is two dimensional because there is two components in its shape, specifically two rows and three columns.

Arrays are technically called N dimensional arrays because you can have an infinite number of these dimensions.

It allowed for you to have one dimension in A and it allowed for two dimensions in B and you could have three, four, five, six, or an infinite number of dimensions.

Something that we cannot do with a NumPy array is have an inconsistent number of elements.

For example, if we remove this two, wed have the first row has two elements, just one and three, and the second row has three elements, four five and six.

Well, its not going to like that.

And it says youre creating something that it calls a ragged sequence because it has an inconsistent number of elements in the dimensions.

If this has two, well then this has to have two, and if this has three, then this has to have three.

So far weve only used integers: one, two, three and four or five, six, for example, we dont have to do that.

We can also use floating point numbers as well as true and false values or booleans.

So we will copy this and replace this with a variable called C, and we will make this simply 1.1 and then 5.5 so that this has, now everything is the decimal What would happen if we replaced one of these numbers with a true here? This actually makes it a decimal because the true would convert to a one and the one would convert to a 1.0 to make it a decimal.

So this will be the string of six instead.

All of them will convert to strings.

It has to be consistent.

This is mostly being done for tutorial purposes: Most of the time we will have arrays of all decimals, All integers are all boolean values.

You can technically have more complex data types, but most of the time thats a bad idea.

So Im just going to delete this array and well go back to just b.

Now, currently this shape is two by three, but thats not the only possible shape for B.

It could also be maybe three by two.

For example, if we were to do B dot reshape and then pass that shape, say three by two.

Well, then now its the same information but with three rows and two columns instead.

Thats also not the only possible shape for B, it could also just be six and then nothing after it at all.

It could be one dimensional.

All of those shapes worked because they assumed that B has six elements.

If we instead passed the shape of, say, five: that doesnt make any sense, you cannot reshape an array of size six into shape five, the shape has to imply that it has the same number of elements that B does.

It doesnt really matter what its original shape was.

You cant shape that into five things.

And so were going to leave it as this one.

Now, Id like to go through some quick ways to build up numpy arrays, because as weve been writing it so far, actually making the Python list and passing that to numpy is very slow.

And if we had arrays of, say, a million numbers, that would take a very, very long time. Two numbers that we very commonly want an array of is zeros and ones.

Luckily, we can do this in numpy with np dot zeros and then pass a shape, say two by three and that will get a np array where everything is a zero in the shape of 2 3.

If we instead wanted to get ones instead of zeros would simply copy this and replace the zeros with the word ones.

And now this is a np array of shape, two by three where everything is a one.

If you look closely youll see that everything here is actually a decimal or a floating point number.

If we instead wanted integers, we can always specify the data type of an array with passing the argument d type and setting that equal to a type itself.

Ill just pass it int.

And what that will do is make all of these zeroes integers.

Analogous to pythons range function, Theres also an np.arange function which will take a starting value well choose one, a stop value, well choose ten.

And we can call that to get the array of one through nine note just like pythons range function that it actually goes up until the stopping value -1.

If we added in a third value here, well do two.

That mean s it goes up by twos.

So its going to start at one go up by two until it hits ten -1, which is nine.

Thats exactly what we see here with one, then three, then five, then seven, then nine The last function Im going to show for generating numpy arrays is my favorite, which is called and peaked within space.

This still takes a start.

Well, choose one again.

It takes it and will choose 11 again.

But instead of specifying the step here, we are going to specify something called None and set that equal to for our example, itll be 11.

So what that does is it determines itself what the step is needed to evenly space.

11 numbers between one and 11.

Note that it actually does include the end value here because this works well with floating point values.

And not just integers.

This is really just for integers to slightly change it up.

If I made this a 12 instead youll see the step it needs is actually 1.1 every time.

So it says you need 11 numbers between one and 12 evenly spaced.

So Ill say 1.11.11.1 give you all of those numbers.

Now Im going to go back to looking at the array B, which is the array of one, two, three, four, five, six, with the shape of two by three, sometimes in numpy arrays you want to access individual items.

So maybe you want to ask what is in the first row, what is in the second row, what is in the first column? What is in the last column? What is in the second row? The second item, theres so many things you might want to access, and we do this through something called indexing.

Its easiest to get used to this with some practice.

So if we were to do B Sub-Zero, you might be able to guess that this is the first row of information.

Note that it returns it as an umpire, Ray.

And so its really the numpy array of just the first row.

One, two, three. It has the shape of three.

The way this indexing works is very similar to Python lists syntax.

If we are to do a piece of one or that would be the second row.

But if we try to do B sub to one that doesnt exist, that is out of bounds because the size is two.

We have two different rows, but you try to get the third one which doesnt exist just like Python indexing the colon.

Operator mean s you want everything.

So if we just specify this, well, that says give me all of the rows, which is really just the same array back.

This allows us to get a particular column like 1425 or 36 back with Colon and then we place a comma and give an index corresponding to the column.

This is column zero. This is column one.

This is column two.

So if we specify columns zero, that gets us back 14.

If we instead wanted a particular item, then we just specify both the row and the column.

So if we do zeros zero, that corresponds to the very first element.

And we can see that as this one, just the number one and not the numpy array of one.

Now, the reason that numpy is so popular is because it does math very, very quickly.

If you instead want it be but to have all of its elements times too.

So this would be two, this would be four, this would be six and so on.

Well, we can write that with just be times two.

This was an example of something very important called pairwise Asian.

Parallel station mean s that operations were independent of each other.

So this one, we needed to multiply it by two to get two.

We could have put that on one computer, but then on a different computer, put two times two and got four and somewhere else did three times two to get six and two times four to get eight and so on.

These are parallel or independent operations, and you want to do it whenever you possibly can because its so much faster that way.

If we were to instead write the same code that this does, but in a different way, or one by one, we go through each element and we change it to whatever it was multiplied by two.

It is so much slower both to write and for the computer to calculate.

So to show you what not to do, you could actually overwrite be with four I in the range of two because theres two rows and then four j in the range of three because theres three columns and then set each element B sub I sub J equal to whatever it was before.

So well grab that and set it equal to that times two.

This will work. And if we output B, it does actually overwrite B itself and set it equal to itself.

Times two, but its way longer to write and its way slower for the computer to run.

By the way, these two are very similar except theyre actually slightly different because the first one only made a copy and the second one actually rewrote B.

Arithmetic is not the only thing that nampai can parallel lives.

We can also do comparison like B is greater than seven, and that asks every individual element if it is greater than seven and if it is not like two, four and six are, itll place a false there and itll place it true with anything where it is true like eight, ten and 12 This is very useful because we can actually index an umpire res with these questions.

So if we did B and then we did the index and text with square brackets and pasted be greater than seven, or that returns specifically the umpire free of eight, ten and 12 because those are the ones where that question was true.

An important and potentially confusing idea in NAMPAI is the idea of the axis or multiple axes.

It really involves a direction and Ill just show you through an example.

One way it comes up is through the NPR two mean function, which is for taking averages.

If we call and dont mean on B, or that takes the average of the whole array, which is two plus four is six, 12, 20, 30, 42 divided by six we do get seven.

But what if instead we wanted the average for each row, so this average would be four and this average would be ten, or we can do that by specifying that the axis is equal to one.

And so we get the array of four and ten, which is the average of each row.

If instead we wanted the average of each column while wed switch the axis to be zero.

And so we get the averages five averages seven and the average is nine instead theres one axis for every single dimension.

And so if we tried to do axis equals two here, that is out of bounds for dimension two.

We dont have an axis for that.

And so this fails.

But similar to General Python stuff, if you do -1 here, that would assume the axis is the last one And so this goes back to the average of each row.

Theres also similar functions like calculating the standard deviation, and this would do the same thing.

But with that function instead, somewhere else it comes up is if you want to stack NumPy arrays and join them together, we would do this through the Nampai talking cannot function and so we join them together by saying B and B times two.

Those are the two that will join together.

And we want to specify an axis by default.

Its actually axis equals zero.

And what happens here is we stack them.

So this is B and this B times two.

So the concatenation of them is B, and then B times to vertically.

Well, we also could stack them a different way if we specify that axis equals one or that specifies that this is going to be B and this is going to be B times too.

Id now like to finish off this tutorial by showing you how numpy arrays often arise in practice.

One way is to the front TensorFlow Dot Keras Datasets Library, and we can import amnesty, for example.

Amnesty is a particular dataset on handwritten digits and we can turn that into num pire raids or call them x train.

So X train is one, y train is another one.

We write it like brackets like this just because of how this function requires it.

X train, y train and x test and y test.

These things are for arrays if we do equals amnesty dot load data, you can see that it had to load this data, but afterwards we should see that each of these things are numpy series, which mean s they should have this shape attribute.

So if we ask what the shape is, we can see its 60,000 by 28 by 28.

We havent dealt with three dimensional arrays yet, but often they come up when theyre pictures.

So here we have 60,028 by 28 pictures.

A second reason they come up is through the pandas library.

If we do import pandas as PD and then we loaded dataframe with DF is equal to PD dot read CC and then we pass a file name.

Were going to pass the sample data slash California housing trained to use fee name and we get that as a data frame and it should output just like this which tells us we have 17,000 rows and nine columns of longitude, latitude and other data on housing the details really dont matter here, but this should look a lot to you like a numpy array and we can get that as a no higher rate with simply df dot to numpy and that converted to its array form.

We can of course ask this shape because its a non pira and it has 17,000 rows and nine columns just like the data frame does.

Im going to finish this off with a quick homework assignment and if you dont want to do it, thats fine, but I would recommend doing it because it is helpful.

So get array a, I dont care what it is, any array.

And then what I want you to do is save that to file.

So save a into a file.

You have to look up how to do that. I didnt teach you.

The last thing is load it back.

So load a back into memory from the file if you want to drop the like and subscribe to the channel Id greatly appreciate that and I hope you enjoyed the video.

Ill see you next time, guys.

## Shop

Learn programming in R: courses

\$

Best Python online courses for 2022

\$

Best laptop for Fortnite

\$

Best laptop for Excel

\$

Best laptop for Solidworks

\$

Best laptop for Roblox

\$

Best computer for crypto mining

\$

Best laptop for Sims 4

\$

Latest questions

NUMPYNUMPY

Common xlabel/ylabel for matplotlib subplots

NUMPYNUMPY

How to specify multiple return types using type-hints

NUMPYNUMPY

Why do I get "Pickle - EOFError: Ran out of input" reading an empty file?

NUMPYNUMPY

Flake8: Ignore specific warning for entire file

NUMPYNUMPY

glob exclude pattern

NUMPYNUMPY

How to avoid HTTP error 429 (Too Many Requests) python

NUMPYNUMPY

Python CSV error: line contains NULL byte

NUMPYNUMPY

csv.Error: iterator should return strings, not bytes

## Wiki

Python | How to copy data from one Excel sheet to another

Common xlabel/ylabel for matplotlib subplots

Check if one list is a subset of another in Python

How to specify multiple return types using type-hints

Printing words vertically in Python

Python Extract words from a given string

Cyclic redundancy check in Python

Finding mean, median, mode in Python without libraries