In this article, we will create random datasets using the Numpy library in Python.
Libraries required:
-" Numpy: sudo pip install numpy -" Pandas: sudo pip install pandas -" Matplotlib: sudo pip install matplotlib
Normal distribution:
In probability theory, the normal or Gaussian distribution is a very common continuous probability distribution, symmetric about the mean , showing that data near the mean are more common than data that are far from the mean . Normal distributions are used in statistics and are often used to represent real random variables.
The normal distribution is the most common type of distribution in statistical analysis. The standard normal distribution has two parameters: mean and standard deviation. The mean is the central trend in the distribution. Standard deviation is a measure of variability. It defines the width of the normal distribution. The standard deviation determines how far from the mean the values tend to fall. It represents the typical distance between observation and average. it corresponds to many natural phenomena, such as altitude, blood pressure, measurement uncertainty and IQ readings, all correspond to a normal distribution.
Normal distribution graph:
Example :
|
Output:
Let’s see a better example.
We will generate a dataset with 4 columns. Each column in the dataset represents an object. The 5th column of the dataset is the output label. This ranges between 0-3. This dataset can be used to train a classifier such as a logistic regression classifier, neural network classifier, support vector machines, etc.
< code class = "plain"> point1
|
Output: