Change language

# ML | Forecasting rainfall using linear regression

| | |

Forecasting precipitation — it is the application of science and technology to predict the amount of precipitation in a region. It is important to accurately determine rainfall for efficient water use, crop productivity and preliminary planning of water features.

In this article, we will use linear regression to predict rainfall. Linear regression tells us how many inches of precipitation we can expect.

The dataset is a publicly available weather dataset from Austin, Texas, available on Kaggle. The dataset can be found here .

Data cleansing:
Data comes in all forms, most of which are very messy and unstructured. They are rarely ready to use. Datasets big and small come with a lot of problems: invalid fields, missing and optional values, and values ​​in forms other than what we want. To bring it into a workable or structured form, we need to "cleanse" our data and prepare it for use. Some common cleanup includes parsing, converting to a one-off state, deleting unnecessary data, etc.

In our case, our data has several days where some factors were not captured. And the amount of precipitation in cm was marked as T if there were traces of precipitation. Our algorithm requires numbers, so we cannot work with the alphabets that appear in our data. so we need to clean up the data before applying it to our model

Clean up the data in Python:

Once the data has been cleaned up, it can be used as input to our linear regression model. Linear Regression — it is a linear approach to the formation of the relationship between the dependent variable and the set of independent explanatory variables. This is done by plotting the line that best matches our dot plot, that is, with the fewest errors. This gives predictions of the value, i.e. how many, by substituting the independent values ​​in the line equation.

We will use Scikit-learn’s linear regression model to train our dataset. Once the model is trained, we can provide our own data for various columns such as temperature, dew point, pressure, etc. to predict the weather based on these attributes.

` `

 ` # importing libraries ` ` import ` ` pandas as pd ` ` import ` ` numpy as np `   ` # read data in pandas data frame ` ` data ` ` = ` ` pd.read_csv (` `" austin_weather.csv "` `) `   ` # remove or remove unwanted columns in the data. ` ` data ` ` = ` ` data.drop ([` ` ’Events’ , ’Date’ , ’SeaLevelPressureHighInches’ , ``   ’ SeaLevelPressureLowInches’ ], axis = 1 )   # some values ​​have & # 39; T & # 39;, which stands for precipitation trail # we need to replace all occurrences of T with 0 # so we can use the data in our model data = data.replace ( ’T’ , 0.0 )  < br /> # the data also contains "-", indicating no # or zero. This means the data is not available # we must also replace these values. data = data.replace ( ’-’ , 0.0 )   # save data to CSV file data.to_csv ( ’austin_final.csv’ ) `
 ` # importing libraries ` ` import ` ` pandas as pd ` ` import ` ` numpy as np ` ` import ` ` sklearn as sk ` ` from ` ` sklearn.linear_model ` ` import ` ` LinearRegression ` ` import ` ` matplotlib.pyplot as plt `   ` # read cleaned data ` ` data ` ` = ` ` pd.read_csv (` ` "austin_final .csv "` `) ` ` `  ` # features or x data values ​​` ` # these columns are used to train the model ` ` # last column ie the precipitation column ` ` # will serve as a label ` ` X ` ` = ` ` data.drop ([` ` ’PrecipitationSumInches’ ` `], axis ` ` = ` ` 1 ` `) `   ` # output or label. ` ` Y ` ` = ` ` data [` ` ’PrecipitationSumInches’ ` `] ` ` # convert it to a 2D vector ` ` Y ` ` = ` ` Y.values.reshape (` ` - ` ` 1 ` `, ` ` 1 ` `) `   ` # consider a random day in the dataset ` ` # plot a graph and see it ` ` # day ` ` day_index ` ` = ` ` 798 ` ` days ` ` = ` ` [i ` ` for ` ` i ` ` in ` ` range ` ` (Y.size)] ` ` `  ` # initialize the linear regression classifier ` ` clf ` ` = ` ` LinearRegression () ` ` # train the classifier with ours ` ` # input data. ` ` clf.fit (X, Y) `   ` # give an example of input to test our model ` ` # this is a 2D vector containing values ​​` ` # for each column in the dataset. ` ` inp ` ` = ` ` np.array ([[` ` 74 ` `], [` ` 60 ` `], [` ` 45 ` `], [` ` 67 ` `], [` ` 49 ` `], [` ` 43 ` `], [` ` 33 ` `], [` ` 45 ` `], ` ` [` ` 57 ` `], [` ` 29.68 ` `], [` ` 10 ` `], [` ` 7 ` `], [` ` 2 ` `], [` ` 0 ` `], [` ` 20 ` `], [` ` 4 ` `] , [` ` 31 ` `]]) ` ` inp ` ` = ` inp.reshape ( ` 1 ` `, ` ` - ` ` 1 ` `) `   ` # print the output. ` ` print ` ` (` ` ’The precipitation in inches for the input is:’ ` `, clf.predict (inp)) `   ` # build a graph of precipitation levels ` ` # versus the total number of days. ` ` # one day that’s red ` ` # tracked here. A precipitate is falling ` ` # approx. 2 inches. ` ` print ` ` (` ` "the precipitation trend graph: "` `) ` ` plt.scatter (days, Y, color ` ` = ` ` ’g’ ` `) ` ` plt.scatter (days [day_index], Y [day_index], color ` ` = ` ` ’r’ ` `) ` ` plt.title (` ` "Precipitation level "` `) ` ` plt.xlabel (` `" Days "` `) ` ` plt.ylabel (` ` "Precipitation in inches" ` `) `     ` plt.show () ` ` x_vis ` ` = ` ` X. ` ` filter ` ` ([` `’ TempAvgF’ ` `, ` ` ’DewPointAvgF’ ` `, ` `’ HumidityAvgPercent’ ` ` , ` ` ’SeaLevelPressureAvgInches’ ` `, ` ` ’VisibilityAvgMiles’ ` `, ` ` ` ` ’WindAvgMPH’ ` `], axis ` ` = ` ` 1 ` `) ` ` `  ` # build a graph with several characteristics (x-values) ` ` # against precipitation or precipitation, to watch ` ` # trends `   ` print ` ` (` ` "Precipitation vs selected attributes graph:" ` `) `   ` for ` ` i ` ` in ` ` range ` ` (x_vis. columns.size): ` ` plt.subplot (` ` 3 ` `, ` ` 2 ` `, i ` ` + ` ` 1 ` `) ` ` plt.scatter (days, x_vis [x_vis.columns.values ​​[i] [: ` ` 100 ` `]], ` ` color ` ` = ` ` ’g ’` `) ` ` `  ` ` ` plt.scatter (days [day_index], ` ` ` ` x_vis [x_vis.columns.values ​​[i]] [day_index], ` ` color ` ` = ` ` ’r’ ` `) `   ` plt. title (x_vis.columns.values ​​[i]) `   ` plt.show () `

Output:

` The precipitation in inches for the input is: [[1.33868402]] The precipit ation trend graph: `

Precipitation graph against selected attributes:

A day (in red) with about 2 inches of precipitation is tracked by several parameters (the same day is tracked by several parameters such as temperature, pressure, etc.). The X-axis denotes days, and the Y-axis denotes the magnitude of an element such as temperature, pressure, etc. The graph shows that precipitation can be high if the temperature is high and the humidity is high.

## Shop

Learn programming in R: courses

\$

Best Python online courses for 2022

\$

Best laptop for Fortnite

\$

Best laptop for Excel

\$

Best laptop for Solidworks

\$

Best laptop for Roblox

\$

Best computer for crypto mining

\$

Best laptop for Sims 4

\$

Latest questions

NUMPYNUMPY

Common xlabel/ylabel for matplotlib subplots

NUMPYNUMPY

How to specify multiple return types using type-hints

NUMPYNUMPY

Why do I get "Pickle - EOFError: Ran out of input" reading an empty file?

NUMPYNUMPY

Flake8: Ignore specific warning for entire file

NUMPYNUMPY

glob exclude pattern

NUMPYNUMPY

How to avoid HTTP error 429 (Too Many Requests) python

NUMPYNUMPY

Python CSV error: line contains NULL byte

NUMPYNUMPY

csv.Error: iterator should return strings, not bytes

## Wiki

Python | How to copy data from one Excel sheet to another

Common xlabel/ylabel for matplotlib subplots

Check if one list is a subset of another in Python

How to specify multiple return types using type-hints

Printing words vertically in Python

Python Extract words from a given string

Cyclic redundancy check in Python

Finding mean, median, mode in Python without libraries

Python add suffix / add prefix to strings in a list

Why do I get "Pickle - EOFError: Ran out of input" reading an empty file?

Python - Move item to the end of the list

Python - Print list vertically