Let`s use a real dataset from TRAI to analyze mobile data rates and try to see the average speeds for a particular operator or state this month. It will also show how easily Pandas can be used on any real data to produce interesting results.
About the dataset —
Telecommunications Regulatory Authority of India (TRAI ) releases a monthly dataset of internet speeds measured by it via the app MySpeed (TRAI) . This includes userinitiated speed tests or periodic background tests performed by the application. We will try to analyze this dataset and see the average speeds for a specific operator or state this month.
Checking the raw data structure:
1st column is of the Network Operator – JIO, Airtel etc.
2nd column is of the Network Technology – 3G or 4G .
3rd column is the Type of Test initiated – upload or download .
4th column is the Speed Measured in Kilobytes per second.
5th column is the Signal Strength during the measurement.
6th column is the Local Service Area (LSA) , or the circle where the test was done – Delhi, Orissa etc. We will refer to this as simply `states`.
NOTE. Signal strength can be na (Not Available)
due to the fact that some devices cannot intercept the signal. We will ignore the use of this parameter in our calculations to simplify the process. However, this can easily be added as a condition when filtering.
Packages required –
Pandas – a popular data analysis toolkit. Very powerful for crunching large sets of data.
Numpy – provides fast and efficient operations on arrays of homogeneous data. We will use this to along with pandas and matplotlib.
Matplotlib – is a plotting library. We will use its bar plotting function to make bar graphs.
Let`s start analyzing the data.
Step # 1: Import packages and define some constants.

Step # 2: Define multiple lists that will store the final computed results so they can be easily passed to function of building a histogram. The state (or operator), download speed, and download speed will be stored sequentially, so the index, state (or operator) and their corresponding download and upload speeds can be accessed.
For example, final_states [2 ], final_download_speeds [2]
and final_upload_speeds [2]
will give the corresponding values for the 3rd state.
# define lists
final_download_speeds
=
[]
final_upload_speeds
= []
final_states
=
[]
final_operators
=
[]
Step # 3: Import the file using read_csv ()
Pandas read_csv ()
and save it to & # 39; df & # 39 ;. This will create a DataFrame of CSV content that we will work on.

Step # 4: First, let`s find all the unique states and statements in this dataset and store them in their respective lists of states and statements.
# find and display unique states
states
=
df [
` State`
]. unique ()
print
(
` STATES Found: `
, states)
# find and display unique operators
operators
=
df [
`Service Provider`
]. unique ()
print
(
`OPERATORS Found:`
, operators)
Exit:
STATES Found: [`Kerala`` Rajasthan` `Maharashtra`` UP East`` Karnataka` nan `Madhya Pradesh`` Kolkata`` Bihar`` Gujarat`` UP West, Orissa, Tamil Nadu, Delhi, Assam, Andhra Pradesh, Haryana, Punjab, North East, Mumbai, Chennai, Himachal Pradesh, Jammu & amp; Kashmir`` West Bengal`] OPERATORS Found: [`IDEA`` JIO` `AIRTEL`` VODAFONE`` CELLONE`]
Step # 5: Define the fixed_operator
function, which will keep the statement constant and iterate over all the available states for that statement. We can build a similar function for a fixed state.

Exit :
Kerala  Avg. Download: 26129.27 Avg. Upload: 5193.46 Rajasthan  Avg. Download: 27784.86 Avg. Upload: 5736.18 Maharashtra  Avg. Download: 20707.88 Avg. Upload: 4130.46 UP East  Avg. Download: 22451.35 Avg. Upload: 5727.95 Karnataka  Avg. Download: 16950.36 Avg. Upload: 4720.68 Madhya Pradesh  Avg. Download: 23594.85 Avg. Upload: 4802.89 Kolkata  Avg. Download: 26747.80 Avg. Upload: 5655.55 Bihar  Avg. Download: 31730.54 Avg. Upload: 6599.45 Gujarat  Avg. Download: 16377.43 Avg. Upload: 3642.89 UP West  Avg. Download: 23720.82 Avg. Upload: 5280.46 Orissa  Avg. Download: 31502.05 Avg. Upload: 6895.46 Tamil Nadu  Avg. Download: 16689.28 Avg. Upload: 4107.44 Delhi  Avg. Download: 20308.30 Avg. Upload: 4877.40 Assam  Avg. Download: 5653.49 Avg. Upload: 2864.47 Andhra Pradesh  Avg. Download: 32444.07 Avg. Upload: 5755.95 Haryana  Avg. Download: 7170.63 Avg. Upload: 2680.02 Punjab  Avg. Download: 14454.45 Avg. Upload: 4981.15 North East  Avg. Download: 6702.29 Avg. Upload: 2966.84 Mumbai  Avg. Download: 14070.97 Avg. Upload: 4118.21 Chennai  Avg. Download: 20054.47 Avg. Upload: 4602.35 Himachal Pradesh  Avg. Download: 7436.99 Avg. Upload: 4020.09 Jammu & amp; Kashmir  Avg. Download: 8759.20 Avg. Upload: 4418.21 West Bengal  Avg. Download: 16821.17 Avg. Upload: 3628.78
Use the arange ()
method of Numpy, which returns evenly spaced values over a given interval. Here, passing the length of the list final_states
, we get values from 0 to the number of states in the list, for example [0, 1, 2, 3 ...]
Then we can use these indices to build a bar in this place. The second bar is constructed by offsetting the location of the first bar by the width of the bar.

Let`s also take data from another month and plot it together to see the difference in data rates.
In this example, the dataset from the previous month will be the same as sept18_publish.csv, and next month`s dataset — oct18_publish.csv .
We just need to follow the same steps again. Read the data for another month. Filter it on subsequent data frames and then plot it using a slightly different method. During the construction of the columns, we will grow the 3rd and 4th columns (corresponding to the upload and download of the second file) by 2 and 3 times the width of the columns so that they are in their correct positions.
Below is the implementation for comparison data for 2 months:
