An Example—Meteorological Data

Nelli, Fabio

doi:10.1007/978-1-4842-9532-8_10

Fabio Nelli²

2510 Accesses

Abstract

One type of data that’s easier to find on the Internet is meteorological data. Many sites provide historical data on many meteorological parameters, such as pressure, temperature, humidity, rain, and so on. You only need to specify the location and the date to get a file with datasets of measurements collected by weather stations. These data are a source of a wide range of information. As you read in the first chapter of this book, the purpose of data analysis is to transform the raw data into information and then convert it into knowledge.

Access provided by Autonomous University of Puebla. Download chapter PDF

One type of data that’s easier to find on the Internet is meteorological data. Many sites provide historical data on many meteorological parameters, such as pressure, temperature, humidity, rain, and so on. You only need to specify the location and the date to get a file with datasets of measurements collected by weather stations. These data are a source of a wide range of information. As you read in the first chapter of this book, the purpose of data analysis is to transform the raw data into information and then convert it into knowledge.

In this chapter, you will see a simple example of how to use meteorological data. This example is useful for getting a general idea of how to apply many of the techniques seen in the previous chapters.

A Hypothesis to Be Tested: The Influence of the Proximity of the Sea

At the time of writing of this chapter, I find myself at the beginning of summer and temperatures rising. On the weekend, many inland people travel to mountain villages or cities close to the sea, in order to enjoy a little refreshment and get away from the sultry weather of the inland cities. This has always made me wonder what effect the proximity of the sea has on the climate.

This simple question can be a good starting point for data analysis. I don’t want to pass this chapter off as something scientific; it’s just a way for someone passionate about data analysis to put knowledge into practice in order to answer this question—what influence, if any, does the proximity of the sea have on local climate?

The System in the Study: The Adriatic Sea and the Po Valley

Now that the problem has been defined, it is necessary to look for a system that is well suited to the study of the data and to provide a setting suitable for this question.

First you need a sea. Well, I’m in Italy and I have many seas to choose from, since Italy is a peninsula surrounded by seas. Why limit myself to Italy? Well, the problem involves a behavior typical of the Italians, that is, they take refuge in places close to the sea during the summer to avoid the heat of the hinterland. Not knowing if this behavior is the same for people of other nations, I will only consider Italy as a system of study.

But what areas of Italy might we consider studying? Can we assess the effects of the sea at various distances? This creates a lot of problems. In fact, Italy is rich in mountainous areas and doesn’t have a lot territory that uniformly extends for many kilometers inland. So, to assess the effects of the sea, I exclude the mountains, as they may introduce many other factors that also affect climate, such as altitude, for example.

A part of Italy that is well suited to this assessment is the Po Valley. This plain starts from the Adriatic Sea and spreads inland for hundreds of kilometers (see Figure 10-1). It is surrounded by mountains, but the width of the valley mitigates any mountain effects. It also has many towns and so it is easy to choose a set of cities increasingly distant from the sea, to cover a distance of almost 400 km in this evaluation.

A screenshot of Google Maps highlighting the regions in the northern part of Italy, which include Milano, Torino, Genova, Bologna, Venezia, and other locations. — **Figure 10-1**

The first step is to choose a set of ten cities that will serve as reference standards. These cities are selected in order to cover the entire range of the plain (see Figure 10-2).

A screenshot of Google Maps highlights the regions Milano, Torino, Asti, Piacenza, Mantova, Ferrara, Bologna, Faenza, Cesena, Ravenna, and Comacchio in the map of Italy. — **Figure 10-2**

In Figure 10-2, you can see the ten cities that were chosen to analyze weather data: five cities within the first 100 km and the other five distributed in the remaining 300 km.

Here are the chosen cities:

Ferrara
Torino
Mantova
Milano
Ravenna
Asti
Bologna
Piacenza
Cesena
Faenza

Now you have to determine the distances of these cities from the sea. You can follow many procedures to obtain these values. In this case, you can use the service provided by the site TheDistanceNow (www.thedistancenow.com/), which is available in many languages (see Figure 10-3).

A screenshot of the Distance Now web page. The title reads measure the distance between cities. It provides two text boxes to input the name of the cities. A button labeled go is there to measure the distance. — **Figure 10-3**

Thanks to this service, it is possible to calculate the approximate distances of the cities from the sea. You can do this by selecting a city on the coast as the destination. For many of them, you can choose the city of Comacchio as a reference to calculate the distance from the sea (see Figure 10-2). Once you have determined the distances from the ten cities, you will get the values shown in Table 10-1.

Table 10-1 The Distances from the Sea of the Ten Cities

Full size table

Finding the Data Source

Once you have defined the system under study, you need to establish a data source from which to obtain the needed data. By browsing the Internet, you can discover many sites that provide meteorological data measured from various locations around the world. One such site is OpenWeather, available at openweathermap.org (see Figure 10-4).

A screenshot denotes the weather details of London, along with a line graph of the hourly forecast of the rise in temperature and a table for the 8-day forecast. A Map of London is at the top, indicating no precipitation within an hour. — **Figure 10-4**

After you’ve signed up for an account and received an app ID code, this site enables you to capture data by specifying the city through a request via an URL.

https://api.openweathermap.org/data/2.5/weather?q=Atlanta,US&appid=5807ad2a45eb6bf4e81d137dafe74e15

This request will return a JSON file containing all the information about the current weather situation in the city in question (see Figure 10-5). This JSON file will be submitted for data analysis using the Python pandas library.

A screenshot denotes a U R L in the search bar, and a J son file at the bottom. The J son file represents the weather in Atlanta which include the details of mist, pressure, temperature, humidity, wind, pressure, and clouds. — **Figure 10-5**

Data Analysis on Jupyter Notebook

This chapter addresses data analysis using Jupyter Notebook. It allows you to enter and study portions of code gradually.

After the service has started, create a new Notebook.

Start by importing the necessary libraries:

import numpy as np import pandas as pd import datetime

The first step is to study the structure of the data received from the site through a specific request.

Choose a city from those chosen for the study, for example Ferrara, and make a request for its current meteorological data, using the URL specified. Without a browser, you can get the text content of a page by using the request.get() text function. Because the content obtained is in JSON format, you can directly read the text received following this format with the json.load() function.

import json import requests ferrara = json.loads(requests.get('https://api.openweathermap.org/data/2.5/weather?q=Ferrara,IT&appid=5807ad2a45eb6bf4e81d137dafe74e15').text)

Now you can see the contents of the JSON file with the meteorological data related to the city of Ferrara.

ferrara Out [ ]: {'coord': {'lon': 11.8333, 'lat': 44.8}, 'weather': [{'id': 804, 'main': 'Clouds', 'description': 'overcast clouds', 'icon': '04d'}], 'base': 'stations', 'main': {'temp': 292.51, 'feels_like': 292.22, 'temp_min': 292.04, 'temp_max': 293.74, 'pressure': 1017, 'humidity': 66, 'sea_level': 1017, 'grnd_level': 1017}, 'visibility': 10000, 'wind': {'speed': 2.68, 'deg': 50, 'gust': 7.76}, 'clouds': {'all': 100}, 'dt': 1685511558, 'sys': {'type': 2, 'id': 2007888, 'country': 'IT', 'sunrise': 1685503859, 'sunset': 1685558996}, 'timezone': 7200, 'id': 3177088, 'name': 'Provincia di Ferrara', 'cod': 200}

When you want to analyze the structure of a JSON file, the following command is useful:

list(ferrara.keys()) Out [ ]: ['coord', 'weather', 'base', 'main', 'visibility', 'wind', 'clouds', 'dt', 'sys', 'id', 'name', 'cod']

This way, you can have a list of all the keys that make up the internal structure of the JSON file. Once you know the name of these keys, you can easily access internal data.

print('Coordinates = ', ferrara['coord']) print('Weather = ', ferrara['weather']) print('base = ', ferrara['base']) print('main = ', ferrara['main']) print('visibility = ', ferrara['visibility']) print('wind = ', ferrara['wind']) print('clouds = ', ferrara['clouds']) print('dt = ', ferrara['dt']) print('sys = ', ferrara['sys']) print('id = ', ferrara['id']) print('name = ', ferrara['name']) print('cod = ', ferrara['cod']) Out [ ]: Coordinates = {'lon': 11.8333, 'lat': 44.8} Weather = [{'id': 804, 'main': 'Clouds', 'description': 'overcast clouds', 'icon': '04d'}] base = stations main = {'temp': 292.51, 'feels_like': 292.22, 'temp_min': 292.04, 'temp_max': 293.74, 'pressure': 1017, 'humidity': 66, 'sea_level': 1017, 'grnd_level': 1017} visibility = 10000 wind = {'speed': 2.68, 'deg': 50, 'gust': 7.76} clouds = {'all': 100} dt = 1685511558 sys = {'type': 2, 'id': 2007888, 'country': 'IT', 'sunrise': 1685503859, 'sunset': 1685558996} id = 3177088 name = Provincia di Ferrara cod = 200

Now choose the values that you consider most interesting or useful for this type of analysis. For example, an important value is temperature:

ferrara['main']['temp'] Out [ ]: 292.51

The purpose of this analysis of the initial structure is to identify the data that could be most important in the JSON structure. These data must be processed for analysis. That is, the data must be extracted from the structure, cleaned or modified according to your needs, and ordered in a dataframe. This way, you can apply all the data analysis techniques presented in this book.

A convenient way to avoid repeating the same code is to insert some extraction procedures into a function, such as the following:

def prepare(city,city_name): temp = [ ] humidity = [ ] pressure = [ ] description = [ ] dt = [ ] wind_speed = [ ] wind_deg = [ ] temp.append(city['main']['temp']-273.15) humidity.append(city['main']['humidity']) pressure.append(city['main']['pressure']) description.append(city['weather'][0]['description']) dt.append(city['dt']) wind_speed.append(city['wind']['speed']) wind_deg.append(city['wind']['deg']) headings = ['temp','humidity','pressure','description','dt','wind_speed','wind_deg'] data = [temp,humidity,pressure,description,dt,wind_speed,wind_deg] df = pd.DataFrame(data,index=headings) city = df.T city['city'] = city_name city['day'] = city['dt'].apply(datetime.datetime.fromtimestamp) return city

This function does nothing more than take the meteorological data you are interested in from the JSON structure, and once they are cleaned or modified (for example, dates and times), that data are collected in a row of a dataframe (as shown in Figure 10-6).

t1 = prepare(ferrara,'ferrara') t1

A table represents the values of temperature, humidity, pressure, description, wind speed, wind degree, city, and day. — **Figure 10-6**

Among all the parameters described in the JSON structure in the list column, these are the most appropriate for the study:

Temperature
Humidity
Pressure
Description
Wind speed
Wind degree

All these properties will be related to the time of acquisition expressed from the dt column, which contains a timestamp as the type of data. This value is difficult to read, so you can convert it into a datetime format that allows you to express the date and time in a manner more familiar to you. The new column will be called day.

city['day'] = city['dt'].apply(datetime.datetime.fromtimestamp)

Temperature is expressed in degrees Kelvin, and you can convert these values to Celsius by subtracting 273.15 from each value.

Finally, add the name of the city passed as a second argument of the prepare() function.

Data is collected at regular intervals, during different times of the day. For example, you could use a program that executes these requests every hour. Each acquisition will have a row of the dataframe structure that will be added to a general dataframe related to the city, called for example, df_ferrara (as shown in Figure 10-7).

df_ferrara = t1 t2 = prepare(ferrara,'ferrara') df_ferrara = pd.concat([df_ferrara, t2]) df_ferrara

A table represents the values of temperature, humidity, pressure, description, wind speed, wind degree, city, and day. It indicates the same values two times for the city Ferrara. — **Figure 10-7**

It often happens that data that’s useful to this analysis is not present in the JSON source. In that case, you have to resort to other data sources and import the missing data into the structure. In this example, the distances of the cities to the sea are indispensable. You repeat the procedure just described for all the cities in the list that you want to analyze. Then you add the distance values to the dataframe you obtain.

. df_ravenna['dist'] = 8 df_cesena['dist'] = 14 df_faenza['dist'] = 37 df_ferrara['dist'] = 47 df_bologna['dist'] = 71 df_mantova['dist'] = 121 df_piacenza['dist'] = 200 df_milano['dist'] = 250 df_asti['dist'] = 315 df_torino['dist'] = 357 .

Analysis of Processed Meteorological Data

For practical purposes, I have already collected data from all the cities involved in the analysis. I have already processed and collected them in a dataframe, which I saved as a CSV file.

If you want to refer to the data used in this chapter, you have to load the ten CSV files that I saved at the time of writing. These files contain data already processed to be used for this analysis.

df_ferrara=pd.read_csv('ferrara_270615.csv') df_milano=pd.read_csv('milano_270615.csv') df_mantova=pd.read_csv('mantova_270615.csv') df_ravenna=pd.read_csv('ravenna_270615.csv') df_torino=pd.read_csv('torino_270615.csv') df_asti=pd.read_csv('asti_270615.csv') df_bologna=pd.read_csv('bologna_270615.csv') df_piacenza=pd.read_csv('piacenza_270615.csv') df_cesena=pd.read_csv('cesena_270615.csv') df_faenza=pd.read_csv('faenza_270615.csv')

Thanks to the read_csv() function of pandas, you can convert CSV files to the dataframe in just one step.

Once you have uploaded data for each city as a dataframe, you can easily see the content.

df_cesena

As you can see in Figure 10-8, Jupyter Notebook makes it much easier to read dataframes with the generation of graphical tables. Furthermore, you can see that each row shows the measured values for each hour of the day, covering a timeline of about 20 hours in the past.

A table has 11 columns and 20 rows. The column headers are unnamed 0, temperature, humidity, pressure, description, wind speed, wind degree, city, date, and distance. It denotes the values for the city of Cesena. — **Figure 10-8**

In the case shown in Figure 10-8, note that there are only 19 rows. In fact, observing other cities, it looks like the meteorological measurement systems sometimes failed during the measuring process, leaving holes in the acquisition. If the data collected make up 19 rows, as in this case, they are sufficient to describe the trend of the meteorological properties during the day. However, it is good practice to check the size of all ten dataframes. If a city provides insufficient data to describe the daily trend, you need to replace it with another city.

There is an easy way to check the size, without having to put one table after another. Thanks to the shape() function, you can determine the number of data acquired (lines) for each city.

print(df_ferrara.shape) print(df_milano.shape) print(df_mantova.shape) print(df_ravenna.shape) print(df_torino.shape) print(df_asti.shape) print(df_bologna.shape) print(df_piacenza.shape) print(df_cesena.shape) print(df_faenza.shape)

This will give the following result:

(20, 9) (18, 9) (20, 9) (18, 9) (20, 9) (20, 9) (20, 9) (20, 9) (20, 9) (19, 9)

As you can see, the choice of ten cities is optimal, since the control units have provided enough data to continue with the data analysis.

A normal way to analyze the data you just collected is to use data visualization. You saw that the matplotlib library includes a set of tools to generate charts on which to display data. In fact, data visualization helps you during data analysis to discover features of the system you are studying.

Next, activate the necessary libraries:

%matplotlib inline import matplotlib.pyplot as plt import matplotlib.dates as mdates

For example, there is a simple way to analyze the trend of the temperature during the day. Consider the city of Milan.

y1 = df_milano['temp'] df_milano['day'] = pd.to_datetime(df_milano['day']) x1 = df_milano['day'] fig, ax = plt.subplots() plt.xticks(rotation=70) hours = mdates.DateFormatter('%H:%M') ax.xaxis.set_major_formatter(hours) ax.plot(x1,y1,'r')

Executing this code, you get the graph shown in Figure 10-9. As you can see, the temperature trend follows a nearly sinusoidal pattern characterized by a temperature that rises in the morning, to reach the maximum value during the heat of the afternoon (between 2:00 and 6:00 pm). Then the temperature decreases to a minimum value corresponding to just before dawn, that is, at 6:00 am.

A line graph denotes a decreasing trend for temperature after an initial increment up to 6 p m. The peak temperature floats around 30 to 32 and decreases to the least by 6 a m. — **Figure 10-9**

Because the purpose of your analysis is to try to interpret the weather, it is possible to assess how and if the sea influences this trend. This time, you evaluate the trends of different cities simultaneously. This is the only way to see if the analysis is going in the right direction. Thus, choose the three cities closest to the sea and the three cities farthest from it.

y1 = df_ravenna['temp'] df_ravenna['day'] = pd.to_datetime(df_ravenna['day']) x1 = df_ravenna['day'] y2 = df_faenza['temp'] df_faenza['day'] = pd.to_datetime(df_faenza['day']) x2 = df_faenza['day'] y3 = df_cesena['temp'] df_cesena['day'] = pd.to_datetime(df_cesena['day']) x3 = df_cesena['day'] y4 = df_milano['temp'] df_milano['day'] = pd.to_datetime(df_milano['day']) x4 = df_milano['day'] y5 = df_asti['temp'] df_asti['day'] = pd.to_datetime(df_asti['day']) x5 = df_asti['day'] y6 = df_torino['temp'] df_torino['day'] = pd.to_datetime(df_torino['day']) x6 = df_torino['day'] fig, ax = plt.subplots() plt.xticks(rotation=70) hours = mdates.DateFormatter('%H:%M') ax.xaxis.set_major_formatter(hours) plt.plot(x1,y1,'r',x2,y2,'r',x3,y3,'r') plt.plot(x4,y4,'g',x5,y5,'g',x6,y6,'g')

This code will produce the chart shown in Figure 10-10. The temperatures of the three cities closest to the sea are shown in red, while the temperatures of the three cities farthest away are in green.

A line graph denotes a decreasing trend for temperature after an initial increment up to 5 p m. The peak temperature reaches nearer to 32 and decreases to the least by 5 a m. There are 6 lines dived into 2 groups and each group has a nearly similar trend of lines. — **Figure 10-10**

Looking at Figure 10-10, the results seem promising. In fact, the three closest cities have maximum temperatures much lower than those farthest away, whereas there seems to be little difference in the minimum temperatures.

In order to go deep into this aspect, you can collect the maximum and minimum temperatures of all ten cities and display a line chart that charts these temperatures compared to their distances from the sea.

dist = [df_ravenna['dist'][0], df_cesena['dist'][0], df_faenza['dist'][0], df_ferrara['dist'][0], df_bologna['dist'][0], df_mantova['dist'][0], df_piacenza['dist'][0], df_milano['dist'][0], df_asti['dist'][0], df_torino['dist'][0] ]temp_max = [df_ravenna['temp'].max(), df_cesena['temp'].max(), df_faenza['temp'].max(), df_ferrara['temp'].max(), df_bologna['temp'].max(), df_mantova['temp'].max(), df_piacenza['temp'].max(), df_milano['temp'].max(), df_asti['temp'].max(), df_torino['temp'].max() ] temp_min = [df_ravenna['temp'].min(), df_cesena['temp'].min(), df_faenza['temp'].min(), df_ferrara['temp'].min(), df_bologna['temp'].min(), df_mantova['temp'].min(), df_piacenza['temp'].min(), df_milano['temp'].min(), df_asti['temp'].min(), df_torino['temp'].min() ]

Start by representing the maximum temperatures.

plt.plot(dist,temp_max,'ro')

The result is shown in Figure 10-11.

A scatterplot denotes an increasing trend of temperature with respect to the distance from the sea that ranges from 0 to 400 kilometers. The temperature stays the least between the distance 0 to 50, which increases and fluctuates as the distance rises. — **Figure 10-11**

As shown in Figure 10-11, you can affirm the hypothesis that the presence of the sea somehow influences meteorological parameters is true (at least in the day today ☺).

Furthermore, you can see that the effect of the sea decreases rapidly, and after about 60-70 km, the maximum temperatures reach a plateau.

An interesting idea would be to represent the two different trends with two straight lines obtained by linear regression. To do this, you can use the SVR method provided by the scikit-learn library.

If you haven’t installed the scikit-learn library yet, do so now. If you are working with Anaconda, you can install it via Anaconda Navigator by selecting it from the packages available in your virtual environment (see Figure 10-12), or from the CMD.exe Prompt console, by entering this command:

conda install scikit-learn

A screenshot of the Anaconda navigator represents the settings under the environments options. It highlights the option of edition 3 which leads to a table comprising name and description. It highlights the name 101 scikit learn from the table. — **Figure 10-12**

At this point, you can continue with the example by inserting the following code into a cell in the Notebook:

x = np.array(dist) y = np.array(temp_max) x1 = x[x<100] x1 = x1.reshape((x1.size,1)) y1 = y[x<100] x2 = x[x>50] x2 = x2.reshape((x2.size,1)) y2 = y[x>50] from sklearn.svm import SVR svr_lin1 = SVR(kernel='linear', C=1e3) svr_lin2 = SVR(kernel='linear', C=1e3) svr_lin1.fit(x1, y1) svr_lin2.fit(x2, y2) xp1 = np.arange(10,100,10).reshape((9,1)) xp2 = np.arange(50,400,50).reshape((7,1)) yp1 = svr_lin1.predict(xp1) yp2 = svr_lin2.predict(xp2) plt.plot(xp1, yp1, c='r', label='Strong sea effect') plt.plot(xp2, yp2, c='b', label='Light sea effect') plt.axis((0,400,27,32)) plt.scatter(x, y, c='k', label='data')

This code will produce the chart shown in Figure 10-13.

A scatterplot with lines represents the trend of temperature with respect to the distance from the sea. It denotes a steep increase as the distance rises from 0 to 100, which decreases slightly and fluctuates up to the range of 400 kilometers. — **Figure 10-13**

As you can see, temperature increase in the first 60 km is very rapid, rising from 28 to 31 degrees. It then increases very mildly (if at all) over longer distances. The two trends are described by two straight lines that have the following expression

$$ \mathrm{x}=\mathrm{ax}+\mathrm{b} $$

where a is the slope and the b is the intercept.

print( svr_lin1.coef_) print( svr_lin1.intercept_) print( svr_lin2.coef_) print( svr_lin2.intercept_) Out [ ]: [[-0.04794118]] [ 27.65617647] [[-0.00317797]] [ 30.2854661]

You might consider the intersection point of the two lines as the point between the area where the sea exerts its influence and the area where it doesn’t, or at least not as strongly.

from scipy.optimize import fsolve def line1(x): a1 = svr_lin1.coef_[0][0] b1 = svr_lin1.intercept_[0] return a1*x + b1 def line2(x): a2 = svr_lin2.coef_[0][0] b2 = svr_lin2.intercept_[0] return a2*x + b2 def findIntersection(fun1,fun2,x0): return fsolve(lambda x : fun1(x) - fun2(x),x0) result = findIntersection(line1,line2,0.0) print("[x,y] = [ %d , %d ]" % (result,line1(result))) x = np.linspace(0,300,31) plt.plot(x,line1(x),x,line2(x),result,line1(result),'ro')

Executing the code, you can find the point of intersection as follows:

Out [ ]: [x,y] = [ 58, 30 ]

This point is represented in the chart shown in Figure 10-14.

A line graph denotes the variation in temperature with respect to the distance. It denotes two intersecting lines and highlights the point of intersection, which is approximately at (58, 30). — **Figure 10-14**

You can say that the average distance in which the effects of the sea vanish is 58 km.

Now you can analyze the minimum temperatures.

plt.axis((0,400,15,25)) plt.plot(dist,temp_min,'bo')

Doing this, you’ll obtain the chart shown in Figure 10-15.

A scatterplot denotes a linear trend for temperature initially, that fluctuates and increases later. The temperature starts at the value of 18 and stays mostly around it. It peaks at the distance rises from 300 to 350. — **Figure 10-15**

In this case, it appears very clear that the sea has no effect on minimum temperatures recorded during the night, or rather, around six in the morning. If I remember well, when I was a child I was taught that the sea mitigated the cold temperatures, or that the sea released the heat absorbed during the day. This does not seem to be the case. This case tracks summer in Italy; it would be interesting to see if this hypothesis is true in the winter or somewhere else.

Another meteorological measure contained in the ten dataframes is the humidity. Even for this measure, you can see the trend of the humidity during the day for the three cities closest to the sea and for the three farthest away.

y1 = df_ravenna['humidity'] x1 = df_ravenna['day'] y2 = df_faenza['humidity'] x2 = df_faenza['day'] y3 = df_cesena['humidity'] x3 = df_cesena['day'] y4 = df_milano['humidity'] x4 = df_milano['day'] y5 = df_asti['humidity'] x5 = df_asti['day'] y6 = df_torino['humidity'] x6 = df_torino['day'] fig, ax = plt.subplots() plt.xticks(rotation=70) hours = mdates.DateFormatter('%H:%M') ax.xaxis.set_major_formatter(hours) plt.plot(x1,y1,'r',x2,y2,'r',x3,y3,'r') plt.plot(x4,y4,'g',x5,y5,'g',x6,y6,'g')

This code will create the chart shown in Figure 10-16.

A line graph denotes the increasing trend of humidity at different times of the day. There are 5 lines, grouped in 2 different colors. Cities nearer the sea have higher humidity. The humidity is less around 2 p m and peaks around 6 a m for most cities. values are approximate. — **Figure 10-16**

At first glance, it would seem that the cities closest to the sea experience more humidity than those farthest away and that this difference in moisture (about 20 percent) extends throughout the day. You can see if this remains true when you report the maximum and minimum humidity with respect to the distances from the sea.

hum_max = [df_ravenna['humidity'].max(), df_cesena['humidity'].max(), df_faenza['humidity'].max(), df_ferrara['humidity'].max(), df_bologna['humidity'].max(), df_mantova['humidity'].max(), df_piacenza['humidity'].max(), df_milano['humidity'].max(), df_asti['humidity'].max(), df_torino['humidity'].max() ] plt.plot(dist,hum_max,'bo')

The maximum humidity of ten cities according to their distance from the sea is represented in the chart in Figure 10-17.

A scatterplot denotes a decreasing trend of humidity with respect to the distance from the sea. The value of humidity decreases from 95 to 76 as the distance rises from 0 to 250. It increases and stays between 80 to 85 for the distance between 300 to 350. Values are approximate. — **Figure 10-17**

hum_min = [df_ravenna['humidity'].min(), df_cesena['humidity'].min(), df_faenza['humidity'].min(), df_ferrara['humidity'].min(), df_bologna['humidity'].min(), df_mantova['humidity'].min(), df_piacenza['humidity'].min(), df_milano['humidity'].min(), df_asti['humidity'].min(), df_torino['humidity'].min() ] plt.plot(dist,hum_min,'bo')

The minimum humidity of ten cities according to their distance from the sea is represented in the chart in Figure 10-18.

A scatterplot denotes a fluctuating trend of humidity with respect to the distance from the sea. The value of humidity decreases from 61 to 28 as the distance rises from 0 to 300. It increases and stays around 45 for a distance between 350 to 400. Values are approximate. — **Figure 10-18**

Looking at Figures 10-17 and 10-18, you can certainly see that the humidity, both the minimum and maximum, is greater in the cities closest to the sea. However, in my opinion, it is not possible to say that there is a linear relationship or some other kind of relationship to draw a curve. The collected points (ten) are too few to highlight a trend in this case.

The RoseWind

Among the various meteorological data that were collected for each city are those related to the wind:

Wind degree (direction)
Wind speed

If you analyze the dataframe, you will notice that the wind speed is relative to the direction it blows and the time of day. For instance, each measurement shows the direction in which the wind blows (see Figure 10-19).

A table represents the data on wind degrees and wind speed for 18 different time instances between 27 and 28 June 2015. — **Figure 10-19**

To better analyze this kind of data, it is necessary to visualize them. In this case, a linear chart in Cartesian coordinates is not the most optimal approach.

If you use the classic scatterplot with the points contained in a single dataframe:

plt.plot(df_ravenna['wind_deg'],df_ravenna['wind_speed'],'ro')

You get a chart like the one shown in Figure 10-20, which certainly is not very educational.

A scatterplot represents the distribution of points in 360 degrees. The numbers on the y-axis range from 1 to 6, while the x-axis ranges from 75 to 250. The plots have a steep decline initially and progress with fluctuation towards the end. — **Figure 10-20**

To represent a distribution of points in 360 degrees, it’s best to use another type of visualization: the polar chart. You have already seen this kind of chart in Chapter 8.

First you need to create a histogram, whereby the data are distributed over the interval of 360 degrees divided into eight bins, each of which is 45 degrees.

hist, bins = np.histogram(df_ravenna['wind_deg'],8,[0,360]) print(hist) print(bins)

The values returned are occurrences within each bin expressed by an array called hist:

Out [ ]: [ 0 5 11 1 0 1 0 0]

and an array called bins, which defines the edges of each bin within the range of 360 degrees.

Out [ ]: [ 0. 45. 90. 135. 180. 225. 270. 315. 360.]

These arrays will be useful to correctly define the polar chart to be drawn. For this purpose, you have to create a function in part by using the code contained in Chapter 8. This function is called showRoseWind(), and it will need three different arguments: values is the array containing the values to be displayed, which in this case is the hist array; city_name is a string containing the name of the city to be shown as the chart title; and max_value is an integer that establishes the maximum value for presenting the blue color.

Defining a function of this kind helps you avoid rewriting the same code many times, and it produces more modular code, which allows you to focus on the concepts related to a particular operation within a function.

def showRoseWind(values,city_name,max_value): N = 8 theta = np.arange(0.,2 * np.pi, 2 * np.pi / N) radii = np.array(values) plt.axes([0.025, 0.025, 0.95, 0.95], polar=True) colors = [(1-x/max_value, 1-x/max_value, 0.75) for x in radii] plt.bar(theta +np.pi/8, radii, width=(2*np.pi/N), bottom=0.0, color=colors) plt.title(city_name,x=0.2, fontsize=20)

One thing that changed is the color map. In this case, the closer to blue the slice is, the greater the value it represents.

Once you define a function, you can use it:

showRoseWind(hist,'Ravenna',max(hist))

Executing this code, you will obtain a polar chart like the one shown in Figure 10-21.

A polar chart denotes the distribution of values within the range of 0 to 360 degrees for Ravenna. There are 5 concentric circles having the values 2, 4, 6, 8, and 10 in the outward direction. The region between the angle of 90 to 135 degrees is shaded. — **Figure 10-21**

As you can see in Figure 10-20, you have a range of 360 degrees divided into eight areas of 45 degrees each (bin), in which a scale of values is represented radially. In each of the eight areas, a slice is represented with a variable length that corresponds precisely to the corresponding value. The more radially extended the slice is, the greater the value represented. In order to increase the readability of the chart, a color scale has been entered that corresponds to the extension of its slice. The wider the slice is, the more the color tends to a deep blue.

This polar chart provides you with information about how the wind direction will be distributed radially. In this case, the wind has blown purely toward the southwest/west most of the day.

Once you have defined the showRoseWind function, it is very easy to observe the winds with respect to any of the ten sample cities.

hist, bin = np.histogram(df_ferrara['wind_deg'],8,[0,360]) print(hist) showRoseWind(hist,'Ferrara', 15.0) Out [ ]: [7 2 3 3 3 2 0 0]

Figure 10-22 shows the polar charts of the ten cities.

A set of 10 polar charts represents the wind direction in the cities of Torino, Cesena, Faenza, Ferrara, Piacenza, Mantova, Bologna, Asti, Milano, and Ravenna. Each chart highlights different sections of angles in darker shades. — **Figure 10-22**

Calculating the Mean Distribution of the Wind Speed

Even the other quantity that relates the speed of the winds can be represented as a distribution on 360 degrees.

Now define a feature called RoseWind_Speed that will allow you to calculate the mean wind speeds for each of the eight bins into which 360 degrees are divided.

def RoseWind_Speed(df_city): degs = np.arange(45,361,45) tmp = [] for deg in degs: tmp.append(df_city[(df_city['wind_deg']>(deg-46)) & (df_city['wind_deg']<deg)]['wind_speed'].mean()) return np.nan_to_num(tmp)

This function returns a NumPy array containing the eight mean wind speeds. This array will be used as the first argument of the ShowRoseWind_Speed() function, which is an improved version of the previous ShowRoseWind() function used to represent the polar chart.

def showRoseWind_Speed(speeds,city_name): N = 8 theta = np.arange(0,2 * np.pi, 2 * np.pi / N) radii = np.array(speeds) plt.axes([0.025, 0.025, 0.95, 0.95], polar=True) colors = [(1-x/10.0, 1-x/10.0, 0.75) for x in radii] bars = plt.bar(theta+np.pi/8, radii, width=(2*np.pi/N), bottom=0.0, color=colors) plt.title(city_name,x=0.2, fontsize=20) showRoseWind_Speed(RoseWind_Speed(df_ravenna),'Ravenna')

Figure 10-23 represents the RoseWind corresponding to the wind speeds distributed around 360 degrees.

A polar chart for Ravenna represents the distribution of angles from 0 to 360 at an interval of 45 degrees, There are 4 concentric circles with the values 1 to 4, from inside out. The region between 0 to 90 degrees is shaded darker, while other sections have lighter shades. — **Figure 10-23**

At the end of all this work, you can save the dataframe as a CSV file, thanks to the to_csv () function of the pandas library.

df_ferrara.to_csv('ferrara.csv') df_milano.to_csv('milano.csv') df_mantova.to_csv('mantova.csv') df_ravenna.to_csv('ravenna.csv') df_torino.to_csv('torino.csv') df_asti.to_csv('asti.csv') df_bologna.to_csv('bologna.csv') df_piacenza.to_csv('piacenza.csv') df_cesena.to_csv('cesena.csv') df_faenza.to_csv('faenza.csv')

Conclusions

The purpose of this chapter was mainly to show how you can get information from raw data. Some of this information will not lead to important conclusions, while other information will lead to the confirmation of a hypothesis, thus increasing your state of knowledge. These are the cases in which data analysis has led to a success.

In the next chapter, you see another case related to real data obtained from an open data source. You also see how you can further enhance the graphical representation of the data using the D3 JavaScript library. This library, although not Python, can be easily integrated into Python.

Author information

Authors and Affiliations

Rome, Italy
Fabio Nelli

Authors

Fabio Nelli
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Nelli, F. (2023). An Example—Meteorological Data. In: Python Data Analytics. Apress, Berkeley, CA. https://doi.org/10.1007/978-1-4842-9532-8_10

Download citation

DOI: https://doi.org/10.1007/978-1-4842-9532-8_10
Published: 02 September 2023
Publisher Name: Apress, Berkeley, CA
Print ISBN: 978-1-4842-9531-1
Online ISBN: 978-1-4842-9532-8
eBook Packages: Professional and Applied ComputingApress Access BooksProfessional and Applied Computing (R0)

Publish with us

Policies and ethics