### ENGG1811 Lab07: numpy data analysis

ENGG1811 Lab 07: numpy data analysis

Objectives

After completing this lab, students should be able to

Use numpy to solve some data analysis problems

Assessment

This lab consists of Part A only.

You are expected to complete all the parts and show them to your tutor. Remember you must be ready for some assessment half an hour before the end of the class. These two exercises are worth 2 marks.

For all the programs, we expect that you choose informative variable names and document your program. There is also an online assessment question which is worth 1 mark. We suggest that you attempt this question after completing both parts.

Getting Started

We suggest that you create a folder called lab07 to keep all the files for this lab.

Part A: Data analysis using numpy

Among the many data sets used in climate change research, the Sea Ice Index, or the area covered by sea ice in the northern hemisphere shows particularly marked changes. There are seasonal variations across each year, and trends that span a number of years.

The file sea_ice.txt (You can download this file by right clicking on the link, and choose “Download Linked File”) contains northern hemisphere total sea ice data, in units of millions of square km, between 1979 and 2013 (i.e., 35 years in total). There are 24 measurements per year, measured once every half a month. The data in sea_ice.txt has 35 rows (one row per year) and 25 columns where the first column contains the year in which the measurements in that row were taken; the half-monthly measurements are stored from the second to the last column. After you have downloaded the file sea_ice.txt, you need to make sure that it is in the directory that you had created earlier.

You will need to do a number of pre-processing steps to get the data ready. The steps are shown in the box below with yellow background. One of the first step of data analysis is always to plot the data, so you will do that.

You are asked to run these lines one by one and observe the results. The comments will also help you to understand what those commands are doing.

# Import packages

import numpy as np

import matplotlib.pyplot as plt

# Load data and store it as a numpy array called data_sea_ice

data_sea_ice = np.loadtxt(fname=”sea_ice.txt”)

# Check the shape of the numpy array

print(“The shape of the numpy array is: “, data_sea_ice.shape)

# The shape of the array is 35 x 25

# number of rows = number of years

# Each row has 25 elements:

# The first element is the year, followed by 24 measurements per year (i.e. 1 measurement per half a month)

# Print out the first row to confirm the data format

print(“The first row of data is \n”, data_sea_ice[0,:])

# We need to move the years into a different variable and then remove the years from the first column

years = data_sea_ice[:,0]; # first column, easy

data_sea_ice = np.delete(data_sea_ice, 0, axis=1) # remove the first column

print(“The shape of the numpy array is: “, data_sea_ice.shape) # should be 35 (years) x 24 (half-monthly samples)

# The following line uses a numpy function to produce the array: array([0.5, 1, 1.5, 2, …, 11.5, 12])

months = np.linspace(0.5,12,24)

# We haven’t discussed linspace yet but for this lab, it is sufficient for you to know what the contents of the numpy array months are

# If you want to know more about the numpy function linspace, its manual page is at

# https://docs.scipy.org/doc/numpy-1.13.0/reference/generated/numpy.linspace.html#numpy.linspace

# plot the data – you should add the xlabel, ylabel, title

# You will see 35 curves, each curve depicts the variation of sea ice extents in one year

# The seasonal variation should be obvious and don’t forget, the data came from the northern hemisphere

# You will need to plot more figures later on, so put it in Figure 1

plt.figure(1)

plt.plot(months, np.transpose(data_sea_ice))

plt.xticks([2, 4, 6, 8, 10, 12], [“Feb”, “Apr”, “Jun”, “Aug”, “Oct”, “Dec”]) # This shows you how to use xticks

plt.show()

The following is a number of questions on the data set which you should answer using numpy. You should not use any loops. Most questions can be answered with just one line of Python code. There is no restriction that you must use as few lines of code as possible. You can break the steps up into multiple lines of code if you want, especially for Questions 7 and 8 which have several steps. There is only one restriction: no loops.

You should copy your Python code to a script so that you can show your tutor later on. Another reason is that we will work on this data set again in a later lab.

If you want to check your answers, some of them are on this page.

Determine the annual average sea ice extent, i.e. for each year, compute the average of all the half-monthly measurements over that year. Store the answer in the variable avg_sea_ice_annual. You should plot the annual average sea ice extent in Figure 2 to have a look.

Determine the average sea ice extent for each half-month, i.e. average over the years for each half-month.

Determine the average sea ice extent over the entire data collection. Store the answer in the variable avg_sea_ice which you will need later.

For each year, determine the number of half-months that exceeds the overall average calculated in Question 3.

Determine the number of years whose annual average is less than the overall average calculated in Question 3.

Determine the number of years, within the last 10 years of data, whose annual average is less than the overall average calculated in Question 3. (Note: Questions 5 and 6 together tell you something important about the status of ice extents.) Hint: You have learnt about list slicing in Week 3B. You can slice a numpy array in the same way. This question requires you to slice an array from the end of the array, not the beginning. Check the week 3B lecture where we covered this if you need a refresher. There are also examples from Week 5C.

Determine the 10 years that have the lowest 10 annual average sea ice extents. You should arrange the 10 years such that their corresponding annual averages are sorted in ascending order. That is, the first year in the list is the year that has the lowest annual average sea ice extent, the second year has the second lowest annual average and so on. Hint: Use the numpy.argsort() function (link to the argsort() manual page) to get a list of sorted indexes. Think about what you want to sort by (one of the variables you made earlier in the problem), use argsort() on it and then use those indexes on the array years. If you’re confused check out the first example on the argsort() page and understand what it’s doing (the other examples are a bit more complicated).

The 2-dimensional array data_sea_ice contains half-monthly data, but you want to work with monthly data. You want to obtain a 2-dimensional array sea_ice_monthly which contains monthly data. The matrix should have 35 rows and 12 columns. Column 1 contains the average of the two measurements in January, Column 2 for February, etc. The problem is to compute sea_ice_monthly from data_sea_ice without using any loops. You may find the reshape function useful. If you want some hints, click here.

At the End of the Lab

You should be able to show your tutor your solution to the exercises. You should be comfortable with writing function with default parameter values and numpy.

Finally, do not forget to complete your online multiple choice question if you have not done it yet.

If you have completed everything, please do not forget to logout. Simply double click on the “Log Out” icon