1811 Lab9 ENGG1811 Lab 09: File handling, numpy Objectives

Objectives

After completing this lab, students should be able to:

  • Read files and extract data from them
  • Use numpy broadcasting
  • Selecting elements in a numpy array using the colon notation, the double colon notation and Boolean indexing
  • Use numpy to do data analysis

Assessment

This lab has three parts: Parts A to C. You need to show your tutors all three parts.

For all the programs, we expect that you choose informative variable names and document your program.

There is also an online multiple choice question which is worth 1 mark. We suggest that you attempt this question after completing Parts A-C.

Organising your work

You should make a directory called lab09 to store your files for this lab.


Part A: File handling

The aim of the exercise is practice using Python to read data files. For this exercise, you need to download the zip-file lab09A_files.zip [Note: right click to download  and move it into the directory lab09 that you created. This file contains 30 files zipped together. You will need to extract these files into a directory call lab09A under lab09. The steps to do this in the CSE lab computers can be found here.

After completing the extraction, you will find 30 files in the directory lab09A. These files have names temp00.txt, temp01.txt, temp02.txt, …, temp29.txt. All these 30 files have the same format. If you double click on a file, a text editor window will pop up and you will be able to see the contents of the file. You need to take care not to edit the files because you do not wish to change the format of the file. The contents of temp00.txt are:

A    4
23.31    20.89    27.04    27.50    29.70

The contents of temp01.txt are:

B    2
29.65    25.46    29.44    21.81    28.75

The format of the files are:

  • Each file contains two lines.
  • The first line contains a letter and an integer separated by a tab. The letter can be either A or B. The integer has a value between 2 and 5, inclusively.
  • The second line has five floating point numbers separated by tabs.

We ask you to write a Python program, with the name file_proc.py, whose task is to read in these files and then use their contents to:

  • Determine the number of files that has the letter A in it
  • Compute a numpy array of float type with 30 entries. We will refer to this array as mean_data.
    • The entry mean_data[0] is determined from the contents of temp00.txt. Similarly, mean_data[1]is determined from the contents of temp01.txt and so on.
    • In order to compute mean_data[0], you first look up the integer in the first line of the corresponding file temp00.txt which is 4 in this case. The value of mean_data[0] is the mean of the first 4 numbers (i.e. 23.31, 20.89, 27.04, 27.50) in the second line of the file.
    • Similar, in order to compute mean_data[1], you first look up the integer in the first line of temp01.txt which is 2 in this case. The value of mean_data[1] is the mean of the first 2 numbers (i.e. 29.65, 25.46) in the second line of the file.
    • Other entries of mean_data are determined in the same way.

Note that you will need to put your Python file file_proc.py in the directory lab09A so that it can read the files temp00.txt, temp01.txt etc. This is because, by default, a Python program will look for files in the directory that it is in. However, it is possible to put the Python program file in one directory and the data files in a different directory. If you want to learn how to do this, you can consult this text, in particular Section 11.2.

If you want to check some answers for this question, see here.

Hints:

  • For this exercise, you will need to open a number of similarly named files one after another, you can use a loop and update the string filename every iteration. You can get some idea from the write_file_ex_final.py example in Week 8’s lecture.
  • Note that the numbering of the filenames is 00, 01, .., 09, 10, 11 etc. It means that you need to pad a zero when the index is a single digit number. You can use a if-statement to handle that.
  • You can find some examples on reading files in file_read.py that we discussed in the lecture.

Part B: numpy computation

This exercise is based on the Python file distance.py. We suggest that you download the file distance.py and open it in Spyder because it will make it easier for you to follow the description below.

The Python file distance.py defines two numpy arrays with the name pos and ref. The array pos has a shape of (6,2). You can consider each row of pos is used to store the position of an object. For example, pos[0,0] and pos[0,1] store, respectively, the x- and y-coordinates of the first object.

The array elements ref[0] and ref[1] store, respectively, the x- and y-coordinates of a reference point.

Your task is to compute the distance between each of the 6 objects in pos from the reference point. If the co-ordinates of the object is (x,y) and those of the reference point are (a,b), then their distance is given by the formula:

We require that you complete this task using numpy functions and arithmetic operators. You are not allowed to use any loops. You can also find the expected answers in the file distance.py.

Hints: You will need to use numpy broadcasting discussed in Week 8’s lecture and some numpy mathematical functions (link to numpy manual page on maths functions)

Part C: numpy data analysis

In Part B of Week 7’s lab, you used numpy to analyse the data on the sea ice extent. In this exercise, you will use the same data set to perform some additional data analysis using numpy.

You will need to download these two files: sea_ice.txt (the data file, which was also used in Week 7) and sea_ice_lab09.py which is the Python file that you will use to complete this exercise.

The existing code in sea_ice_lab09.py does exactly the same preliminary processing on the data in sea_ice.txt as we asked you to do in Week 7. You had to type the code yourselves in Week 7 but we have done it for you this time. If you run the existing code in sea_ice_lab09.py, it will create three numpy arrays and plot a graph. The numpy arrays created are: years, months, data_sea_ice. Some information that you need to know are:

  • The data set contains data that began in year 1979 and ended in 2013, i.e. over 35 years. The numpy arrays years is [1979, 1980, … , 2013].
  • For each year, 24 measurements were made at half-monthly interval. The numpy array months is [0.5, 1, 1.5, …, 12] which has 24 elements.
  • The measured sea ice extent is stored in the 2-dimensional numpy array data_sea_ice whose shape is (35,24). Each row of data_sea_ice contains the 24 half-monthly measurements taken over a year. For example, the element data_sea_ice[2,5] contains the sea-ice extend in Year 1981 (= years[2]) at Month 3 (= months[5]).

The graph plotted by sea_ice_lab09.py contains 35 curves which show the annual variation of sea ice extent. You can see that the sea ice extent is larger in the beginning of the year (which corresponds to the Northern winter) than the later part of the year.

The following is a number of questions on the data set which you should answer using numpy. There is a restriction that you must not use any loops in your answer. If you want to check your answers, some of them are on this page.

  1. Compute the mean sea ice extent using all the measurements in years 1987 to 1999 inclusively. There are multiple methods to do this but we ask you to use Boolean indexing.
    • Note: You will be using 13 years of data, so you will be averaging 13 x 24 measurements.
  2. Compute the mean sea ice extent using all the data in the last 3 months of all the years in the data set. We would like you to use two different methods to answer this question. The first method is to use the colon (:) notation to select the appropriate columns and the second one is to use Boolean indexing. Both methods should give you the same answer. If you get the answer 10.663 or 9.477, then something is incorrect.
    • Note: You will be using 35 years of data and 6 measurements per year, so you will be averaging 35 x 6 measurements.
  3. Compute the mean sea ice extent using all the data in the first 6 months of the years 2000-2009 inclusively. We ask you to do it with Boolean indexing.
    • Note: You will be using 10 years of data and 12 measurements per year, so you will be averaging 10 x 12 measurements.
    • Hint: For this question, you will need to select some rows and some columns. The numpy function ix_(), which was discussed in Week 8’s lecture in the file numpy_slicing_2.py, will be useful.
  4. The 2-dimensional array data_sea_ice contains half-monthly data, but you want to work with monthly data. You want to obtain a 2-dimensional array sea_ice_monthly which contains monthly data. The matrix should have 35 rows and 12 columns. Column 1 contains the average of the two measurements in January, Column 2 for February, etc. The problem is to compute sea_ice_monthlyfrom data_sea_ice without using any loops. This question is identical to Question 8 in Week 7’s lab on sea ice. You were asked to use the reshape function in that exercise. In this question, we ask you to use the double colon (::) notation to select the correct columns.
    • Hint: You will need to form two sub-arrays by selecting the appropriate columns. One sub-array has the columns corresponding to months 0.5, 1.5, 2.5, … 11.5. The other sub-array has the columns corresponding to months 1, 2, 3, …, 12.
    • The double colon notation was discussed in numpy_slicing_1.py
  5. What is the largest decrease in sea ice extent between two consecutive half-monthly measurements?
    • It is not possible to directly use the array data_sea_ice to answer this question. This is because it is not easy to compute the difference between the sea ice extents in the second half of December (last column of data) and first half of January (first column of data). It is easier if the half-monthly data are arranged chronologically in a 1-dimensional array. In numpy, the process of turning a 2-dimensional array into a 1-dimensional array is call “ravel” which we discuss in Week 6’s lecture. The “ravelled” array will be useful for answering this question. Hint: You may also need some of these numpy functions: diff, max, min, abs.
    • The manual page for the numpy function ravel is here.
    • Note that if your answer is 2.824 then you probably have forgotten that we are looking for the largest decrease, i.e. you need to ignore the increases.
  6. Each year, the sea ice extent peaked in a certain half-month. Which half-month was the annual peak most frequently found?
    • Note that you can get a good guess of what the answer is from the plot.
    • Hint: These numpy functions will be useful argmax, unique. For the numpy function unique, if you use it with the two outputs and the option return_counts=True then you can get the frequency of occurrence. See the following example.

At the End of the Lab

You should be able to show your tutor the exercises. You should be comfortable with using: Python file processing; numpy broadcasting, indexing.

Finally, do not forget to complete your online multiple choice question if you have not done it yet.

If you have completed everything, please do not forget to logout. Simply double click on the “Log Out” icon


Answer:


HelpYourPaper此处内容已经被作者无情隐藏,请输入验证码查看内容。
验证码:
请关注本站微信公众号,回复“验证码”,获取验证码。在微信里搜索“HelpYourPaper”或者“helpyourpaper”或者微信扫描右侧二维码都可以关注本站微信公众号。

 

ENGG1811 Sample Mid-term ExaminationSession 2, 2018

August 20, 2018
Instructions
(a) This exam consists of 4 questions. Questions are of equal value. Answer all questions.
(b) Time allowed: 50 minutes including 5 minutes of reading time.
(c) Each question requires you to submit a separate Python program file for marking. Note the
following:
(i) Submission must be made using the Dragon icon located on the Desktop.
(ii) There is a tab for each question.
(iii) Each question requires a specific filename and the submission system will only accept that
particular filename.
(iv) Ensure that you
save your file before submission. If the submission system accepts your
file, it will also display the contents of your submitted file. This allows you to check.
(v) You can make multiple submissions during the exam. Only the last submitted file will be
marked.
(d) For each question, an associated test file or testing code has been provided to help you to test
your code. The testing code is rudimentary and is not thorough. The file or code is provided so
that you can write your own tests. It is your responsibility to test your code thoroughly before
submission.
(e) During the mid-term exam, you can use only these applications: Spyder3, a PDF file reader and
a calculator app.
(f) You are allowed to consult the following Python documentation in PDF format:
The Python Tutorial (Release 3.6.4).
The Python Tutorial is available from the background menu, which can be accessed by right-
clicking on the blue background.
(g) If you have accidentally killed Spyder3, the submission box, PDF file reader or file manager,
you can re-start them from the background menu. The calculator app is available on the back-
ground menu.
Note: The above instructions are almost the same as what you will see in the mid-term. For this
sample paper, you can access the The Python Tutorial (Release 3.6.4) using this clickable link.
Question 1
You have two sensor readings which are stored in two Python variables x and y of float type. Your
task is to write a Python code segment which determines a Python variable output_string, which
is a variable of string type, based on the values of x and y. The table below shows the value of
output_string for different values of x and y.
x 50 x > 50
y < 100 normal
y 100 alert poor-performance
Requirements and testing:
You must write the code segment in the template file q1.py provided. The submission system
will only accept this filename.
You must write your code segment to determine output_string between the two #-
*
*
*
lines in the file q1.py
You must use the variable names x, y and output_string
The string variable output_string can only take the values of normal, alert or poor-performance
The file q1.py contains only one test. You need to test your code for different values of x
and y before submission. You need to make sure that your test code are
not between the two
#-
*
*
*
lines.
Make sure that you
save your file before submission.
Question 2
Let x and y be two real numbers. Consider the mathematical functions g(x, y) and h(x, y):
g(x, y) = cos(x)
h(x, y) =
3
xy
Your task is to write a Python function q2_func, which has two inputs and two outputs, and has
the following def line:
def q2_func(x,y):
where you can assume the inputs x and y are variables of float type. You can associate the Python
variables x and y with, respectively, the real numbers x and y. If you call the Python function with
the following statement:
fg, fh = q2_func(x,y)
then the variables fg and fh should take on, respectively, the values of g(x, y) and h(x, y).
Requirements and testing:
You must write the function q2_func in a file with the filename q2.py. The submission
system will only accept this filename. A template file q2.py has been provided.
The file test_q2.py contains a rudimentary test. You need to test your function for different
values of x and y before submission. Note that you can assume that both x and y are non-zero.
You do not need to submit test_q2.py.
Make sure that you
save your file before submission

Question 2
Let x and y be two real numbers. Consider the mathematical functions g(x, y) and h(x, y):
g(x, y) = cos(x)
h(x, y) =
3
xy
Your task is to write a Python function q2_func, which has two inputs and two outputs, and has
the following def line:
def q2_func(x,y):
where you can assume the inputs x and y are variables of float type. You can associate the Python
variables x and y with, respectively, the real numbers x and y. If you call the Python function with
the following statement:
fg, fh = q2_func(x,y)
then the variables fg and fh should take on, respectively, the values of g(x, y) and h(x, y).
Requirements and testing:
You must write the function q2_func in a file with the filename q2.py. The submission
system will only accept this filename. A template file q2.py has been provided.
The file test_q2.py contains a rudimentary test. You need to test your function for different
values of x and y before submission. Note that you can assume that both x and y are non-zero.
You do not need to submit test_q2.py.
Make sure that you
save your file before submission
Question 4
Your task is to write a Python function q4_func, which has one input and one output, and has the
following def line:
def q4_func(x):
where you can assume the input x is a Python list whose entries are of the type float. The output of
the function is equal to
(sum of the 1
st
, 3
rd
, … entries of x) (sum of the 2
nd
, 4
th
, … entries of x)
For examples,
(a) If x is [2, -3.1, 4.2, 6.7, 5.5, 3.4], then the output is (2 +4.2 + 5.5) (3.1+
6.7 + 3.4) = 4.7
(b) If x is [2, -3.1, 4.2, 6.7, 5.5], then the output is (2 + 4.2 + 5.5) (3.1 + 6.7) =
8.1
Requirements and testing:
You must write the function q4_func in a file with the filename q4.py. The submission
system will only accept this filename. A template file q4.py has been provided.
The file test_q4.py checks the second example above. You need to test your function for
different values of x before submission.
You do not need to submit test_q4.py.
Make sure that you
save your file before submission.
————- End of exam ——————

COMP9417 18s1 Assignment 2 – project topics

COMP9417 18s1 Assignment 2 – project topics


0: Self-proposed

The objective of this topic is to propose a machine learning problem, source the dataset(s) and implement a method to solve it. This will typically come from an area of work or research of which you have some previous experience.


Topic 0: Propose your own topic

Task description
Up to you. The basic conditions are: Read More