ENGG1811 Assignment 1: Fault detection
Due date: 5pm, Friday 3 April (week 7). Late submissions will be penalised at the rate of 10% per day. The penalty applies to the maximum available mark. Submissions will generally not be accepted after 5pm, Monday 6 April, 2020.
Version: v1.03 on 11 March 2020.
(06/03/20) Please note that any updates or corrections will be summarized here.
(07/03/20) A minor correction on the section on “Determining the false alarms”. The correction is typeset in maroon.
(11/03/20) We used the word “positive” in the section on “Validity Checks” in earlier versions. There were questions on whether the number 0 would be considered to be positive. We have added the word strictly in front of positive to indicate the exclusion of zero. The update is typeset in maroon.
(11/03/20) In the section on “Validity Checks”, The sentence “The value of irradiance_sampling_time must to an integral multiple of the value of power_sampling_time.” has been replaced by “The value of power_sampling_time must to an integral multiple of the value of irradiance_sampling_time.” The update is typeset in maroon.
Automatic detection of faults can be found in many engineering systems. There are systems to automatically diagnose faults in engines, chemical plants, power generation plants, robotic arms and on on.
This assignment is inspired by a fault detection system in a photovoltaic (PV) plant . A PV plant (Wikipedia page on PV power station) is a collection of solar panels which converts solar energy into electrical power. However, sometimes the plant does not work correctly which results in, for example, less electrical power being generated than it should be. If this is the case, the plant technicians should be alerted automatically so that they can fix the faults as soon as possible.
In this assignment, you will write Python programs to perform fault detection. The aim of your program is to process data sequences of solar irradiance and power to determine whether there are faults and if so, when they have occurred.
Note that we chose the word inspired earlier because we have adapted the fault detection problem in  as a programming assignment by simplifying and liberally changing a few aspects of the original problem. In particular, we have made changes so that, in this assignment, you will have to use the various Python constructs that you have learnt. This means a few details of this assignment may not be realistic in engineering terms, but on the whole, you will still get a taste on how programming can be used to perform automatic fault detection.
By completing this assignment, you will learn:
To apply basic programming concepts of variable declaration, assignment, conditional, functions, loops and import.
To use the Python data types: list, float, int and Boolean
To translate an algorithm described in a natural language to a computer language.
To organize programs into modules by using functions
To use good program style including comments and documentation
To get a practice on software development, which includes incremental development, testing and debugging.
You are not allowed to use numpy for this assignment. This is an individual assignment, so no group work.
Key ideas behind the fault detection algorithm
The algorithm uses two sets of measurements. The first is the amount of solar irradiance which is the quantity of solar radiation falling on the solar panels. The second is the amount of electrical power generated by the solar panels; we will simply refer to that as power or power generated.
The key idea of the fault detection algorithm is to use the measured irradiance and power to determine whether a fault has occurred. For a given amount of irradiance, the algorithm uses a model (which in this case is a formula) to predict what the expected amount of power the PV plant should generate. After that, the algorithm compares the power predicted by the formula against the measured power. If the difference between these two quantities is too big then the algorithm will decide that a fault has occurred.
Requirements for fault detection
This section describes the requirements on the fault detection algorithm that you will be programming in this assignment. You should be able to implement these requirements by using only the Python skills that you have learnt in the first four weeks’ of the lectures in this course.
We begin with describing the data that the algorithm will operate on. We will use the following Python code as an example. In the following, we will refer to the following code as the sample code. Note that the data and parameter values in the sample code are for illustration only; your code should work with any allowed input data and parameter values.
# Data: irradiance and power
# Irradiance measurements in W/m^2
irradiance_time_series = [ 240.2, 220.1, 260.2, 280.7, 256.5,
320.3, 300.7, 267.1, 321.2, 234.5,
421.7, 476.2, 321.6, 329.7, 323.4,
407.9, 456.7, 489.3, 521.5, 534.6,
# Generated power measured in kW
power_time_series = [31.2, 27.5, 55.5, 44.2, 58.38, 53.52]
# Parameters for the fault detection algorithm
# Data sampling times in minutes
irradiance_sampling_time = 12
power_sampling_time = 60
# Parameters of the model to predict the power generated for
# a given level of irradiance
a0 = 0.086
a1 = 3.44e-5
a2 = 3e-3
model_para = [a0, a1, a2]
# Margin in power measurment to decide whether it is a fault or not
margin = 10.0 # in kW
# Call the fault detection function
import fault_detection_main as fd
fault_status_output = fd.fault_detection_main(irradiance_time_series, power_time_series, irradiance_sampling_time,
In the sample code, there are two data series which contain, respectively, the irradiance and power measurements. Both series are Python lists whose entries are of the float type. Their variable names are irradiance_time_series and power_time_series. The irradiance is measured in Watts per square metre and power generated is measured in kilowatts.
In the sample code, the irradiance and power measurements were collected once every 12 and 60 minutes respectively. These values are stored in the variables irradiance_sampling_time and power_sampling_time.
(Remark: In , the irradiance measurements were taken once every 5s, which is a more realistic sampling time. We have chosen a sampling time of 12 minutes for irradiance so that the length of the list irradiance_time_series will not be exceedingly long in this example.)
We break the algorithm down into a number of steps. The first step is to compute the average of the irradiance data.
(Averaging the irradiance data)
Since irradiance and power were measured every 12 and 60 minutes, respectively, therefore there are 5 irradiance samples within the duration of a power sample. We assume that the first power measurement power_time_series corresponds to the first 5 irradiance measurements:
irradiance_time_series, irradiance_time_series, irradiance_time_series, irradiance_time_series, irradiance_time_series.
Similarly. the second power measurement power_time_series corresponds to the next 5 irradiance measurements:
irradiance_time_series, irradiance_time_series, irradiance_time_series, irradiance_time_series, irradiance_time_series.
Similarly for power_time_series and power_time_series.
Although we can make correspondence between power_time_series and the last two irradiance measurements, the correspondence is incomplete and therefore these data are not usable. Also, there are no irradiance measurements corresponding to the last power measurement, which means this power measurement is not usable.
Since we can only make (complete) correspondences between the first 4 power measurements and the first 20 irradiance measurements, so we will only use these measurements for fault detection.
We will divide the first 20 irradiance measurements into non-overlapping segments of 5 data points and compute the average of each segment. This is so that each segment of irradiance measurements corresponds to one power measurement. The table below illustrates the computation. We have added a segment number so that we can refer to them later on. Note that the segment number also corresponds to the indices in the variable power_time_series.
Data in the segment from irradiance_time_series Average
240.2, 220.1, 260.2, 280.7, 256.5 (240.2 + 220.1 + 260.2 + 280.7 + 256.5) / 5
320.3, 300.7, 267.1, 321.2, 234.5 (320.3 + 300.7 + 267.1 + 321.2 + 234.5) / 5 288.76
421.7, 476.2, 321.6, 329.7, 323.4 (421.7 + 476.2 + 321.6 + 329.7 + 323.4) / 5
407.9, 456.7, 489.3, 521.5, 534.6 (407.9 + 456.7 + 489.3 + 521.5 + 534.6) / 5
For the irradiance_time_series given in the sample code, we can summarize this averaging as returning a list whose entries are [251.54, 288.76, 374.52, 482.00]. For ease of reference, we will refer to this list by using the name irradiance_time_series_average later on.
Note that we rounded the numbers in the last column to 2 decimal places for display only. You should not be rounding any of your calculations in this assignment.
(Use the average irradiance and model to predict the expected power generated)
The next step is to use the average irradiance in each segment to predict the expected amount of power generated. To do that, we use a model (which in this case is a formula) to calculate the expected power from irradiance. We first define some notation:
Let G and P denote, respectively, average irradiance and power generated.
a0, a1 and a2 are three coefficients of the model.
The formula is:
P = G (a0 + a1 G + a2 log(G))
where log is the natural logarithm.
By using the values of a0, a1 and a2 from the sample code, and the average irradiance calculated earlier, we can calculate the expected power generated for each time segment:
Predicted power generated (rounded to 2 decimal points for display only)
(Compare the predicted power generated against the measured power to determine whether there is a fault – FOR ONE POWER SAMPLE)
The next step is to compare the predicted power against the measured power. We will use the algorithmic parameter margin which is defined in the sample code.
If the value of measured power minus predicted power is less than or equal to margin and bigger than or equal to -margin, then the decision is that there is no fault because the measured power is sufficiently close to the predicted power; otherwise, there is a fault. For example, by using the values of margin from the sample code, we have:
For segment number 0, the average irradiance is 251.54 W/m2 which gives a predicted power generation of 27.98 kW. The measured power is 31.2 kW, which is 3.22 kW higher than the predicted value. Since the difference is within than the margin, there is no fault. We will use the Boolean value of False to denote the absence of fault.
For segment number 2, the average irradiance is 374.52W/m2 which gives a predicted power generation of 43.69 kW. The measured power is 55.5 kW, which is 11.81 kW higher than the predicted value. Since the difference is higher than the margin, there is a fault. We will use the Boolean value of True to denote the presence of fault.
(Performing fault detection for a time-series)
The above examples show how the fault detection is to be performed for two power measurements. The following table summarizes the result of fault detection for the time series.
Average irradiance Predicted power generated
Measured power minus predicted power
Fault (True if it is a fault)
251.54 27.98 31.2 3.22
288.76 32.61 27.5 -5.11 False
374.52 43.69 55.5 11.81 True
482.00 58.38 44.2 -14.18 True
We will use a list to indicate when the faults had occurred. For the above example, we will represent the faults in the data series using [2,3] because the measurements power_time_series and power_time_series are determined to be faults.
In the case where there are no faults, we will indicate that by an empty list [ ].
The following figure illustrates the fault detection decision making. The solid blue dots show the predicted power generated for the average irradiance. The vertical lines are centred at the predicted power and have a height of 2*margin. The power measurements are plotted with crosses. If the cross is within the vertical line, then it is not a fault; otherwise, it is.
(Determining the false alarms)
After a fault detection algorithm has been designed, the engineers will want to check how well the algorithm is in catching the faults. One way that they can do that is to monitor the PV plant manually to determine whether actual faults have occurred. There are two possible types of error:
We say a false alarm has occurred when the fault detection algorithm decides it is a fault but in reality it is not.
We say a missed detection has occurred when the fault detection algorithm decides it is not a fault but in reality it is.
Let us follow on from the above example. The fault detection algorithm says the power measurements [2,3] are faults. Let us, for the sake of illustration, say that the real faults are [1,2]. In this case, the real fault at 1 is a missed detection because it is not detected by the detection algorithm. On the other hand, the fault detection algorithm claims that there is a fault at 3 but it is in fact a false alarm. If we store the results from the fault detection in a list called your_fault_list and the real faults in a list called real_fault_list. For this example,
your_fault_list = [2,3]
real_fault_list = [1,2]
A task for this assignment is to determine the false alarms from the given your_fault_list and real_fault_list. For this assignment, you will store the false alarms in a list. In this example, it is  . In the case where there are no false alarms, that should be indicated by an empty list [ ].
Note that the engineers should also be interested in missed detection, but the calculation is very similar to false alarms, so we will not ask you to do that.
The description above shows how the data (irradiance_time_series, power_time_series) and algorithmic parameters (irradiance_sampling_time, power_sampling_time, model_para, margin) are used to determine when the faults occur. Note that the algorithmic parameters must be valid so that the computation can be carried out. We require that your code performs a number of validity checks before determining if there are any faults. For example, we assume that the algorithmic parameter irradiance_sampling_time is required to be a strictly positive integer. The following table states the requirements for the algorithmic parameters to be valid and what assumptions you can make when testing.
Algorithmic parameters Requirements for the parameter to be valid Assumptions you can make when testing or further explanation
irradiance_sampling_time Data type must be int and its value is strictly positive
Examples of invalid parameter values are -5, -5.2, 5.7. You can assume that, when we test your code, irradiance_sampling_time is always a number
power_sampling_time Data type must be int and its value is strictly positive You can assume that, when we test your code, power_sampling_time is always a number
power_sampling_time The value of irradiance_sampling_time must to an integral multiple of the value of power_sampling_time
The value of power_sampling_time must to an integral multiple of the value of irradiance_sampling_time For example, if power_sampling_time is 12 and irradiance_sampling_time is 7, then the given parameters are invalid because 12 is not an integral multiple of 7.
You can also assume that power_sampling_time and irradiance_sampling_time are given in the same unit.
model_para Must have exactly 3 entries in the list
You can assume that the given model_para is always a list and its entries are always numbers (int or float).
For example, if the given model_para has four entries, then it is invalid.
margin Must be a strictly positive number
You can assume that the given margin is always a number (int or float).
Dealing with different amount of data
The above sample code shows the situation where the overall duration of power measurements (6 samples times 60 minutes = 360 minutes) is more than that of the irradiance measurements (22 samples times 12 minutes = 264 minutes). The above example shows that we should only be using the first 4 power measurements and the first 20 irradiance measurements.
Another situation is when the overall duration of power measurements is less than that of the irradiance measurements. Consider the following code:
# Irradiance measurements in W/m^2
irradiance_time_series = [ 240.2, 220.1, 260.2, 280.7,
320.3, 300.7, 267.1, 321.2,
421.7, 476.2, 321.6, 329.7,
407.9, 456.7, 489.3, 521.5,
# Generated power measured in kW
power_time_series = [31.2, 27.5]
# Data sampling times in minutes
irradiance_sampling_time = 15
power_sampling_time = 60
From the sampling times, we know that 1 power measurement corresponds to 4 irradiance measurements. In this case, all the power measurements and the first 8 irradiance measurements should be used to determine the faults.
When the overall duration of power measurement is equal to that of irradiance measurements, you should use all the measurements.
Checking whether there are enough data
In order for the fault detection algorithm to run, there must be enough power and irradiance measurements. The requirements are:
There must be at least one entry in power_time_series, and,
The overall duration of the irradiance_time_series must be longer than or equal to one power_sampling_time.
You can assume that, when we test your assignment, both irradiance_time_series and power_time_series are lists, and their entries are always of the float type. You can assume that the entries in irradiance_time_series are bigger than or equal to one so that the you will not have problem in computing logarithm.
You need to implement the following six functions. The first five functions working together will implement the the fault detection algorithm. The sixth function finds the false alarms.
The requirement is that you implement each function in a separate file. This is so that we can test them independently and we will explain this point here. We have provided template files, see Getting Started.
1. def calc_average(time_series, segment_length):
This function has 2 inputs. The input time_series is a list and the input segment_length is an integer.
This function should divide the input time_series into a number of non-overlapping segments where the number of entries in each segment is given by the input segment_length. It should compute the average of each segment and store the results in a list.
For example, consider the example described under the heading (Averaging the irradiance data).
The input time_series is a list containing the first 20 entries of irradiance_time_series and segment_length is 5.
The output to be returned is the list irradiance_time_series_average mentioned there.
This function can be tested using the file test_calc_average.py
2. def power_prediction(irradiance_average_one_sample, model_para):
The aim of this function is to compute the expected power generated for one value of average irradiance. An explanation of this computation is given earlier under the heading Use the average irradiance and model to predict the expected power generated where a formula in the irradiance G and model parameters a0, a1 and a2 are used to predict the power generated.
The function has 2 inputs. The input irradiance_average_one_sample is a number which has the same meaning as the symbol G. The input model_para is the list [a0, a1, a2] which contains the model parameters.
The function should return one output which is the predicted power. The output is a float.
This function can be tested using the file test_power_prediction.py.
3. def fault_detection_one_sample(irradiance_average_one_sample, power_one_sample, model_para, margin):
The aim of this function is to use one value of average irradiance, one power measurement, the model parameters and the margin to determine whether the given power measurement is a fault or not.
An explanation of this computation is given earlier under the heading Compare the predicted power generated against the measured power to determine whether there is a fault – FOR ONE POWER SAMPLE.
The function should return one output which is of Python bool (Boolean) type.
This function requires power_prediction(). An import line have been included in the template file for you. Please do not change it.
This function can be tested using the file test_fault_detection_one_sample.py.
4. def fault_detection_time_series(irradiance_time_series_average, power_time_series, model_para, margin):
The aim of this function is to use the time series of average irradiance and power measurements to determine whether there are faults.
An example of the computation is given earlier under the heading Performing fault detection for a time-series
The function should return one output which is a list. The list should contain the indices in power_time_series that correspond to faults. The list should be empty if there are no faults.
You are expected to use fault_detection_one_sample() to complete this function. An import line have been included in the template file for you. Please do not change it.
This function can be tested using the file test_fault_detection_time_series.py.
5. def fault_detection_main(irradiance_time_series, power_time_series, irradiance_sampling_time, power_sampling_time, model_para, margin):
This function is called after all the input data have been specified, see the last line in the sample code.
The function has 6 inputs. The names for the inputs have been chosen to match their roles in the description earlier.
The function should return one output which can be a list (possibly empty) or a string depending on the situation
The expected steps within the function fault_detection_main() are:
The function should first check whether all algorithmic parameters are valid. If any algorithmic parameter is invalid, the function should return the string ‘Corrupted input’. It should not proceed to execute the next two steps. See the section with heading Validity Checks for the requirements on the algorithmic parameters.
If all algorithmic parameters are valid, the function should determine whether there are enough data for the calculations. If there are not enough data, the function should return the string ‘Not enough data’. It should not proceed to execute the next step. See the section with heading Checking whether there are enough data for the requirements.
If all algorithmic parameters are valid and there are enough data, then the function should proceed to determine the faults. The function should return a list. The list should contain the indices in power_time_series that correspond to faults. The list should be empty if there are no faults.
You can use the following test files: test_fault_detection_main_1.py and test_fault_detection_main_2.py.
For the tests in test_fault_detection_main_1.py, there are enough data and all algorithmic parameters are valid. Your code should proceed to determine the faults.
Test 1 in test_fault_detection_main_1.py is based on the sample code.
The test file test_fault_detection_main_2.py contains a number of test cases where the algorithmic parameters are invalid and/or there are not enough data. For all the test cases, the function should return a string.
This function requires the functions calc_average() and fault_detection_time_series(). Two import lines have been included in the template file for you. Please do not change them.
6. def find_false_alarms(your_fault_list, true_fault_list):
The aim of this function is to determine the false alarms.
Examples of computing the false alarms is given earlier under the heading (Determining the false alarms). The names for the inputs have been chosen to match their roles in the description earlier.
This function should return one output which is a list of false alarms. The list should be empty when there are no false alarms.
This function can be tested using the file test_find_false_alarms.py.
Hint: The Python keyword in is useful here. You have seen how in is used with for, but there is another usage of in. You can type in the following lines of code in the Python console to see what the answers are:
6 in [2,6,7]
3 in [2,6,7]
Additional requirements: In order to facilitate testing, you need to make sure that within each submitted file, you only have the code required for that function. Do not include test code in your submitted file.
Download the zip file assign1_prelim.zip (which contains 6 template files and 7 test files) and unzip it. This will create the directory (folder) named ‘assign1_prelim’.
Rename/move the directory (folder) you just created named ‘assign1_prelim’ to ‘assign1’. The name is different to avoid possibly overwriting your work if you were to download the ‘assign1_prelim.zip’ file again later.
First browse through all the files provided including the test files.
(Incremental development) Do not try to implement too much at once, just one function at a time and test that it is working before moving on.
Start implementing the first function, properly test it using the given testing file, and once you are happy, move on to the the second function, and so on.
Please do not use ‘print’ or ‘input’ statements. We won’t be able to assess your program properly if you do. Remember, all the required values are part of the parameters, and your function needs to return the required answer. Do not ‘print’ your answers.
Test your functions thoroughly before submission.
You can use the provided Python programs (files like test_calc_average.py etc.) to test your functions. Please note that each file covers a limited number of test cases. We have purposely not included all the cases because we want you to think about how you should be testing your code. You are welcome to use the forum to discuss additional tests that you should use to test your code.
We will test each of your files independently. Let us give you an example. Let us assume we are testing three files: prog_a.py, prog_b.py and prog_c.py. These files contain one function each and they are: prog_a(), prog_b() and prog_c(). Let us say prog_b() calls prog_a(); and prog_c() calls both prog_b() and prog_a(). We will test your files as follows:
We will first test your prog_a().
When we test your prog_b(), we will test your prog_b() together with our working version of prog_a(). In this way, if your prog_a() does not work for some reason, there is a chance that your prog_b() may work and you may still receive marks for prog_b().
When we test your prog_c(), we will test your prog_c() together with our working version of prog_a() and prog_b().
You need to submit the following six files. Do not submit any other files. For example, you do not need to submit your modified test files.
To submit this assignment, go to the Assignment 1 page and click the tab named “Make Submission”.
We will test your program thoroughly and objectively. This assignment will be marked out of 27 where 21 marks are for correctness and 6 marks are for style.
The 21 marks for correctness are awarded according to these criteria.
Criteria Nominal marks
Function calc_average.py 3
Function power_prediction.py 3
Function fault_detection_one_sample.py 3
Function fault_detection_time_series.py 3
Function fault_detection_main.py Case 1: Expected output is the string ‘Corrupted input’ 2
Function fault_detection_main.py Case 2: Expected output is the string ‘Not enough data’ 1
Function fault_detection_main.py Case 3: Expected output is a list or an empty list. 3
Six (6) marks are awarded by your tutor for style and complexity of your solution. The style assessment includes the following, in no particular order:
Use of meaningful variable names where applicable
Use of sensible comments to explain what you’re doing
Use of docstring for documentation to identify purpose, author, date , data dictionary, parameters, return value(s) and program description at the top of the file
You are reminded that work submitted for assessment must be your own. It’s OK to discuss approaches to solutions with other students, and to get help from tutors, but you must write the Python code yourself. Sophisticated software is used to identify submissions that are unreasonably similar, and marks will be reduced or removed in such cases.
We will run Help Sessions for this assignment during Weeks 4-7. These are face-to-face consultation in the lab on a first-come-first-serve basis. The timetable for the Help Sessions can be found on the course website.
Use the forum to ask general questions about the assignment, but take specific ones to Help Sessions.
Keep an eye on the course webpage notice board for updates and responses.
Remarks and reference:
Note that some aspects of this assignment are not realistic. We mentioned the sampling time of irradiance earlier. Also, we have neglected the dependence on temperature, which is in .
 R. Platon et al., Online Fault Detection in PV Systems. IEEE Transactions on Sustainable Energy, Vol. 6, No. 4, Pages 1200-1207, October 2015. https://ieeexplore.ieee.org/document/7098398