### ECMT6002: Econometric Applications

THE UNIVERSITY OF SYDNEY
FACULTY OF ARTS AND SOCIAL SCIENCES
SCHOOL OF ECONOMICS
ECMT6002: Econometric Applications
Final exam
Due on 19 June 2020, by 11:59 PM
Instructions:
 This final exam consists of five questions. Each question is worth 20 marks, so the entire exam is
worth 100 marks. This exam contributes 50% towards your final mark for this unit.
 Please answer all parts of all questions. There is no minimum or maximum number of words.
 When performing statistical tests, always state the null and alternative hypotheses, the test statistic
and its distribution under the null hypothesis, the rejection rule, the level of significance, and the
conclusion of the test. If no significance level is specified, please assume that it is 5%.
not worry too much about making formulas look pretty; “alpha4hat / (gamma1 – gamma2)” is
a perfectly acceptable way to type “ ^ 4

1 􀀀
2
”. Handwritten answers can unfortunately not be
accepted for this exam.
 Submit your answers no later than 11:59 PM on Friday 19 June 2020, using the TurnItIn tool on
the Canvas Final Exam page for this unit. (Note that this is a different page than the regular Canvas
page for this unit.) In order to comply with our anonymous marking policy, please ensure that your
 This exam is not timed and may obviously be completed with open book/notes/anything. In particular,
you can find any critical values that you need in Wooldridge’s textbook.
 However, this is still an individual exam. I trust you to do the right thing, but any suspected
plagiarism or collaboration will be referred to the Academic Integrity Board.
Good luck!
Question 1. We have collected seventy years of annual data on the average price of all houses sold
in Sydney during each year, yt in dollars, and the average temperature at a research station in Central
Greenland during the same year, xt in hundredths of degrees Celsius. We decide to use OLS to estimate
the model ln(yt) = + xt + ut, and we find that ^ = 0:04 with standard error 0:01.
(a) (2 marks) What is the economic interpretation of the number 0.04 here?
(b) (2 marks) Show that H0 : = 0 can be rejected here, against the two-sided alternative. Also argue
why this is a strange result.
(c) (6 marks) Discuss which econometric problem could have caused this strange result.
(d) (5 marks) How would you test whether the problem in part (c) is in fact present?
(e) (5 marks) If the problem from parts (c) and (d) was found to be present and you still wanted to
estimate , how would you do that?
Question 2. In this question, we study the economy of two small neighbouring countries. Country 1
has GDP growth y1t, which it mainly earns by exporting gold; the inflation of the price of its gold is
x1t. Similarly, country 2 has GDP growth y2t, which it mainly earns by exporting oil; the inflation of the
price of its oil is x2t. Both countries are so small that they have no real market power in the international
commodities markets, so we can take x1t and x2t as exogenous. Two macroeconomic shocks e1t and e2t
influence these economies, and they have all the nice properties that we want: normal, homoskedastic,
not autocorrelated, and independent of each other. Long story short, we are studying the model

y1t = 1 + 1x1t + e1t + e2t
y2t = 2 + 2x2t + e2t:
We have been hired by the government of country 1, which is mainly interested in estimating 1.
(a) (4 marks) Describe an economic argument for the presence of a “+e2t” term in the first equation.
(b) (2 marks) Suppose that country 2 is unwilling to share its data, so all you have are the time series
y1t and x1t. What is the best estimator ^ 1 that you can produce? Motivate your answer.
For the remaining parts of this question, assume that you have managed to convince country 2 to share
its data, so you now have access to y2t and x2t in addition to y1t and x1t.
(c) (5 marks) Argue why it should now be possible to improve on the estimator that you suggested
“(un)biased”, “(in)consistent”, and “(in)efficient”.
(d) (5 marks) Assume that  is known, but the remaining parameters are not. Write a regression model
that exploits all the data that you have, that can be estimated by OLS, and that provides
an estimator ^ 1 that benefits from the improvements that you suggested in part (c).
(e) (4 marks) In reality,  is not known, so we will need to turn the GLS procedure from part (d) into
an FGLS procedure by coming up with an estimator ^. Describe how to compute this ^.
Question 3. We have collected data on a random sample of homeowners with a mortgage. For each of
them, we know their annual income at the time the house was bought (in thousands of dollars), the initial
size of the loan (also in thousands of dollars), as well as whether they have defaulted on their repayments
(dummy variable). We use Stata to estimate the following model.
. generate ln_income=ln(income)
. generate ln_loan=ln(loan)
. logit default ln_income ln_loan
Logistic regression Number of obs = 117
LR chi2(2) = 7.03
Prob > chi2 = 0.0297
Log likelihood = -24.472186 Pseudo R2 = 0.0765
——————————————————————————
default | Coef. Std. Err. z P>|z| [95% Conf. Interval]
————-+—————————————————————-
ln_income | -57.47886 34.62703 -1.66 0.097 -125.3466 10.38887
ln_loan | 56.38969 34.44848 1.64 0.102 -11.12809 123.9075
_cons | -126.8349 78.20496 -1.62 0.105 -280.1138 26.444
——————————————————————————
. margins, dydx(ln_income)
Average marginal effects Number of obs = 117
——————————————————————————
| dy/dx Std. Err. z P>|z| [95% Conf. Interval]
————-+—————————————————————-
ln_income | -3.115679 2.044844 -1.52 0.128 -7.123499 .8921409
——————————————————————————
. margins, dydx(ln_income) atmeans
Conditional marginal effects Number of obs = 117
——————————————————————————
| dy/dx Std. Err. z P>|z| [95% Conf. Interval]
————-+—————————————————————-
ln_income | -2.428992 1.216568 -2.00 0.046 -4.813421 -.0445629
——————————————————————————
(This question continues on the next page.)
(a) (3 marks) The last two tables report estimated marginal effects of –3.12 and –2.43, respectively.
Describe carefully how each of these numbers should be interpreted.
(b) (5 marks) A person has an annual income of 70 000 dollars and applies for a home loan of 700 000
dollars. Predict how likely it is that this person will default on their loan.
(c) (6 marks) Compute the marginal effect of a small change in income, holding the loan size constant,
on the probability in part (b).
(d) (3 marks) As you can see from the first table, ln income and ln loan are both individually
insignificant. Show that they are jointly significant.
(e) (3 marks) What is the economic intuition behind the contradictory results in part (d)?
Question 4. Suppose that we have obtained survey data on the new price of the most expensive car
owned by each household. However, we do not know the exact price, but we do know whether it is \$0
(no car), in (0, 20k], in (20k, 50k], in (50k, 100k], or greater than \$100k. We are trying to model these
car prices as a function of household income, the number of children living in the household, and a set
of dummies indicating whether the household is located in an urban, suburban, regional, or rural area.
There are various models that we could use for this task.
(a) (4 marks) Between the multinomial logit and multinomial probit models, which one would you
(b) (4 marks) Between the multinomial probit and ordered probit models, which one would you prefer
(c) (4 marks) Between the ordered probit and interval regression models, which one would you prefer
For the remainder of this question, we will be interested in the marginal effect of an additional child
being born, holding everything else constant, on the price of a household’s main car. (They may want to
buy a larger car to make sure the entire family still fits inside; or they may want to downsize to a smaller
car because kids are expensive.) That is, we are trying to estimate @ E[pricei]
@ childreni
.
(d) (4 marks) The ordered probit model includes a “: : : + childreni + : : :” term. Does this
represent the desired marginal effect? Why or why not?
(e) (4 marks) The interval regression model also includes a “: : :+ childreni +: : :” term. Does this
represent the desired marginal effect? Why or why not?
Question 5. In this final question, we investigate how many times people visit a medical doctor over the
course of a year, based on whether or not they have a chronic disease, a disability, and health insurance,
as well as their gender and the size of their family. We obtain the following results; note that family size
is included in logarithms, so a person who lives alone would have ln family= ln(1) = 0.
. summarize visits
Variable | Obs Mean Std. Dev. Min Max
————-+———————————————————
visits | 20,190 2.860426 4.504365 0 77
. regress visits chron_dis disabled insured female ln_family, robust
——————————————————————————
| Robust
visits | Coef. Std. Err. t P>|t| [95% Conf. Interval]
————-+—————————————————————-
chron_dis | .1058531 .006246 16.95 0.000 .0936105 .1180957
disabled | 1.254728 .1311765 9.57 0.000 .9976112 1.511845
insured | 1.595272 .0611246 26.10 0.000 1.475463 1.715081
female | .511249 .0609712 8.39 0.000 .3917405 .6307575
ln_family | -.2548986 .0592984 -4.30 0.000 -.3711284 -.1386688
_cons | .263288 .1121838 2.35 0.019 .0433987 .4831774
——————————————————————————
. poisson visits chron_dis disabled insured female ln_family, robust
——————————————————————————
| Robust
visits | Coef. Std. Err. z P>|z| [95% Conf. Interval]
————-+—————————————————————-
chron_dis | .0288251 .001523 18.93 0.000 .0258402 .03181
disabled | .320908 .0325414 9.86 0.000 .2571281 .3846878
insured | .7544583 .0354006 21.31 0.000 .6850744 .8238423
female | .1914363 .0221097 8.66 0.000 .148102 .2347705
ln_family | -.0749335 .0197642 -3.79 0.000 -.1136706 -.0361965
_cons | -.0232276 .0471662 -0.49 0.622 -.1156717 .0692165
——————————————————————————
(This question continues on the next page.)
(a) (3 marks) Show that the number of doctor visits is overdispersed, and argue why one would have
expected this to be the case even before looking at any data.
(b) (2 marks) Consider a man who is part of a family of five, has no insurance (these are US data)
and no chronic diseases or disabilities. How often do we expect him to visit a doctor,
according to the OLS results?
(c) (7 marks) Repeat part (b) using the Poisson regression results. Also compute the expected probability
that this person will visit a doctor at least once during this year.
(d) (3 marks) Consider the first estimated regression coefficient in each table, so 0.1058531 for OLS
and 0.0288251 for Poisson regression. Describe carefully how each of these two numbers
should be interpreted.
(e) (5 marks) According to the Poisson regression model, what is the average marginal effect of having
a disability? And finally, what does this number mean?