### ECMT6002: Econometric Applications

THE UNIVERSITY OF SYDNEY

FACULTY OF ARTS AND SOCIAL SCIENCES

SCHOOL OF ECONOMICS

ECMT6002: Econometric Applications

Final exam

Due on 19 June 2020, by 11:59 PM

Instructions:

This final exam consists of five questions. Each question is worth 20 marks, so the entire exam is

worth 100 marks. This exam contributes 50% towards your final mark for this unit.

Please answer all parts of all questions. There is no minimum or maximum number of words.

When performing statistical tests, always state the null and alternative hypotheses, the test statistic

and its distribution under the null hypothesis, the rejection rule, the level of significance, and the

conclusion of the test. If no significance level is specified, please assume that it is 5%.

Please type your answers to all questions into one (Word/PDF/whatever works) document. Do

not worry too much about making formulas look pretty; “alpha4hat / (gamma1 – gamma2)” is

a perfectly acceptable way to type “ ^4

1

2

”. Handwritten answers can unfortunately not be

accepted for this exam.

Submit your answers no later than 11:59 PM on Friday 19 June 2020, using the TurnItIn tool on

the Canvas Final Exam page for this unit. (Note that this is a different page than the regular Canvas

page for this unit.) In order to comply with our anonymous marking policy, please ensure that your

submission includes your SID but not your name.

This exam is not timed and may obviously be completed with open book/notes/anything. In particular,

you can find any critical values that you need in Wooldridge’s textbook.

However, this is still an individual exam. I trust you to do the right thing, but any suspected

plagiarism or collaboration will be referred to the Academic Integrity Board.

Good luck!

Question 1. We have collected seventy years of annual data on the average price of all houses sold

in Sydney during each year, yt in dollars, and the average temperature at a research station in Central

Greenland during the same year, xt in hundredths of degrees Celsius. We decide to use OLS to estimate

the model ln(yt) = + xt + ut, and we find that ^ = 0:04 with standard error 0:01.

(a) (2 marks) What is the economic interpretation of the number 0.04 here?

(b) (2 marks) Show that H0 : = 0 can be rejected here, against the two-sided alternative. Also argue

why this is a strange result.

(c) (6 marks) Discuss which econometric problem could have caused this strange result.

(d) (5 marks) How would you test whether the problem in part (c) is in fact present?

(e) (5 marks) If the problem from parts (c) and (d) was found to be present and you still wanted to

estimate , how would you do that?

Question 2. In this question, we study the economy of two small neighbouring countries. Country 1

has GDP growth y1t, which it mainly earns by exporting gold; the inflation of the price of its gold is

x1t. Similarly, country 2 has GDP growth y2t, which it mainly earns by exporting oil; the inflation of the

price of its oil is x2t. Both countries are so small that they have no real market power in the international

commodities markets, so we can take x1t and x2t as exogenous. Two macroeconomic shocks e1t and e2t

influence these economies, and they have all the nice properties that we want: normal, homoskedastic,

not autocorrelated, and independent of each other. Long story short, we are studying the model

y1t = 1 + 1x1t + e1t + e2t

y2t = 2 + 2x2t + e2t:

We have been hired by the government of country 1, which is mainly interested in estimating 1.

(a) (4 marks) Describe an economic argument for the presence of a “+e2t” term in the first equation.

(b) (2 marks) Suppose that country 2 is unwilling to share its data, so all you have are the time series

y1t and x1t. What is the best estimator ^ 1 that you can produce? Motivate your answer.

For the remaining parts of this question, assume that you have managed to convince country 2 to share

its data, so you now have access to y2t and x2t in addition to y1t and x1t.

(c) (5 marks) Argue why it should now be possible to improve on the estimator that you suggested

in part (b). In addition to some intuition, your answer should also contain the words

“(un)biased”, “(in)consistent”, and “(in)efficient”.

(d) (5 marks) Assume that is known, but the remaining parameters are not. Write a regression model

that exploits all the data that you have, that can be estimated by OLS, and that provides

an estimator ^ 1 that benefits from the improvements that you suggested in part (c).

(e) (4 marks) In reality, is not known, so we will need to turn the GLS procedure from part (d) into

an FGLS procedure by coming up with an estimator ^. Describe how to compute this ^.

Question 3. We have collected data on a random sample of homeowners with a mortgage. For each of

them, we know their annual income at the time the house was bought (in thousands of dollars), the initial

size of the loan (also in thousands of dollars), as well as whether they have defaulted on their repayments

(dummy variable). We use Stata to estimate the following model.

. generate ln_income=ln(income)

. generate ln_loan=ln(loan)

. logit default ln_income ln_loan

Logistic regression Number of obs = 117

LR chi2(2) = 7.03

Prob > chi2 = 0.0297

Log likelihood = -24.472186 Pseudo R2 = 0.0765

——————————————————————————

default | Coef. Std. Err. z P>|z| [95% Conf. Interval]

————-+—————————————————————-

ln_income | -57.47886 34.62703 -1.66 0.097 -125.3466 10.38887

ln_loan | 56.38969 34.44848 1.64 0.102 -11.12809 123.9075

_cons | -126.8349 78.20496 -1.62 0.105 -280.1138 26.444

——————————————————————————

. margins, dydx(ln_income)

Average marginal effects Number of obs = 117

——————————————————————————

| dy/dx Std. Err. z P>|z| [95% Conf. Interval]

————-+—————————————————————-

ln_income | -3.115679 2.044844 -1.52 0.128 -7.123499 .8921409

——————————————————————————

. margins, dydx(ln_income) atmeans

Conditional marginal effects Number of obs = 117

——————————————————————————

| dy/dx Std. Err. z P>|z| [95% Conf. Interval]

————-+—————————————————————-

ln_income | -2.428992 1.216568 -2.00 0.046 -4.813421 -.0445629

——————————————————————————

(This question continues on the next page.)

(a) (3 marks) The last two tables report estimated marginal effects of –3.12 and –2.43, respectively.

Describe carefully how each of these numbers should be interpreted.

(b) (5 marks) A person has an annual income of 70 000 dollars and applies for a home loan of 700 000

dollars. Predict how likely it is that this person will default on their loan.

(c) (6 marks) Compute the marginal effect of a small change in income, holding the loan size constant,

on the probability in part (b).

(d) (3 marks) As you can see from the first table, ln income and ln loan are both individually

insignificant. Show that they are jointly significant.

(e) (3 marks) What is the economic intuition behind the contradictory results in part (d)?

Question 4. Suppose that we have obtained survey data on the new price of the most expensive car

owned by each household. However, we do not know the exact price, but we do know whether it is $0

(no car), in (0, 20k], in (20k, 50k], in (50k, 100k], or greater than $100k. We are trying to model these

car prices as a function of household income, the number of children living in the household, and a set

of dummies indicating whether the household is located in an urban, suburban, regional, or rural area.

There are various models that we could use for this task.

(a) (4 marks) Between the multinomial logit and multinomial probit models, which one would you

prefer to use here? Motivate your answer.

(b) (4 marks) Between the multinomial probit and ordered probit models, which one would you prefer

to use here? Motivate your answer.

(c) (4 marks) Between the ordered probit and interval regression models, which one would you prefer

to use here? Motivate your answer.

For the remainder of this question, we will be interested in the marginal effect of an additional child

being born, holding everything else constant, on the price of a household’s main car. (They may want to

buy a larger car to make sure the entire family still fits inside; or they may want to downsize to a smaller

car because kids are expensive.) That is, we are trying to estimate @ E[pricei]

@ childreni

.

(d) (4 marks) The ordered probit model includes a “: : : + childreni + : : :” term. Does this

represent the desired marginal effect? Why or why not?

(e) (4 marks) The interval regression model also includes a “: : :+ childreni +: : :” term. Does this

represent the desired marginal effect? Why or why not?

Question 5. In this final question, we investigate how many times people visit a medical doctor over the

course of a year, based on whether or not they have a chronic disease, a disability, and health insurance,

as well as their gender and the size of their family. We obtain the following results; note that family size

is included in logarithms, so a person who lives alone would have ln family= ln(1) = 0.

. summarize visits

Variable | Obs Mean Std. Dev. Min Max

————-+———————————————————

visits | 20,190 2.860426 4.504365 0 77

. regress visits chron_dis disabled insured female ln_family, robust

——————————————————————————

| Robust

visits | Coef. Std. Err. t P>|t| [95% Conf. Interval]

————-+—————————————————————-

chron_dis | .1058531 .006246 16.95 0.000 .0936105 .1180957

disabled | 1.254728 .1311765 9.57 0.000 .9976112 1.511845

insured | 1.595272 .0611246 26.10 0.000 1.475463 1.715081

female | .511249 .0609712 8.39 0.000 .3917405 .6307575

ln_family | -.2548986 .0592984 -4.30 0.000 -.3711284 -.1386688

_cons | .263288 .1121838 2.35 0.019 .0433987 .4831774

——————————————————————————

. poisson visits chron_dis disabled insured female ln_family, robust

——————————————————————————

| Robust

visits | Coef. Std. Err. z P>|z| [95% Conf. Interval]

————-+—————————————————————-

chron_dis | .0288251 .001523 18.93 0.000 .0258402 .03181

disabled | .320908 .0325414 9.86 0.000 .2571281 .3846878

insured | .7544583 .0354006 21.31 0.000 .6850744 .8238423

female | .1914363 .0221097 8.66 0.000 .148102 .2347705

ln_family | -.0749335 .0197642 -3.79 0.000 -.1136706 -.0361965

_cons | -.0232276 .0471662 -0.49 0.622 -.1156717 .0692165

——————————————————————————

(This question continues on the next page.)

(a) (3 marks) Show that the number of doctor visits is overdispersed, and argue why one would have

expected this to be the case even before looking at any data.

(b) (2 marks) Consider a man who is part of a family of five, has no insurance (these are US data)

and no chronic diseases or disabilities. How often do we expect him to visit a doctor,

according to the OLS results?

(c) (7 marks) Repeat part (b) using the Poisson regression results. Also compute the expected probability

that this person will visit a doctor at least once during this year.

(d) (3 marks) Consider the first estimated regression coefficient in each table, so 0.1058531 for OLS

and 0.0288251 for Poisson regression. Describe carefully how each of these two numbers

should be interpreted.

(e) (5 marks) According to the Poisson regression model, what is the average marginal effect of having

a disability? And finally, what does this number mean?