---
title: "Recitation 8: Time Series and Big Data"
author: "Matthew Davis"
date: July 18, 2023
output:
  pdf_document:
    keep_tex: true
header-includes:
  - \usepackage{booktabs}
  - \usepackage{tikz}
  - \usetikzlibrary{fit}
  - \usepackage{colortbl}
---

```{r, include = F}
library(tidyverse)
library(magrittr)
library(haven)
library(fixest)
library(ggplot2)
library(Quandl) # To download FRED data
library(readxl)
```

# Practice Problem 1: The St. Louis Model

A model that attracted quite a bit of interest in macroeconomics in the 1970s was the St. Louis model. The underlying idea was to calculate fiscal and monetary policy impact and long-run *cumulative dynamic multipliers*, by relating real output (growth) to real government expenditure (growth) and real money supply (growth). The assumption was that both government expenditures and the money supply were exogenous.

In order to investigate the effect of a fiscal and monetary policies on output, you want to estimate a St. Louis type model using your quarterly data (i.e., make sure to use HAC standard errors) and report your results.

## (a) and (b) Download FRED Data

#### Visit the Federal Reserve Bank of St. Louis where you have access to the FRED and download the data for the required three variables (i.e., real GDP (GDPC1), real money supply (M2REAL), and real government expenditure (GCEC1)).

Don't need to learn this, but a convenient way of doing this is using the Quandl package to download the data directly

```{r}
Quandl.api_key('xHu2y3xExQ6bGkGqcYEi') # This is my personal API key
gdp <- Quandl('FRED/GDPC1')
m2 <- Quandl('FRED/M2REAL')
gov <- Quandl('FRED/GCEC1')

head(gdp)
head(m2)
head(gov)
```
#### The real money supply *m2* is available on a monthly frequency basis: don't forget to convert it into a quarterly frequency variable to match it with the other two variables. Hint: Is *m2* a stock or flow variable? Although, you may take the last month of each quarter or the three-month average as your quarterly value, here use the first month of each quarter.

Since M2 is a stock variable, you can take the value of this series corresponding to the first month of each quarter to get the quarterly values of this variable.

```{r}
fred <- left_join(gdp, m2, by = 'Date') %>%
  left_join(gov, by = 'Date')
head(fred)
fred %<>% rename(gdp = Value.x,
                 m2 = Value.y,
                 gov = Value)
head(fred)
```
Notice that our time variable, Date, is not just a number variable but a 'Date' variable.

#### The sample period should be from first quarter of 1960 to the fourth quarter of 2019 (i.e., 1960q1 to 2019q4). 

Quarters in this dataset are defined by the first day of the first month in that quarter, i.e. the 1st of January, April, July, and October

```{r}
fred %<>% filter(Date >= '1960-01-01') %>%
  filter(Date <= '2019-12-31')
```

As a sanity check, let's plot one of these variables as a line graph:

```{r}
ggplot(fred, aes(x = Date, y = m2)) +
  theme_bw() +
  geom_line() +
  xlab('Year') +
  ylab('') +
  ggtitle('Time series of M2 money supply, quarterly')
```

We can also derive the autocorrelations for this series either as a plot of a corellogram...

```{r}
acf(fred$m2,
    na.action = na.pass) # ignore any missing data/gaps in the time series
```

Or by displaying the estimated autocorrelations, say up to 15 lags:

```{r}
acf(fred$m2,
    na.action = na.exclude, # Ignore any missing data
    lag.max = 15, # Number of lags to include
    plot = FALSE)
```
#### Compute the growth rates of these three variables after you first transform them into natural logarithm and name/label them *ygrowth*, *mgrowth*, and *ggrowth*

For a variable $X_t$, we'll compute the growth rate $x_t$ using the following transformation:

$$
x_t = \log(X_t)-\log(X_{t-1})
$$

```{r}
fred %<>% mutate(ygrowth = log(gdp)-lag(log(gdp)),
                 mgrowth = log(m2)-lag(log(m2)),
                 ggrowth = log(gov)-lag(log(gov)))
```

## (c) Estimate a distributed lag model of *ygrowth* on current-period *mgrowth*, the effect of a monetary policy on current quarter's output growth
## (d) Estimate a distributed lag model of *ygrowth* on current-period *ggrowth*, the effect of a fiscal policy on current quarter's output growth


We'll use our familiar feols package to estimate a series of distributed lag models. We'll find it convenient to use the same panel methods for time series, allowing us to use mostly the same commands we've used before. This will require us to define our unit and and time variables (since fixest is intended for panel datasets). Since time series are essentially just panel datasets with $T$ time periods but only $N=1$ units, let's just create a unit variable that takes on an arbitrary value for all observations:

```{r}
fred %<>% mutate(unit = 'same')
```

This will also be necessary to make use of the other new component: new standard errors for time series and panel datasets which are robust to both heteroskedasticity and autocorrelation (HAC). We call these Newey-West (NW) standard errors and they are convenientliy calculated using the $NW(m)$ function.

These standard errors require us to determine the appropriate lag truncation parameter $m$. This is a matter of applying the rule-of-thumb formula from the textbook and rounding up:

```{r}
m <- 0.75*nrow(fred)^(1/3)
m
# Round up
m <- ceiling(m)
m
```

Last note: we can of course use the same functions to estimate autoregressive models AR(p). In this case, we do not want to use HAC standard errors and in addition, we'll want to specify iid standard errors so we don't cluster standard errors

Now we can estimate the desired models:

```{r, results = 'asis'}
mod.c <- feols(ygrowth ~ mgrowth, fred,
               panel.id = c('unit', 'Date'),
               vcov = NW(5))
mod.d <- feols(ygrowth ~ ggrowth, fred,
               panel.id = c('unit', 'Date'),
               vcov = NW(5))
etable(mod.c, mod.d, markdown = T)
```
## (e) Estimate a distributed lag model of *ygrowth* on current and next quarter's monetary policy on output growth
## (f) Estimate a distributed lag model of *ygrowth* on current and next quarter's fiscal policy

To clarify the ambiguous language of the question, we are being asked to regress GDP growth between period t and t+1 to changes to monetary/fiscal policy between periods t and t+1 and changes to monetary/fiscal policy between periods t-1 and t.

Other than the standard errors, the other new component here is that our models are "dynamic", meaning they relate observations from different time periods to one another. Here, we'll find it very convenient to make use of the fixest package, which has functions l(), d(), and f(), which allow us to refer to lagged, differenced, and lead terms in a regression formula without having to manually create new variables.

Estimating the above models:

```{r, results = 'asis'}
mod.e <- feols(ygrowth ~ l(mgrowth, 0:1), fred,
               panel.id = c('unit', 'Date'),
               vcov = NW(5))
mod.f <- feols(ygrowth ~ l(ggrowth, 0:1),
               fred,panel.id = c('unit', 'Date'),
               vcov = NW(5))
# Or combining them into one command
mods.ef <- feols(ygrowth ~ sw(l(mgrowth, 0:1),
                                 l(ggrowth, 0:1)),
                 fred,panel.id = c('unit', 'Date'),
                 vcov = NW(5))
etable(mods.ef, markdown = T)
```
Notice the second argument in the lag function l() can take on a vector of numbers if we want to include multiple lags (including lag 0 here).

#### What is the impact multiplier? Explain the meaning.
#### What is the cumulative multiplier? Explain the meaning.

## (g) Estimate a distributed lag model of *ygrowth* on the change (i.e., the *first difference*) in current and four lags of *mgrowth* and *ggrowth* (to mimic the original St. Louis Equation)

```{r, results = 'asis'}
# Create first-differenced versions of our regressors
fred %<>% mutate(diff.mgrowth = mgrowth-lag(mgrowth),
                 diff.ggrowth = ggrowth-lag(ggrowth))

mod.g <- feols(ygrowth ~ l(diff.mgrowth, 0:3) + l(mgrowth, 4) + l(diff.ggrowth, 0:3) + l(ggrowth, 4),
               fred, panel.id = c('unit', 'Date'),
               vcov = NW(5))
etable(mod.g, markdown = T)
```
## (h) Assuming that money and government expenditures are exogenous, what do the coefficients represent? Calculate the *h*-period cumulative dynamic multipliers from these.

```{r}
stl.est <- coeftable(mod.g)
stl.est
```

Assuming money and government expenditures are exogenous, the coefficients estimated represent the (monetary/fiscal) cumulative multipliers. If you differenced them out, they would represent the dynamic multipliers. The coefficients are easy to plot when estimated through fixest:

```{r}
# Collect the cumulative multipliers
cumu <- data.frame(Estimate = stl.est[-1,'Estimate']) %>%
  mutate(Lag = rep(0:4, 2),
         Policy = rep(c('Monetary', 'Fiscal'), each = 5))
dyna <- group_by(cumu, Policy) %>%
  mutate(Estimate = ifelse(Lag == 0, Estimate, Estimate-lag(Estimate)))

multipliers <- rbind(cumu, dyna) %>%
  mutate(Multiplier = rep(c('Cumulative', 'Dynamic'), each = 10))

multi.table <- pivot_wider(multipliers,
                           names_from = c(Policy, Multiplier),
                           values_from = Estimate)
print(multi.table)
```

#### How can you test for the statistical significance of the cumulative dynamic multipliers and the long-run cumulative dynamic multiplier?

## (i) Sketch the estimated dynamic and cumulative dynamic fiscal and monetary multipliers (similar to Fig 16.2 in the textbook)

```{r}
ggplot(multipliers, aes(x = Lag, y = Estimate, color = Multiplier)) +
  theme_bw() +
  geom_line() +
  theme(legend.position = 'top') +
  ggtitle('Estimated multipliers') +
  xlab('Lags') + ylab('') + facet_grid(~ Policy)
```

## (j) For these coefficients to represent dynamic ultipliers, the money supply and government expenditures must be exogenous variables. Explain why this is unlikely to be the case. As a result, what importance should you attach to the above results?

There is little reason to believe that these government instruments are exogenous. Even if the monetary base and those components of government expenditures which do not respond to business cycle fluctuations had been chosen rather than the above regressors, these instruments respond to changes in the growth rate of GDP.

In fact, government reaction functions were also estimated at the time to capture how government instruments respond to changes in target variables. As a result, the regressors will be correlated with the error term, OLS estimation is inconsistent, and inference not dependable. It is hard to imagine how useable information can be retrieved from these numbers.

# Practice Problem 2: Stock-Watson Empirical Exercise 16.1

Note that the textbook leads you to the data hosted on the official textbook website. This data does not match what's in the textbook or the solutions! It doesn't even have the variables you need to answer any of these subquestions. I had to find the original dataset on an unofficial website so download it from my recitation folder instead:

```{r}
macro <- read_excel('data/UsMacro_Monthly.xlsx')
head(macro)
```

## Part a: Compute the monthly growth rate in IP, expressed in percentage points. What are the mean and standard deviation of IP growth over the 1960:M1--2017:M12 sample period? What are the units for IP growth (percent, percent per annum, percent per month, or something else)?

```{r}
macro %<>% mutate(ip.growth = 100*(log(IP)-log(lag(IP))))

macro.sample <- filter(macro, Year >= 1960 & Year <= 2017)
mean(macro.sample$ip.growth)
sd(macro.sample$ip.growth)
```

So we have a mean of about 0.22 and a standard deviation of about 0.75. This is slightly different from the provided solutions, which seem to be using a sample from the period 1952-2009 for some reason (different textbook?) rather than 1960-2017.

## Part b: Plot the value of $O_t$. Why are so many values of $O_t$ equal to 0? Why aren't some values of $O_t$ negative?

We want to plot the value of oil. Let's create a Date variable that combines the info in Year and Month first:

```{r}
macro %<>% mutate(Date = as.Date(ISOdate(Year, Month, 1)))
```

Here, the 1 just tells R to assign it the first day of the month.

Now let's plot the oil time series:

```{r}
ggplot(macro, aes(x = Date, y = Oil)) +
  theme_bw() + # A black-and-white theme I like over the gray default
  geom_line() + # Draw a line graph using Date on the x and Oil on the y
  xlab('Date') + ggtitle('Time series of O') + ylab('')
```

From Exercise 16.1, O here is defined as "the greater of 0 or the percentage point difference between oil prices at date t and their maximum value during the past three years." Thus it equals 0 whenever the price of oil is less than the maximum during the previous three years.

## Part c: Estimate a distributed lag model regressing IP growth against the current value and 18 lagged values of O, including an intercept. What value of the HAC standard error truncation parameter did you choose? Why?

We are estimating a distributed lag model of IP growth onto the current and 18 lagged values of the variable $O_t$.

First, let's calculate the HAC truncation parameter m:

```{r}
m <-  0.75*nrow(macro)^(1/3)
m <- ceiling(m)
m
```

Estimating the requested distributed lag model:

```{r}
# Create an id variable like before
macro %<>% mutate(id = 999)
oil.dyn <- feols(ip.growth ~ l(Oil, 0:18), macro,
                   panel.id = ~ id + Date,
                   vcov = NW(m))
```

## Part d: Taken as a group, are the coefficients on $O_t$ statistically significantly different from 0?

Printing the regression output, including an F-test on all these regressors:

```{r, results = 'asis'}
etable(oil.dyn, fitstat = ~ wald + wald.p, markdown = T)
```

Or alternatively:

```{r}
wald(oil.dyn)
```

With a p-value of 0.026, we can reject the hypothesis of nullity in the Oil regressors at a 5% significance level

## Part e: Construct graphs like those in Figure 16.2, showing the estimated dynamic multipliers, cumulative multipliers, and 95% confidence intervals. Comment on the real-world size of the multipliers.

The above model captured dynamic (non-cumulative) effects. To capture cumulative effects, we would run a related model of up to *seventeen* lags of *differenced* Oil values and then include the 18th lag of *non-differenced* Oil. This means we'll need to create a new variable that is the first difference of the Oil variable:

```{r}
macro %<>% mutate(diff.Oil = Oil-lag(Oil))
oil.cumdyn <- feols(ip.growth ~ l(diff.Oil, 0:17) + l(Oil, 18),
                        macro,
                        panel.id = ~ id + Date,
                        vcov = NW(m))
etable(oil.dyn, oil.cumdyn)
```

Then it is easy to draw the desired graphs using fixest's coefplot function:

```{r}
coefplot(oil.dyn, drop = 'Constant',
         main = 'Dynamic multipliers', xlab = 'Lag')
coefplot(oil.cumdyn, drop = c('Constant', 'lag(Oil, 18)'),
         main = 'Cumulative multipliers', xlab = 'Lag')
```

Or with some customization:

```{r}
dyn <- coefplot(oil.dyn, drop = 'Constant',
         main = 'The effect of oil prices on IP growth rate\nDynamic multipliers',
         xlab = 'Lag',
         ci.join = T, ci.lty = 0, ci.fill = T, ci.fill.par = list(col = 'lightblue'),
         pt.join = T)
cumdyn <- coefplot(oil.cumdyn, drop = 'Constant', ci.join = T,
         main = 'The effect of oil prices on IP growth rate\nCumulative dynamic multipliers',
         xlab = 'Lag',
         ci.lty = 0, ci.fill = T, ci.fill.par = list(col = 'red'),
         pt.join = T)
```

# Practice Problems 3-5: Stock-Watson non-empirical exercises 14.1, 14.2, 14.5 on iPad