### General

- Cheatsheet of R commands for data work
.
- Data cleaning:
**dplyr**and**tidyr** - Figures:
**ggplot2** - Working with character/word vectors:
**stringr** - If you plan on using R Notebooks:
**rmarkdown** - Original source

- Data cleaning:
- Cheatsheet of Stata commands for data work used Stata commands

### Workshop 1: script-writing

- See my econometrics notes below for a refresher on R, R Notebooks, and for an intro econometrics refresher.
- R:
**fixest**for running regressions (as opposed to lm or plm)- faster estimation
- flexible formula writing allows simultaneous estimation of many similar regression models, much simpler and more convenient inclusion of leads, lags, and differences of variables, and intuitive specification of standard errors
- its
*etable*function provides a nice pipeline for creating very customizable tables automatically (better in my opinion than the usual suggestion, which is the stargazer package)

- R:
*magrittr*for piping for very intuitive and readable script-writing especially for data processing. An intro here . - Stata: an intro to data cleaning functions

### Workshop 2: project-oriented workflows

R-specific:

- Why use R Projects?
(the other chapters here are great too)
- A more basic guide here

- Relative filepaths using the
*here*package - A guide to R Notebooks
- A bit more advanced: R profiles
- Setting seeds for replicability

Stata-specific:

- Stata’s equivalent of R’s
*here*package for relative filepaths - Workflows for automating tables
- More advanced: Stata’s equivalent of R Profiles
- More advanced: Jupyter Notebooks for Stata
- Setting seeds for replicability

### Additional topics

Topics I’d cover with more time

- Debugging tips: how to identify bugs in your code
- Using ChatGPT as a coding resource. Error prone but really valuable if you ask questions well. It also generates reproducible examples if you ask. Some things I use it for:
- “How do I do [task] in Stata?”
- “How do I implement [task] in R using [package, e.g. the tidyverse]?”
- Copy and paste a chunk of code and then ask what each line is doing
- [pasted code] What’s a more efficient way of accomplishing the same thing?"
- “This is my code: [pasted code]. I get an error that says . Where is my mistake?”

- Writing your own functions
- Implementing different kinds of regressions (I do this a little bit in my econometrics notes)
- Customizing regression tables
- Data visualization
- R: see the ggplot2 cheatsheet
- Stata: see the commands in the cheatsheet and here ’s a guided introduction

### Other possibly helpful research resources:

- Trello for keeping notes, maintaining to-do lists, storing relevant documents, summarizing research meetings, etc. Sychronizes across all devices.
- How to present an applied micro paper
- Browser extensions
*EZProxy Redirect*to access online resources that Columbia has subscriptions to when away from university internet and without using a VPN*Simple Mass Downloader*to download all files contained in a web page

- Literature review
- Google Scholar search is obvious but also click on the “Cited by” link under a search result to find other relevant and possibly more up to date papers and methods or see if someone’s already done what you want to do