General
- Cheatsheet of R commands for data work
.
- Data cleaning: dplyr and tidyr
- Figures: ggplot2
- Working with character/word vectors: stringr
- If you plan on using R Notebooks: rmarkdown
- Original source
- Cheatsheet of Stata commands for data work
Workshop 1: script-writing
- See my econometrics notes below for a refresher on R, R Notebooks, and for an intro econometrics refresher.
- R: fixest
for running regressions (as opposed to lm or plm)
- faster estimation
- flexible formula writing allows simultaneous estimation of many similar regression models, much simpler and more convenient inclusion of leads, lags, and differences of variables, and intuitive specification of standard errors
- its etable function provides a nice pipeline for creating very customizable tables automatically (better in my opinion than the usual suggestion, which is the stargazer package)
- R: magrittr for very intuitive and readable script-writing especially for data processing. An intro here .
- Stata: an intro to data cleaning functions
Workshop 2: project-oriented workflows
R-specific:
- Why use R Projects?
(the other chapters here are great too)
- A more basic guide here
- Relative filepaths using the here package
- A guide to R Notebooks
- A bit more advanced: R profiles
- Setting seeds for replicability
Stata-specific:
- Stata’s equivalent of R’s here package for relative filepaths
- Workflows for automating tables
- More advanced: Stata’s equivalent of R Profiles
- More advanced: Jupyter Notebooks for Stata
- Setting seeds for replicability
Additional topics
Topics I’d cover with more time
- Debugging tips: how to identify bugs in your code
- Using ChatGPT as a coding resource. It is quite error prone (especially for Stata since it isn’t open source) but invaluable when it does work. Some things I use it for:
- “How do I do [task] in Stata?”
- “How do I implement [task] in R using [package, e.g. the tidyverse]?”
- Copy and paste a chunk of code and then ask what each line is doing
- [pasted code] What’s a more efficient way of accomplishing the same thing?"
- “This is my code: [pasted code]. I get an error that says [error]. Where is my mistake?”
- Writing your own functions
- Implementing different kinds of regressions (I do this a little bit in my econometrics notes)
- Customizing regression tables
- Data visualization
- R: see the ggplot2 cheatsheet
- Stata: see the commands in the cheatsheet and here ’s a guided introduction
Other possibly helpful research resources:
- Trello for keeping notes, maintaining to-do lists, storing relevant documents, summarizing research meetings, etc. Sychronizes across all devices.
- How to present an applied micro paper
- Browser extensions
- EZProxy Redirect to access online resources that Columbia has subscriptions to when away from university internet and without using a VPN
- Simple Mass Downloader to download all files contained in a web page
- Literature review
- Google Scholar search is obvious but also click on the “Cited by” link under a search result to find other relevant and possibly more up to date papers and methods or see if someone’s already done what you want to do