Automate your workflow from the start
We often include manual steps in our workflow, thinking we will only do something once or twice. However, we invariably end up doing it more often than anticipated. It is very likely that before submitting a paper you will need to prepare a reproducibility package – if not because the journal requires it, because you will want someone outside the project team to do at least a computational reproducibility check. Writing code to re-run years of work will be a daunting task. So do yourself a favor and think about an automation workflow at the onset of your analytical work. It will save you a lot of time over the project’s life-cycle. There are two key steps to follow when setting up this workflow:
-
Create a main script to run all the other scripts for a project. It is easier to do this for projects that use a single software, but it is also possible to use this script to run one software from another. There are functions to call R and Python from Stata, for example, and vice-versa. In the unlikely case it is not possible to write such a script, consider using a Makefile. If that is also not an option, you should at the very least have a README file that explains exactly how to run the code and in which order.
-
Automate the importing of code outputs such as tables and figures into final outputs such as papers and presentations. A common practice that breaks the automation principle is to export graphs from statistical software, save them as images, and manually load them into a paper or presentation. This practice dangerously increases the likelihood that images will not be updated in the final document when making last minute changes. Fortunately, there are many tools to automate this step. One option is to use markdown documents in the same statistical software used for data processing and analysis. Another one is to create a Makefile to run not only your statistical code but also to compile any LaTeX documents – making sure that if one exhibit changes, the latest version will be included in the final document.
Learning how to use these tools can be time consuming, but remember that not everyone involved in a project needs to know all the nitty-gritty of this setup. The key take-away is to know what you should aim for and where to ask for help when it’s time to put it into practice.
Quarto: dynamic documents in Python, R, Julia, and Observable LaTeX manual Template main.R Template main.do Calling R from Stata Calling Stata from R Basics of makefiles Template project README