C Managing Code and Scientific Communication

C.1 Best Practices

  1. Use “R projects” and the here package to manage the working directory and paths. See below in C.3 and C.4 for some details. This is how we promote replication and collaboration (including with our future selves).

  2. Use high-quality names for objects, files, and directories, like df_resume, 03-analysis.R, and /code/ (not finalTHING or more stuff.R or /Next Try/). Names should be human- and machine-readable. File names should

    • start with numbers, so that default ordering is correct, as in 00-prelims.R, 01-power-calcs.R, etc.,
    • be informative,
    • not have spaces, and
    • separate words with - or _ to enable human reading and globbing.

Object and directory names also should be informative, not have spaces, and use _ to separate words. See these slides for more detail and painful counterexamples.

  1. Use GitHub. But do not put sensitive data there.

  2. Begin your code file with a library() command for every package required for that file to run. Keep these in alphabetical order.

library(here)
library(tidyverse)
  1. Do not include your “workflow” in your code files. Anything that’s interactive or changes the environment should be excluded. Do not include install.packages(), View(), setwd(), in your .R file. Do not start your .R file with rm(list = ls()); it is both too strong and too weak.9

  2. Don’t open or change the original data. Read it in programmatically with, e.g., read_csv() or read_excel(). See here for examples in R.

  3. Use the assignment arrow for assignment. The structure:

new_obj_name <- value_of_that_object
  1. Use space around operators (write 3 + 4 = 7, not 3+4=7) and after a comma, as in English (write f(x, y), not f(x,y)).

  2. Use the native pipe, |>. (If you use RStudio keyboard shortcuts, you can toggle the default pipe shortcut to use the native pipe. See Tools, Global Options, Code, “Use native pipe operator”.)

C.2 Style

C.2.1 R

When we write in R, we tend to prefer the tidyverse style. See https://style.tidyverse.org for the full treatment with many examples. Of course, there is a package, styler, with functions that will style your code for you. See here for more detail.

C.2.2 Equations

When we use mathematical notation, we strive to make it as accurate and representative as possible. For example, we may write, “we will estimate the model below via least squares, where \(\beta_1\) is our treatment effect estimate:

\[y_i= \alpha + \beta_1 Z_i + \beta_2 X_i + \beta_{[3-20]} B_i^{[3-20]}+ \epsilon_i".\]

Some stylistic notes embodied by this example:

  1. Quantities that vary by unit are indexed with \(i\).
  2. For each unit \(i\), for \(j \in {3, ..., 20}\), each \(B_i^j\) represents a different variable, perhaps one of 18 block indicators (though see here before including such terms). Each \(\beta_j\) represents the coefficient on one such variable. Each \(B_i^j\) should be accompanied by a \(\beta_j\). Another way to represent this would be to use a bold vector \(\boldsymbol\beta\) and a bold \(\boldsymbol B\). For another way that groups by variable type, see equation 4.1 on page 9 here.
  3. For each variable, there should be a coefficient if we are going to estimate one.
  4. If there is only one coefficient represented by a particular Greek letter, it should not be numbered. See \(\alpha\) above.
  5. It can be difficult to write good notation in Google Drive, e.g. (For example, I don’t see a way to make a bold \(\beta\).) One suggestion is to write the equation in and export the equation or an image of the equation into your document. (You can do this via Mac OS Pages, e.g.)

C.2.3 Statistical Quantities

  • Where we report statistical significance, we do so relative to an \(\alpha\) level on \([0, 1]\). E.g., “\(\hat{\beta}_1\) is statistically significant at \(\alpha = 0.05\).”

C.3 Projects

Create a .Rproj file in your project’s top-level directory, making your project an “R project”. In RStudio, File - New Project - Existing Directory (or New Directory, if no project folder exists).

Then, to start work, always open the .Rproj file. This ensures that you have a fresh instance of R and RStudio, and the working directory is always the same. The working directory is the top-level directory (that is, the directory within which the .Rproj file lives).

C.4 Working Directory and Relative Paths

The “working directory” is the directory where R will look for data, code files, etc. and save your output objects (a new .csv or a plot .pdf) by default.

Opening the .Rproj file ensures that the working directory always starts at the top-level directory of your project.

We create paths to our objects using the here package. This ensures that a) we are not hard-coding a path that no one else has (like ~/Me/My Docs/my_special_folder/my_subfolder/, etc.), and b) our code is platform-independent. For more on the here package, see here.

The code below requires the following packages to be loaded and attached:

library(here)

C.4.1 See the working directory

To see the current working directory, type getwd().

C.4.2 See the project directory

The “project directory” is the top-level directory of your project. It should be the working directory, as well, if you follow the advice at C.3, and start work by opening the .Rproj file. To see the project directory:

here()
## [1] "/Users/ryanmoore/Documents/github/thelab/LAB-SOP-experiments"

C.4.3 Create a path with here()

I have an object called 02-01-df.RData in a subdirectory called /data/. The dir() function shows what is in that subdirectory:

dir("data")
## [1] "02-01-df.RData"

To see its full path,

here("data", "02-01-df.RData")
## [1] "/Users/ryanmoore/Documents/github/thelab/LAB-SOP-experiments/data/02-01-df.RData"

To use that path to read in the data, I first create the path, then use it to read in the object:

my_rdata_path <- here("data", "02-01-df.RData")

# Use load() for an .RData object:
load(my_rdata_path) 

(These could be combined into a single line.)

The object 02-01-df.RData contains a single dataframe called df. I can now see that dataframe in my environment:

ls()
## [1] "df"            "my_rdata_path"

and examine it

head(df)
##           x z         y
## 1 5.8873201 0 5.6125511
## 2 2.3428286 0 2.0830483
## 3 3.9439262 0 2.5861546
## 4 3.3373187 1 2.6527890
## 5 2.7946122 0 0.6784908
## 6 0.6943772 0 2.8216648

C.5 What Packages are Installed?

To see what packages are installed,

library()

or

lapply(.libPaths(), dir)
## [[1]]
##   [1] "abind"         "askpass"       "assertthat"    "backports"    
##   [5] "bandit"        "base"          "base64enc"     "bit"          
##   [9] "bit64"         "blob"          "blockTools"    "bookdown"     
##  [13] "boot"          "boxr"          "brew"          "brio"         
##  [17] "broom"         "bslib"         "cachem"        "callr"        
##  [21] "car"           "carData"       "cellranger"    "class"        
##  [25] "cli"           "clipr"         "clock"         "cluster"      
##  [29] "codetools"     "coefplot"      "colorspace"    "commonmark"   
##  [33] "compiler"      "config"        "conflicted"    "cpp11"        
##  [37] "crayon"        "credentials"   "crosstalk"     "curl"         
##  [41] "dagitty"       "data.table"    "datasets"      "DBI"          
##  [45] "dbplyr"        "DeclareDesign" "desc"          "devtools"     
##  [49] "diagram"       "dials"         "DiceDesign"    "diffobj"      
##  [53] "digest"        "doFuture"      "downlit"       "dplyr"        
##  [57] "DT"            "dtplyr"        "dygraphs"      "ellipsis"     
##  [61] "estimatr"      "evaluate"      "fabricatr"     "fansi"        
##  [65] "farver"        "fastmap"       "fontawesome"   "forcats"      
##  [69] "foreach"       "foreign"       "Formula"       "fs"           
##  [73] "furrr"         "future"        "future.apply"  "gam"          
##  [77] "gargle"        "generics"      "gert"          "ggdag"        
##  [81] "ggforce"       "ggplot2"       "ggraph"        "ggrepel"      
##  [85] "gh"            "gitcreds"      "globals"       "glue"         
##  [89] "googledrive"   "googlesheets4" "gower"         "GPfit"        
##  [93] "graphics"      "graphlayouts"  "grDevices"     "grid"         
##  [97] "gridExtra"     "gtable"        "hardhat"       "haven"        
## [101] "here"          "highr"         "hms"           "htmltools"    
## [105] "htmlwidgets"   "httpuv"        "httr"          "httr2"        
## [109] "ids"           "igraph"        "infer"         "ini"          
## [113] "ipred"         "isoband"       "iterators"     "janitor"      
## [117] "jquerylib"     "jsonlite"      "kableExtra"    "KernSmooth"   
## [121] "knitr"         "labeling"      "later"         "lattice"      
## [125] "lava"          "lazyeval"      "lhs"           "lifecycle"    
## [129] "listenv"       "lme4"          "lubridate"     "magrittr"     
## [133] "maps"          "markdown"      "MASS"          "Matching"     
## [137] "Matrix"        "MatrixModels"  "memoise"       "methods"      
## [141] "mgcv"          "mime"          "miniUI"        "minqa"        
## [145] "modeldata"     "modelenv"      "modelr"        "munsell"      
## [149] "nlme"          "nloptr"        "nnet"          "numDeriv"     
## [153] "openssl"       "parallel"      "parallelly"    "parsnip"      
## [157] "patchwork"     "pbkrtest"      "pillar"        "pkgbuild"     
## [161] "pkgconfig"     "pkgdown"       "pkgload"       "plotly"       
## [165] "plyr"          "png"           "polyclip"      "praise"       
## [169] "prettyunits"   "processx"      "prodlim"       "profvis"      
## [173] "progress"      "progressr"     "promises"      "ps"           
## [177] "purrr"         "quantreg"      "R.cache"       "R.methodsS3"  
## [181] "R.oo"          "R.utils"       "R6"            "ragg"         
## [185] "randomizr"     "rappdirs"      "rcmdcheck"     "RColorBrewer" 
## [189] "Rcpp"          "RcppArmadillo" "RcppEigen"     "RcppTOML"     
## [193] "readr"         "readxl"        "recipes"       "rematch"      
## [197] "rematch2"      "remotes"       "renv"          "reprex"       
## [201] "reshape2"      "reticulate"    "rio"           "rlang"        
## [205] "rmarkdown"     "roxygen2"      "rpart"         "rprojroot"    
## [209] "rsample"       "rstudioapi"    "rversions"     "rvest"        
## [213] "sass"          "scales"        "selectr"       "sessioninfo"  
## [217] "shape"         "shiny"         "slider"        "snakecase"    
## [221] "sourcetools"   "SparseM"       "spatial"       "splines"      
## [225] "SQUAREM"       "stargazer"     "stats"         "stats4"       
## [229] "stringi"       "stringr"       "styler"        "survival"     
## [233] "svglite"       "sys"           "systemfonts"   "tcltk"        
## [237] "testthat"      "textshaping"   "tibble"        "tidygraph"    
## [241] "tidymodels"    "tidyr"         "tidyselect"    "tidyverse"    
## [245] "timechange"    "timeDate"      "tinytex"       "tools"        
## [249] "translations"  "tune"          "tweenr"        "tzdb"         
## [253] "urlchecker"    "useful"        "usethis"       "utf8"         
## [257] "utils"         "uuid"          "V8"            "vctrs"        
## [261] "viridis"       "viridisLite"   "vroom"         "waldo"        
## [265] "warp"          "whisker"       "withr"         "workflows"    
## [269] "workflowsets"  "writexl"       "xfun"          "xml2"         
## [273] "xopen"         "xtable"        "xts"           "yaml"         
## [277] "yardstick"     "zip"           "zoo"

  1. It is “too weak” in that you should start a new R session regularly to ensure replicability. Removing objects from the workspace does nothing to packages, session options, graphical par()’s, etc.↩︎