There are a lot of questions and answers on the internet. This page lists resources that I have found useful, so I thought you might find them useful too.
Starting out with R
Step one: download R to your appropriate system (Windows, Mac, and Linux).
Step two: download RStudio. RStudio has invested a lot of time and money to make using R much, much easier. It’s also free!
Step three: Read R for Data Science. This is a great guide written by Hadley Wickham and Garrett Grolemund It introduces the principles of the
tidyverse - a set of R packages that play really well together. They cover tools to data reading in/out, manipulate data, perform data visualisation, and tools for modelling, summarising models, and communiating results.
If you are having problems with getting R or RStudio, check out this guide, from this person (can’t find their name on their website ). It covers installing R and RStudio in a little more detail.
Learning and Troubleshooting with R
To learn R, you need to learn how to get unstuck with R. This teaches you a really good process to iterate through when going through the process of getting unstuck.
For all my other problems, I usually google the error message, or try my darndest to ask an reasonable question to google that describes my current dilemna, and then look read the appropriate blog post, or StackOVerflow Answer. RSeek is also basically a google search that filters by R related content.
A new community really worth checking out is the RStudio community, which aims to provide a nice, friendly space to ask questions that don’t necessarily fit into GitHub issues, bug reports, or Stack Overflow. You can read more about it here in their announcement of the community.
I would also recommend checking out RStudio’s list of resources for learning R, and this blog post, which describes learning R from a social sciences background.
To stay up to date with what other people around the world are doing with R, I recommend checking up on r-bloggers every other day, and checking out the #rstats hashtag on twitter. The R and statistics community on twitter is both excellent and friendly.
Learning Advanced R
Got a basic handle on R and are hankering for more? I recommend these free, online books by Hadley Wickham:
There is also a book, Ramarro, by quantide which seems similar(ish) to Hadley’s books.
Advanced R stuff: S3 Classes
R’s S3 classes are this really awesome minimal class of functions that can be super handy in R. They are described nicely in Hadley’s book, but I have also found these to be helpful:
This blog post, which also has such a suave blog layout.
I have also written a blog post about S3 methods, and have a preprint on arxiv.
If you are going to do a plot in R, it should be in ggplot. It takes about 5 minutes to get the hang of, and once you’ve got it down you can create plots that make sense, behave how you expect, and look fantastic.
ggplot follows a logical syntax adapted from the book “The Grammar of Graphics”. It makes visualisation make sense. And there are lots of other packages that build upon it to make it more awesome, such as GGally, ggalt, ggExtra, ggforce, gganimate, and ggbeeswarm, to name a few!
Here are some ggplot resources in order of usefulness
- The RStudio ggplot cheatsheet sits pinned up above my desk.
- The official documentation
- The R Graphics Cookbook usually has the answers for what I’m after.
- I also recently discovered the ggplot2 wiki, which has some great case studies and examples.
- This handout provides an introduction to ggplot.
Plotly for R, written and maintained by Carson Sievert, is a very powerful and flexible interactive plotting engine in R. It has a fully fledged API for writing interactive graphics in R, as well as a fantastic function that gives the user a lot for free:
ggplotly. You can read more about plotly for R in Carson’s free and online book.
ggvis is another great package written by Hadley Wickham, which builds upon the structure of ggplot but it allows for more interactive, reactive, plot building. Examples can be found here here, and here.
More serious development on ggvis will apparently begin in 2018, as Hadley and his team at RStudio will be spending 2017 to make the everything in the tidyverse work well together. For the moment I would recommend using plotly to do your interactive graphics, although ggvis is still great!
shiny is a really awesome way to enhance your R script, package, or method. Shiny turns these into ‘apps’, that people can interact with.
- Use dplyr to manipulate data in R. Here is a helpful lesson
- Use tidyr to change the data format; gathering data into long format, and spreading them into wide format, etc. It also has heaps of other little handy tools, like
- Use broom to create tidy dataframes of statistical models. Here is a helpful lesson
The RStudio webinars page is in my opinion an untapped resource of the R community!
Probably the coolest thing ever.
knitr is this amazing package that allows the user to combine their code and document text, making research easier to reproduce, and it does this while looking slick and classy. The idea is essentially to let the human do the writing, and the computer handle displaying the results, so that reports can be easily constructed, and most importantly, reproduced easily.
Check out some really nice guides here and here, and from the awesome dude who created knitr here.
You can also augment your rmarkdown documents with templates. For example - rticles which is an r package that adds loads of rmarkdown templates. Currently, there are templates for the R Journal, the UseR Conference, Journal of Statistical Software, PLoS Computational Biology, and more!
Learning Statistics Using R
If you want to learn statistics using R, check out this website containing 15 hours of an applied R statistics course from Stanford. They also have an excellent (and free!) book.
I use decision trees a lot in R, and I even wrote a little package that helps take care of some common tasks in interrogating decision trees. Here are a list of resources that I recommend using to learn about them:
This book from James et al - chapter 8 specifically refers to decision trees. They’ve also made the book free! Also their videos on decision trees are very useful. You can find a comprehensive list of all their videos and material at this website
This book chapter from the Handbook of Statistics is broad and general.
This video on introduction to boosting trees for regression and classification by statsoft.
Using R for Spatial Data Wrangling, Analysis, and Visualisation.
Spatial data analysis can be really different to anything else that you’ve done in R. Well, it was for me. Fortunately, recent awesome progress has been made on the simple features R package, officially supported by the RConsortium, and authored by Edzer Pebesma. The format of simple features is to adopt a standard dataframe format, where every row is a spatial feature, and the spatial features are described in a geometry list column. This is really fantastic, because it means that (for the most part), working with spatial data is very similar to working with regular dataframes, which is the bread and butter of analysis and data wrangling in R.
In particular, simple features is designed to play nicely with the tidyverse, and accordingly plays well with ggplot2, dplyr, purrr, and so on. It’s amazing.
Here is a list of resources on using spatial data in R:
The R Spatial Blog is a great way to stay updated with the latest changes in simple features.
A blog post by Matt Strimas-Mackey on how to use simple features with dplyr, tidyr, and ggplot2.
For more thoughts on R for spatial data analysis:
- Michael Sumner’s overview of R’s spatial capabilities for 2017.
ggmap, is also great, as it produces static maps.
If you write code or plain text (LaTeX, RMarkdown, Markdown, R, c++ or even .txt), you should really consider using git to help manage your workflow. It’s like Dropbox on steriod.
To get started I would recommend 2 things to get started:
Read Jenny Bryan’s awesome book, Happy git with R.
Download GitKraken, it’s the best free GUI for interacting with git.
Some other great resources include:
I would also really recommend reading an article by Karthik Ram, “Git can facilitate greater reproducibility and increased transparency in science”.
STATA Related Resources
STATA do a great job of explaining multilevel and hierarchical models on their blog. I found these two blogs and video really helpful:
Just as it is important to have strong data visualisation skills, it is important to understand what makes a good looking document, poster, business card, and whatnot. To this end, you should read typography in ten minutes, and the summary of key rules of typography. One day I will purchase some fonts to pay him back.