Sunday, 3 February 2013

Scatterplot with marginal boxplots

Using R and ggplot2 to draw a scatterplot with the two marginal boxplots

Drawing a scatterplot with the marginal boxplots (or marginal histograms or marginal density plots) has always been a bit tricky (well for me anyway). The approach I take here is, first, to draw the three separate plots using ggplot2:
  • the scatterplot;
  • the horizontal boxplot to appear in the top margin;
  • the vertical scatterplot to appear in the right margin;
then second, to set widths and heights of the spaces used for axis and tick mark labels, and to combine the three plots using functions from the gtable package. The difficulty has been to ensure that the tick mark labels on the vertical axis in the scatterplot panel and in the top marginal boxplot panel take up the same space. Functions from the gtable package make this a reasonably straightforward process.
To draw the following chart, I borrowed and modified code from here and here. The final code and data are available on GitHub.

plot of chunk ScatterBoxPlot

Thursday, 10 January 2013

Getting Access data into R

1. Introduction

These notes give the steps to configure a Windows machine so that R can communicate with Microsoft Access databases. It turns out that the same mechanism can be used to connect with Microsoft Excel workbooks, so the notes include R to Excel communication as well. In R, there are two main ways to connect with Access databases: using the ODBC (Open DataBase Connectivity) facility available on many computers; and using the DBI (DataBase Interface) package in R. These notes deal with ODBC only. The notes also include some details on how the set-up is different on a Mac, but none of the steps have been tested on a Mac. (It is worth mentioning a commercial product, Stat/Transfer (, that simplifies the task of transferring data between data formats, including Access to R.)

ODBC allows a connection to a database to be opened, but that is only half the process. The second half of the process requires the use of SQL (Structured Query Language) to import database tables into R. Thus the notes also provide a brief introduction to SQL, and show how to formulate SQL requests within R and then to send the request through the open connection to the database.

Monday, 19 November 2012

R and SQLite: Part 1

Creating SQLite databases from R

1. Introduction

These notes show how to create an SQLite database from within R. The notes outline two way in which R can communicate with SQLite databases: using the RSQLite package and using the sqldf package. Both packages use reasonably standard versions of SQL to administer and manage the database, but the two packages differ in the way meta statements are constructed.

Management of SQLite databases requires the use of SQL (Structured Query Language). These notes show how to formulate relevant SQL requests within R and then to send the requests through the open connection to an SQLite database. But for a comprehensive treatment of SQL, and in paticular, SQLite's flavour of SQL, readers should consult texts such as Allen & Owens (2010) and van der Lans (2009).