Sunday, 3 February 2013

Scatterplot with marginal boxplots


Using R and ggplot2 to draw a scatterplot with the two marginal boxplots

Drawing a scatterplot with the marginal boxplots (or marginal histograms or marginal density plots) has always been a bit tricky (well for me anyway). The approach I take here is, first, to draw the three separate plots using ggplot2:
  • the scatterplot;
  • the horizontal boxplot to appear in the top margin;
  • the vertical scatterplot to appear in the right margin;
then second, to set widths and heights of the spaces used for axis and tick mark labels, and to combine the three plots using functions from the gtable package. The difficulty has been to ensure that the tick mark labels on the vertical axis in the scatterplot panel and in the top marginal boxplot panel take up the same space. Functions from the gtable package make this a reasonably straightforward process.
To draw the following chart, I borrowed and modified code from here and here. The final code and data are available on GitHub.

plot of chunk ScatterBoxPlot

Drawing the plot

This example uses the mtcars dataframe, available in base R. For convenience, the file mtcars marginal boxplots.R on GitHub contains all the code. First, load the ggplot2 and gtable packages and the mtcars dataframe.
library(ggplot2)
library(gtable)
data(mtcars)

Draw the scatterplot.

The plot margins are adjusted so that the spaces between the panels are reduced. Also, there is an ever-so-slight mismatch of the gridlines across the panels. The way to fix it is to remove the offset on each axis (expand=c(0,0)), then select an offset of your choice (expand_limits(...)). There are similar adjustments made to the marginal plots.
p1 <- ggplot(mtcars, aes(mpg, hp)) + 
   geom_point() + 
   scale_x_continuous(expand = c(0, 0)) + 
   scale_y_continuous(expand = c(0, 0)) + 
   expand_limits(y = c(min(mtcars$hp) - 0.1 * diff(range(mtcars$hp)), 
      max(mtcars$hp) + 0.1 * diff(range(mtcars$hp)))) + 
   expand_limits(x = c(min(mtcars$mpg) - 0.1 * diff(range(mtcars$mpg)), 
      max(mtcars$mpg) + 0.1 * diff(range(mtcars$mpg)))) + 
   theme(plot.margin = unit(c(0, 0, 0.5, 0.5), "lines"))

Draw the marginal boxplots

Note that the margins and axis offsets are adjusted to match those in the scatterplot. Also, the tick mark labels and axis titles for the x-axis and the y-axis are removed.
# To remove all axis labelling and marks from the two marginal plots
theme_remove_all <- theme(axis.text = element_blank(),
  axis.title = element_blank(),
  axis.ticks =  element_blank(),
  axis.ticks.margin = unit(0, "lines"),
  axis.ticks.length = unit(0, "cm"))

# Horizontal marginal boxplot - to appear at the top of the chart
p2 <- ggplot(mtcars, aes(x = factor(1), y = mpg)) + 
  geom_boxplot(outlier.colour = NA) +
  geom_jitter(position = position_jitter(width = 0.05)) +
  scale_y_continuous(expand = c(0, 0)) + 
  expand_limits(y = c(min(mtcars$mpg) - 0.1 * diff(range(mtcars$mpg)), 
                      max(mtcars$mpg) + 0.1 * diff(range(mtcars$mpg)))) + 
  coord_flip() +
  theme_remove_all +
  theme(plot.margin= unit(c(0.5, 0, 0, 0.5), "lines"))
                 
# Vertical marginal boxplot - to appear at the right of the chart
p3 <- ggplot(mtcars, aes(x = factor(1), y = hp)) + 
  geom_boxplot(outlier.colour = NA) +
  geom_jitter(position = position_jitter(width = 0.05)) +
  scale_y_continuous(expand = c(0, 0)) + 
  expand_limits(y = c(min(mtcars$hp) - 0.1 * diff(range(mtcars$hp)), 
                      max(mtcars$hp) + 0.1 * diff(range(mtcars$hp)))) + 
  theme_remove_all +
  theme(plot.margin= unit(c(0, 0.5, 0.5, 0), "lines"))

Get the gtables for the three plots


gt1 <- ggplot_gtable(ggplot_build(p1))
gt2 <- ggplot_gtable(ggplot_build(p2))
gt3 <- ggplot_gtable(ggplot_build(p3))

Set the maximum widths and heights for x-axis and y-axis titles and text

The gtables store information required to draw the plots, including the widths of the spaces occupied by the y-axis titles and tick mark labels. The code gets the maximum widths of these spaces for the scatterplot and the horizontal marginal boxplot (gt1 and gt2), then sets that maximum as the width in the two gtables. So that there are no problems with the vertical alignment of the scatterplot and the vertical marginal boxplot, the heights are similarly set for gt1 and gt3.
# Get maximum widths and heights
maxWidth <- unit.pmax(gt1$widths[2:3], gt2$widths[2:3])
maxHeight <- unit.pmax(gt1$heights[4:5], gt3$heights[4:5])

# Set the maximums in the gtables for gt1, gt2 and gt3
gt1$widths[2:3] <- as.list(maxWidth)
gt2$widths[2:3] <- as.list(maxWidth)

gt1$heights[4:5] <- as.list(maxHeight)
gt3$heights[4:5] <- as.list(maxHeight)

Combine the scatterplot with the two marginal boxplots

The following code creates a new gtable (gt), inserts the modified gt1, gt2 and gt3 into the new gtable, then renders the plot according to the information stored in the new gtable. Finally, a box is drawn around the combined plot.

# Create a new gtable
gt <- gtable(widths = unit(c(7, 1), "null"), height = unit(c(1, 7), "null"))

# Instert gt1, gt2 and gt3 into the new gtable
gt <- gtable_add_grob(gt, gt1, 2, 1)
gt <- gtable_add_grob(gt, gt2, 1, 1)
gt <- gtable_add_grob(gt, gt3, 2, 2)

# And render the plot
grid.newpage()
grid.draw(gt)

grid.rect(x = 0.5, y = 0.5, height = 0.995, width = 0.995, default.units = "npc", 
    gp = gpar(col = "black", fill = NA, lwd = 1))

plot of chunk combine_scatterplot_and_marginal_boxplots

Similar logic applies to the drawing of marginal density plots. The code shown below is also available in the file mtcars marginal density plots.R on GitHub.
# # Main scatterplot
p1 <- ggplot(mtcars, aes(mpg, hp)) + 
  geom_point() +
  scale_x_continuous(expand = c(0, 0)) +
  scale_y_continuous(expand = c(0, 0)) +
  expand_limits(y = c(min(mtcars$hp) - .1*diff(range(mtcars$hp)), 
                      max(mtcars$hp) + .1*diff(range(mtcars$hp))))  +
  expand_limits(x = c(min(mtcars$mpg) - .1*diff(range(mtcars$mpg)), 
                      max(mtcars$mpg) + .1*diff(range(mtcars$mpg))))  +
  theme(plot.margin= unit(c(0, 0, 0.5, 0.5), "lines"))

# To remove all axis labelling and marks from the two marginal plots
theme_remove_all <- theme(axis.text = element_blank(),
  axis.title = element_blank(),
  axis.ticks =  element_blank(),
  axis.ticks.margin = unit(0, "lines"),
  axis.ticks.length = unit(0, "cm"))

# Horizontal marginal density plot - to appear at the top of the chart
p2 <- ggplot(mtcars, aes(x = mpg)) + 
  geom_density() +
  scale_x_continuous(expand = c(0, 0)) +
  expand_limits(x = c(min(mtcars$mpg) - .1*diff(range(mtcars$mpg)), 
                      max(mtcars$mpg) + .1*diff(range(mtcars$mpg))))  +
  theme_remove_all +
  theme(plot.margin= unit(c(0.5, 0, 0, 0.5), "lines"))
               
# Vertical marginal density plot - to appear at the right of the chart
p3 <- ggplot(mtcars, aes(x = hp)) + 
  geom_density() +
  scale_x_continuous(expand = c(0, 0)) +
  expand_limits(x = c(min(mtcars$hp) - .1*diff(range(mtcars$hp)), 
                      max(mtcars$hp) + .1*diff(range(mtcars$hp))))  +
  coord_flip() +
  theme_remove_all +
  theme(plot.margin= unit(c(0, 0.5, 0.5, 0), "lines"))

# Get the gtables
gt1 <- ggplot_gtable(ggplot_build(p1))
gt2 <- ggplot_gtable(ggplot_build(p2))
gt3 <- ggplot_gtable(ggplot_build(p3))

# Get maximum widths and heights for x-axis and y-axis title and text
maxWidth = unit.pmax(gt1$widths[2:3], gt2$widths[2:3])
maxHeight = unit.pmax(gt1$heights[4:5], gt3$heights[4:5])

# Set the maximums in the gtables for gt1, gt2 and gt3
gt1$widths[2:3] <- as.list(maxWidth)
gt2$widths[2:3] <- as.list(maxWidth)

gt1$heights[4:5] <- as.list(maxHeight)
gt3$heights[4:5] <- as.list(maxHeight)

# Combine the scatterplot with the two marginal boxplots
# Create a new gtable
gt <- gtable(widths = unit(c(7, 2), "null"), height = unit(c(2, 7), "null"))

# Instert gt1, gt2 and gt3 into the new gtable
gt <- gtable_add_grob(gt, gt1, 2, 1)
gt <- gtable_add_grob(gt, gt2, 1, 1)
gt <- gtable_add_grob(gt, gt3, 2, 2)

# And render the plot
grid.newpage()
grid.draw(gt)

grid.rect(x = 0.5, y = 0.5, height = 0.995, width = 0.995, default.units = "npc", 
    gp = gpar(col = "black", fill = NA, lwd = 1))

plot of chunk marginal_density_plots


1 comment:

  1. Very interesting. I might try it with gridExtra

    Or maybe we need to post some patches to ggplot2 to make this a default option, like geom_rug!

    ReplyDelete