Shape plots

Introduction

The shape_plot() function creates a plot of estimates and confidence intervals using the ggplot2 graphics package. The function returns both a plot and the ggplot2 code used to create the plot. In RStudio, the code used to create the plot will be shown in the Viewer pane (see [Plot code example] for an example).

Basic usage

Supply a data frame of estimates and standard errors to the shape_plot() function, and specify the column that contains x-axis values and axis limits.

my_results <- data.frame(
  risk_factor = c(  17,    20,  23.5,    25,    29),
  est         = c(   0, 0.069, 0.095, 0.182, 0.214),
  se          = c(0.05, 0.048, 0.045, 0.045, 0.081)
)

shape_plot(my_results,
           col.x = "risk_factor",
           xlims = c(15, 30),
           ylims = c(-0.25, 0.5))

If your estimates and standard errors are on the log scale (e.g. log hazard ratios), then set exponentiate to true. This will plot exp(estimates) and use a log scale for the axis.

shape_plot(my_results,
           col.x        = "risk_factor",
           xlims        = c(15, 30),
           ylims        = c(0.8, 1.6),
           exponentiate = TRUE)

Set axis titles using xlab and ylab.

shape_plot(my_results,
           col.x        = "risk_factor",
           xlims        = c(15, 30),
           ylims        = c(0.8, 1.6),
           exponentiate = TRUE,
           xlab         = "BMI (kg/m\u00B2)",
           ylab         = "Hazard Ratio (95% CI)")

Using groups

Use col.group to plot results for different groups (using shades of grey for the fill colour).

my_results <- data.frame(
  risk_factor = c(17, 20, 23.5, 25, 29,
                  18, 20.5, 22.7, 24.5, 30),
  est         = c(0, 0.069, 0.095, 0.182, 0.214,
                  0.32, 0.369, 0.395, 0.482, 0.514),
  se          = c(0.05, 0.048, 0.045, 0.045, 0.061,
                  0.04, 0.049, 0.045, 0.042, 0.063),
  group       = factor(rep(c("Women", "Men"), each = 5))
)

shape_plot(my_results,
           col.x        = "risk_factor",
           xlims        = c(15, 30),
           ylims        = c(0.8, 2),
           exponentiate = TRUE,
           xlab         = "BMI (kg/m\u00B2)",
           ylab         = "Hazard Ratio (95% CI)",
           col.group    = "group",
           ciunder      = TRUE)

Adding lines

Use lines to add lines (linear fit through estimates on plotted scale, weighted by inverse variance) for each group.

shape_plot(my_results,
           col.x        = "risk_factor",
           xlims        = c(15, 30),
           ylims        = c(0.8, 2),
           exponentiate = TRUE,
           xlab         = "BMI (kg/m\u00B2)",
           ylab         = "Hazard Ratio (95% CI)",
           col.group    = "group",
           ciunder      = TRUE,
           lines        = TRUE)

Categorical risk factor

The risk factor can be a factor. In this case, the x-axis coordinates are 1, 2, 3, .. so suitable x-axis limits are 0.5 and number of categories plus 0.5. You may need to add position arguments so that points, intervals and text do not overlap.

smoking_results <- data.frame(
  smk_cat = factor(c("Never", "Ex", "Current"),
                   levels = c("Never", "Ex", "Current")),
  est         = c(0, 0.362, 0.814),
  se          = c(0.05, 0.09, 0.041)
)

shape_plot(smoking_results,
           col.x        = "smk_cat",
           xlims        = c(0.5, 3.5),
           ylims        = c(0.5, 4),
           ybreaks      = c(0.5, 1, 2, 4), 
           xlab         = "Smoking",
           ylab         = "Hazard Ratio (95% CI)",
           exponentiate = TRUE)

Scaling point size

Set scalepoints = TRUE to have point size (area) proportional to the inverse of the variance (SE2) of the estimate.

my_results <- data.frame(
  risk_factor = c(19, 24, 29),
  est         = c(0, 0.095, 0.214),
  se          = c(0.02, 0.018, 0.1)
)

shape_plot(my_results,
           col.x        = "risk_factor",
           xlims        = c(15, 30),
           ylims        = c(0.8, 2),
           exponentiate = TRUE,
           xlab         = "BMI (kg/m\u00B2)",
           ylab         = "Hazard Ratio (95% CI)",
           scalepoints  = TRUE)

To have consistent scaling across plots, set minse to the same value (it must be smaller than the smallest SE). This will ensure the same size scaling is used across the plots.

Confidence intervals

Narrow confidence interval lines can be hidden by points. Set the height argument to change the appearance of short confidence interval lines. The function will by default try to change the colour and plotting order of confidence intervals so that they are not hidden. You can also supply vectors and lists to the cicolour argument to have more control.

Note that the calculations for identifying narrow confidence intervals has has been designed to work for shapes 15/‘square’ (the default) and 22/‘square filled’, and for symmetric confidence intervals. These may not be completely accurate in all scenarios, so check your final output carefully.

my_results <- data.frame(
  risk_factor = c(19, 24, 29),
  est         = c(0, 0.095, 0.214),
  se          = c(0.02, 0.018, 0.1)
)

shape_plot(my_results,
           col.x        = "risk_factor",
           xlims        = c(15, 30),
           ylims        = c(0.8, 2),
           exponentiate = TRUE,
           xlab         = "BMI (kg/m\u00B2)",
           ylab         = "Hazard Ratio (95% CI)",
           scalepoints  = TRUE,
           pointsize    = 6,
           height       = unit(5, "cm"))

Customisation

See Customising plots for more ways to customise shape plots.

Notes

Stroke

The stroke argument sets the stroke aesthetic for plotted shapes. See https://ggplot2.tidyverse.org/articles/ggplot2-specs.html for more details. The stroke size adds to total size of a shape, so unless stroke = 0 the scaling of size by inverse variance will be slightly inaccurate (but there are probably more important things to worry about).