Statistics
source: rstudio/statistics.md
As always links first
Some commonly used statistics calculations for use in RStudio.
Load data
We'll load some data on Tasmanian devils and presence of DFTD (cancer) and Trypanosoma presence. The data is coded so that 0 is negative and 1 is positive.
library(readr)
# Load data
tasdevil <- read_csv("data/tasdevil-parasite.csv")
Basics
Sructure of data set
str(tasdevil)
How many had DFTD
sum(tasdevil$DFTDStatus)
Proportion and CIs
Load libraries
# Install first if you need
#
# install.packages("DescTools")
# install.packages("PropCIs")
# install.packages("binom")
# install.package("rcompanion")
# install.package("tidyverse")
library(DescTools)
library(PropCIs)
library(binom)
library(rcompanion)
library(tidyverse)
Calculate different in Trypanosoma prevalence between males and females with 95% CIs
groupwiseMean(TrypStatus ~ Sex,
data = tasdevil,
conf = 0.95,
digits = 3)
We could plot the values above using this…
#save values to a data.frame
CI <- groupwiseMean(TrypStatus ~ Sex,
data = tasdevil,
conf = 0.95,
digits = 3)
#plot
qplot(x= Sex,
y = Mean,
data = CI,
shape= Sex) +
geom_point(size=2.5) +
geom_errorbar(aes(
ymin = Trad.lower,
ymax = Trad.upper,
width = 0.15)) + theme_bw() + ylim(0,1)
Calculate different in Trypanosoma prevalence between males and females and 4 different sites with 95% CIs
groupwiseMean(TrypStatus ~ Sex + Site_code,
data = tasdevil,
conf = 0.95,
digits = 3)
Simple stuff
If you need here are some simple bits of code where you have basic numbers such as…7 positive out of sample size of 21.
binom.test(7, 21,
0.5,
alternative="two.sided",
conf.level=0.95)
Now we'll calculate the 95% CIs using the Jeffreys method.
BinomCI(7, 21,
conf.level=0.95,
method="jeffreys")
Odds ratio & Relative risk
Using epitools - manual here
Reminder: If you need more information on the tests use the help command in the console (e.g. ?riskratio
, ?oddsratio
).
Library
library(epitools)
# if you don't have this package, first install using `install.packages("epitools")`
Create a simple dataframe. In this case we'll test effect of gender on parasite presence with a simple positive/negative summary. Of course if you have a your raw data in a spreadsheet you could make your own by summarising the releavnt information into a dataframe. (Need help tidying and summarising your data…check out this tutorial to check you hooked on the dplyr
and tidyr
packages
factor1 <- c("Female", "Male")
factor2 <- c("Positive", "Negative")
dat <- matrix(c(16, 30, 15, 34), nrow = 2, ncol = 2, byrow = TRUE)
dimnames(dat) <- list("Sex" = factor1, "Parasite present" = factor2)
Your dataframe should look like this
dat
Now lets calculate our odds ratio
oddsratio(dat)
and relative risk
riskratio(dat)
Modelling
Manual here and webpage