Instructor: Erika A. Sudderth, PhD E-mail: Erika_Sudderth@brown.edu
Note: This course will be taught at MBL in Woods Hole in January, 2013 (Jan 7th-22nd).
The primary aim of the course is to learn methods in R for: 1) data manipulation 2) exploratory data analysis, 3) data analysis using standard statistical methods, and 4) graphical presentation of data and results. This is a twelve-day intensive course (6 hr/day), with additional problem sets to be completed in R. Each day will consist of 2-3 modules, each with an introductory lecture followed by a computer lab exercise. Participants will work through the lab exercises for each module independently, while the instructor circulates to answer questions. The problem sets will focus on reinforcing the methods learned in the daily modules by applying the material to new datasets. Participants may utilize provided datasets or bring their own datasets to analyze.
Familiarity with basic statistical analysis (e.g. hypothesis testing, analysis of variance, correlation, regression) is expected. The focus of the course will be on how to use R to analyze real datasets and present results. Suggested statistics references for review are listed below. More advanced methods (generalized liner models, time series, and bioinformatics) will also be introduced using specific examples. An additional objective of the course is to demonstrate approaches and available resources for learning new statistical methods in R when necessary for specific applications.
No prior programming experience is required. The course will start by introducing the basics of command-line programming and R scripts.
Enrollment limited to 15 students
Preliminary Course Schedule (Jan 7th – Jan 22nd)
Day 1: Introduction to R and data manipulation. Reading and writing data, dates, factors, subscripting, data aggregation, reshaping data, introduction to R scripts.
Day 2: Exploratory data analysis. Basic graphics, frequency histograms, quantile plots, boxplots, graphical parameters, introduction to lattice (graphics package).
Day 3: Summary statistics and resampling. Sample size, degrees of freedom, standard error, confidence intervals, bootstrap, writing functions.
Day 4: Hypothesis testing. Review probability distributions, visualizing univariate distributions, testing for normality, comparing variances and means, non-parametric tests.
Day 5. Correlation and data visualization. Correlation analysis, covariance, scatter plots and extensions (lattice), graphics (labels and legends).
Day 6. Regression I. Model formulae in R, linear regression, interactions, measuring model fit, plotting residuals, model checking, model simplification.
Day 7. Regression II. Polynomial regression, non-linear regression, multiple regression, model simplification, visualizing multivariate data.
Day 8. Analysis of variance and covariance. Effect sizes, nested designs and split plot, longitudinal data, plots for interpreting ANOVA.
Day 9. Generalized linear models. Logistic regression, ordinal regression, error structure, linear predictor, link function, plotting observed and fitted values.
Day 10. Managing workflow, advanced graphics. Scripts and functions, customizing plots.
Day 11. Time series. R packages for decomposing time series, filling data gaps, simple autoregressive and ARMA models for serial correlation.
Day 12. Bioinformatics. Examples of analysis and visualization of microbial community data using phyloseq.
R Reference Texts:
1. Introductory Statistics with R. Peter Dalgaard. 1st edition available from Brown.
2. Data manipulation with R. Phil Spector.
3. Lattice. Multivariate data visualization with R. Deepayan Sarkar.
Statistics Reference Texts:
1. Using Statistics to Understand the Environment (2000). Wheater & Cook.
Basic and readable text for statistics review.
2. Statistics: An introduction using R (2005). Crawley.
Concise review of basic and more advanced statistical methods.
3. Environmental and Ecological Statistics with R (2009). Qian.
Applied statistical analysis with data and code available online.