Developing an Understanding of Statistics Using R.

Rob Feissner PhD, Laboratory Instructor in Biology, SUNY Geneseo

For the Fall 2013 semester of my college freshman-level General Biology Laboratory class, I overhauled the way statistics was taught. The reason for the overhaul was two-fold; I wanted to move toward an inquiry-based experience to replace a lab-on-rails style tutorial and I wanted to migrate to the free and open-source ‘R’ statistical programming software environment as the platform for data analysis. As with any change in curriculum, questions arose. Would the introduction of a new and complex software environment (R) challenge students to explore HOW statistics work or simply overwhelm them? Would students be successful in independently using collaborative peer interactions and freely available internet resources to help them solve problems? To answer these questions I spent one semester carrying out an action research project to investigate these questions

The decision to change software was prompted by the adoption of ‘R’ as the primary tool for data analysis in upper-level courses, as well as the fact that it is free and usable on any computer platform. The old laboratory was a walkthrough experience that led students through the procedures for completing statistical tests. Because the lab simply led students through entering numbers in a spreadsheet-like interface and selecting options from drop-down menus, the reasoning for what was being accomplished was lost to many students.

I designed the new laboratory to be an integrated component of the entire semester rather than a single, one-time activity. The main goals were to emphasize statistical literacy and develop statistical thinking, use real data in an authentic manner, and stress conceptual understanding rather than rote memorization of procedures. The first exposure to statistics was through a lab entitled “Introducing R” that provided a self-guided lesson on how to acquire, install, and become familiarized with the R software itself. A second lab entitled “Using R for Biological Statistical Analysis” was an inquiry-based lab in which students solved progressive problems by searching for help within the internal help system and on the internet to develop an understanding of data analysis using ‘R’ and how to employ statistics to test hypotheses.

After switching to the R-based lab, grades from all graded lab assignments (n=248) were tabulated and compared to the grades from a similar assignment from the prior 4 semesters. My hypothesis for changing the lab experience was that student achievement would increase with an inquiry-based approach as compared to the old recipe-book style approach. The median grade, however, dropped upon switching labs. A number of explanations for the grade drop are discussed in my research report.

Despite the small drop in grades on the statistics assignment, I noticed some positive changes in lab reports through the semester. Whereas students in the past relied on graphs to present data in lab reports, I started seeing students voluntarily using ‘R’ to analyze their data statistically. For their final project, between 50% and 80% of student groups both chose to use some sort of statistics to analyze some or all of their data. With one semester of data to draw upon, my impression is that while students were challenged to put together the concepts of a statistical analysis with little guidance, the struggle led to a deeper understanding of why statistics are useful. It will be interesting to follow these students through their college career to see if the groundwork established in freshman general biology will lead to higher achievement in upper-level classes that depend heavily on ‘R’ usage.

Leave a Reply

Your email address will not be published. Required fields are marked *