Fall reproducibility and data science workshop series

Jan. 12, 2023
Image

Jessica and Kristina wrapped up the third iteration of our workshop series on data science skills in Fall 2022. Students were enthusiastic and learned a variety of applicable skills.

This series initially came from a request by the ESA SEEDS program to teach data science skills to ecology graduate students from across the country. We redesigned the materials for a UA audience and taught the inaugural cohort in spring of 2022. Based on participant feedback, we added sessions to teach intermediate GitHub skills and extend curriculum time on project documentation. We also conducted the followup sessions for help, review, and demos more rapidly to minimize loss of learning momentum.

We have thoroughly enjoyed getting to know this cohort, comprising graduate students, postdocs, and UA staff hard at work on projects including environmental genomics, natural language processing, and public health. It was a pleasure to witness project demonstrations, where we saw improvements great and small toward reproducibility. Workshop series participants also confirmed improvement in their ability and understanding of the data science skills we covered, as shown by the summary in survey responses below. 

Image

Summary of workshop participant responses to a survey about their skills. Participants were asked to assess their ability to complete and understand 13 primary skills before and after the workshop series. They self-assessed on a scale of 1-5, from not being able to complete/understand to being able to complete without help/understand. The x-axis of the figure represents the change from before to after, and therefore has a range from -5 to 5.

CCT Data Science will offer this foundational workshop series every Fall semester going forward, focusing on researchers who seek to deepen their practice of R for data science. If this may be of interest, please sign up for our quarterly email blast to receive updates about application deadlines.

One piece of feedback was to incorporate more data visualization with ‘ggplot2’. Upon deeper consideration, this content may be better suited for a short, standalone workshop series. Please let us know what you think should be included!