Getting Inclusive with Data Science: ERG Students Initiate Data & Diversity Course at Cal

 [Cecilia Han Springer, ERG graduate student] 

“What our students found was pretty shocking. In just a few short weeks, and without fancy statistics, they were able to highlight some major diversity-related problems on campus.” 

ERG students, in collaboration with groups across Cal, start a new data science course focused on diversity. ERG grad student, Cecilia Han Springer, shares their fascinating results and how they made this happen. For more information, please go to http://datadiversity.berkeley.edu.

Cecilia and Pierce at the Synberc's Expanding Potential Workshop

Our course started as a fledgling idea talked about in the ERG Reading Room. Before we knew it, Pierce and I were presenting before dozens of inspiring change-makers from all over the country. We never knew that our Data and Diversity project would go so far. 

Why data and diversity?

The first seeds of the Data and Diversity project were sown in an ERG Student Diversity Committee meeting. We were discussing the idea of “quantitative privilege,” wherein attention and funding tend to flow towards those who possess more quantitative skills and focus on quantitative methods.

Unfortunately, though, fluency with quantitative concepts often falls along lines of race, gender, and other identities, based on entrenched societal norms that affect students early on in their schooling. By graduate school, this disparity was already clear to us. Thus we asked the question: could we do anything to shore up these leaks earlier in the educational pipeline? Could we make sure that people of all backgrounds and identities feel comfortable with quantitative skills?

Around this time, Synberc announced a call for Seed Projects focused on increasing diversity in STEM fields, and it was clear to us that we might be able to do something impactful. We focused on data science for two reasons. First, it is a critical skill for both interdisciplinary and STEM research. Second, for better or for worse, quantitative data is very persuasive, and we thought we could use it to bring attention to diversity issues on campus.

Pierce leading a design thinking session

We began brainstorming about new ways to teach data science to a diverse group of undergraduates. Introductory computer science classes can be intimidating, and we saw the need to create a collaborative classroom environment that would foster diverse perspectives instead of intimidating them out of the pipeline. We also wanted to promote hands-on learning through analyzing data on diversity here at UC Berkeley, using new and innovative sources of data.

Each graduate student brought some unique skill to the class.

That was last spring in 2015. We worked all summer to design the class, with Yang Ruan (MS/MPP ‘15) building us a web platform and helping us choose a thought-provoking reading list. Current ERGies Grace Wu, Michaelangelo Tabone, and Pierce Gordon talked to groups all over campus — from the Berkeley Center for New Media to the Office of Faculty Equity and Welfare — to get their input on how to design the class. We recruited graduate students from Physics (Jesse Livezey), Integrated Biology (Dax Vivid), and ESPM (Guillermo Douglass-Jaimes) to help us mentor the students who would ultimately take the class. Each graduate student brought some unique skill to the class — from experience teaching Python to Pierce’s design thinking sills.

Ultimately, the Berkeley Institute for Data Science (BIDS) and the Berkeley Division of Equity and Inclusion (E&I) were happy to work with us as clients for the students’ data analysis projects. Anthony Suen, the Data Science Fellow from BIDS, was interested in promoting diversity and inclusivity for BIDS programs across campus, while E&I’s research analyst Andrew Eppig advised us on Cal diversity data sources and analysis.

The class

Our first class begins!

With seven extraordinarily bright undergraduates from a wide range of backgrounds signed up for the class, we started with the basic building blocks of data analysis using Python. Later we taught intros to data scraping and data visualization. Critical to the course, as well, was holding discussion sections on various themes around diversity, including microaggressions, stereotype threat, and unconscious bias. The Geoff Marcy scandal in the middle of the semester provoked a heated discussion on gender in STEM fields.

Once we laid these foundations, we drew from the field of design thinking to have brainstorming sessions on project ideas to apply our data and diversity skills to issues on campus. We went through several rounds of brainstorming — and hundreds of post-it notes — to generate as many creative and diverse ideas as possible. We then narrowed our ideas down to match the issues students cared about to the tools and data they had access to within the time we had in class.

Student presentation

What our students found was pretty shocking.

Toward the end of the semester, we let the students use class time to work on the projects they chose:
(1) the effect of “weeder” courses on diversity in STEM classes and
(2) where female applicants are leaking out of the faculty hiring pipeline.
What our students found was pretty shocking. In just a few short weeks, and without fancy statistics, they were able to highlight some major diversity-related problems on campus.

Student Project 1: Student diversity and “weeder” courses

The group analyzing weeder courses developed a list of criteria to identify the most notorious STEM weeder classes at Cal. They then used both the quantitative analysis of enrollment data and qualitative interviews to paint a stark picture of how weeder courses flatten diversity. Female underrepresented minorities suffered the greatest dropout rates while male non-underrepresented minorities had the lowest dropout rates — a trend that was consistent across all the weeder courses they analyzed.

They hypothesized that students from underrepresented backgrounds, in addition to being a visible minority in classes, also encountered a strong difference in prior experience with the subject as well as experience taking advantage of resources on campus (office hours, tutoring, etc.). To counter this problem, our students proposed miniintroductory STEM courses, a well-developed resource on finding fellow minorities in STEM courses, and increased diversity amongst GSIs.

"Weeder" course project graphs

Student Project 2: Gender and faculty

The group doing the gender and faculty analysis had similarly salient results. In developing their research question, they quickly found data to show that bias in hiring practices is NOT what is driving the gender disparity in faculty positions, as they had initially hypothesized. Rather, they saw a major difference in the number of female Ph.D. candidates graduating each year and the number of female applicants for faculty positions.

Women were self-selecting out of the academic pipeline. Our students designed a survey to find out why and sent it out to graduate departments across campus. They pulled in an impressive 478 responses, which they then analyzed qualitatively and quantitatively. They disaggregated responses by self-reported gender.

Gender and faculty graph

They found that, even when men and women expressed the same amount of dissatisfaction with a certain aspect of academia, many fewer women within that group intended to pursue a career in academia. Women reported higher dissatisfaction in the following areas: failure to connect with peers, lack of confidence, fear of consequences of having children, lack of representation in graduate school, stress, and unfair salary compensation.

Gender and faculty diagrams

The students’ research was well received.

Synberc, BIDS, the Division of Equity and Inclusion, and others who attended their final presentations were quite pleased with the work by our students. The faculty and gender study group went on to present their results at the Expanding Potential conference organized by Synberc in a presentation and a poster. Pierce and I also presented general lessons from the class at the conference.

The weeder courses study group is continuing to work with BIDS. At the Expanding Potential conference, Pierce also ran a design thinking workshop to get conference attendees to apply our methods to analyze diversity issues in their own institutions. The workshop was a great demonstration of how our class could be scaled to different settings.

Pierce at the design thinking workshop

We all learned a lot from running the course. We tackled a massive range of topics and methods, and while we may not have knocked every single one out of the ballpark, the stellar reception that the students’ projects received indicated some holistic level of success.

Still, if we do it again, we want to do it better.

It turns out that our original goals and our sub-goals flipped over the course of the semester. We originally focused on teaching data science, however, the diversity data projects ended up becoming the highlight because of how much our students were interested in them. From the feedback they gave us, students generally felt that readings and discussions were covered well, but the programming sessions could have been better planned. In the future, we intend to teach the data section better without sacrificing any of the perspective gained from exploring diversity issues in our discussions.

Do you want to be a part of the next round of Data and Diversity?

Or, do you want the course materials and syllabus for your own organization? If so, let us know! We’re in the midst of figuring out next steps and new ideas for the course, such as pairing students with private sector clients to analyze diversity within companies. We welcome your time, energy, and ideas!

Please email Cecilia.h.springer@berkeley.edu if you want to get involved.

Special thanks to all those mentioned in the article who helped with the course, as well as Shaila Kotadia and Kevin Costa at Synberc, and Duncan Callaway, our faculty sponsor at ERG.


  1. A really interesting and important project on a topic that is crucial to making higher education more inclusive and more relevant. Go ERGies! Keep up the good work!

  2. nice information about Data Science. very useful blog. its really help ful for me. keep sharing on updated tutorials............

  3. This comment has been removed by a blog administrator.


© ERG. Design adapted from Main-Blogger Blogger Template.