InSon's Research

Ongoing Research Teaching

Applied Probabilistic Data Science in Python

a Coursera MOOC specialization co-creating with Christopher Brooks, University of Michigan, School of Information

Breakdown of MOOC
specialization:

Fundamentals of Bayesian Thinking
Probabilistic Data Analysis: Bayesian Estimation
Probabilistic AI-Assisted Technology: Bayesian Experimentation

The learning objectives of the first two courses are to explain what Bayesian thinking is and go through fundamental Bayesian estimation techniques in Python. The last course is to simulate AI-assisted toolkits including, but not limited to, online advertisement system (click through rate, conversion), MOOClet system (personalized message assigning policy), Bayesian knowledge tracing, and recommender system using coding lectures and step-guided projects.

Step-guided Code Examples
and Projects

This series of courses aim at engaging novice programmers who are new to Bayesian statistics, want to improve programming skills while learning statistics and nurture analytic mindsets. We also train conversational Bayesian programmers who will learn through industrial application the core Bayesian framework and mindset behind probability.

Currently, we have designed 50+ step-guided coding lectures and projects combined to encourage students to seamlessly interact with Bayesian models and develop experimental mindset during their studies.

Student-centric Content
Generation: Learnersourcing

Learnersourcing is a highly effective way to encourage high-order thinking, foster space repetition, improve student’s use of critical thinking and retrieval practice, and invite timely feedback, thus enhancing students’ motivation to engage in acquiring course-related skill sets (Learn more).

In response to the success implementation of learnersourcing in MOOCs, we are currently building a learnersourcing academic communities among all students who attend the specialization. We incentivize students to create research questions, generate interesting case studies and reinterpret concepts etc. Meanwhile, each students receive other student’s work and provide peer feedback and reflection.

Ongoing Research Projects

1. How to Learn Coding Quickly through Video
Project Overview

Character-by-character Comments: Traditionally, live coding videos are presented by screen sharing. As instructors write code in the IDE (i.e. Jupyter notebook), they can synchronise real-time narration to explain how the code works. However, most instructors find it difficult to simultaneously type and annotate the code while clearly explaining the code logic under the hood. I was motivated to design character-by-character comments when recording lecture videos, a technique invented by Educational Technology Collectives lab to reproduce every character of Python codes and comments according to the timestamp instructor’s auditory explanation is played. By implementing the character-by-character comments on live coding videos, I invoked dual-coding effect and reduced split-attention effect, which requires lower cognitive load for students to navigate and leads to higher learning satisfaction and learning gain.

Main Study: I constructed a quasi-experiment that involves three different playback speed options and three commonly used video design conditions including different comment design and with or without instructor’s face showing up on the screen. The main objective of the experiment drills into how educational quality of video is influenced by playback speed and video design. Particularly, the project questions if a small increase in lecture speed can induce time saving and boost engagement while not sacrificing learning outcome, and if the presence of the instructor makes classes more engaging to students.

2. Augmenting Learning Experience with
Machine-assisted Data Science Assignment Solving

A continued source of challenge for students are autograded assignments, and understanding how we can best scale feedback and improve course content. We are building a "porous" autograder, where the autograding experience is not just an evaluation of student submissions with pre-filled feedback, but also allows student errors and attempts to be seen rapidly (and in aggregated clusters) by course support (instructors, course liaisons, etc). On the evaluative side, we call for help to utilize AI-assisted models to assess the quality of code and capture the underlying misconceptions behind students' code errors.

3. Learnersourcing: Construct a More Engaged,
Diverse, and Effective Learning Environment

Based on work done in Applied Data Science in Python MOOCs looking at how learners in the US and other country origins might respond differently to culturally relevant learning materials, we explore how sustainable learning communities can be created at the intersection of the research areas of culturally relevant content, psychological sense of ownership, and learning analytics.

4. Authentic Datasets and Bias in Data
Science Education

Living in this exciting time of fakenews and polarization, the data source and the algorithmic fairness are rising concerns that worth dialogue among data science educators and practitioners. In many application, the algorithm’s expectation about a user makes the user unhappy and feel discriminated. The research explores the biases which come from contexts of (a) societal narratives and (b) personal beliefs in authentic datasets and proposes potential bias mitigation methods that address the bias in socio-cultural settings.

Past Research Projects

The Michigan Model of Diabetes

- University of Michigan, School of Public Health Michigan Diabetes Research

Modeling: I simulated the 30-year disease progression of 6 type 2 diabetic complications by calibrating risk factors and disease-state transition probabilities.

User Interface: I visualized the MMD such as (not limited to) the 30-year survival curves for the 6 type 2 diabetic complications, effect of medical treatments, and change of quality of life.

User Manual: I documented the simulation process, developmental phases of type 2 diabetes illnesses, derivation of disease-state transition probability, and effect of treatment interventions of MMD through a 50-page MMD user manual to support the clinical evaluation in University of Michigan Health System and disease prevention in Michigan Public Health departments.

Selected Course Projects

Measuring Latent Political Ideal Points of Twitter Users from User Description Text Data - University of Michigan, School of Public Health Slides Paper
Unhealthy Behaviors and Health Issues in U.S. - University of Michigan, LSA Statistics Unhealthy Behaviors and Health Issues in U.S.
The Opioid Prescription in U.S. - University of Michigan, LSA Statistics Opioid Prescription in U.S.

In Son Zeng

Ongoing Research Teaching

Applied Probabilistic Data Science in Python

Breakdown of MOOC specialization:

Step-guided Code Examples and Projects

Student-centric Content Generation: Learnersourcing