Networks, Community Detection, and Robustness: Statistical Inference on Student Enrollment Data
General Material Designation
[Thesis]
First Statement of Responsibility
Israel, Uriah
Subsequent Statement of Responsibility
McKay, Timothy A.
.PUBLICATION, DISTRIBUTION, ETC
Name of Publisher, Distributor, etc.
University of Michigan
Date of Publication, Distribution, etc.
2020
GENERAL NOTES
Text of Note
146 p.
DISSERTATION (THESIS) NOTE
Dissertation or thesis details and type of degree
Ph.D.
Body granting the degree
University of Michigan
Text preceding or following the note
2020
SUMMARY OR ABSTRACT
Text of Note
At the heart of higher education is the student experience which depends upon the courses students take, the people they interact with, their extracurricular activities and much more. Developing methods to measure the student experience will help university leaders such as presidents, provosts, deans, department chairs, and faculty design better curricula and allocate resources, it will give more context to students about the courses they select, and it help employers better understand the graduates that they will employee. In this study, we demonstrate how high resolution student enrollment data can be used to better quantify the student experience. The methods described in this thesis are not unique to the institution studied and are scalable. They can be applied at other institutions were student enrollment is recorded. This thesis introduces a dataset provided by the University of Michigan Information and Technology Services staff. These data contain information on enrollment dating back to 2000. We demonstrate how this data is implicitly networked. The connections between students and courses are explored and analyzed by employing methods from network science. Student enrollment is represented as a bipartite network. Common network measures are made on these individual networks to gain insights on the structure of the university based on how students enroll in courses. Questions related to how to characterize connections lie at the core of social network analysis. How are edges defined? Are they directed? Do they receive different weight and if so how? In this thesis, we introduce three measures for defining a connection between students. The three types of connections are unique connections, weighted connections, and intensity connections. The first, unique, answers the question: who did you take courses with? The second, weighted, answers: how many courses did you take with an individual? The third measure, intensity, combines the previous question of how many, with the question of: what was the enrollment size of the courses you took with an individual? The relationship between these measures varies depending on the subset of students you're looking at. For example, there is zero correlation between the unique connections and intensity connections a Mechanical Engineering BSE students makes, however, there is relatively high correlation with these two connections for History BA students. Using network analysis, we can draw comparisons to traditional categories and measures. For example, how effective, or informative, are the typical categorizations (or labels) used to describe students? The typical categorizations, split students into bachelor of science and bachelor of arts (BS/BA), the next splits students into humanities, social sciences, biological sciences, and natural sciences, and the final categorization splits students by majors. We introduce concepts such as label coherence, strong and weak recoverability, and robustness. Through this analysis, we find that BA and BS is not a good representation of courses taken. We also show that majors perform the best of the legacy labels, however, there is a significant difference in performance between the majors. Finally, we explore the link between how connections are defined in a network and the recoverability of a labeling in a network. We find see little correlation between strong recoverability and unique connections and high correlation between strong recoverability and intensity connections.