Crime Reproduction Study
#Summary and Reflection on Crime Study Reproduction
This study replicates Yunliang Meng’s 2021 analysis of crime rates and contextual characteristics in Connecticut using spatial statistics and geographically weighted regression (GWR). Based on the original study, we focused on county subdivisions as the unit of analysis and drew on publicly available data from the American Community Survey (ACS) and the FBI’s Uniform Crime Reporting (UCR) program for the years 2013–2017. While we closely followed the original study’s methods, our replication revealed both methodological ambiguities and interpretive challenges, prompting us to make several analytical decisions independently.
Through this replication attempt, we encountered several deviations from the original study. Some were forced by missing methodological details, while others emerged through attempts to validate and interpret the results. In the methodological case, for example, our Shannon Equitability Index results differed significantly from those reported in Meng’s study. After rigorous testing and external validation using independent diversity maps, we concluded the original study likely miscalculated the metric. This discovery not only affected our regression results (where the Shannon index failed to be a significant predictor for violent crime) but also highlighted the importance of transparency in computational workflows.
OLS regression results largely mirrored the original findings, affirming relationships between property crime and urban density, renter-occupancy, and housing type, as well as between violent crime and poverty, education, and population density withou considering spatial relationships. However, the results were regionally variable, which the GWR helped to uncover. In our study, we treated GWR less as a predictive model and more as an exploratory tool for detecting spatial heterogeneity. Still, the model’s high sensitivity to local context makes it difficult to generalize or extend its findings.
Another deviation arose when performing the Local Moran’s I analysis, where we visualized clusters of high and low crime values. Here, we found it difficult to reproduce the exact classification scheme used in ArcGIS, demonstrating the limitations of proprietary or GUI-based analysis environments when replicating research. We were ultimately forced to rely on secondary documentation, peer discussion, and tools like ChatGPT to fill in methodological gaps—an experience that underscored the importance of open science, code sharing, and reproducibility in spatial analysis.
Personally, this project deepened my understanding of both the technical and epistemological limits of spatial crime analysis. Tt’s easy to fall into the trap of viewing crime data as a social reality, which he original authors did with some minor caveas. This replication study reaffirmed that crime statistics are shaped by social processes—including policing practices, reporting patterns, and racialized definitions of criminality. Our findings, especially those concerning high-poverty or high-density areas, must be interpreted with a critical awareness of these biases. Spatial statistics can help us visualize and interrogate patterns, but they cannot tell us why those patterns exist without deeper contextual and historical analysis.
More broadly, this project reminded me how much of data science and spatial analysis involves judgment calls, especially when replicating under-reported methods. Reproducing work in this field is not just about re-running code—it’s about understanding the logic of the analysis, validating data decisions, and being transparent about the assumptions and limits of what we can infer. This study was as much an ethical and academic exercise as it was a technical one, and I walk away from it with a deeper appreciation for the rigor and transparerncy required to do meaningful spatial research.