Welcome to the online version of Public Policy Analytics: Code & Context for Data Science in Government, a book set to be published by CRC Press as part of its Data Science Series. The data for this book can be found here.

The goal of this book is to make data science accessible to social scientists and City Planners, in particular. I hope to convince readers that one with strong domain expertise plus intermediate data skills can have a greater impact in government than the sharpest computer scientist who has never studied economics, sociology, public health, political science, criminology etc.

Public Policy Analytics was written to pass along the knowledge I have personally gained from so many gifted educators over the last 20 years. They are too many to name individually, but their impression on me has been so lasting and so monumental, that somewhere along the line, I decided to become an educator myself. This book is a reflection of all that these individuals have given to me.

I am incredibly grateful to my colleague Sydney Goldstein, without whom this book would not have been possible. Sydney was instrumental in helping me edit and compile the text. Additionally, she and I co-authored an initial version of Chapter 7 as a white paper. Dr. Tony Smith, a most cherished mentor and friend, edited nearly every machine learning chapter in this book. Dr. Maria Cuellar (Ch. 5), Michael Fichman (Intro), Matt Harris (review of functions), Dr. George Kikuchi (Ch. 5); and Dr. Jordan Purdy (Chs. 6 & 7), each generously provided their time and expertise in review. I thank them wholeheartedly. All errors are mine alone. Finally, this book is dedicated to my wife, Diana, and my sons Emil and Malcolm, who always keep me focused on love and positivity.

I hope both non-technical policymakers and budding public-sector data scientists find this book useful and I thank you for taking a look.


Spring, 2021

West Philadelphia, PA.

Table of Contents

Chapter Description Data
Chapter 1: Indicators for Transit Oriented Development Following the Introduction, Chapter 1 introduces indicators as an important tool for simplifying and communicating complex processes to non-technical decision makers. Introducing the tidyverse, tidycensus, and sf packages, this chapter analyzes whether Philadelphia renters are willing to pay a premium for transit amenities. link
Chapter 2: Expanding the Urban Growth Boundary Chapter 2 explores the discontinuous nature of boundaries to understand how an Urban Growth Area in Lancaster County, PA affects suburban sprawl. link
Chapters 3 & 4: Intro to Geospatial Machine Learning Chapters 3 and 4 provide a first look at geospatial predictive modeling, forecasting home prices in Boston, MA. Chapter 3 introduces linear regression, goodness of fit metrics, and cross-validation, with the goal of assessing model accuracy and generalizability. Chapter 4 builds on the initial analysis to account for the 'spatial process' or pattern of home prices. link
Chapter 5: Geospatial Risk Modeling - Predictive Policing Chapter 5 tackles the controversial topic of Predictive Policing, forecasting burglary risk in Chicago. The argument is made that converting Broken Windows theory into Broken Window policing, can bake bias directly into a predictive model and lead to a discriminatory resource allocation tool. The concept of generalizability remains key. link
Chapter 6: People-Based ML Models Chapter 6 introduces the use of machine learning in estimating risk/opportunity for individuals. The resulting intelligence is then used to develop a cost/benefit analysis for Bounce to Work! a pogo-transit start-up. The goal is to predict the probability a client will 'churn' or not re-up their membership. This is valuable for public-sector data scientists working with individuals and families. link
Chapter 7: People-Based ML Models: Algorithmic Fairness Chapter 7 evaluates people-based algorithms for 'disparate impact' - the idea that even if an algorithm is not designed to discriminte on its face, it may still have a discriminatory effect. This chapter returns to a criminal justice use case, estimating the social costs and benefits. link
Chapter 8: Predicting Rideshare Demand Chapter 8 builds a space/time predictive model of ride share demand in Chicago. New R functionality is introduced along with functions unique to time series data. link