1. Introduction

A great deal of child welfare research is focused on the causal relationship between maltreatment and associated exogenous factors or neighborhood effects (Daley et al, 2016). Conclusions from causal studies are often used to drive major programmatic or budgetary decisions.

In contrast, predictive modeling is not a tool for causal inference but for operational decision-making. The goal is to help stakeholders more effectively allocate their limited child welfare resources. While a causal model is judged on its external validity, a predictive model is judged on whether it leads to more effective decision-making relative to the current resource allocation process.

To date, researchers have built both people and placed-based predictive models of child maltreatment. Jurisdictions like Allegheny County, Pennsylvania use people-based algorithms to estimate risk of abuse for a child who is the subject of a child welfare hotline call. This risk score is then used alongside other metrics to decide whether a social worker should be dispatched to the child’s home (Hurley, 2018). People-level predictions require person and household-level data, often requiring a jurisdiction to overcome the significant legal, bureaucratic, and financial costs needed to integrate highly private, across-agency administrative data. The benefit of people-based models is their ability to guide interventions to specific individuals or households, potentially preempting a maltreatment event.

Placed-based predictive models also predict risk, but do so for a small geographic area. Although placed-based models make predictions at a geographic scale greater than the household, the only required private data is geocoded locations of maltreatment events. Most of the variables used to predict risk are environmental factors such as crime, blight, nuisance land uses, etc. - public data often available for download in bulk on city or state Open Data websites. Not only does this lessen the concern over the mistreatment of private data, it also means that building, testing, and deploying placed-based predictive models is more cost effective.

This document provides the technical details behind the Urban Spatial/Predict Align Prevent geospatial risk prediction model. Section 2 reviews the literature that situates the child maltreatment spatial risk prediction framework presented here and Section 3 introduces the codebase hosted on GitHub. These two sections are included in this report to provide additional context on our methodology to the reader.

Sections 4 through 8 are designed to be read alongside the county-level child maltreatment reports Urban Spatial has developed. Section 4 presents the Extract, Transform, and Load tools that were created for the purposes of data ingestion, cleaning, and standardization. Section 5 discusses several Exploratory Data Analysis routines that were created to describe key relationships in the data, many of which can be visualized and explained to non-technical policy makers. Section 6 and Section 7 explain the host of comprehensive, machine-learning specific tools which were generated for the purposes of prediction and prediction assessment, respectively. Finally, Section 8 presents the Align workflow, which converts the risk predictions into a strategic planning framework.

2. Methodology Literature

The child maltreatment spatial risk prediction framework presented here was developed in the spirit of criminological platforms like RTMDx and Hunchlab, with some critical differences. First, it is free and open source and released with the hope that other jurisdictions will replicate and improve our work. Second, it is more than just a predictive algorithm. The exploratory data analysis functionality, algorithmic fairness tools, and suite of comprehensive planning tools (‘The Align Phase’) should make it easier for analysts to communicate data-driven insights to non-technical policy makers.

Spatial prediction has been a tool in the spatial analysis toolbox for decades. Predicting unknown quantities across space from observed sample data is typically known as ‘interpolation.’ In their exhaustive review, Li and Heap (2008) situate this brand of spatial prediction in the realm of ‘spatial interpolation’ suggesting, ‘nearly all spatial interpolation methods can be represented as weighted averages of sampled data.’

Methodological literature at the intersection of interpolative predictive modeling and geography include simple interpolation (Silverman, 1986), geostatistical methods such as Kriging (Cressie, 1993), Gaussian Process Regression (Dearmon and Smith, 2016), Geographic Weighted Regression (Brundson et al, 1998), Generalized Linear Models (Haining, Law & Griffith, 2009), and non-linear models like Random Forests (Oliveira et al, 2012).

These models have been applied across a variety of disciplines including forestry (Oliveira et al, 2012), soil science (Leib, Glaser & Huwe, 2012), biology (Knudby, Brenning & LeDrew, 2010), archaeology (Harris, Kingsley & Sewell, 2015), criminology (Caplan, Kennedy & Miller, 2011), and public health (Steif, Butz & Streetman, 2018). There is also a compelling thread of literature on the validation of spatial prediction methods including Congalton (1991), Vicente-Serrano et al (2003), Li and Heap (2008), and Brenning (2012). There is decades of literature on machine learning techniques, which vary from geostatistics but are mechanically similar, including Kuhn and Johnson (2013) and Hastie, Tibshirani, and Friedman (2008).

The first application of these techniques to predict child maltreatment employed The Risk Terrain Model (RTM; Caplin, Kennedy & Miller, 2010). The original RTM model used a weighted raster overlay methodology rooted in ‘Map Algebra’, methods first presented by Tomlin (1990). While this is a useful descriptive approach for generating spatial hotspots, it does not provide measures of statistical confidence or goodness of fit.

The RTM approach matured from geographical overlay into a more prescriptive machine learning workflow as part of a partnership with Jeremy Heffner and software development firm Azavea. The Risk Terrain Modeling Diagnostics Utility tool (RTMDx) used regularization to reduce potential collinearity when predicting crime counts (Caplan et al, 2013). Heffner developed a novel feature selection approach combining Penalized regression and Stepwise regression to estimate a final Poisson regression model that predicts crime counts. The RTMDx is similar to other ‘predictive policing’ applications including HunchLab and Predpol. RTMDx ranges from $100/year for a student license to $4,995/year for ‘Agency Users.’ Hunchlab, the predictive policing application developed by Azavea, can cost $15,000/yr for every 50,000 people in a jurisdiction. According to the same source, Predpol, another predictive policing application can also range in the six figures.

3. Codebase and Licensing

The code created for this analysis is licensed under the GNU General Public License (GPL). The suite of tools we have developed and document on GitHub have been written in the R Statistical Language using custom functions of our own creation, as well as a number of open source packages shared freely by their authors.

The codebase written for this project is designed with the goals of standards and transparency in mind. The public sector is still developing the internal capacity for developing machine learning algorithms. Typically, governments procure sophisticated, closed-source software at a significant expense for both initial development and ongoing maintenance. We believe that by creating the foundations for an open source suite of spatial machine learning tools as part of this project, we can lay the groundwork for standards that other jurisdictions can adopt with minimal startup costs in the future.

The proliferation of standards will make it easier for the policy community to coalesce around a set of best practices for developing and deploying these models. Perhaps most critical is the ability to interpret the accuracy of these models and identify potential sources of bias. The reader will find all the source code used to develop this analysis on the associated GitHub repository.

4. Feature Engineering and ETL Procedures

Feature engineering is the creation of standardized observations from which predictive relationships can be mined. The process of feature engineering includes the steps of data ingestion, aggregation, and transformation. This pipeline allows for a single process where the locations and attributes of each dataset are reshaped consistently to meet the modeling goals. As Table 1 shows, this study employs several datasets, converting each into a series of ‘predictive features.’ Jurisdictions can use Table 1 as a guide, however, other variables may be available.

Dataset Description
Crime Locations of reported violent and nonviolent incidents that represent risk in the city
Points of Interest Places that characterize environmental factors and are categorized as either risk (i.e. Pawnbrokers, Payday loan locations, ABC stores) or protective (i.e. Libraries, Community Centers, Grocery stores) features
Businesses Businesses that either characterize nuisance land uses or protective land uses
Code Violations Indicates the safety and overall quality of structures
CPS Accepted Events Locations of child matreatment incidents
Administrative boundaries Shapefiles of key boundaries including the study area, neighborhoods, political wards, etc.

4.1 Data ingestion

Routines have been developed that load data from various files and formats into the R programming environment for standardization and reshaping. This step begins with moving all data files to a common folder and then using R to list the files, check for the presence of ID and spatial coordinate fields, the number of rows, and the number of columns. A custom R function is written to handle these steps automatically regardless of file type or file names. Once checked and entered into this format, summary statistics on each dataset are generated to better understand the range, magnitude, and sparsity of the characteristics of each.

Special measures are taken to clean the data set containing the location of maltreatment events. Being that the count of maltreatment events in a given unit area is the dependent variable for this modeling process, it is important that the data cleaning process align with the modeling objective: to identify trends of elevated maltreatment for more efficient deployment of resources.

The additional cleaning steps include the removal of events that are spatial and temporal duplicates. It is clear that maltreatment events can happen repeatedly at the same location, but records of incidents at the same time and place are removed as they likely indicate data entry duplicates. Including duplicates of the same event has an undesirable bias of over emphasizing the features of that geographic area.

4.2 Feature engineering

Data aggregation involves the counting of mapped maltreatment events within each cell of the ‘fishnet’ - the lattice grid overlain the study area. The use of a lattice grid as the unit of analysis provides a standardized unit of analysis that does not conform to a predefined political or social boundary (i.e. census tracts or neighborhoods) and from which ‘exposure’ to geographic risk and protective factors can be measured. Further, with an arbitrary grid, the geographic dimensions of the analysis can be controlled.

The fishnet is made using the st_make_grid function from the sf package, specifying the study area and cell size. Multiple grid cell sizes are tested to determine which yields the most appropriate distribution of data to model. Maps and histograms that show the count of maltreatment events per cell are created at each cell dimension. If the grid cell size is too small, the result is many grid cells with zero counts and a lower maximum of incidents per cell. When the grid cell size is too large, there are only a few hundred zero count cells and a much higher maximum count of events per cell. The appropriate cell size should produce a distribution of data somewhere in between these two extremes. The appropriate cell size should also be at a scale suitable for targeting community interventions in high risk areas.

Once the fishnet is created, the number of maltreatment event locations is counted for each of the grid cells. Further, the population normalized rate of maltreatment events per 100 residents is calculated for each grid cell. This is achieved by overlying the fishnet grid onto the 2010 census tract total populations and attributing to each cell the portion of the population corresponding to the size of the grid cells spatial intersection. The result is that each grid cell is ascribed with a population consistent with the density of population within the census tracts that it intersects.

The objective of this study is to predict the net number of maltreatment events per fishnet grid cell, however the identification of maltreatment rates is important in providing a context to the model assessment.

Three different ‘feature engineering’ approaches are used to measure exposure from each grid cell to nearby risk and protective factors:

  1. Net or sum of exposure events per grid cell
  2. The Euclidean distance from the center of each grid cell to the nearest exposure event
  3. The average Euclidean distance from the center of each grid cell to the five nearest event neighbors.

Each feature engineering method measures exposure differently. Approach 2, which is visualized as the left image in Figure 1 below measures the distance from each grid cell to its one nearest neighbor risk/protective factor, however, it may be biased if the one nearest neighbor is an outlier. Thus, as illustrated by the right image in Figure 1, the third approach is to measure the average distance from each grid cell to its n nearest neighbor risk/protective factor. In this example, n = 3.

How many nearest neighbors is the ‘correct’ number to use? Consider the following three notions: First, for a given risk/protective factor, the ‘correct’ number of nearest neighbors leads to the best prediction. An optimal (but computationally intensive) predictive algorithm is one that would iteratively test each combination of risk/protective factors by each combination of n, selecting the combination which leads to the most optimal model.

Second, assumptions about n can be made according to how rare a given risk/protective event occurs in space. For instance, a feature describing exposure to crime would likely have a higher n than a feature describing distance to the nearest school. This is because crime occurs more frequently.

Finally, if the scale of phenomena varies across the study area, there is likely no one correct nearest neighbor parameter. For example, imagine a city composed of a dense downtown and a less dense neighborhood along the periphery. Despite being in the same jurisdiction, the scale of the social relationships occurring in each place are fundamentally different because the scale of the built environment is fundamentally different.

Figure 1

Figure 1

The final step in the feature engineering phase is to normalize the contributing risk and protective features by putting all of the measurements on a relative scale. Data are normalized in two steps:

  1. Centering the range of measurements for each feature on zero by subtracting the mean
  2. Scaling the measurements of each feature by dividing by the measurements standard deviation

The end result is that each feature has a mean of zero and each value is a z-score. A primary benefit of putting each of the measurements on the same scale is to help the machine learning algorithms compute more efficiently. Additionally, it leads to scaled model coefficients that are relative to the average measurement of the feature.

The results of the feature engineering phase is a data table containing a row for each fishnet grid cell and columns for each of the normalized features (as aggregate counts, distance to nearest event, and nearest neighbor distance) and the count of maltreatment events. This data table is created for both the risk factor and protective factor groups.

5. Exploratory Data Analysis

Exploratory Data Analysis (EDA) is a principled exploration of the data that allows natural patterns, structures, and relationships in the data to come to the surface (Tukey, 1977). The tables and visuals created for this stage highlight outliers and anomalies, test assumptions, and uncover underlying structures.

A desired outcome of EDA is to present data in such a way that stakeholders ask new and interesting questions that were not previously realized. This approach benefits in clearer goal setting and an atmosphere of collaboration between policy makers, technical specialists, and analysts.

EDA is employed here to gain a deeper understanding at the intersection of the contributing features, maltreatment outcomes, and subject matter expertise. This process is even more prescient due to the complicating dimensions of space and time. The EDA methods developed here consist of:

  1. Calculating summary statistics for each of the features
  2. Plotting histograms and counts of categorical variables
  3. Mapping the distribution and density of each feature
  4. Visualizing the correlation between each variable
  5. Computing Global and Local Moran’s I statistics
  6. Simulating random Nearest Neighbor distance
  7. Estimating the distribution of the outcome variable

5.1 Calculating summary statistics for each of the features

Summary statistics (e.g. minimum, maximum, deciles, counts, etc.) provide quantitative descriptions of each feature and is performed using standard techniques in the R coding environment.

5.2 Visualizing the correlation between each variable

Correlations are an initial view into the relationships between features and outcomes. A Pearson correlation is calculated between each feature and maltreatment count. The result of this process is a correlation matrix. The coefficients of this matrix instruct on both the degree of correlation between features and the relationship between the features and outcome.

Features that are highly correlated to each other are said to be ‘colinear’ and it is often not useful to include colinear features in the model. Correlation is used to choose the best feature from each set of three for a given measure of exposure. For instance, exposure to aggregavted assault has three features - sum, nearest neighbor distance, and average nearest neighbor distance. Of those, the feature that has the greatest correlation with maltreatment is selected to be used for the predictive model.

5.3 Computing Global and Local Moran’s I statistics

Moran’s I is a measure of spatial autocorrelation and used here to estimate the pattern of spatial association for maltreatment counts. The values of Moran’s I generally range between -1 and +1, but more extreme values are possible. A value of zero typically indicates spatial randomness, a value of -1 indicates negative spatial autocorrelation (i.e. a regular pattern), and a value of +1 indicates positive spatial autocorrelation (i.e. clustering).

For this study, Moran’s I is calculated on both the global scale and the local scale. Global Moran’s I results in a single value describing the universal autocorrelation. This study uses the moran.mc function of the spdep R package to derive the global I value through Monte Carlo simulation (Bivand and Piras, 2015).

Local Moran’s I describes the spatial autocorrelation within a local neighborhood of eight fishnet grid cells surrounding a given fishnet grid cell. This neighborhood configuration is called the ‘Queen case’ because it is the same eight directions as the Queen moves in chess. The local measure of Moran’s I also computes the p-value indicating the significance of the local autocorrelation (Anselin, 1995). Both the Moran’s I value and the p-value are plotted to visualize significant positive and negative local clusters of maltreatment event counts.

5.4 A test for measuring if maltreatment is ‘close’ to a given risk/protective factor

A second simulation method is used to better understand whether risk and protective factors are more closely or distantly associated with maltreatment event locations (Smith, 2016). This is achieved by simulating a set of random points equal in quantity to the number of events for a specific feature and averaging the nearest neighbor distance to n maltreatment events. This is repeated 1,000 times resulting in a distribution of randomly permuted nearest neighbor distances. Finally, this distribution is compared to the observed global average nearest neighbor distance of each contributing feature. If the observed distance of a given feature is closer than 95% of the randomly permuted distances (a p-value of 0.05), we can conclude that the feature is ‘close’ to maltreatment citywide. This is a powerful tool in understanding which aspects of risk and protective factors are in the same geographic space as maltreatment events and if specific features (e.g. churches, fire houses, community centers, etc.) are better suited to address maltreatment needs.

5.5 Estimating the distribution of the outcome variable

The final step in EDA is to gain a better understanding of the statistical distribution of maltreatment event counts. Inherently, the number of maltreatment events per fishnet cell is a count data type, which is typically modeled by either the Poisson or Negative Binomial distributions.

The intuition for the Poisson distribution is that maltreatment is a relatively rare event and the probability of this occurrence is independent after accounting for contributing factors. Further, it is assumed that the variance in maltreatment event counts is approximately equal to the mean of counts. This assumption limits the potential for drastic outliers in maltreatment event counts relative to the mean of all counts. By contrast, the Negative Binomial distribution can be conceptualized in a similar manner with the exception that the variance is not the same as the mean.

The fitdist function in the fitdistrplus package is employed to simulate samples from the empirical maltreatment event count distribution and compare the goodness of fit to both the Poisson and Negative Binomial distributions (Delignette-Muller and Dutang, 2015). The results of this test are visualized and used to choose between the two distributions.

6. Model Fitting and Stacking

Model ‘fitting’ describes the process by which a statistical algorithm learns about maltreatment risk by relating the interaction of risk/protective factors to maltreatment events across space. Once a model is fit and validated, the learned pattern is applied back to the contributing features in each fishnet cell to predict the count of maltreatment events across space. This prediction highlights areas where maltreatment risk may be present but not reported.

6.1 Feature selection

The first step in the model building process is to select the 15 most statistically important risk and protective features. We select across the different feature types (Euclidean distance, average Nearest Neighbor distance, and aggregate counts) based upon statistical correlation. These features make up the final feature sets, which are then subjected to our models.

6.2 Model fitting

Three different algorithms are fit modelling different aspects of the spatial process and then combined into a fourth ‘meta-model.’ The three individual models are a Poisson Generalized Linear Model (Poisson GLM), a Random Forest model, and a Spatial Durbin Model (SDM). The final prediction of maltreatment events is produced from the meta-model which is created by applying the Random Forest algorithm to the predictions made by the sub-models. The use of three different model algorithms is an effort to understand different aspects of the highly complex system that contributes to the observation of a maltreatment event.

At each stage in this process, models are fit using a ‘leave one group out cross validation’ routine (LOGOCV). LOGOCV splits the data into spatially explicit groups (i.e. neighborhoods), fits the models to all but one of the groups, and predicts maltreatment event counts for the left out group. This process, explained in greater detail below, tests how well the models generalize across neighborhoods. Below we explain the three sub-models and their inclusion in the final meta-model.

6.2.1 Poisson GLM

The Poisson GLM model, fit with the base R glm function, is an adaptation of linear regression that accounts for the characteristics of count data. The adaptations include modeling the residuals as Poisson distributed and transforming the linear model with a natural log function. As a result, the predictions from a Poisson model are positive and represent mean expected counts conditional on the contributing risk or protective features for each fishnet grid cell. In the meta-model, Poisson GLM predictions represent a linear model of a Poisson distributed count process.

6.2.2 Random Forest

In this study, the Random Forest algorithm is fit using the ranger library (Wright and Ziegler, 2017). The Random Forest algorithm builds a series of decision tree models to learn the relationship between maltreatment and exposure variables. The stochastic approach to sampling from observed data ensures that each individual tree is different from the next. The Random Forest provides a ‘wisdom of many’ approach, contributing non-linear interactions between maltreatment and the corresponding features to the final meta-model.

6.2.3 Spatial Durbin Model

To model spatial interrelationships - also referred to as ‘spatial autocorrelation’ - a Spatial Durbin Model (SDM) is fit using the errorsarlm function of the spdep package in R (Bivand and Piras, 2015). In the setting of this study, the interpretation of this model is that the rate of maltreatment events is affected by both the exogenous exposure factors as well as neighboring rates of maltreatment. Further, this model assumes that there may be latent features that impact the model errors but are not accounted by the exposure features.

The key model input of spatial autocorrelation is a spatial weights matrix relating maltreatment in a given grid cell to its neighbors. Modeling the underlying spatial maltreatment process provides a powerful predictive story when input into the final meta-model. It is important to note that the SDM is not fit with the LOGOCV method due to the complications of subsetting a spatial weights matrix in a cross-validation setting.

6.2.4 Meta-Model

The final maltreatment count predictions are generated from a meta-model which combines predictions from the three sub-models. The process to combine the three models is straightforward - the predicted counts from each sub-model are used as input features of a new model fit with the Random Forest algorithm. Often referred to as model ‘stacking’ this technique seeks to average out the variance in the three separate models. To reduce the risk of over-fitting, the stacked meta-model is fit and predicted using the same LOGOCV routine as the sub-models.

6.3 Leave One Group Out Cross Validation

LOGOCV is a technique for assuring that model predictions are generalizable across spatially explicit geographic groups (i.e. neighborhoods). The first step in LOGOCV is to assign each fishnet cell to the geography that encompasses it. From a policy perspective, LOGOCV evaluates whether the maltreatment experience as we’ve modelled it is relevant to varying geographic contexts. From a modeling perspective, this approach helps ensure our models are not overfit. Each geography takes a turn as the hold out, generating individual sub-models and separate estimates of goodness of fit for each explicit group.

Once each cell is designated to a geography the model fitting process can utilize this information to learn from the data of one group and then project that onto a different group. The benefits to this are:

  1. The process can incorporate a stratification, such as neighborhoods, that have endogeneous local characteristics known to those making and implementing policy
  2. The models are making estimations of maltreatment events for geographies that are not part of the model fitting process thereby reducing the risk of over-fitting
  3. The ability to calculate an accuracy and error rate for each group

7. Model Validation

Assessing the accuracy and spatial generalizability of model predictions is crucial when considering how to embed this model in the provision of child welfare services. A variety of approaches are used for model validation. Some of these metrics are statistical in nature, while others measure goodness of fit across space.

7.1 Goodness of fit metrics

Model error is defined simply as the difference between the observed count of maltreatment events and the predicted count for each grid cell. Complicating matters is that the LOGOCV routine generates GLM and Random Forest models for each individual geographic area, while one Spatial Durbin Model is estimated for the study area. This leads to numerous grid cell level predictions. We derive several statistics to summarize and aggregate these errors in order to judge models and compare across them. We describe each below.

7.1.1 Mean Absolute Error

The Mean Absolute Error or MAE, measures the average absolute difference between the observed and predicted values. For example, if the predicted number of events is 5 and the observed number of recorded events is 7, then the MAE is 2. The absolute part describes that the error is the same whether it is positive or negative. The same prediction of 5 events and an observed value of 3 also has a MAE of 2. The MAE returns a single value no matter how many estimates are made. If the predictions for three different fishnet cells is 5, 8, and 10 with corresponding observed values of 3, 9, and 12, then the absolute errors are 2, 1, and 2 leading to a MAE of 1.67.

An example interpretation of MAE is that, ‘on average, the model is off by plus or minus 1.67 events’. MAE is simple to interpret in a policy context, however, it comes with some drawbacks, namely that the direction of the error is unknown and that every error is assumed to have the same severity. The MAE assumes that an error between a predicted count of 5 and an observed count of 7 events should be considered the same as a prediction of 23 and an observed value of 25 events. This metric is used here due to its obvious interpretation and common usage in the assessment of predictive models.

7.1.2 Logarithmic Score

The second goodness of fit metric used in this study is called the Logarithmic Score. This metric is not as straightforward as MAE, but it has qualities that make is well-suited to count-based predictions. The intuition of the Logarithmic Score is as follows: ‘What is the likelihood of the observed count given the predicted count.’ More descriptively stated: if the model predicts 10 events and the observed count is 7 events, then what is the probability of observing those 7 events if the prediction of 10 is indeed the correct number. In this way, the Logarithmic Score measures the deviance between the predicted and observed counts. Specifically, this is measured by calculating the probability density of the observed value from a Poisson distribution centered on the predicted value. The goodness of fit measures report the negative log of the probability density so that the value should always be minimized. When presented in the results section, the Logarithmic Score is converted back to a probability and aggregated. The result is a score that is interpreted as the ‘average likelihood that the observed counts are true given the predicted maltreatment counts.’ Closer to one means a higher relative likelihood, 0.5 equates to maximum uncertainty, and a value near zero signifies very little likelihood.

Figure 2 gives an example of how the log density and probability density measurements of the Logarithmic Score relate. In the plot, the two measurements are calculated for the same scenario in which the predicted value of maltreatment events is 50 (at the red vertical line) and the observed number of events is represented by each of the black dots from zero to 100. On the left hand side of each is the case where the recorded number of events is zero or close to zero. At this location the log density is very high relative to the baseline of zero and the probability density is near zero. In the middle of each chart where the estimated and expected values are close to the same, the log density is near zero and the probability is at its maximum. As the observed values move to the right away from the estimate, the log density again goes up and the probability goes down.

The important takeaway from Figure 2 is that when measured as a log density, the best estimate is always the one closest to zero which makes it very easy to compare models. However, the scale of the log density is not very intuitive. On the other hand, measuring on the probability side can lead to difficulties in comparing models if they are fit to different sets of data. However, it is on an intuitive scale of probability between zero and one. Given that all of the models here are fit to the same dataset, the probability density measurement is selected to represent the Logarithmic Score.

Figure 2

Figure 2

7.2 Accuracy and generalizability tradeoff

The purpose of the LOGOCV and the goodness of fit metrics is to assess model errors on average and across space. A model that perfectly predicts the observed event counts for each fishnet grid cell would be very accurate, but would not generalize well to other cells because exposure changes across the city. Conversely, a model that predicts the same count of maltreatment events for every cell would generalize well, but not be relevant to the conditions of any one cell. LOGOCV and associated metrics help establish a balance between model accuracy and model generalization. Given the purpose of this study, it is important to create a model that is accurate enough to give confidence, but general enough to be applicable in areas where few cases are documented.

7.3 Comparing meta-model predictions to kernel density

Kernel Density Estimation (KDE) is a common tool used to convert point data into hot spot maps. The ubiquity of KDE as a tool for spatial targeting makes it a useful baseline to which model predictions can be compared. KDE is a simple spatial interpolation technique that calculates ‘predicted risk’ by taking a weighted local density of maltreatment events. No risk/protective factors are used and no measures of statistical significance can be calculated.

To compare between the meta-model predictions and KDE, predictions from both are divided into five risk categories. Held out maltreatment events that were not used to fit the original model are overlaid and the percent of observed maltreatment events that fall into each predicted risk category is calculated.

The comparison is visualized by mapping the five risk categories for the both the KDE and the meta-model with the held out maltreatment events overlaid. A grouped bar plot is also created showing percent of held out maltreatment events in each risk category for KDE and the meta-model. If the meta-model better targets high risk areas, the bar plot should show that the meta-model captures more events in the highest risk category.

8. Align Metrics

This work extends beyond the creation of an algorithm for predicting maltreatment risk, toward a workflow for generating actionable intelligence. The purpose of this phase is to harness the predictions generated above in the context of strategic planning. Specifically, the goal is to understand the extent to which the supply of child welfare services are properly aligned with the demand for child welfare services. To define supply, a host of administrative datasets that either proxy or directly relate to service delivery are employed. To define demand, maltreatment risk predictions are used.

Aside from direct child welfare provisions like social services, other important service delivery types include community resources such as churches and recreation centers. The purpose of the Align phase in part, is to identify resources that are optimally located relative to maltreatment risk. It is at these locations where stakeholders may wish to deploy education, outreach, and treatment programs. The reader should consider the analysis outlined below to be a framework that can be adopted and replicated by both public and non-profit stakeholders. All one needs is a digital map file (i.e. shapefile) of risk predictions and the various supply-side oriented datasets. We developed analytics that:

  1. Look at the number of people living in areas of high maltreatment risk
  2. Explore the relationship between poverty rate and predicted maltreatment risk
  3. Examining the spatial relationship between current program services and predicted maltreatment risk
  4. Assign risk scores to protective land uses (churches, rec centers etc.) to get a better sense of existing resources in the community
  5. Perform a gap analysis to understand where and to what extent the supply of child welfare services co-locate with the demand for those services

8.1 Risk population totals

In Section 4.2, the weighted population per grid cell was calculated. This population estimate is used to determine the number of people living in areas of high maltreatment risk. The population per grid cell is aggregated for each risk category and the percent is calculated by dividing the sum of the population per risk category by the total population for the study area. The percent of population by risk category is then plotted in a bar plot to illustrate how the population is distributed across risk categories.

8.2 Exploring the relationship between poverty rate and predicted maltreatment risk

The weighted poverty rate is calculated in a similar fashion to the weighted population described in Section 4.2. The fishnet grid is overlaid on 2010 census tract totals for the number of people whose income levels were below the poverty line in the last 12 months. The portion of those with an income below the poverty line that corresponds to the size of the grid cells is attributed to each cell using a spatial intersection. The result is that each grid cell is assigned a number of people in poverty consistent with the density of people in poverty within the census tracts that it intersects. The number of people in poverty in each cell is then divided by the weighted population in each cell to calculate the weighted poverty rate.

The rate is mapped for the study area and a scatter plot showing the relationship between predicted count of maltreatment events and weighted poverty rate is created.

8.3 The relationship between current program services and predicted maltreatment risk

There are a variety of intervention programs, such as home visits by social workers, and metrics related to child maltreatment, like removals of a child from the home, that can be evaluated to determine how current services relate to predicted maltreatment risk.

Geocoded locations of where these events occur are used to assess this relationship. Events that have incomplete geometries are filtered out and the st_union function in the sf package is used to select those events that intersect the study area. The number of events in each grid cell is counted and aggregated to the risk category level. The percent of events per risk category is then calculated and plotted as a bar plot.

8.4 Assigning risk scores to protective land uses

Protective land uses are locations of existing resources that can aid the community in combating child maltreatment, such as churches and community centers. Stakeholders may wish to deploy additional education, treatment, or prevention resources in these places. Similar to the metric described above in Section 8.3, geocoded locations of these resources are used.

The locations with incomplete geometries are filtered out of the dataset and only those that fall within the study area boundary are selected. A quarter mile buffer around each protective land use is created using st_buffer from the sf package and the mean predicted count of maltreatment events is calculated for each event.

Mapping these buffers show whether the protective land uses are located in areas predicted as having high maltreatment. If the average predicted count of maltreatment is low, it is likely that these resources are misaligned with populations at risk.

8.5 Gap analysis

The gap analysis brings the above metrics together in one spatial analytic that compares relative maltreatment risk to the relative supply of protective resources by neighborhood.

To define relative risk, a density metric defined as the sum of the highest risk category grid cells in a neighborhood divided by the total number grid cells in that neighborhood is created.

To define the relative supply of protective resources, the density of protective resources from above is taken and divided by the total area of the neighborhood. Both metrics are scaled into percentiles to run from 1 to 100, where 100 equates to a relatively high density.

The gap is calculated using the following formula:

gap = RelativeProtectiveSupply - RelativeRisk 

For example, a neighborhood with a protective score of 100 (lots of protective resources) and a risk score of 10 (relatively little relative risk), would have a gap score of 90, suggesting more resources are present than needed.

Conversely, a neighborhood with a protective score of 10 (few protective resources) and a risk score of 90 (relatively high maltreatment risk), would have a gap score of -80, suggesting a tremendous misalignment of resources given the predicted need.

The gap is then mapped using a color scale of red to green. Light red and light green neighborhoods are those where there is a strong alignment between protective resource and risk. Dark green places are those where there are more protective resources than risk, whereas dark red areas are places where there is far more risk than protective resources.

References

  1. Anselin, Luc. ‘Local Indicators of Spatial Association - LISA.’ Geographical Analysis (1995): p.93-115
  2. Bivand, Roger and Gianfranco Piras. ‘Comparing Implementations of Estimation Methods for Spatial Econometrics. Journal of Statistical Software’(2015): vol. 63, no. 18, p.1-36
  3. Brenning, Alexander. ‘Spatial cross-validation and bootstrap for the assessment of prediction rules in remote sensing: The R package sperrorest.’ In 2012 IEEE International Geoscience and Remote Sensing Symposium, July 2012.
  4. Brunsdon, Chris, Stewart Fotheringham, and Martin Charlton. ‘Geographically Weighted Regression - Modelling Spatial Non-Stationarity.’ Journal of the Royal Statistical Society. Series D (The Staistician) (1998): vol. 47, no. 3, pp.431-443
  5. Caplan, Joel M., Leslie W. Kennedy, and Joel Miller. ‘Risk Terrain Modeling: Brokering Criminological Theory and GIS Methods for Crime Forecasting’. Justice Quarterly (2011): vol. 28, no. 2
  6. Caplan, Joel M., Leslie W. Kennedy, and Eric L. Piza. ‘Risk Terrain Modeling Diagnostics Utility User Manual.’ Rutgers Center on Public Security, 2013
  7. Congalton, Russell G. ‘A Review of Assessing the Accuracy of Classifications of Remotely Sensed Data.’ Remote Sensing of Environment (1991): vol. 37, p.35-46
  8. Cressie, Noel A.C. ‘Statistics for spatial data.’ 1993
  9. Daley, Dyann, Michael Bachmann, Brittany A. Bachmann, Christian Pedigo, Minh-Thuy Bui, and Jamye Coffman. ‘Risk terrain modeling predicts child maltreatment.’ Child abuse & neglect 62 (2016): 29-38
  10. Dearmon, Jacob and Tony E. Smith. ‘Gaussian Process Regression and Bayesian Model Averaging: An Alternative Approach to Modeling Spatial Phenomena.’ Geographical Analysis (2016): vol. 48, p.82-111
  11. Delignette-Muller, Marie Laure and Christophe Dutang. ‘fitdistrplus: An R Package for Fitting Distributions. Journal of Statistical Software’ (2015): vol. 64 no. 4, p.1-34
  12. Haining, Robert, Jane Law, and Daniel Griffith. ‘Modelling small area count in the presence of overdispersion and spatial autocorrelation.’ Computational Statistics & Data Analysis (2009): vol. 53, no. 8, p.2923-2937
  13. Harris, Matthew D., Robert G. Kingsley, and Andrew R. Sewell. ‘Archeological Predictive Model Set.’ Pennsylvania Department of Transportation, March 2015
  14. Hastie, Trevor, Robert Tibshirani, and Jerome Friedman. ‘The Elements of Statistical Learning: Data Mining, Inference, and Prediction.’ 2008
  15. Hurley, Dan. ‘Can an Algorithm Tell When Kids Are in Danger?’, The New York Times Magazine, Jan. 2, 2018
  16. Knudby, Anders, Alexander Brenning, and Ellsworth LeDrew. ‘New approaches to modelling fish - habitat relationships.’ (2010): vol. 221, no. 3, p.503-511
  17. Kuhn, Max and Kjell Johnson. ‘Applied Predictive Modeling.’ 2013
  18. Leib, Mareike, Bruno Glaser, and Bernd Huwe. ‘Uncertainty in the spatial prediction of soil texture: Comparison of regression tree and Random Forest models.’ Geoderma (2012): vol. 170, p. 70-79
  19. Li, Jin and Andrew D. Heap. ‘A Review of Spatial Interpolation Methods for Environmental Scientists.’ Geoscience Australia, 2008
  20. Oliveira, Sandra, Friderike Oehler, Jesus San-Miguel_Ayanz, Andrea Camia, and Jose M.C. Pereira. ‘Modeling spatial patterns of fire occurrence in Mediterranean Europe using Multiple Regression and Random Forest.’ Forest Ecology and Management (2012): vol. 275, p.117-129
  21. Silverman, B.W. ‘Density Estimation for Statistics and Data Analysis.’ In Monographs on Statistics and Applied Probability, London: Chapman and Hall, 1986
  22. Smith, Tony E. ‘Notebook on Spatial Data Analysis.’ 2016, https://www.seas.upenn.edu/~ese502/#notebook
  23. Steif, Kenneth, Jordan Butz, and Annie Streetment. ‘Predicting Spatial Risk of Opioid Overdoses in Providence, RI.’ May 2018.
  24. Tomlin, Dana C. ‘Geographic Information Systems and Cartographic Modeling.’ 1990
  25. Tukey, John W. ‘Exploratory Data Analysis.’ 1977
  26. Vicente-Serrano, Sergio M. M. Angel Saz-Sanchez, and Jose M. Cuadrat. ‘Comparative analysis of interpolation methods in the middle Ebro Valley (Spain): application to annual precipitation and temperature.’ Climate Research (2003): vol. 24, p.161-180
  27. Wright, Marvin N. and Andreas Ziegler. ‘ranger: A Fast Implementation of Random Forests for High Dimensional Data in C++ and R.’ Journal of Statistical Software (2017): vol. 77 no. 1, p.1-17