Conclusion - Algorithmic Governance

In these last few pages, I would like to conclude by considering how some of the key themes of this book pertain to the future of public-sector data science.

The most critical is the importance of evidenced-based decision-making in government. While the evidenced-based approach is now celebrated far and wide, most of its support comes from the program evaluation community. Evaluation is absolutely critical, and can actually compliment prediction, but at times, some researchers get defensive at the role of machine learning in this movement.

Their beef typically reflects one of two concerns. First, that the data scientist with little domain expertise should not attempt meaningful policy prescriptions from hastily prepared maps or regressions. Second, that machine learning is useless because it is not driven by theory.

To the first point - I too am weary about computer scientists, physicists, and other engineers who are drawn to complex urban systems but may lack domain expertise. There are some use cases where an engineer can have an impact - like counting cars from satellite imagery or forecasting ride-share demand, even. However, when social costs and benefits are present, more public policy context is required.

Having said that, data science is a new and exciting field and many young people are eager to learn how these skills can be used to have impact. If this is you, it is with great enthusiasm that I urge you to experiment, learn, and have fun.

The world is full of gatekeepers - forget them. Be curious, make mistakes and problem solve - just know that most problems are more complex than the solution from your weekend hackathon project. Go talk to a domain expert or a policy maker; learn about the business-as-usual approach; and consider the relevant costs and benefits.

To the second point on machine learning driven not by theory: In 2008, Wired Magazine published, “The End of Theory: The Data Deluge Makes the Scientific Method Obsolete”⁷⁰, claiming:

This is a world where massive amounts of data and applied mathematics replace every other tool that might be brought to bear. Out with every theory of human behavior, from linguistics to sociology. Forget taxonomy, ontology, and psychology. Who knows why people do what they do? The point is they do it, and we can track and measure it with unprecedented fidelity. With enough data, the numbers speak for themselves.

Rightfully, this idea generated tremendous backlash from academics who generate evidence by testing theory. One critique was that machine learning models are based on correlation not causation, and while the latter requires theory, the former does not.

A bad causal model suffers from selection bias and sample bias and is ultimately just as useless as a bad machine learning model. A machine learning model is based on correlation but if it leads to a better cost/benefit and allocates resources without disparate impact, then it may be useful.

This book has taught you how to develop and evaluate algorithms with these outcomes in mind. The theory that we have learned here is not statistical theory, but decision theory. Hence the first sentence of the book which reads, “At its core, this book is about public-sector decision-making.” I hope you agree that machine learning has a role to play in evidence-based policy.

Why has government been slow to adopt data science relative to the private sector? It could be that policymakers fear disparate impact; or that the technology is too intimidating, and the bureaucratic hurdles too great. While these are valid issues, consider that algorithms force agencies to look inward on their own business-as-usual decision-making. Government may be worried that shining this light will reveal programs that are currently ineffective.

An algorithm will not make a bad program effective, but it may improve a good one at the margins. Developing strong programs complimented by useful algorithms requires a comprehensive Planning approach. It requires what I call, ‘Algorithmic Governance’.

In 2017, New York City established a first of its kind, Automated Decision Systems Task Force, to examine how City agencies were using algorithms to allocate resources.⁷¹ By the Spring of 2019, the Task Force began publicly complaining that not a single NYC agency had to date, been transparent about their algorithms.⁷²

In the Fall of 2020, the Task Force issued a watered down final report with half of its 36 pages dedicated to member bios, thank-yous and other info besides policy recommendations.⁷³ The effort was largely a failure, and it took the George Floyd protests to force meaningful reform, for at least one agency. The POST act, now requires the New York City Police Department (NYPD) to publish surveillance technology usage policies and give communities and City Council the opportunity for comment.⁷⁴

What incentives might NYPD have for not being transparent by default, about their algorithms? Of all the reasons listed above, let’s again consider the possibility that the programs informed by those algorithms may not be effective crime deterrents to begin with.

Algorithmic Governance starts with the evaluation of an existing program to understand its efficacy as well as its costs and benefits. Next, communities are engaged on the program; its objectives; its value, as judged by the evaluation; and a proposal for using data and algorithms to make the program more effective. This includes information on whether the algorithm will be created in-house or procured from a private vendor; where the training data comes from; tests for fairness and more.

Community engagement is the foundation of Algorithmic Governance. It is how governments and their citizens agree to ‘community standards’, as discussed at the end of Chapter 5. It gives them the opportunity to understand the value of trading off automation with better government. Without engagement, communities will be fearful of third-party software; data privacy and increased surveillance.

Following the evaluation and engagement phases, an algorithm is created and deployed, and a Randomized Control Trial evaluation compares outcomes for the business-as-usual decision-making approach to those from the algorithm. A second round of community engagement asks stakeholders to judge whether the use of data and algorithms was worth the benefit for both participants and taxpayers.

Algorithmic Governance is based on the idea that algorithmic decision-making is just a new take on the traditional Planning approach governments use to create programs. I do not know of a government anywhere that has engaged in such a comprehensive process around algorithms. It is far easier for an agency to procure a soup-to-nuts private-sector algorithm, even if little is known about the black box or its utility.

The ‘procure first and ask questions later strategy’ is the Mark Zuckerberg approach to Algorithmic Governance. Zuckerberg’s famous mantra, “Move fast and break things”, landed Facebook in hot water, much the way that police departments, early adopters of machine learning, are now on the defensive about their use of algorithmic surveillance.

A more deliberate Planning approach will likely lead to better programs, better outcomes, and more social cohesion among stakeholders. This is not to say that a more transparent approach will deliver political cover to government agencies. Ultimately, it is up to mayors and agency officials to lead - but without transparency, innovation will suffer.

To end on this note, I hope I have succeeded in convincing you that data science and Planning are intertwined. Data science cannot replace more participatory forms of Planning, but it does have an emerging role to play in both Planning education and Planning practice.

Assuming you made it through this book (or my class that uses this book), how can you keep learning? I direct the Master of Urban Spatial Analytics program at the University of Pennsylvania - a graduate program that teaches at the intersection of data science and public policy. A couple years back a prospective student at an open house asked me this: “My professor told me there is no way I could come here and become a data scientist in one year - what do you say to that?”

My response was, “your professor sounds like a great mentor - he or she is absolutely correct.” The data scientist is a tradesperson - a problem is revealed, the right tool is found, and a solution implemented. Becoming a data scientist takes humility, team spirit, legwork and lots of trial and error. Above all, it takes years of experience. Here are some suggestions for furthering your data science skills:

Start a portfolio where you can showcase your work. Assume no one will read a single use case for more than 6-8 minutes so lead with data visualization and focus on the business process.
Start learning and practicing more complicated machine learning algorithms and use cases.
Your first job will likely be as a ‘Data Analyst’, where you will be cleaning data and performing other menial tasks. Remember that if the ball is dropped in the data wrangling phase, then all the fancy analytical work to follow is wasted. Perform these tasks with a smile on your face.
Find yourself a strong mentor who will respectfully shred your work but offer two quick nuggets of wisdom before sending you on your way. You know someone is a good mentor if you leave their office like, “so and so doesn’t know what they’re talking about,” before realizing they were totally correct after you implement the change.

Finally, challenge yourself; stay curious and rely on your creativity to solve problems that you find impactful. I hope you found this book useful.

Anderson, Chris (2008). The End of Theory: The Data Deluge Makes the Scientific Method Obsolete. Wired Magazine. https://www.wired.com/2008/06/pb-theory/↩︎
Kirchner, Lauren (2017). New York City Moves to Create Accountability for Algorithms. ProPublica. https://www.propublica.org/article/new-york-city-moves-to-create-accountability-for-algorithms ↩︎
Lecher, Colin (2019). New York City’s algorithm task force is fracturing. The Verge. https://www.theverge.com/2019/4/15/18309437/new-york-city-accountability-task-force-law-algorithm-transparency-automation ↩︎
Fox Cahn, Albert (2020). The first effort to regulate AI was a spectacular failure. Fast Company.https://www.fastcompany.com/90436012/the-first-effort-to-regulate-ai-was-a-spectacular-failure↩︎
Sheard, Nathan (2020). Victory! New York City Council Passes the POST Act. Electronic Frontier Foundation. https://www.eff.org/deeplinks/2020/06/victory-new-yorks-city-council-passes-post-act)↩︎