Urban Wire How Can Local Governments Use AI to Answer Community Members’ Questions About Zoning and Land-Use Policies?
Judah Axelrod, Ridhi Purohit, Will Curran-Groome
Display Date

A group of planners examining blueprints.

Local zoning and permitting codes are notoriously difficult to navigate. Their complexity can lead to decreases in housing production and increases in costs for developers, adding to housing challenges in the US.

Generative artificial intelligence (which we refer to as AI throughout) can help community members navigate these policies. But without proper guidance, local governments risk deploying tools that provide incomplete, outdated, or misleading information.

Urban research on AI use in local governments found that many have limited capacity for vetting these tools. To fill this gap, we built an AI benchmarking exercise to help local governments understand AI’s ability to answer questions about zoning and permitting laws.

We found that, even with system prompts and customization, AI tools often provided unhelpful answers. So for local governments to help AI answer zoning questions effectively—and better vet AI use cases more broadly—we recommend they take three steps:

  1. Prepare zoning and land-use policy documents so AI tools can more easily navigate and interpret them.
  2. Use system prompts to create guardrails for AI tools while fine-tuning them to avoid overcorrection.
  3. Evaluate tools’ benefits and risks using local knowledge, zoning expertise, and careful reading.

How we tested AI tools’ ability to answer questions about Minneapolis’s zoning code

For our exercise, we generated realistic questions community members might ask an AI tool about their local zoning code, evaluated the effectiveness of AI responses across five criteria, and assessed AI tools’ overall strengths and weaknesses in responding to our prompts.

We focused on planning and zoning in Minneapolis because it has a 467-page zoning document, and our team was familiar with the city’s processes. We then adopted two user personas to engage with the AI tools. The first was a professional developer building a multifamily apartment building, to represent someone skilled at navigating the zoning process. The second was a single-family homeowner interested in a new accessory dwelling unit for their property, to represent a nonexpert. Both personas asked a range of questions designed to test different AI capabilities. For example, our developer asked whether proposed specifications met local zoning requirements, and our homeowner asked how their property was zoned. (See the full list of questions we tested and AI-generated responses on Github.)

We tested a few broad categories of AI tools that range in complexity for local governments and community members, from freely available, consumer-facing tools to cloud platforms that require some technical capability. We used a mix of free, “open weight” models like Mistral and Llama; lightweight, paid-access models like OpenAI’s GPT 5-mini; and ChatGPT 5.1, which had access to more powerful reasoning and web search capabilities.

To enhance the information available to these tools and help them more accurately answer user questions, we used a retrieval augmented generation (RAG) framework. The RAG framework retrieves relevant text from the zoning code based on a user’s question to augment the AI model’s understanding before the model generates a response to a user’s query. This means our test wasn’t just about evaluating the correctness of AI responses. We also wanted to understand the extent to which we can supply the AI model with the information it needs to generate an accurate and useful response.

With assistance from an Urban zoning expert familiar with the Minneapolis area, we evaluated responses manually across five dimensions:

  1. Accuracy: Is the response to the user’s question factually correct?
  2. Relevance: Does the response directly address the user’s question?
  3. Reference to context: Does the RAG framework pull the right sections from the zoning code to answer the question?
  4. Consistency: Does the model give a substantively similar answer when asked the same question multiple times?
  5. Confidence: Does the model’s response reflect uncertainty when there’s no clear answer or it lacks adequate information?

Finally, to place guardrails around our answers, we used a system prompt, which is a set of instructions provided to each AI tool to guide its responses across user questions. The system prompt instructed the models to do two things: to only rely on retrieved zoning code contexts to inform its answers and, importantly, to reply with “I don’t know” when unsure, rather than make an informed guess.

Because of poor information retrieval, AI tools’ answers to zoning questions weren’t always useful

We found that the AI tools often provided unhelpful answers because of the RAG framework’s poor information retrieval. The framework’s ability to supply relevant context to each tool was generally poor, particularly for questions that required bringing together different sections across the zoning code. Even when the framework retrieved the relevant text, it also pulled additional irrelevant text, making the final output hard to parse and of limited accuracy.

For the few questions where the RAG framework identified only the right parts of the zoning code, answers were far more specific, useful, and actionable. Although we iterated on the RAG framework with many custom settings, none seemed to systematically improve retrieval. Even ChatGPT 5.1 with web search—the most advanced AI system we tested—struggled. It could not easily navigate Minneapolis’s dynamic, interactive zoning websites, which are optimized for human readers.

The AI models did succeed on one of the criteria: confidence. Though the tools sometimes misinterpreted or misapplied context and gave incorrect answers, we did not find that they hallucinated, or provided wholly fabricated answers not grounded in the code. Instead, the tools consistently responded with “I don’t know” when uncertain, as we had hoped.

To use AI effectively, local governments should invest in data readiness, guardrails, and evaluation

As AI advances, the technical architecture will evolve with it. Even since this exercise, newer models and agentic approaches have emerged that we’re exploring, and they could yield different results. But regardless of the specific model or approach, local government staff currently involved in AI policy and deployment can take the following steps toward successfully adopting AI tools and approaches that help meet community needs.

Reorganize documents and information to be more “AI ready.” The structure of zoning ordinances are complex. If a local government wants to apply AI to its zoning code, staff must make the text more readable and accessible to AI.

We played with various custom settings in our RAG framework, but we found they made little difference. We uploaded the entire Minneapolis zoning code directly to ChatGPT 5.1, bypassing the RAG entirely, but this information overload did not yield better answers.

Even with more iteration, we’re skeptical that results would have dramatically improved without changes to zoning code documentation itself to make information retrieval easier for AI. Such changes include creating machine-readable text and documents, employing metadata tagging to better organize meaningful sections of documents for easier parsing, and reorganizing tables into structured, symbol-free formats. As we saw, documents need to be both human- and machine-readable to be useful.

Implement well-calibrated guardrails informed by local context. Our system prompt to avoid overconfidence prevented the AI tools from hallucinating. But it also led the tools to sometimes respond with “I don’t know,” instead of providing a response that could have still been useful with the limited information the tool had. We also didn’t incorporate local context about Minneapolis’s planning and zoning in our system prompts, which could have improved the tools’ effectiveness.

The right guardrails for an AI tool carefully weigh risks and benefits and depend on a local governments’ specific goals for using AI. Engaging directly with expected AI users in the community can help local governments calibrate tools to ensure they deliver meaningful results.

Leverage human expertise to evaluate AI’s benefits and limitations. Building and implementing trusted AI tools at scale requires evaluation frameworks that can scale with them, without sacrificing the care and nuance of human review. Many of the user prompts couldn’t have been evaluated without local knowledge, zoning expertise, and careful reading. Local governments should treat claims about AI capabilities with caution unless they’re backed by context-specific evaluations and benchmarking datasets that government staff can see and understand themselves.

These findings can help local governments make zoning information more accessible and, more broadly, empower them to take an informed, responsible approach when using AI to meet other community needs.

Body

Let’s help communities build more secure, hopeful futures.

Today’s complex challenges demand smarter solutions. Urban brings decades of expertise to understanding the forces shaping people’s lives and the systems that support them. With rigorous analysis and hands-on guidance, we help leaders across the country design, test, and scale solutions that build pathways for greater opportunity.

Your support makes this possible.

DONATE

Research and Evidence Technology and Data Housing and Communities
Expertise Artificial Intelligence
Tags Data science Land use and zoning
Related content