SQL AgentsAI SafetyData Security

SQL Agents Are Broken. Here's How to Fix Them

Limits Team15 min read
SQL Agents Are Broken. Here's How to Fix Them

Every company building data analytics products is adding AI-powered SQL generation. Users ask questions in natural language, the agent generates SQL, queries run, insights appear.

It's magic when it works.

But SQL agents are fundamentally broken. Not because the models aren't good enough - they are. The problem is architectural.

Let me show you why every SQL agent in production is a ticking time bomb, and how to fix it.

The Pattern Every Team Discovers

You build a text-to-SQL agent. It works beautifully in demos.

User asks: "What are our top-selling products?"

Agent generates:

SELECT product_name, SUM(quantity) as total_sold
FROM orders
JOIN products ON orders.product_id = products.id
GROUP BY product_name
ORDER BY total_sold DESC
LIMIT 10

Perfect. Clean. Fast. Stakeholders are impressed.

Then you deploy to production.

What Actually Happens in Production

Week 1: User asks the same question. Agent generates the same query. Except this time it runs against your production database with 3 years of data. No date filter. The query takes 45 seconds. Your dashboard times out.

Week 2: Data analyst asks: "Show me customer details for account #12345."

Agent generates:

SELECT * FROM customers WHERE account_id = 12345

Looks fine. Except the analyst is from Company A, and account #12345 belongs to Company B. Your SQL agent just violated tenant isolation and exposed customer data across organizations.

Week 3: Sales VP asks: "What's our revenue this quarter?"

Agent generates:

SELECT SUM(amount) FROM transactions
WHERE type = 'sale'

No date filter. Returns total revenue since the beginning of time. VP makes a presentation to the board with completely wrong numbers. You find out during the board meeting.

Week 4: Someone asks: "How many users signed up today?"

Agent generates:

SELECT COUNT(*) FROM users
WHERE created_at = CURRENT_DATE

Runs fine in your Postgres staging environment. Deploys to production. Your production database is MySQL. CURRENT_DATE syntax is different. Query fails. Agent crashes. Support tickets flood in.

This is not hypothetical. This is every SQL agent deployment.

Why SQL Agents Break

The problem isn't the LLM's SQL generation capability. GPT-4 and Claude can write excellent SQL.

The problem is context.

Missing Context #1: Time Ranges

LLMs don't know what "reasonable" means for your data.

User asks: "Show me sales trends"

Human analyst thinks: "Obviously they mean recent trends, probably last quarter"

LLM thinks: "They said 'trends' which suggests time series data. I'll query all available data to show the complete trend."

Result: Query scans 5 years of data when it should scan 3 months.

Missing Context #2: Data Boundaries

LLMs don't understand your access control model.

User is logged in with tenant_id = 5.

User asks: "Show me all customers"

Human analyst knows: "Filter by my tenant_id"

LLM thinks: "They want all customers in the customers table"

Result: Data leak across tenant boundaries.

Missing Context #3: Schema Evolution

Your schema changes. The LLM's training data doesn't.

Last month: orders table had total column.

This month: Renamed to total_amount for clarity.

LLM still generates: SELECT total FROM orders

Result: Query fails. Agent breaks.

Missing Context #4: Performance Constraints

LLMs don't know your database performance characteristics.

User asks: "Find products with 'shoe' in the name"

LLM generates:

SELECT * FROM products
WHERE LOWER(name) LIKE '%shoe%'

Looks fine. Except:

  • Your products table has 10M rows
  • LOWER() prevents index usage
  • Leading wildcard prevents index usage
  • Query scans entire table
  • Database CPU spikes to 100%
  • Everything else slows down

The Prompt Engineering Trap

Every team tries to fix this with better prompts.

You add to your system prompt:

IMPORTANT: Always include date filters for time-series queries.
Use the last 90 days by default unless the user specifies otherwise.
Always filter by tenant_id from the user's session.
Always use indexed columns in WHERE clauses when possible.
Never use LOWER() or other functions that prevent index usage.

This works. For a while.

Then you discover edge cases:

Edge Case #1: Conflicting Instructions

User asks: "Show me ALL sales from when we started the company"

System prompt says: "Use last 90 days by default"

User explicitly said: "ALL" and "from when we started"

What does the agent do? Sometimes it follows the prompt. Sometimes it follows the user. You can't predict which.

Edge Case #2: Context Window Limits

Your prompt is now 2,000 words of SQL best practices.

User has a 10-message conversation about different queries.

By message 7, your original instructions are effectively invisible. The model prioritizes recent context. Your date filter rule? Forgotten.

Edge Case #3: The Helpful Agent Problem

User: "This query is taking forever. Can you make it faster?"

Agent, being helpful: "I'll remove the date filter to simplify the query"

Your prompt said to include date filters. But the user's immediate request contradicted it. The agent chose helpfulness over rules.

This is not a prompt engineering problem you can solve. It's a fundamental limitation of using prompts for enforcement.

What Teams Build In-House

Sophisticated teams build SQL validators. Post-generation, pre-execution.

function validateSQL(sql, userContext) {
  // Check 1: Must have date filter for large tables
  if (sql.includes('FROM orders') && !sql.includes('WHERE')) {
    return { valid: false, error: 'Orders queries must include date filter' };
  }
  
  // Check 2: Must filter by tenant
  if (!sql.includes(`tenant_id = ${userContext.tenantId}`)) {
    return { valid: false, error: 'Must filter by tenant_id' };
  }
  
  // Check 3: No full table scans
  if (sql.includes('SELECT *') && sql.includes('LIKE')) {
    return { valid: false, error: 'Cannot use LIKE with SELECT *' };
  }
  
  return { valid: true };
}

This works better than prompts. But it has problems:

Problem #1: Brittle

Your validator checks for FROM orders. Agent writes FROM orders o (with alias). Check fails.

Agent uses subquery. Your pattern matching breaks.

Agent uses CTE. Your regex doesn't catch it.

You add more cases. Code becomes unmaintainable.

Problem #2: Doesn't Scale

You have 50 tables. Each needs different rules. Your validator is now 500 lines of if/else.

New table added? Update validator.

Schema changes? Update validator.

New query pattern discovered? Update validator.

This becomes tech debt fast.

Problem #3: Can't Auto-Fix

You can detect bad queries. But can you fix them?

Query is missing date filter. Where do you add it? Before GROUP BY? After WHERE? What if there's no WHERE clause?

Fixing SQL programmatically is harder than validating it.

The Right Solution: Declarative Policies

Stop writing imperative validators. Write declarative policies.

Instead of: "If query contains 'FROM orders' and not 'WHERE', then block"

Write: "Sales queries require date filters. Default to last 90 days."

Let the policy engine figure out how to enforce it.

Example: Date Filter Policy

policy: sql_date_filters

rules:
  # Rule 1: Require date filters for time-series tables
  require_date_filter_for:
    tables: [orders, transactions, events, logs]
    condition: query_type == 'aggregation' OR row_count > 100000
    default_range: '90 days'
    auto_fix: true
    
  # Rule 2: Never add date filters to reference tables  
  forbid_date_filter_for:
    tables: [products, customers, categories]
    reason: "Reference data should not be time-filtered"
    
  # Rule 3: Respect explicit user intent
  when_user_specifies_range:
    action: use_user_range
    validate: user_range <= 2 years

What This Does

Before execution, the policy engine:

  1. Parses the SQL - Understands structure, not just pattern matching
  2. Identifies query type - Aggregation? Lookup? Join?
  3. Checks affected tables - Large tables? Time-series data?
  4. Detects missing filters - Should this have a date filter?
  5. Auto-fixes violations - Adds WHERE date >= CURRENT_DATE - 90 in the right place
  6. Validates user intent - Did user explicitly ask for all-time data?

Result: Bad queries never execute. Good queries run unchanged.

Real-World Example

User asks: "What are our best-selling products?"

Agent generates:

SELECT product_name, SUM(quantity) as total_sold
FROM orders
JOIN products ON orders.product_id = products.id
GROUP BY product_name
ORDER BY total_sold DESC
LIMIT 10

Policy engine detects:

  • Table: orders (time-series data)
  • Query type: Aggregation (SUM, GROUP BY)
  • Missing: Date filter
  • User intent: Recent data (said "are", present tense)

Policy engine fixes:

SELECT product_name, SUM(quantity) as total_sold
FROM orders
JOIN products ON orders.product_id = products.id
WHERE orders.created_at >= CURRENT_DATE - INTERVAL '90 days'
GROUP BY product_name
ORDER BY total_sold DESC
LIMIT 10

What changed:

  • Added date filter (last 90 days)
  • Placed correctly (before GROUP BY)
  • Qualified column name (orders.created_at not just created_at)
  • Maintains user's intent (top sellers)

Execution time:

  • Without fix: 45 seconds (scans 3 years)
  • With fix: 0.3 seconds (scans 90 days)

User gets their answer. Database stays fast. Agent is production-safe.

More Examples

Example 2: Tenant Isolation

User asks: "Show me customer email addresses"

Agent generates:

SELECT email FROM customers

Policy detects:

  • Multi-tenant table
  • No tenant_id filter
  • Potential data leak

Policy fixes:

SELECT email FROM customers
WHERE tenant_id = 'tenant_abc123'

Data leak prevented. Automatic. Guaranteed.

Example 3: Performance Optimization

User asks: "Find products with 'running' in the name"

Agent generates:

SELECT * FROM products
WHERE LOWER(name) LIKE '%running%'

Policy detects:

  • Function on indexed column (prevents index usage)
  • Leading wildcard (prevents index optimization)
  • SELECT * on large table

Policy fixes:

SELECT id, name, price, category FROM products
WHERE name ILIKE '%running%'
LIMIT 100

Changes:

  • Removed LOWER() (use case-insensitive ILIKE)
  • Added LIMIT (prevent massive result sets)
  • Specified columns (avoid SELECT *)

Query runs 100x faster.

Example 4: Schema Compatibility

User asks: "Show today's revenue"

Agent generates:

SELECT SUM(amount) FROM transactions
WHERE date = CURRENT_DATE

Policy detects:

  • Database type: MySQL
  • Incompatible syntax: CURRENT_DATE (Postgres syntax)
  • Correct syntax for MySQL: CURDATE()

Policy fixes:

SELECT SUM(amount) FROM transactions
WHERE date = CURDATE()

Cross-database compatibility handled automatically.

How to Implement This

You have two options:

Option 1: Build It Yourself

If you're building a data product and SQL safety is core to your value proposition, build the policy engine in-house.

What you need:

  1. SQL parser - Not regex. Real AST parsing. Use sqlparse (Python) or node-sql-parser (JavaScript)
  2. Policy DSL - YAML or JSON format for declaring rules
  3. Query analyzer - Detect query patterns, table access, missing filters
  4. Auto-fix engine - Modify AST and regenerate SQL
  5. Validation layer - Sits between agent and database

Time to build: 2-3 months for basic version. 6-12 months for production-ready.

Worth it if SQL agents are your core product.

Option 2: Use Limits

If SQL agents are part of your product but not the core differentiator, use infrastructure.

import { Limits } from '@limits/sdk';
import { createAgent } from 'your-agent-framework';

const limits = new Limits({ 
  apiKey: process.env.LIMITS_API_KEY 
});

const sqlAgent = createAgent({
  tools: limits.wrapTools([executeSQLTool], {
    policies: {
      'execute_sql': ['sql_date_filter_policy', 'tenant_isolation_policy']
    },
    autoFix: true
  })
});

// Agent now enforces SQL policies automatically

Setup time: 5 minutes

Ongoing maintenance: Zero (we handle schema evolution, database compatibility, new query patterns)

The Policy You Should Start With

Don't try to cover every case on day one. Start with the policy that prevents your most expensive failure.

For most teams, it's this:

policy: prevent_expensive_queries

rules:
  # Rule 1: Large tables must have date filters
  require_date_filter:
    tables: [orders, events, logs, transactions]
    default_range: '90 days'
    auto_fix: true
    
  # Rule 2: Limit result set sizes
  max_rows:
    default: 10000
    action: add_limit_clause
    
  # Rule 3: Prevent functions that break indexes
  forbid_index_breaking_functions:
    functions: [LOWER, UPPER, SUBSTRING]
    suggest_alternatives:
      LOWER: "Use case-insensitive collation or ILIKE"
      SUBSTRING: "Filter after retrieval"

This single policy prevents:

  • Slow queries that time out
  • Queries that spike database CPU
  • Result sets that crash your application
  • Indexes rendered useless by function calls

Deploy this policy. Measure impact for 1 week. You'll be shocked how many bad queries your agent was generating.

What Good Looks Like

A production-ready SQL agent has:

  • Automatic date filters - Never scans full table history by accident
  • Tenant isolation - Impossible to access other tenants' data
  • Performance guardrails - Queries optimized automatically
  • Schema compatibility - Works across database types
  • Result set limits - Never returns 10M rows by accident
  • Index awareness - Uses indexes effectively
  • Audit trail - Every query logged with user context
  • Auto-fix capability - Bad queries corrected before execution

You can build this yourself. Or you can treat it as infrastructure and focus on your actual product.

The Bottom Line

SQL agents are broken because they generate queries in isolation, without understanding:

  • Your data boundaries
  • Your performance constraints
  • Your access control model
  • Your schema specifics

Prompt engineering can't fix this. It's an architectural problem that needs an infrastructure solution.

The solution is a policy layer that:

  • Validates every generated query
  • Auto-fixes common issues
  • Enforces your rules deterministically
  • Works regardless of what the LLM generates

Build it yourself if SQL is your core product.

Use Limits if you want to ship fast and focus on your actual differentiator.

Either way, stop deploying broken SQL agents to production.


Ready to fix your SQL agent?

Try Limits free: limits.dev

Questions? Email: founders@limits.dev