Every company building data analytics products is adding AI-powered SQL generation. Users ask questions in natural language, the agent generates SQL, queries run, insights appear.
It's magic when it works.
But SQL agents are fundamentally broken. Not because the models aren't good enough - they are. The problem is architectural.
Let me show you why every SQL agent in production is a ticking time bomb, and how to fix it.
The Pattern Every Team Discovers
You build a text-to-SQL agent. It works beautifully in demos.
User asks: "What are our top-selling products?"
Agent generates:
SELECT product_name, SUM(quantity) as total_sold
FROM orders
JOIN products ON orders.product_id = products.id
GROUP BY product_name
ORDER BY total_sold DESC
LIMIT 10
Perfect. Clean. Fast. Stakeholders are impressed.
Then you deploy to production.
What Actually Happens in Production
Week 1: User asks the same question. Agent generates the same query. Except this time it runs against your production database with 3 years of data. No date filter. The query takes 45 seconds. Your dashboard times out.
Week 2: Data analyst asks: "Show me customer details for account #12345."
Agent generates:
SELECT * FROM customers WHERE account_id = 12345
Looks fine. Except the analyst is from Company A, and account #12345 belongs to Company B. Your SQL agent just violated tenant isolation and exposed customer data across organizations.
Week 3: Sales VP asks: "What's our revenue this quarter?"
Agent generates:
SELECT SUM(amount) FROM transactions
WHERE type = 'sale'
No date filter. Returns total revenue since the beginning of time. VP makes a presentation to the board with completely wrong numbers. You find out during the board meeting.
Week 4: Someone asks: "How many users signed up today?"
Agent generates:
SELECT COUNT(*) FROM users
WHERE created_at = CURRENT_DATE
Runs fine in your Postgres staging environment. Deploys to production. Your production database is MySQL. CURRENT_DATE syntax is different. Query fails. Agent crashes. Support tickets flood in.
This is not hypothetical. This is every SQL agent deployment.
Why SQL Agents Break
The problem isn't the LLM's SQL generation capability. GPT-4 and Claude can write excellent SQL.
The problem is context.
Missing Context #1: Time Ranges
LLMs don't know what "reasonable" means for your data.
User asks: "Show me sales trends"
Human analyst thinks: "Obviously they mean recent trends, probably last quarter"
LLM thinks: "They said 'trends' which suggests time series data. I'll query all available data to show the complete trend."
Result: Query scans 5 years of data when it should scan 3 months.
Missing Context #2: Data Boundaries
LLMs don't understand your access control model.
User is logged in with tenant_id = 5.
User asks: "Show me all customers"
Human analyst knows: "Filter by my tenant_id"
LLM thinks: "They want all customers in the customers table"
Result: Data leak across tenant boundaries.
Missing Context #3: Schema Evolution
Your schema changes. The LLM's training data doesn't.
Last month: orders table had total column.
This month: Renamed to total_amount for clarity.
LLM still generates: SELECT total FROM orders
Result: Query fails. Agent breaks.
Missing Context #4: Performance Constraints
LLMs don't know your database performance characteristics.
User asks: "Find products with 'shoe' in the name"
LLM generates:
SELECT * FROM products
WHERE LOWER(name) LIKE '%shoe%'
Looks fine. Except:
- Your products table has 10M rows
LOWER()prevents index usage- Leading wildcard prevents index usage
- Query scans entire table
- Database CPU spikes to 100%
- Everything else slows down
The Prompt Engineering Trap
Every team tries to fix this with better prompts.
You add to your system prompt:
IMPORTANT: Always include date filters for time-series queries.
Use the last 90 days by default unless the user specifies otherwise.
Always filter by tenant_id from the user's session.
Always use indexed columns in WHERE clauses when possible.
Never use LOWER() or other functions that prevent index usage.
This works. For a while.
Then you discover edge cases:
Edge Case #1: Conflicting Instructions
User asks: "Show me ALL sales from when we started the company"
System prompt says: "Use last 90 days by default"
User explicitly said: "ALL" and "from when we started"
What does the agent do? Sometimes it follows the prompt. Sometimes it follows the user. You can't predict which.
Edge Case #2: Context Window Limits
Your prompt is now 2,000 words of SQL best practices.
User has a 10-message conversation about different queries.
By message 7, your original instructions are effectively invisible. The model prioritizes recent context. Your date filter rule? Forgotten.
Edge Case #3: The Helpful Agent Problem
User: "This query is taking forever. Can you make it faster?"
Agent, being helpful: "I'll remove the date filter to simplify the query"
Your prompt said to include date filters. But the user's immediate request contradicted it. The agent chose helpfulness over rules.
This is not a prompt engineering problem you can solve. It's a fundamental limitation of using prompts for enforcement.
What Teams Build In-House
Sophisticated teams build SQL validators. Post-generation, pre-execution.
function validateSQL(sql, userContext) {
// Check 1: Must have date filter for large tables
if (sql.includes('FROM orders') && !sql.includes('WHERE')) {
return { valid: false, error: 'Orders queries must include date filter' };
}
// Check 2: Must filter by tenant
if (!sql.includes(`tenant_id = ${userContext.tenantId}`)) {
return { valid: false, error: 'Must filter by tenant_id' };
}
// Check 3: No full table scans
if (sql.includes('SELECT *') && sql.includes('LIKE')) {
return { valid: false, error: 'Cannot use LIKE with SELECT *' };
}
return { valid: true };
}
This works better than prompts. But it has problems:
Problem #1: Brittle
Your validator checks for FROM orders. Agent writes FROM orders o (with alias). Check fails.
Agent uses subquery. Your pattern matching breaks.
Agent uses CTE. Your regex doesn't catch it.
You add more cases. Code becomes unmaintainable.
Problem #2: Doesn't Scale
You have 50 tables. Each needs different rules. Your validator is now 500 lines of if/else.
New table added? Update validator.
Schema changes? Update validator.
New query pattern discovered? Update validator.
This becomes tech debt fast.
Problem #3: Can't Auto-Fix
You can detect bad queries. But can you fix them?
Query is missing date filter. Where do you add it? Before GROUP BY? After WHERE? What if there's no WHERE clause?
Fixing SQL programmatically is harder than validating it.
The Right Solution: Declarative Policies
Stop writing imperative validators. Write declarative policies.
Instead of: "If query contains 'FROM orders' and not 'WHERE', then block"
Write: "Sales queries require date filters. Default to last 90 days."
Let the policy engine figure out how to enforce it.
Example: Date Filter Policy
policy: sql_date_filters
rules:
# Rule 1: Require date filters for time-series tables
require_date_filter_for:
tables: [orders, transactions, events, logs]
condition: query_type == 'aggregation' OR row_count > 100000
default_range: '90 days'
auto_fix: true
# Rule 2: Never add date filters to reference tables
forbid_date_filter_for:
tables: [products, customers, categories]
reason: "Reference data should not be time-filtered"
# Rule 3: Respect explicit user intent
when_user_specifies_range:
action: use_user_range
validate: user_range <= 2 years
What This Does
Before execution, the policy engine:
- Parses the SQL - Understands structure, not just pattern matching
- Identifies query type - Aggregation? Lookup? Join?
- Checks affected tables - Large tables? Time-series data?
- Detects missing filters - Should this have a date filter?
- Auto-fixes violations - Adds
WHERE date >= CURRENT_DATE - 90in the right place - Validates user intent - Did user explicitly ask for all-time data?
Result: Bad queries never execute. Good queries run unchanged.
Real-World Example
User asks: "What are our best-selling products?"
Agent generates:
SELECT product_name, SUM(quantity) as total_sold
FROM orders
JOIN products ON orders.product_id = products.id
GROUP BY product_name
ORDER BY total_sold DESC
LIMIT 10
Policy engine detects:
- Table:
orders(time-series data) - Query type: Aggregation (SUM, GROUP BY)
- Missing: Date filter
- User intent: Recent data (said "are", present tense)
Policy engine fixes:
SELECT product_name, SUM(quantity) as total_sold
FROM orders
JOIN products ON orders.product_id = products.id
WHERE orders.created_at >= CURRENT_DATE - INTERVAL '90 days'
GROUP BY product_name
ORDER BY total_sold DESC
LIMIT 10
What changed:
- Added date filter (last 90 days)
- Placed correctly (before GROUP BY)
- Qualified column name (
orders.created_atnot justcreated_at) - Maintains user's intent (top sellers)
Execution time:
- Without fix: 45 seconds (scans 3 years)
- With fix: 0.3 seconds (scans 90 days)
User gets their answer. Database stays fast. Agent is production-safe.
More Examples
Example 2: Tenant Isolation
User asks: "Show me customer email addresses"
Agent generates:
SELECT email FROM customers
Policy detects:
- Multi-tenant table
- No tenant_id filter
- Potential data leak
Policy fixes:
SELECT email FROM customers
WHERE tenant_id = 'tenant_abc123'
Data leak prevented. Automatic. Guaranteed.
Example 3: Performance Optimization
User asks: "Find products with 'running' in the name"
Agent generates:
SELECT * FROM products
WHERE LOWER(name) LIKE '%running%'
Policy detects:
- Function on indexed column (prevents index usage)
- Leading wildcard (prevents index optimization)
- SELECT * on large table
Policy fixes:
SELECT id, name, price, category FROM products
WHERE name ILIKE '%running%'
LIMIT 100
Changes:
- Removed
LOWER()(use case-insensitiveILIKE) - Added
LIMIT(prevent massive result sets) - Specified columns (avoid SELECT *)
Query runs 100x faster.
Example 4: Schema Compatibility
User asks: "Show today's revenue"
Agent generates:
SELECT SUM(amount) FROM transactions
WHERE date = CURRENT_DATE
Policy detects:
- Database type: MySQL
- Incompatible syntax:
CURRENT_DATE(Postgres syntax) - Correct syntax for MySQL:
CURDATE()
Policy fixes:
SELECT SUM(amount) FROM transactions
WHERE date = CURDATE()
Cross-database compatibility handled automatically.
How to Implement This
You have two options:
Option 1: Build It Yourself
If you're building a data product and SQL safety is core to your value proposition, build the policy engine in-house.
What you need:
- SQL parser - Not regex. Real AST parsing. Use
sqlparse(Python) ornode-sql-parser(JavaScript) - Policy DSL - YAML or JSON format for declaring rules
- Query analyzer - Detect query patterns, table access, missing filters
- Auto-fix engine - Modify AST and regenerate SQL
- Validation layer - Sits between agent and database
Time to build: 2-3 months for basic version. 6-12 months for production-ready.
Worth it if SQL agents are your core product.
Option 2: Use Limits
If SQL agents are part of your product but not the core differentiator, use infrastructure.
import { Limits } from '@limits/sdk';
import { createAgent } from 'your-agent-framework';
const limits = new Limits({
apiKey: process.env.LIMITS_API_KEY
});
const sqlAgent = createAgent({
tools: limits.wrapTools([executeSQLTool], {
policies: {
'execute_sql': ['sql_date_filter_policy', 'tenant_isolation_policy']
},
autoFix: true
})
});
// Agent now enforces SQL policies automatically
Setup time: 5 minutes
Ongoing maintenance: Zero (we handle schema evolution, database compatibility, new query patterns)
The Policy You Should Start With
Don't try to cover every case on day one. Start with the policy that prevents your most expensive failure.
For most teams, it's this:
policy: prevent_expensive_queries
rules:
# Rule 1: Large tables must have date filters
require_date_filter:
tables: [orders, events, logs, transactions]
default_range: '90 days'
auto_fix: true
# Rule 2: Limit result set sizes
max_rows:
default: 10000
action: add_limit_clause
# Rule 3: Prevent functions that break indexes
forbid_index_breaking_functions:
functions: [LOWER, UPPER, SUBSTRING]
suggest_alternatives:
LOWER: "Use case-insensitive collation or ILIKE"
SUBSTRING: "Filter after retrieval"
This single policy prevents:
- Slow queries that time out
- Queries that spike database CPU
- Result sets that crash your application
- Indexes rendered useless by function calls
Deploy this policy. Measure impact for 1 week. You'll be shocked how many bad queries your agent was generating.
What Good Looks Like
A production-ready SQL agent has:
- ✅ Automatic date filters - Never scans full table history by accident
- ✅ Tenant isolation - Impossible to access other tenants' data
- ✅ Performance guardrails - Queries optimized automatically
- ✅ Schema compatibility - Works across database types
- ✅ Result set limits - Never returns 10M rows by accident
- ✅ Index awareness - Uses indexes effectively
- ✅ Audit trail - Every query logged with user context
- ✅ Auto-fix capability - Bad queries corrected before execution
You can build this yourself. Or you can treat it as infrastructure and focus on your actual product.
The Bottom Line
SQL agents are broken because they generate queries in isolation, without understanding:
- Your data boundaries
- Your performance constraints
- Your access control model
- Your schema specifics
Prompt engineering can't fix this. It's an architectural problem that needs an infrastructure solution.
The solution is a policy layer that:
- Validates every generated query
- Auto-fixes common issues
- Enforces your rules deterministically
- Works regardless of what the LLM generates
Build it yourself if SQL is your core product.
Use Limits if you want to ship fast and focus on your actual differentiator.
Either way, stop deploying broken SQL agents to production.
Ready to fix your SQL agent?
Try Limits free: limits.dev
Questions? Email: founders@limits.dev
