Building AI Agents with Claude Code
Introduction
Imagine you’re reviewing a pull request with dozens of SQL files, each containing complex queries for your data pipeline. You spot inconsistent formatting, or syntax which doesn’t work with your infrastructure. Sound familiar?
It’s common for data professionals to struggle with maintaining consistent SQL standards across their projects, especially when working with specialized platforms and it can be time consuming to review these elements within a peer review. It would be better use of time to focus on the hard thinking elements, like logic etc. However these small syntax or style issues, can be distracting. Well at least they are for me.
That’s where Claude Code comes in. Claude code enables custom agents - we can then create agents that automatically validate your SQL code, enforce style guidelines, and catch platform-specific issues before they reach production. There are tools out there like SQLRuff and others that will likely do a better job that a ai agent. However the purpose of this is to have a basic understanding, but these agents could be applied to “Testing Agent” or “Data Modelling Agent” and far more.
Here’s our objectives:
- Install and configure Claude Code for your development environment
- Create a custom SQL style guide agent that enforces dbt Labs standards
- Build a Starburst/Delta Lake syntax validator
- Set up automated validation workflows in VSCode
- Test your agents with real-world SQL examples
Prerequisites
Before starting this tutorial, ensure you have the following tools installed and configured:
Required Tools
Claude Code: Latest version (1.0.112+)
- This tutorial requires Claude Code with the new hooks format support
- Verify installation:
claude --version
SQLRuff: For SQL linting and formatting
# Install SQLRuff
pip install sqruff
# Verify installation
sqruff --version
Git: For version control examples
- Most systems have this pre-installed
- Verify:
git --version
Optional Tools
Node.js: Only needed if using npm installation method
# Verify if installed
node --version
npm --version
VSCode: Recommended editor for best integration
- Download from: https://code.visualstudio.com/
- Ensure
code
command is available in PATH
System Requirements
- Operating System: Windows 10+, macOS 10.15+, or Linux
- Memory: 4GB RAM minimum (8GB recommended)
- Storage: 500MB free space for tools and project files
- Network: Internet connection for initial setup and API calls
Installation Verification
Run these commands to verify your setup:
# Verify all required tools
claude --version # Should show 1.0.112 or higher
sqruff --version # Should show SQLRuff version
git --version # Should show Git version
# Optional verifications
node --version # If using npm method
code --version # If using VSCode
If any commands fail, install the missing tools before proceeding with the tutorial.
How to Install Claude Code: Setup Guide for Data Engineers
Before we dive into building SQL agents, let’s get Claude Code properly installed and configured. The setup process has evolved significantly in early 2025, with multiple installation options depending on your environment.
Claude Code Installation Methods
The recommended approach is using the native binary installation, which provides the most stable experience:
# macOS/Linux/WSL
curl -fsSL https://claude.ai/install.sh | bash
# Windows PowerShell
irm https://claude.ai/install.ps1 | iex
# Verify installation
claude --version
claude doctor
If you prefer npm-based installation or need it for CI/CD environments:
# Global installation
npm install -g @anthropic-ai/claude-code
# Navigate to your project
cd your-sql-project
claude
Claude Code Authentication Setup
You have two main options for authentication. For production environments, I recommend using the Anthropic API key approach:
# Set environment variable
export ANTHROPIC_API_KEY="your-api-key-here"
# Add to shell profile for persistence
echo 'export ANTHROPIC_API_KEY="your-api-key-here"' >> ~/.bashrc
source ~/.bashrc
# Test connection
claude -p "Test API connection"
Alternatively, if you have a Claude Pro ($20/month) or Max ($200/month) subscription, you can use OAuth authentication during the initial setup process.
SQL Agent Project Structure Setup
Once Claude Code is installed, you’ll want to organize your project with a structure that supports SQL validation. Here’s the recommended layout:
The CLAUDE.md
file serves as your project’s main configuration. Here’s a template specifically designed for SQL validation projects:
# SQL Validation Agent Project
## Project Overview
This project uses Claude Code to validate SQL code according to:
- dbt Labs SQL Style Guide
- Starburst/Trino syntax requirements
- Delta Lake connector best practices
## Key Commands
- Validate SQL: `/sql-validate filename.sql`
- Style check: Use sql-style-guide agent
- Syntax check: Use starburst-validator agent
## Coding Standards
- Use SQLRuff for automated linting
- Follow dbt naming conventions
- 4-space indentation with trailing commas
- Lowercase for field names, keywords, and functions
- Lines should not exceed 80 characters
## Database Context
- Primary engine: Starburst/Trino
- Data lake: Delta Lake format
- Metastore: AWS Glue / Hive Metastore
Essential Claude Code Configuration
Create a .claude/settings.json
file to configure permissions and automation hooks:
{
"permissions": {
"allow": ["Read", "Edit", "Bash"],
"deny": ["Read(./.env)", "Read(./.env.*)", "Read(./secrets/**)"]
},
"hooks": {
"PostToolUse": [{
"matcher": {
"tools": ["EditTool"]
},
"hooks": [{
"type": "command",
"command": "sqruff lint \"$CLAUDE_FILE_PATHS\""
}]
}]
}
}
This configuration automatically runs SQLRuff linting whenever you edit SQL files, providing immediate feedback on style and syntax issues.
Creating Your First SQL Agent: Style Guide Validator
Now comes the exciting part - creating your first custom SQL agent. We’ll start with a style guide validator that enforces dbt Labs standards and general SQL best practices.
SQL Agent Structure and Configuration
Create a new file at .claude/agents/sql-style-guide.md
with the following content:
---
name: sql-style-guide
description: Use this agent for SQL code style validation, formatting checks, and dbt best practice enforcement
tools: Read, Write, Bash
color: blue
---
You are a SQL style guide specialist focusing on dbt Labs standards and general SQL best practices. When analyzing SQL code, you:
## Core Responsibilities
- Validate dbt naming conventions (staging: `stg_[source]__[entity]s.sql`)
- Enforce 4-space indentation with trailing commas
- Check lowercase usage for field names, keywords, and functions
- Validate 80-character line limits
- Ensure proper CTE organization and naming
- Verify explicit JOIN types and column aliasing
## Validation Checklist
- [ ] Field names use snake_case
- [ ] Reserved keywords in UPPERCASE
- [ ] Trailing commas in SELECT statements
- [ ] Explicit `AS` keywords for aliases
- [ ] Proper CTE ordering (imports first, then transformations)
- [ ] No SELECT * usage (except in final CTE)
- [ ] GROUP BY uses numbers, not column names
- [ ] Prefer UNION ALL over UNION
## Output Format
Provide findings as:
1. **Style Issues Found**: Bulleted list with line numbers
2. **Suggested Fixes**: Code snippets showing corrections
3. **dbt Best Practices**: Additional recommendations
4. **SQLRuff Command**: Automated fix command if applicable
## Tool Usage
- Use Read to examine SQL files
- Use Bash to run `sqruff lint filename.sql`
- Use Write to create corrected versions when requested
Understanding Claude Code Agent Components
Let’s break down what makes this SQL agent effective:
Frontmatter Configuration: The YAML frontmatter defines the agent’s identity and capabilities. The tools
field specifies which Claude Code tools the agent can access - in this case, Read (for examining files), Write (for creating corrections), and Bash (for running SQLRuff).
Role Definition: The agent has a clear, specific role as a “SQL style guide specialist.” This focused responsibility ensures it doesn’t try to handle tasks outside its expertise.
Validation Checklist: The structured checklist provides consistent, repeatable validation criteria. This ensures every SQL file gets the same thorough review.
Output Format: By defining a specific output structure, we ensure consistent, actionable feedback that developers can easily follow.
Testing Your SQL Style Guide Agent
Let’s create some test files to validate our agent works correctly. First, create a test file with intentional style issues:
-- tests/bad_style.sql
select u.id,u.name,p.company from users u join profiles p on u.id=p.user_id where u.active=true
Now test your agent:
claude "Use the sql-style-guide agent to analyze tests/bad_style.sql"
The agent should identify multiple issues:
- Missing spaces around operators
- Implicit JOIN instead of explicit INNER JOIN
- No trailing commas
- Line length exceeding 80 characters
- Missing proper indentation
Creating Good SQL Examples
It’s equally important to test with properly formatted SQL:
-- tests/good_style.sql
WITH users_with_profiles AS (
SELECT
users.id,
users.name,
users.email,
profiles.company,
profiles.created_at
FROM {{ ref('users') }}
INNER JOIN {{ ref('profiles') }}
ON users.id = profiles.user_id
WHERE users.active = TRUE
AND profiles.verified = TRUE
),
final AS (
SELECT * FROM users_with_profiles
)
SELECT * FROM final
When you run the agent on this file, it should pass validation with minimal or no issues, confirming your agent correctly identifies good practices.
Advanced SQL Validation: Starburst and Delta Lake Agent
While style validation is important, syntax and platform-specific optimization are equally critical. Let’s build a specialized agent that understands Starburst/Trino syntax and Delta Lake connector patterns.
The Starburst Delta Lake Validator Agent
Create .claude/agents/starburst-validator.md
:
---
name: starburst-validator
description: Validate Starburst/Trino syntax and Delta Lake connector usage patterns
tools: Read, Write, Bash
color: green
---
You are a Starburst/Trino syntax specialist with expertise in Delta Lake connector usage. Focus on:
## Core Validation Areas
- Trino-specific SQL syntax and functions
- Delta Lake time travel queries syntax
- Proper catalog/schema/table references
- Starburst Warp Speed indexing opportunities
- Memory usage optimization patterns
- Cross-connector query optimization
## Delta Lake Specific Checks
- Time travel syntax: `FOR VERSION AS OF` vs `FOR TIMESTAMP AS OF`
- System table usage: `"table$history"`, `"table$partitions"`
- OPTIMIZE and VACUUM operations
- Change data feed (CDF) syntax when applicable
- Column mapping and deletion vectors compatibility
## Starburst Features Validation
- Connector-specific syntax (s3, glue, delta-lake)
- Authentication method compatibility
- Query federation patterns across data sources
- Performance optimization suggestions (pushdown predicates)
## Common Anti-patterns to Flag
- SELECT * on large Delta tables
- Missing partition filters on time-partitioned data
- Inappropriate DISTINCT usage
- Cross-connector joins without proper optimization
- Missing time travel version management
## Output Format
Return structured analysis:
1. **Syntax Validation**: Trino compatibility check
2. **Delta Lake Usage**: Connector-specific recommendations
3. **Performance Optimization**: Query improvement suggestions
4. **Security Review**: Access pattern analysis
5. **Next Steps**: Actionable improvements with examples
## Tool Commands
- Use Read for SQL file analysis
- Use Bash for `trino --execute "EXPLAIN (TYPE VALIDATE) query"` when available
- Check system tables with queries like `SELECT * FROM "table$properties"`
Delta Lake Time Travel Query Patterns
This agent specializes in recognizing and validating Delta Lake-specific syntax. Here are some examples it should handle:
-- Good: Proper time travel syntax
SELECT * FROM example.testdb.customer_orders
FOR VERSION AS OF 3;
-- Good: Timestamp-based time travel
SELECT * FROM example.testdb.customer_orders
FOR TIMESTAMP AS OF TIMESTAMP '2022-03-23 09:59:29.803 America/Los_Angeles';
-- Good: System table monitoring
SELECT * FROM "customer_orders$history" ORDER BY version DESC;
SELECT * FROM "customer_orders$partitions";
The agent should flag problematic patterns like:
-- Bad: No partition filtering on large table
SELECT customer_id, order_total
FROM customer_orders
WHERE order_date = '2023-01-01';
-- Better: Partition filtering
SELECT customer_id, order_total
FROM customer_orders
WHERE date_partition = '2023-01-01'
AND customer_id IN (1, 2, 3);
Starburst Performance Optimization Detection
One of the most valuable features of this agent is its ability to suggest performance improvements:
-- The agent should suggest predicate pushdown optimization
SELECT c.name, o.total
FROM customers c
INNER JOIN orders o ON c.id = o.customer_id
WHERE c.region = 'US' -- Filter pushed down
AND o.status = 'completed';
Building Automated SQL Validation Workflows
Individual agents are powerful, but the real magic happens when you orchestrate them into automated workflows. Let’s build a comprehensive validation system that runs automatically as you work.
VSCode Integration for SQL Validation
First, let’s set up VSCode tasks that make it easy to run validation with keyboard shortcuts. Create or update .vscode/tasks.json
:
{
"version": "2.0.0",
"tasks": [
{
"label": "Validate All SQL Files",
"type": "shell",
"command": "claude",
"args": ["-p", "Use sql-style-guide and starburst-validator agents to validate all .sql files in workspace. Generate summary report."],
"group": {
"kind": "build",
"isDefault": true
},
"presentation": {
"echo": true,
"reveal": "always",
"panel": "new"
}
},
{
"label": "SQL Style Check",
"type": "shell",
"command": "claude",
"args": ["-p", "Use sql-style-guide agent to check style of ${file}"],
"group": "test"
}
]
}
Now you can press Ctrl+Shift+P
(or Cmd+Shift+P
on Mac), type “Tasks: Run Task”, and select “Validate All SQL Files” to run comprehensive validation across your entire project.
Automated File Watching with Claude Code
For real-time validation, create .claude/hooks.mjs
to automatically check files as you save them:
export async function postEdit({ filePath, success }) {
if (!success) return;
// Auto-validate SQL files on save
if (filePath.match(/\.sql$/)) {
console.log(`[Auto-validation] Checking ${filePath}`);
// Run style validation
execSync(`claude -p "Use sql-style-guide agent to quickly validate ${filePath}"`);
// Run syntax validation for Starburst files
if (await fileContains(filePath, /starburst|trino|delta/i)) {
execSync(`claude -p "Use starburst-validator agent to check ${filePath}"`);
}
}
// Auto-format with SQLRuff
if (filePath.match(/\.sql$/)) {
try {
execSync(`sqruff fix "${filePath}"`);
} catch (error) {
console.log(`[Warning] SQLRuff formatting failed: ${error.message}`);
}
}
}
async function fileContains(filePath, pattern) {
const content = await fs.readFile(filePath, 'utf8');
return pattern.test(content);
}
This hook automatically runs validation whenever you save a SQL file, providing immediate feedback without interrupting your workflow.
Batch SQL Processing Command
For comprehensive project-wide validation, create .claude/commands/scan-sql-files.md
:
Scan and validate all SQL files in the project:
1. **Discovery Phase**:
- Find all .sql files in project directory
- Categorize by type (queries, migrations, models, tests)
- Identify potential Starburst/Delta Lake usage patterns
2. **Validation Phase**:
- Run sql-style-guide agent on each file
- Run starburst-validator agent on files with Trino syntax
- Collect all findings into structured report
3. **Reporting Phase**:
- Generate summary dashboard showing:
- Total files processed
- Style issues found and severity
- Syntax errors and warnings
- Performance optimization opportunities
- Files requiring immediate attention
4. **Action Items**:
- Prioritized list of fixes needed
- Automated fix commands where possible
- Manual review recommendations
Confirm before starting each phase. Process files in batches of 10 to avoid context overflow.
Use this command with: claude "/scan-sql-files"
Testing SQL Agents: Production-Ready Validation
Building agents is just the beginning. To make them truly valuable for your team, you need a systematic approach to testing and refinement.
Creating a Comprehensive SQL Test Suite
Start by building a comprehensive test suite that covers various scenarios your agents will encounter:
Unix/Linux/macOS:
# Create test directory structure
mkdir -p tests/{good,bad,edge-cases}
# Good examples (should pass validation)
echo "-- Well-formatted dbt model
WITH users_clean AS (
SELECT
id,
LOWER(email) AS email,
created_at
FROM {{ ref('raw_users') }}
WHERE email IS NOT NULL
)
SELECT * FROM users_clean" > tests/good/dbt_model.sql
# Bad examples (should fail validation)
echo "select * from users where email like '%@gmail.com'" > tests/bad/style_issues.sql
Windows PowerShell:
# Create test directory structure
mkdir tests; mkdir tests\good; mkdir tests\bad; mkdir tests\edge-cases
# Good examples (should pass validation)
@"
-- Well-formatted dbt model
WITH users_clean AS (
SELECT
id,
LOWER(email) AS email,
created_at
FROM {{ ref('raw_users') }}
WHERE email IS NOT NULL
)
SELECT * FROM users_clean
"@ | Out-File -FilePath "tests\good\dbt_model.sql" -Encoding UTF8
# Bad examples (should fail validation)
"select * from users where email like '%@gmail.com'" | Out-File -FilePath "tests\bad\style_issues.sql" -Encoding UTF8
Windows Command Prompt:
# Create test directory structure
mkdir tests && mkdir tests\good && mkdir tests\bad && mkdir tests\edge-cases
# Create files manually or use a text editor for complex SQL examples
Edge Case Example (All Platforms):
Create tests/edge-cases/complex_time_travel.sql
with this content:
-- Complex Delta Lake time travel query
SELECT
customer_id,
order_total,
order_date
FROM delta_lake.sales.orders
FOR VERSION AS OF 5
WHERE date_partition >= '2023-01-01'
AND customer_region = 'US'
UNION ALL
SELECT
customer_id,
order_total,
order_date
FROM delta_lake.sales.orders
FOR TIMESTAMP AS OF TIMESTAMP '2023-06-01 00:00:00'
WHERE date_partition >= '2023-06-01'
AND customer_region = 'EU'
Automated SQL Agent Testing Workflow
Create a testing command that systematically validates your agents:
# Test style guide agent
claude "Use the sql-style-guide agent to analyze all files in tests/good/ - these should pass with minimal issues"
claude "Use the sql-style-guide agent to analyze all files in tests/bad/ - these should identify specific problems"
# Test syntax validator
claude "Use the starburst-validator agent to check tests/edge-cases/complex_time_travel.sql for Delta Lake best practices"
Measuring SQL Agent Effectiveness
Track key metrics to understand how well your agents are performing:
- Accuracy: How often do agents correctly identify real issues?
- Coverage: What percentage of actual problems do they catch?
- False Positives: How often do they flag correct code as problematic?
- Actionability: How clear and helpful are their recommendations?
Keep a log of agent performance:
# Agent Performance Log
## sql-style-guide Agent
- **Date**: 2025-01-15
- **Files Tested**: 25
- **Issues Found**: 47
- **False Positives**: 3
- **Accuracy**: 94%
- **Notes**: Occasionally flags dbt macros as style violations
## starburst-validator Agent
- **Date**: 2025-01-15
- **Files Tested**: 15
- **Performance Suggestions**: 12
- **Syntax Errors Caught**: 3
- **Notes**: Excellent at catching missing partition filters
Iterative SQL Agent Improvement Process
Based on your testing results, continuously refine your agents:
- Analyze Feedback: Review false positives and missed issues
- Update Agent Instructions: Refine the validation criteria
- Add New Test Cases: Include edge cases you discover
- Validate Changes: Test updated agents against your full test suite
- Document Learnings: Keep track of what works and what doesn’t
For example, if your style guide agent keeps flagging legitimate dbt macros, you might update its instructions:
## Additional Considerations
- Recognize dbt macros ({{ macro_name() }}) as valid syntax
- Allow longer lines for complex macro calls
- Understand dbt ref() and source() functions
Real-World SQL Agent Implementation
Let’s walk through a complete example of how these agents work together in a real data engineering project.
Scenario: Data Pipeline Code Review with Claude Code
Imagine you’re working on a data pipeline that processes customer orders using Starburst and Delta Lake. You’ve just received a pull request with several SQL files that need review.
Step 1: Initial SQL Validation
# Run comprehensive validation
claude "Use both sql-style-guide and starburst-validator agents to review all .sql files in the current directory. Provide a summary of issues found and prioritize them by severity."
Step 2: Multi-Agent Coordination
The agents work together to provide comprehensive feedback:
- Style Guide Agent identifies formatting inconsistencies, naming convention violations, and dbt best practice issues
- Starburst Validator catches performance problems, syntax errors, and Delta Lake optimization opportunities
Step 3: Automated SQL Fixes
For issues that can be automatically resolved:
# Fix style issues with SQLRuff
sqruff fix *.sql
# Apply agent suggestions
claude "Use the sql-style-guide agent to create corrected versions of files with style issues"
Step 4: Manual SQL Review
For complex issues requiring human judgment:
# Get detailed analysis
claude "Use the starburst-validator agent to provide detailed performance optimization recommendations for customer_orders_analysis.sql, including specific code examples"
Integration with Existing Data Engineering Tools
Your Claude Code agents work seamlessly with your existing data engineering stack:
dbt Integration:
# Configure SQLRuff for dbt + Trino
# SQLRuff is specifically designed for SQL linting
echo "[tool.sqruff]
dialect = \"trino\"
templater = \"dbt\"" > pyproject.toml
# Validate dbt models
claude "Use sql-style-guide agent to validate all models in models/ directory for dbt best practices"
Git Hooks for SQL Validation:
# Pre-commit validation
echo "#!/bin/bash
files=\$(git diff --cached --name-only --diff-filter=ACM | grep '\\.sql\$')
if [ -n \"\$files\" ]; then
claude \"Use sql-style-guide and starburst-validator agents to validate staged SQL files: \$files\"
fi" > .git/hooks/pre-commit
chmod +x .git/hooks/pre-commit
CI/CD Pipeline Integration:
# GitHub Actions example
- name: Validate SQL Files
run: |
claude "Use sql-style-guide and starburst-validator agents to validate all SQL files. Exit with error code if critical issues found."
Troubleshooting Claude Code SQL Agents
Even with well-designed agents, you’ll encounter challenges. Here are the most common issues and how to resolve them.
Common Claude Code Installation Problems
VSCode Extension Not Loading:
- Ensure the
code
command is in your PATH - Run
claude
from VSCode’s integrated terminal - On Windows, check for conflicts between WSL and Windows Node.js installations
Permission Errors:
- Configure auto-approval in
.claude/settings.json
for trusted operations - Use
--dangerously-skip-permissions
flag for automated scripts (with caution) - In production, always use manual approval for security
Context Window Overflow:
- Use
/clear
command frequently to reset context - Process large projects in batches
- Monitor the context indicator in Claude Code’s status bar
SQL Agent Development Challenges
Agents Not Being Invoked:
- Make agent descriptions action-oriented and specific
- Use clear, unambiguous language in the description field
- Test agent invocation with simple commands first
Tool Access Issues:
- Explicitly configure tool permissions in agent frontmatter
- Verify the agent has access to required tools (Read, Write, Bash)
- Check
.claude/settings.json
for permission conflicts
Inconsistent Results:
- Provide detailed, specific instructions in agent definitions
- Use structured output formats to ensure consistency
- Include examples of good and bad patterns in agent instructions
SQL Validation Performance Optimization
Slow Validation:
- Break large queries into smaller validation chunks
- Use lightweight validation for real-time feedback
- Schedule comprehensive validation for batch processing
Memory Usage:
- Clear context regularly during large validation runs
- Process files in smaller batches
- Use the Task tool for clean delegation between agents
Team Collaboration with SQL Agents
Maintaining Consistency:
- Version control all agent definitions and settings
- Document agent capabilities and limitations
- Establish clear guidelines for when to use each agent
Training and Adoption:
- Create simple examples and tutorials for team members
- Start with basic validation and gradually add complexity
- Collect feedback and iterate based on real usage patterns
Conclusion: Mastering SQL Validation with Claude Code
You’ve now built a comprehensive SQL validation system using Claude Code that can transform how your team approaches code quality. Your custom agents provide automated style checking, syntax validation, and performance optimization suggestions that would typically require hours of manual review.
The key benefits you’ve implemented include:
- Consistent SQL Code Quality: Automated enforcement of dbt Labs standards and SQL best practices
- Platform-Specific Validation: Specialized checking for Starburst/Trino and Delta Lake patterns
- Automated Workflows: Real-time validation that integrates seamlessly with your development process
- Scalable Review Process: Batch validation capabilities for large codebases
Remember that building effective SQL agents is an iterative process. Start with the basic implementations provided in this tutorial, then refine them based on your team’s specific needs and the patterns you encounter in your SQL code.
Your next steps should be:
- Deploy these agents in a development environment and test them with your actual SQL files
- Gather feedback from your team and refine the validation criteria
- Integrate the agents into your CI/CD pipeline for automated code review
- Explore additional agents for other aspects of your data engineering workflow
The foundation you’ve built here can be extended to enforce enterprise patterns, or processes - so you might have DBT specific macros that are used for transformation standards, or specific metadata columns that you use on all tables. You might find that there are common performance issues with code, so showing techniques to break automatically make it more performant. Enforce Partition usage in code. Automate or suggest documentation for all the models you create. The possibilities are endless when you have AI-powered agents working alongside your development process, but as always use with caution. Tools like this are good to save time, but still need validation, review of what it does. Don’t blindly use what comes out.