Building AI Agents with Claude Code

Introduction

Imagine you’re reviewing a pull request with dozens of SQL files, each containing complex queries for your data pipeline. You spot inconsistent formatting, or syntax which doesn’t work with your infrastructure. Sound familiar?

It’s common for data professionals to struggle with maintaining consistent SQL standards across their projects, especially when working with specialized platforms and it can be time consuming to review these elements within a peer review. It would be better use of time to focus on the hard thinking elements, like logic etc. However these small syntax or style issues, can be distracting. Well at least they are for me.

That’s where Claude Code comes in. Claude code enables custom agents - we can then create agents that automatically validate your SQL code, enforce style guidelines, and catch platform-specific issues before they reach production. There are tools out there like SQLRuff and others that will likely do a better job that a ai agent. However the purpose of this is to have a basic understanding, but these agents could be applied to “Testing Agent” or “Data Modelling Agent” and far more.

Here’s our objectives:

Install and configure Claude Code for your development environment
Create a custom SQL style guide agent that enforces dbt Labs standards
Build a Starburst/Delta Lake syntax validator
Set up automated validation workflows in VSCode
Test your agents with real-world SQL examples

Prerequisites

Before starting this tutorial, ensure you have the following tools installed and configured:

Required Tools

Claude Code: Latest version (1.0.112+)

This tutorial requires Claude Code with the new hooks format support
Verify installation: claude --version

SQLRuff: For SQL linting and formatting

# Install SQLRuff
pip install sqruff

# Verify installation
sqruff --version

Git: For version control examples

Most systems have this pre-installed
Verify: git --version

Optional Tools

Node.js: Only needed if using npm installation method

# Verify if installed
node --version
npm --version

VSCode: Recommended editor for best integration

Download from: https://code.visualstudio.com/
Ensure code command is available in PATH

System Requirements

Operating System: Windows 10+, macOS 10.15+, or Linux
Memory: 4GB RAM minimum (8GB recommended)
Storage: 500MB free space for tools and project files
Network: Internet connection for initial setup and API calls

Installation Verification

Run these commands to verify your setup:

# Verify all required tools
claude --version        # Should show 1.0.112 or higher
sqruff --version       # Should show SQLRuff version
git --version          # Should show Git version

# Optional verifications
node --version         # If using npm method
code --version         # If using VSCode

If any commands fail, install the missing tools before proceeding with the tutorial.

How to Install Claude Code: Setup Guide for Data Engineers

Before we dive into building SQL agents, let’s get Claude Code properly installed and configured. The setup process has evolved significantly in early 2025, with multiple installation options depending on your environment.

Claude Code Installation Methods

The recommended approach is using the native binary installation, which provides the most stable experience:

# macOS/Linux/WSL
curl -fsSL https://claude.ai/install.sh | bash

# Windows PowerShell
irm https://claude.ai/install.ps1 | iex

# Verify installation
claude --version
claude doctor

If you prefer npm-based installation or need it for CI/CD environments:

# Global installation
npm install -g @anthropic-ai/claude-code

# Navigate to your project
cd your-sql-project
claude

Claude Code Authentication Setup

You have two main options for authentication. For production environments, I recommend using the Anthropic API key approach:

# Set environment variable
export ANTHROPIC_API_KEY="your-api-key-here"

# Add to shell profile for persistence
echo 'export ANTHROPIC_API_KEY="your-api-key-here"' >> ~/.bashrc
source ~/.bashrc

# Test connection
claude -p "Test API connection"

Alternatively, if you have a Claude Pro ($20/month) or Max ($200/month) subscription, you can use OAuth authentication during the initial setup process.

SQL Agent Project Structure Setup

Once Claude Code is installed, you’ll want to organize your project with a structure that supports SQL validation. Here’s the recommended layout:

The CLAUDE.md file serves as your project’s main configuration. Here’s a template specifically designed for SQL validation projects:

# SQL Validation Agent Project

## Project Overview
This project uses Claude Code to validate SQL code according to:
- dbt Labs SQL Style Guide
- Starburst/Trino syntax requirements  
- Delta Lake connector best practices

## Key Commands
- Validate SQL: `/sql-validate filename.sql`
- Style check: Use sql-style-guide agent
- Syntax check: Use starburst-validator agent

## Coding Standards
- Use SQLRuff for automated linting
- Follow dbt naming conventions
- 4-space indentation with trailing commas
- Lowercase for field names, keywords, and functions
- Lines should not exceed 80 characters

## Database Context
- Primary engine: Starburst/Trino
- Data lake: Delta Lake format
- Metastore: AWS Glue / Hive Metastore

Essential Claude Code Configuration

Create a .claude/settings.json file to configure permissions and automation hooks:

{
  "permissions": {
    "allow": ["Read", "Edit", "Bash"],
    "deny": ["Read(./.env)", "Read(./.env.*)", "Read(./secrets/**)"]
  },
  "hooks": {
    "PostToolUse": [{
      "matcher": {
        "tools": ["EditTool"]
      },
      "hooks": [{
        "type": "command",
        "command": "sqruff lint \"$CLAUDE_FILE_PATHS\""
      }]
    }]
  }
}

This configuration automatically runs SQLRuff linting whenever you edit SQL files, providing immediate feedback on style and syntax issues.

Creating Your First SQL Agent: Style Guide Validator

Now comes the exciting part - creating your first custom SQL agent. We’ll start with a style guide validator that enforces dbt Labs standards and general SQL best practices.

SQL Agent Structure and Configuration

Create a new file at .claude/agents/sql-style-guide.md with the following content:

---
name: sql-style-guide
description: Use this agent for SQL code style validation, formatting checks, and dbt best practice enforcement
tools: Read, Write, Bash
color: blue
---

You are a SQL style guide specialist focusing on dbt Labs standards and general SQL best practices. When analyzing SQL code, you:

## Core Responsibilities
- Validate dbt naming conventions (staging: `stg_[source]__[entity]s.sql`)
- Enforce 4-space indentation with trailing commas
- Check lowercase usage for field names, keywords, and functions
- Validate 80-character line limits
- Ensure proper CTE organization and naming
- Verify explicit JOIN types and column aliasing

## Validation Checklist
- [ ] Field names use snake_case
- [ ] Reserved keywords in UPPERCASE
- [ ] Trailing commas in SELECT statements
- [ ] Explicit `AS` keywords for aliases
- [ ] Proper CTE ordering (imports first, then transformations)
- [ ] No SELECT * usage (except in final CTE)
- [ ] GROUP BY uses numbers, not column names
- [ ] Prefer UNION ALL over UNION

## Output Format
Provide findings as:
1. **Style Issues Found**: Bulleted list with line numbers
2. **Suggested Fixes**: Code snippets showing corrections
3. **dbt Best Practices**: Additional recommendations
4. **SQLRuff Command**: Automated fix command if applicable

## Tool Usage
- Use Read to examine SQL files
- Use Bash to run `sqruff lint filename.sql`
- Use Write to create corrected versions when requested

Understanding Claude Code Agent Components

Let’s break down what makes this SQL agent effective:

Frontmatter Configuration: The YAML frontmatter defines the agent’s identity and capabilities. The tools field specifies which Claude Code tools the agent can access - in this case, Read (for examining files), Write (for creating corrections), and Bash (for running SQLRuff).

Role Definition: The agent has a clear, specific role as a “SQL style guide specialist.” This focused responsibility ensures it doesn’t try to handle tasks outside its expertise.

Validation Checklist: The structured checklist provides consistent, repeatable validation criteria. This ensures every SQL file gets the same thorough review.

Output Format: By defining a specific output structure, we ensure consistent, actionable feedback that developers can easily follow.

Testing Your SQL Style Guide Agent

Let’s create some test files to validate our agent works correctly. First, create a test file with intentional style issues:

-- tests/bad_style.sql
select u.id,u.name,p.company from users u join profiles p on u.id=p.user_id where u.active=true

Now test your agent:

claude "Use the sql-style-guide agent to analyze tests/bad_style.sql"

The agent should identify multiple issues:

Missing spaces around operators
Implicit JOIN instead of explicit INNER JOIN
No trailing commas
Line length exceeding 80 characters
Missing proper indentation

Creating Good SQL Examples

It’s equally important to test with properly formatted SQL:

-- tests/good_style.sql
WITH users_with_profiles AS (
    SELECT
        users.id,
        users.name,
        users.email,
        profiles.company,
        profiles.created_at
    FROM {{ ref('users') }}
    INNER JOIN {{ ref('profiles') }} 
        ON users.id = profiles.user_id
    WHERE users.active = TRUE
        AND profiles.verified = TRUE
),

final AS (
    SELECT * FROM users_with_profiles
)

SELECT * FROM final

When you run the agent on this file, it should pass validation with minimal or no issues, confirming your agent correctly identifies good practices.

Advanced SQL Validation: Starburst and Delta Lake Agent

While style validation is important, syntax and platform-specific optimization are equally critical. Let’s build a specialized agent that understands Starburst/Trino syntax and Delta Lake connector patterns.

The Starburst Delta Lake Validator Agent

Create .claude/agents/starburst-validator.md:

---
name: starburst-validator  
description: Validate Starburst/Trino syntax and Delta Lake connector usage patterns
tools: Read, Write, Bash
color: green
---

You are a Starburst/Trino syntax specialist with expertise in Delta Lake connector usage. Focus on:

## Core Validation Areas
- Trino-specific SQL syntax and functions
- Delta Lake time travel queries syntax
- Proper catalog/schema/table references
- Starburst Warp Speed indexing opportunities
- Memory usage optimization patterns
- Cross-connector query optimization

## Delta Lake Specific Checks
- Time travel syntax: `FOR VERSION AS OF` vs `FOR TIMESTAMP AS OF`
- System table usage: `"table$history"`, `"table$partitions"`
- OPTIMIZE and VACUUM operations
- Change data feed (CDF) syntax when applicable
- Column mapping and deletion vectors compatibility

## Starburst Features Validation
- Connector-specific syntax (s3, glue, delta-lake)
- Authentication method compatibility
- Query federation patterns across data sources
- Performance optimization suggestions (pushdown predicates)

## Common Anti-patterns to Flag
- SELECT * on large Delta tables
- Missing partition filters on time-partitioned data
- Inappropriate DISTINCT usage
- Cross-connector joins without proper optimization
- Missing time travel version management

## Output Format
Return structured analysis:
1. **Syntax Validation**: Trino compatibility check
2. **Delta Lake Usage**: Connector-specific recommendations  
3. **Performance Optimization**: Query improvement suggestions
4. **Security Review**: Access pattern analysis
5. **Next Steps**: Actionable improvements with examples

## Tool Commands
- Use Read for SQL file analysis
- Use Bash for `trino --execute "EXPLAIN (TYPE VALIDATE) query"` when available
- Check system tables with queries like `SELECT * FROM "table$properties"`

Delta Lake Time Travel Query Patterns

This agent specializes in recognizing and validating Delta Lake-specific syntax. Here are some examples it should handle:

-- Good: Proper time travel syntax
SELECT * FROM example.testdb.customer_orders 
FOR VERSION AS OF 3;

-- Good: Timestamp-based time travel
SELECT * FROM example.testdb.customer_orders 
FOR TIMESTAMP AS OF TIMESTAMP '2022-03-23 09:59:29.803 America/Los_Angeles';

-- Good: System table monitoring
SELECT * FROM "customer_orders$history" ORDER BY version DESC;
SELECT * FROM "customer_orders$partitions";

The agent should flag problematic patterns like:

-- Bad: No partition filtering on large table
SELECT customer_id, order_total
FROM customer_orders 
WHERE order_date = '2023-01-01';

-- Better: Partition filtering
SELECT customer_id, order_total
FROM customer_orders 
WHERE date_partition = '2023-01-01'
    AND customer_id IN (1, 2, 3);

Starburst Performance Optimization Detection

One of the most valuable features of this agent is its ability to suggest performance improvements:

-- The agent should suggest predicate pushdown optimization
SELECT c.name, o.total
FROM customers c
INNER JOIN orders o ON c.id = o.customer_id
WHERE c.region = 'US'  -- Filter pushed down
    AND o.status = 'completed';

Building Automated SQL Validation Workflows

Individual agents are powerful, but the real magic happens when you orchestrate them into automated workflows. Let’s build a comprehensive validation system that runs automatically as you work.

VSCode Integration for SQL Validation

First, let’s set up VSCode tasks that make it easy to run validation with keyboard shortcuts. Create or update .vscode/tasks.json:

{
  "version": "2.0.0",
  "tasks": [
    {
      "label": "Validate All SQL Files",
      "type": "shell", 
      "command": "claude",
      "args": ["-p", "Use sql-style-guide and starburst-validator agents to validate all .sql files in workspace. Generate summary report."],
      "group": {
        "kind": "build",
        "isDefault": true
      },
      "presentation": {
        "echo": true,
        "reveal": "always",
        "panel": "new"
      }
    },
    {
      "label": "SQL Style Check",
      "type": "shell",
      "command": "claude", 
      "args": ["-p", "Use sql-style-guide agent to check style of ${file}"],
      "group": "test"
    }
  ]
}

Now you can press Ctrl+Shift+P (or Cmd+Shift+P on Mac), type “Tasks: Run Task”, and select “Validate All SQL Files” to run comprehensive validation across your entire project.

Automated File Watching with Claude Code

For real-time validation, create .claude/hooks.mjs to automatically check files as you save them:

export async function postEdit({ filePath, success }) {
  if (!success) return;
  
  // Auto-validate SQL files on save
  if (filePath.match(/\.sql$/)) {
    console.log(`[Auto-validation] Checking ${filePath}`);
    
    // Run style validation
    execSync(`claude -p "Use sql-style-guide agent to quickly validate ${filePath}"`);
    
    // Run syntax validation for Starburst files
    if (await fileContains(filePath, /starburst|trino|delta/i)) {
      execSync(`claude -p "Use starburst-validator agent to check ${filePath}"`);
    }
  }
  
  // Auto-format with SQLRuff
  if (filePath.match(/\.sql$/)) {
    try {
      execSync(`sqruff fix "${filePath}"`);
    } catch (error) {
      console.log(`[Warning] SQLRuff formatting failed: ${error.message}`);
    }
  }
}

async function fileContains(filePath, pattern) {
  const content = await fs.readFile(filePath, 'utf8');
  return pattern.test(content);
}

This hook automatically runs validation whenever you save a SQL file, providing immediate feedback without interrupting your workflow.

Batch SQL Processing Command

For comprehensive project-wide validation, create .claude/commands/scan-sql-files.md:

Scan and validate all SQL files in the project:

1. **Discovery Phase**:
   - Find all .sql files in project directory
   - Categorize by type (queries, migrations, models, tests)
   - Identify potential Starburst/Delta Lake usage patterns

2. **Validation Phase**:
   - Run sql-style-guide agent on each file
   - Run starburst-validator agent on files with Trino syntax
   - Collect all findings into structured report

3. **Reporting Phase**:
   - Generate summary dashboard showing:
     - Total files processed
     - Style issues found and severity
     - Syntax errors and warnings  
     - Performance optimization opportunities
     - Files requiring immediate attention

4. **Action Items**:
   - Prioritized list of fixes needed
   - Automated fix commands where possible
   - Manual review recommendations

Confirm before starting each phase. Process files in batches of 10 to avoid context overflow.

Use this command with: claude "/scan-sql-files"

Testing SQL Agents: Production-Ready Validation

Building agents is just the beginning. To make them truly valuable for your team, you need a systematic approach to testing and refinement.

Creating a Comprehensive SQL Test Suite

Start by building a comprehensive test suite that covers various scenarios your agents will encounter:

Unix/Linux/macOS:

# Create test directory structure
mkdir -p tests/{good,bad,edge-cases}

# Good examples (should pass validation)
echo "-- Well-formatted dbt model
WITH users_clean AS (
    SELECT
        id,
        LOWER(email) AS email,
        created_at
    FROM {{ ref('raw_users') }}
    WHERE email IS NOT NULL
)

SELECT * FROM users_clean" > tests/good/dbt_model.sql

# Bad examples (should fail validation)
echo "select * from users where email like '%@gmail.com'" > tests/bad/style_issues.sql

Windows PowerShell:

# Create test directory structure
mkdir tests; mkdir tests\good; mkdir tests\bad; mkdir tests\edge-cases

# Good examples (should pass validation)
@"
-- Well-formatted dbt model
WITH users_clean AS (
    SELECT
        id,
        LOWER(email) AS email,
        created_at
    FROM {{ ref('raw_users') }}
    WHERE email IS NOT NULL
)

SELECT * FROM users_clean
"@ | Out-File -FilePath "tests\good\dbt_model.sql" -Encoding UTF8

# Bad examples (should fail validation)
"select * from users where email like '%@gmail.com'" | Out-File -FilePath "tests\bad\style_issues.sql" -Encoding UTF8

Windows Command Prompt:

# Create test directory structure
mkdir tests && mkdir tests\good && mkdir tests\bad && mkdir tests\edge-cases

# Create files manually or use a text editor for complex SQL examples

Edge Case Example (All Platforms): Create tests/edge-cases/complex_time_travel.sql with this content:

-- Complex Delta Lake time travel query
SELECT 
    customer_id,
    order_total,
    order_date
FROM delta_lake.sales.orders 
FOR VERSION AS OF 5
WHERE date_partition >= '2023-01-01'
    AND customer_region = 'US'
UNION ALL
SELECT 
    customer_id,
    order_total,
    order_date  
FROM delta_lake.sales.orders
FOR TIMESTAMP AS OF TIMESTAMP '2023-06-01 00:00:00'
WHERE date_partition >= '2023-06-01'
    AND customer_region = 'EU'

Automated SQL Agent Testing Workflow

Create a testing command that systematically validates your agents:

# Test style guide agent
claude "Use the sql-style-guide agent to analyze all files in tests/good/ - these should pass with minimal issues"

claude "Use the sql-style-guide agent to analyze all files in tests/bad/ - these should identify specific problems"

# Test syntax validator
claude "Use the starburst-validator agent to check tests/edge-cases/complex_time_travel.sql for Delta Lake best practices"

Measuring SQL Agent Effectiveness

Track key metrics to understand how well your agents are performing:

Accuracy: How often do agents correctly identify real issues?
Coverage: What percentage of actual problems do they catch?
False Positives: How often do they flag correct code as problematic?
Actionability: How clear and helpful are their recommendations?

Keep a log of agent performance:

# Agent Performance Log

## sql-style-guide Agent
- **Date**: 2025-01-15
- **Files Tested**: 25
- **Issues Found**: 47
- **False Positives**: 3
- **Accuracy**: 94%
- **Notes**: Occasionally flags dbt macros as style violations

## starburst-validator Agent  
- **Date**: 2025-01-15
- **Files Tested**: 15
- **Performance Suggestions**: 12
- **Syntax Errors Caught**: 3
- **Notes**: Excellent at catching missing partition filters

Iterative SQL Agent Improvement Process

Based on your testing results, continuously refine your agents:

Analyze Feedback: Review false positives and missed issues
Update Agent Instructions: Refine the validation criteria
Add New Test Cases: Include edge cases you discover
Validate Changes: Test updated agents against your full test suite
Document Learnings: Keep track of what works and what doesn’t

For example, if your style guide agent keeps flagging legitimate dbt macros, you might update its instructions:

## Additional Considerations
- Recognize dbt macros ({{ macro_name() }}) as valid syntax
- Allow longer lines for complex macro calls
- Understand dbt ref() and source() functions

Real-World SQL Agent Implementation

Let’s walk through a complete example of how these agents work together in a real data engineering project.

Scenario: Data Pipeline Code Review with Claude Code

Imagine you’re working on a data pipeline that processes customer orders using Starburst and Delta Lake. You’ve just received a pull request with several SQL files that need review.

Step 1: Initial SQL Validation

# Run comprehensive validation
claude "Use both sql-style-guide and starburst-validator agents to review all .sql files in the current directory. Provide a summary of issues found and prioritize them by severity."

Step 2: Multi-Agent Coordination

The agents work together to provide comprehensive feedback:

Style Guide Agent identifies formatting inconsistencies, naming convention violations, and dbt best practice issues
Starburst Validator catches performance problems, syntax errors, and Delta Lake optimization opportunities

Step 3: Automated SQL Fixes

For issues that can be automatically resolved:

# Fix style issues with SQLRuff
sqruff fix *.sql

# Apply agent suggestions
claude "Use the sql-style-guide agent to create corrected versions of files with style issues"

Step 4: Manual SQL Review

For complex issues requiring human judgment:

# Get detailed analysis
claude "Use the starburst-validator agent to provide detailed performance optimization recommendations for customer_orders_analysis.sql, including specific code examples"

Integration with Existing Data Engineering Tools

Your Claude Code agents work seamlessly with your existing data engineering stack:

dbt Integration:

# Configure SQLRuff for dbt + Trino
# SQLRuff is specifically designed for SQL linting
echo "[tool.sqruff]
dialect = \"trino\"
templater = \"dbt\"" > pyproject.toml

# Validate dbt models
claude "Use sql-style-guide agent to validate all models in models/ directory for dbt best practices"

Git Hooks for SQL Validation:

# Pre-commit validation
echo "#!/bin/bash
files=\$(git diff --cached --name-only --diff-filter=ACM | grep '\\.sql\$')
if [ -n \"\$files\" ]; then
    claude \"Use sql-style-guide and starburst-validator agents to validate staged SQL files: \$files\"
fi" > .git/hooks/pre-commit

chmod +x .git/hooks/pre-commit

CI/CD Pipeline Integration:

# GitHub Actions example
- name: Validate SQL Files
  run: |
    claude "Use sql-style-guide and starburst-validator agents to validate all SQL files. Exit with error code if critical issues found."

Troubleshooting Claude Code SQL Agents

Even with well-designed agents, you’ll encounter challenges. Here are the most common issues and how to resolve them.

Common Claude Code Installation Problems

VSCode Extension Not Loading:

Ensure the code command is in your PATH
Run claude from VSCode’s integrated terminal
On Windows, check for conflicts between WSL and Windows Node.js installations

Permission Errors:

Configure auto-approval in .claude/settings.json for trusted operations
Use --dangerously-skip-permissions flag for automated scripts (with caution)
In production, always use manual approval for security

Context Window Overflow:

Use /clear command frequently to reset context
Process large projects in batches
Monitor the context indicator in Claude Code’s status bar

SQL Agent Development Challenges

Agents Not Being Invoked:

Make agent descriptions action-oriented and specific
Use clear, unambiguous language in the description field
Test agent invocation with simple commands first

Tool Access Issues:

Explicitly configure tool permissions in agent frontmatter
Verify the agent has access to required tools (Read, Write, Bash)
Check .claude/settings.json for permission conflicts

Inconsistent Results:

Provide detailed, specific instructions in agent definitions
Use structured output formats to ensure consistency
Include examples of good and bad patterns in agent instructions

SQL Validation Performance Optimization

Slow Validation:

Break large queries into smaller validation chunks
Use lightweight validation for real-time feedback
Schedule comprehensive validation for batch processing

Memory Usage:

Clear context regularly during large validation runs
Process files in smaller batches
Use the Task tool for clean delegation between agents

Team Collaboration with SQL Agents

Maintaining Consistency:

Version control all agent definitions and settings
Document agent capabilities and limitations
Establish clear guidelines for when to use each agent

Training and Adoption:

Create simple examples and tutorials for team members
Start with basic validation and gradually add complexity
Collect feedback and iterate based on real usage patterns

Conclusion: Mastering SQL Validation with Claude Code

You’ve now built a comprehensive SQL validation system using Claude Code that can transform how your team approaches code quality. Your custom agents provide automated style checking, syntax validation, and performance optimization suggestions that would typically require hours of manual review.

The key benefits you’ve implemented include:

Consistent SQL Code Quality: Automated enforcement of dbt Labs standards and SQL best practices
Platform-Specific Validation: Specialized checking for Starburst/Trino and Delta Lake patterns
Automated Workflows: Real-time validation that integrates seamlessly with your development process
Scalable Review Process: Batch validation capabilities for large codebases

Remember that building effective SQL agents is an iterative process. Start with the basic implementations provided in this tutorial, then refine them based on your team’s specific needs and the patterns you encounter in your SQL code.

Your next steps should be:

Deploy these agents in a development environment and test them with your actual SQL files
Gather feedback from your team and refine the validation criteria
Integrate the agents into your CI/CD pipeline for automated code review
Explore additional agents for other aspects of your data engineering workflow

The foundation you’ve built here can be extended to enforce enterprise patterns, or processes - so you might have DBT specific macros that are used for transformation standards, or specific metadata columns that you use on all tables. You might find that there are common performance issues with code, so showing techniques to break automatically make it more performant. Enforce Partition usage in code. Automate or suggest documentation for all the models you create. The possibilities are endless when you have AI-powered agents working alongside your development process, but as always use with caution. Tools like this are good to save time, but still need validation, review of what it does. Don’t blindly use what comes out.

Chris Hillman