Ghost in the data
  • Home
  • About
  • Posts
  • Topics
  • Resources
  • Tags
  • 2026 Trends
  • AI
  • AI Agents
  • AI Bubble
  • AI Business Applications
  • AI Communication
  • AI Concepts
  • AI Ethics
  • AI Productivity
  • AI Prompting
  • AI Tools
  • AI Workflows
  • Airflow
  • Analytics
  • AnalyticsEngineering
  • Anonymization
  • Apache Airflow
  • Apache Iceberg
  • Athena
  • Automation
  • AVRO
  • AWS
  • AWS Glue
  • BankingData
  • Bedrock Edition
  • BigData
  • Blue-Green Deployment
  • Budgeting
  • Business Case
  • Business Value
  • Business-Communication
  • Career Advice
  • Career Development
  • Career Growth
  • Career Planning
  • Career Strategy
  • Change Management
  • Chapter Lead
  • ChatGPT
  • CI/CD
  • Claude
  • Claude-Code
  • Cloud Computing
  • Cloud Gaming
  • Code Review
  • Collaboration
  • Communication
  • ConceptualDataModeling
  • Continuous Learning
  • ContinuousIntegration
  • Cost Optimization
  • CSV
  • Culture
  • Data Architecture
  • Data Culture
  • Data Engineering
  • Data Ethics
  • Data Governance
  • Data Impact
  • Data Ingestion
  • Data Leadership
  • Data Modeling
  • Data Modelling
  • Data Pipeline
  • Data Pipelines
  • Data Quality
  • Data Reliability
  • Data Solutions
  • Data System Resilience
  • Data Teams
  • Data Testing
  • Data Transformation
  • Data Validation
  • Data Vault
  • Data Warehouse
  • Data Warehouse Architecture
  • Database Design
  • DataDemocratization
  • DataEngineering
  • Datafold
  • DataGovernance
  • DataMinimization
  • DataModeling
  • DataPipelines
  • DataPrivacy
  • DataQuality
  • DataTools
  • DataValidation
  • DataWarehouse
  • Dbt
  • Decision Making
  • Delta Lake
  • Development
  • Development Tools
  • DevOps
  • Dimensional Modeling
  • DimensionalModeling
  • DuckDB
  • Emergency Fund
  • Emotional Intelligence
  • EmpatheticDesign
  • Employee Engagement
  • Employee Productivity
  • Engineering Career
  • ETL
  • ETL Pipeline
  • Family Gaming
  • Feedback
  • File Formats
  • Financial Crisis
  • Financial Independence
  • Frameworks
  • Future of Work
  • GCP
  • GDPR
  • Git
  • GitBash
  • GitHub
  • GitHub Actions
  • Hiring Strategies
  • Incident Response
  • Industry Trends
  • Innovation
  • Inspirational Quote
  • Intergroup Conflict
  • Interviews
  • Job Security
  • Journal
  • Journaling Techniques
  • JSON
  • Kimball
  • Kimball Methodology
  • Lambda
  • Language Models
  • Leadership
  • LLM
  • LLM Interaction
  • MacOS
  • Management
  • Mental Health
  • Mentorship
  • Mindfulness Practices
  • Minecraft
  • Moral Development
  • Onboarding
  • One-on-One Meetings
  • OpenSource
  • ORC
  • Organizational Culture
  • Parquet
  • Performance Optimization
  • Personal Growth
  • Pipeline
  • PostegreSQL
  • Presentation-Skills
  • Problem Solving
  • Production Issues
  • Professional Development
  • Professional Growth
  • Professional Relationships
  • Professional-Skills
  • Promotion
  • Psychological Safety
  • Public-Speaking
  • Python
  • RAG
  • Recruitment
  • Redundancy
  • Remote Work
  • Reputation
  • RequirementGathering
  • RetentionPolicies
  • Risk Management
  • Robbers Cave Experiment
  • ROI
  • Roleplaying
  • S3
  • SCD Type 2
  • Schema Evolution
  • Self-Awareness
  • Self-Reflection
  • Server Setup
  • ServiceDesign
  • ShadowIT
  • Soft Skills
  • SQL
  • SQL Standards
  • Sql-Agents
  • Sql-Validation
  • SSH
  • SSH Keys
  • Staff Engineer
  • Stakeholder Engagement
  • Stakeholder Management
  • StakeholderManagement
  • Star Schema
  • Starburst
  • Step Functions
  • Strategy
  • Strengths
  • Success Habits
  • Talent Acquisition
  • Team Building
  • Team Collaboration
  • Team Culture
  • Team Enablement
  • Team-Management
  • Technical Assessment
  • Technical Leadership
  • Testing
  • Tools and Access
  • Trino
  • Trust
  • Trust Building
  • Trust Crisis
  • UserExperience
  • UV
  • UV Package Manager
  • Value Creation
  • Vector Databases
  • Virtual Environments
  • Visualization
  • Vocal-Techniques
  • Vscode
  • WAP Pattern
  • Windows
  • Workplace Communication
  • Workplace Relationships
  • Workplace Stress
  • Write-Audit-Publish
  • Zsh
Hero Image
Leveraging LLMs for Business Impact: Part 2 - Building an AI Data Engineer Agent

Introduction In Part 1 of this series, we explored the theoretical foundations of Large Language Models (LLMs), Retrieval Augmented Generation (RAG), and vector databases. Now, it’s time to put theory into practice. This is going to be a long read, so grab some coffee, and one (couple) of your favorite biscuits. One use case for leveraging LLM’s, is creating of a Agent - a Senior Data Engineer AI that automatically reviews Pull Requests in your data engineering projects. This agent will be that nit picky Data Engineer that enforces SQL formatting standards, ensure naming and data type consistency, validate data quality checks, and suggest improvements based on best practices. By integrating this into your GitHub workflow, you can maintain higher code quality, accelerate onboarding for new team members, and reduce the burden of manual code reviews.

  • GitHub Actions
  • CI/CD
  • AI Agents
  • Code Review
  • Data Quality
  • DBT
  • SQL Standards
Saturday, March 8, 2025 Read