Ghost in the data
  • Home
  • About
  • Posts
  • Tags
  • AI
  • AI Agents
  • AI Business Applications
  • AI Communication
  • AI Concepts
  • AI Productivity
  • AI Prompting
  • AI Workflows
  • Airflow
  • Apache Airflow
  • Apache Iceberg
  • Automation
  • AVRO
  • Bedrock Edition
  • Blue-Green Deployment
  • Business Value
  • Career Advice
  • Career Growth
  • Chapter Lead
  • ChatGPT
  • CI/CD
  • Claude
  • Cloud Gaming
  • Code Review
  • Communication
  • ConceptualDataModeling
  • Continuous Learning
  • CSV
  • Culture
  • Data Architecture
  • Data Culture
  • Data Engineering
  • Data Governance
  • Data Impact
  • Data Leadership
  • Data Modeling
  • Data Modelling
  • Data Pipeline
  • Data Quality
  • Data Reliability
  • Data Solutions
  • Data System Resilience
  • Data Testing
  • Data Transformation
  • Data Vault
  • Data Warehouse
  • Data Warehouse Architecture
  • Database Design
  • DataEngineering
  • DataPipelines
  • DBT
  • Delta-Lake
  • Development
  • Development Tools
  • Emotional Intelligence
  • EmpatheticDesign
  • Employee Engagement
  • Employee Productivity
  • Engineering Career
  • ETL
  • ETL Pipeline
  • Family Gaming
  • Feedback
  • File Formats
  • GCP
  • Git
  • GitBash
  • Github
  • GitHub Actions
  • Hiring Strategies
  • Incident Response
  • Industry Trends
  • Inspirational Quote
  • Intergroup Conflict
  • Interviews
  • Journal
  • Journaling Techniques
  • JSON
  • Language Models
  • LLM
  • LLM Interaction
  • MacOS
  • Management
  • Mentorship
  • Mindfulness Practices
  • Minecraft
  • Onboarding
  • One-on-One Meetings
  • ORC
  • Parquet
  • Performance Optimization
  • Personal Growth
  • Pipeline
  • PostegreSQL
  • Problem Solving
  • Professional Development
  • Professional Growth
  • Promotion
  • Python
  • RAG
  • Recruitment
  • Remote Work
  • RequirementGathering
  • Risk Management
  • Robbers Cave Experiment
  • Roleplaying
  • Schema Evolution
  • Self-Reflection
  • Server Setup
  • SQL
  • SQL Standards
  • SSH
  • SSH Keys
  • Staff Engineer
  • Stakeholder Engagement
  • Stakeholder Management
  • StakeholderManagement
  • Star Schema
  • Success Habits
  • Talent Acquisition
  • Team Collaboration
  • Team Enablement
  • Technical Assessment
  • Technical Leadership
  • Tools and Access
  • Trust Building
  • UV
  • UV Package Manager
  • Value Creation
  • Vector Databases
  • Virtual Environments
  • Visualization
  • VSCode
  • WAP Pattern
  • Windows
  • Workplace Communication
  • Workplace Relationships
  • Write-Audit-Publish
  • Zsh
Hero Image
Leveraging LLMs for Business Impact: Part 2 - Building an AI Data Engineer Agent

Introduction In Part 1 of this series, we explored the theoretical foundations of Large Language Models (LLMs), Retrieval Augmented Generation (RAG), and vector databases. Now, it’s time to put theory into practice. This is going to be a long read, so grab some coffee, and one (couple) of your favorite biscuits. One use case for leveraging LLM’s, is creating of a Agent - a Senior Data Engineer AI that automatically reviews Pull Requests in your data engineering projects. This agent will be that nit picky Data Engineer that enforces SQL formatting standards, ensure naming and data type consistency, validate data quality checks, and suggest improvements based on best practices. By integrating this into your GitHub workflow, you can maintain higher code quality, accelerate onboarding for new team members, and reduce the burden of manual code reviews.

  • GitHub Actions
  • CI/CD
  • AI Agents
  • Code Review
  • Data Quality
  • DBT
  • SQL Standards
Saturday, March 8, 2025 Read