Setting Up Your Data Engineering Environment on MacOS
Introduction
Setting up a development environment for data engineering on MacOS requires careful consideration of package management, Python version control, and tool configuration. This guide will walk you through the process, explaining not just how to set up these tools, but why each component is important.
Clean Slate: Removing Existing Python Installations
Before we begin, it’s important to ensure we’re starting with a clean slate. Multiple Python installations can cause confusion and conflicts. Let’s remove any existing Python installations:
# Remove Python framework
sudo rm -rf /Library/Frameworks/Python.framework/Versions/3.*
# Remove Python applications
sudo rm -rf /Applications/Python\ 3.*
# Check current Python location
which python
which python3
# These should now point to system Python or return "not found"
This cleanup helps prevent conflicts between different Python installations and ensures our new setup will work as expected.
Installing Homebrew: The Foundation
Homebrew is a package manager for MacOS that simplifies the installation and management of software packages. Think of it as the App Store for command-line tools and development software. It’s essential because:
- It manages software dependencies automatically
- Keeps track of installed packages and their versions
- Makes updating and removing packages straightforward
- Ensures consistent installation across different Macs
Install Homebrew with:
/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)"
After installation, Homebrew might suggest adding its path to your shell configuration. Follow any on-screen instructions to complete the setup.
Setting Up ZSH
ZSH (Z Shell) is a powerful shell that extends the capabilities of the standard bash shell. While MacOS uses ZSH by default since Catalina, let’s ensure it’s properly configured:
# Install ZSH if not present
brew install zsh
# Make ZSH your default shell if it isn't already
chsh -s $(which zsh)
Python Version Management with pyenv
pyenv is a powerful tool that allows you to:
- Install multiple Python versions side by side
- Switch between Python versions per project
- Set global and local Python versions
- Manage virtual environments
Install pyenv using Homebrew:
brew install pyenv
Configuring pyenv
Add these lines to your ~/.zshrc file:
# Add pyenv to PATH and initialize
export PYENV_ROOT="$HOME/.pyenv"
export PATH="$PYENV_ROOT/bin:$PATH"
eval "$(pyenv init --path)"
eval "$(pyenv init -)"
These configurations are crucial because:
PYENV_ROOT
defines where pyenv will store all Python versions- Adding pyenv to PATH ensures its commands are available
pyenv init
sets up pyenv’s shims directory, which allows it to intercept Python commands- The initialization ensures pyenv can switch Python versions seamlessly
After adding these lines:
# Reload your shell configuration
source ~/.zshrc
# Verify pyenv installation
pyenv --version
Installing and Managing Python Versions
Now let’s install Python and set it up:
# List available Python versions
pyenv install --list | grep " 3\."
# Install Python 3.10.0
pyenv install 3.10.0
# Install additional versions if needed
pyenv install 3.11.0
# Set global Python version
pyenv global 3.10.0
# Verify installation
python --version
which python # Should point to pyenv shims directory
# To use a different version in a specific directory:
pyenv local 3.11.0
# To switch back to global version:
pyenv local --unset
Setting Up UV Package Manager
UV is a modern, fast Python package manager written in Rust that significantly improves upon pip’s performance. Since we’re using Homebrew as our package manager, we can install UV through it:
brew install uv
This method integrates UV with Homebrew’s package management, making it easier to update and maintain alongside your other development tools.
UV offers two interfaces for installing packages:
uv pip install
: A pip-compatible interface for direct package installationuv install
: Project-oriented interface for managing dependencies from pyproject.toml
For our DBT setup, we’ll use the pip interface since we’re working with specific versions:
# Create a new project directory
mkdir dbt-project
cd dbt-project
# Create and activate virtual environment
uv venv
source .venv/bin/activate
# Install packages using UV's pip interface
uv pip install dbt-core==1.8.1
uv pip install dbt-trino==1.8.0
uv pip install pystarburst
# Verify installations
dbt --version
VSCode Configuration
VSCode is a powerful code editor that provides excellent support for Python development. The “Shell Command” installation is particularly important:
Installing the ‘code’ Command
The ‘code’ command in PATH allows you to:
- Open files and directories in VSCode from the terminal
- Launch VSCode with specific files or projects quickly
- Integrate VSCode with other command-line tools
To install:
- Open VSCode
- Press
Cmd+Shift+P
to open the command palette - Type “shell command”
- Select “Install ‘code’ command in PATH”
This is more efficient than using basic text editors like nano because:
- You get syntax highlighting
- Integrated terminal access
- Git integration
- Debugging capabilities
- Intelligent code completion
Test the installation:
# Should open VSCode
code .
# Open a specific file
code filename.py
Essential Extensions
Install these extensions for Python development:
- Python (Microsoft)
- Pylance
- DBT
- GitLens
Troubleshooting Common Issues
Here are solutions to common setup issues:
Python Path Issues
# Check Python location
which python
# Should show: ~/.pyenv/shims/python
# If not, verify pyenv initialization in ~/.zshrc
# and ensure you've reloaded your shell
Multiple Python Versions
# List all Python versions on your system
ls /usr/local/bin/python*
ls ~/.pyenv/versions/
# Verify which Python is being used
python --version
pyenv which python
UV Installation Issues
# Check UV version
uv --version
# Reinstall UV if needed
curl -LsSf https://astral.sh/uv/install.sh | sh --force
VSCode Python Selection
- Open command palette (Cmd+Shift+P)
- Type “Python: Select Interpreter”
- Choose the pyenv-managed Python version
Conclusion
With this setup complete, you have a robust, version-controlled Python environment with modern package management through UV. The combination of pyenv for Python version management and UV for package management provides a flexible, maintainable development environment for data engineering work.
Remember to keep your tools updated and check for new versions of DBT and its dependencies periodically. Happy coding!