Setting Up Your Data Engineering Environment on Windows
Introduction
Setting up a development environment for data engineering on Windows requires some specific considerations that differ from Unix-based systems. This guide will walk you through creating a robust Python development environment on Windows, with detailed explanations of each component and why it’s important.
Clean Slate: Removing Existing Python Installations
Before starting, it’s important to remove any existing Python installations to avoid conflicts:
- Open Windows Settings > Apps > Apps & Features
- Search for “Python”
- Uninstall any Python versions listed
Also check and remove Python from these locations:
# Using PowerShell to check common Python locations
Get-ChildItem "C:\Python*"
Get-ChildItem "C:\Users\$env:USERNAME\AppData\Local\Programs\Python"
Installing Windows Terminal (Recommended)
Windows Terminal provides a modern, powerful command-line experience:
- Open Microsoft Store
- Search for “Windows Terminal”
- Click Install
This gives you:
- Multiple tabs and panes
- Better text rendering
- Customizable profiles
- Support for WSL (Windows Subsystem for Linux)
- GPU-accelerated text rendering
Python Installation
Unlike MacOS, Windows doesn’t have a built-in package manager like Homebrew. We’ll install Python directly:
- Visit Python Downloads
- Download Python 3.10.x Windows installer (64-bit)
- Run the installer with these important options:
- ✅ Add Python to PATH
- ✅ Install for all users
- ✅ Create shortcuts for installed applications
- ✅ Add Python to environment variables
- ✅ Precompile standard library
Verify the installation in PowerShell:
# Check Python version
python --version
# Check Python location
Get-Command python | Select-Object Source
Managing Multiple Python Versions
For multiple Python versions on Windows:
- Download desired Python versions from python.org
- Install each version with unique directory names:
- C:\Python310
- C:\Python311
- etc.
Create a script to switch between versions:
# save as switch-python.ps1
param(
[string]$version
)
$pythonPath = "C:\Python$version"
$env:Path = "$pythonPath;$pythonPath\Scripts;" + $env:Path
[Environment]::SetEnvironmentVariable("Path", $env:Path, "User")
Usage:
# Run as administrator
.\switch-python.ps1 310 # Switches to Python 3.10
Setting Up UV Package Manager
UV is a modern, high-performance Python package manager. Install it using PowerShell:
powershell -ExecutionPolicy ByPass -c "irm https://astral.sh/uv/install.ps1 | iex"
Add UV to your system PATH if not automatically added:
- Open System Properties > Advanced > Environment Variables
- Edit User Variables > Path
- Add the UV installation path (typically %USERPROFILE%.cargo\bin)
VSCode Setup
VSCode provides an excellent development environment for Python on Windows:
- Download VSCode from code.visualstudio.com
- During installation, ensure these options are checked:
- ✅ Add “Open with Code” action to Windows Explorer file context menu
- ✅ Add “Open with Code” action to Windows Explorer directory context menu
- ✅ Register Code as an editor for supported file types
- ✅ Add to PATH
Why These Options Matter
- “Open with Code” context menu options allow you to right-click files/folders to open them in VSCode
- PATH registration lets you use
code
commands from the terminal - File type registration makes VSCode the default editor for development files
Essential Extensions
Install these extensions for Python development:
- Python (Microsoft)
- Pylance
- DBT
- GitLens
- PowerShell (for better PowerShell script editing)
Setting Up Your DBT Project
Now let’s set up a DBT development environment:
# Create project directory
New-Item -ItemType Directory -Path "dbt-project"
Set-Location dbt-project
# Create and activate virtual environment
uv venv
.\.venv\Scripts\Activate.ps1
# Install packages
uv pip install dbt-core==1.8.1
uv pip install dbt-trino==1.8.0
uv pip install pystarburst
# Verify installation
dbt --version
DBT Profile Setup
Create your DBT profile:
# Create .dbt directory
New-Item -ItemType Directory -Path "$env:USERPROFILE\.dbt"
New-Item -ItemType File -Path "$env:USERPROFILE\.dbt\profiles.yml"
Add your profile configuration:
your_profile_name:
target: dev
outputs:
dev:
type: trino
host: your_host
port: 443
user: your_username
password: your_password
database: your_database
schema: your_schema
Troubleshooting Common Issues
Python Path Issues
# Check Python in PATH
$env:Path -split ';' | Where-Object { $_ -like '*Python*' }
# Verify Python location
Get-Command python | Select-Object Source
Permission Issues
When running scripts, you might encounter ExecutionPolicy restrictions. Fix with:
# Run as Administrator
Set-ExecutionPolicy RemoteSigned
UV Installation Issues
# Check UV version
uv --version
# Reinstall UV if needed
powershell -ExecutionPolicy ByPass -c "irm https://astral.sh/uv/install.ps1 | iex"
VSCode Python Selection
- Open command palette (Ctrl+Shift+P)
- Type “Python: Select Interpreter”
- Choose the appropriate Python version
Virtual Environment Activation Fails
If you see “running scripts is disabled on this system”:
# Run as Administrator
Set-ExecutionPolicy RemoteSigned -Scope CurrentUser
Conclusion
While Windows setup differs from Unix-based systems, you can create an equally powerful development environment with these tools. The combination of Python, UV, and VSCode provides a robust platform for data engineering work.
Remember to keep your tools updated and regularly check for new versions of DBT and its dependencies. Happy coding!