Run Apache Airflow on Windows 10
Apache Airflow is a great tool to manage and schedule all steps of a data pipeline. However, running it on Windows 10 can be challenging. Airflow’s official Quick Start suggests a smooth start, but solely for Linux users. What about us Windows 10 people if we want to avoid Docker ? These steps worked for me and hopefully will work for you, too.
   
Photo by Geran de Klerk on Unsplash
After struggling with incorrect configuration, I eventually found a way to install and launch my first Airflow instance. With high spirits I applied it to a data pipeline with Spark EMR clusters . I am happy to share my insights and list the steps that worked for me. If this also works for you - the better!
TLDR;
How to install and run Airflow locally with Windows subsystem for Linux (WSL) with these steps:
- 
Open Microsoft Store, search for Ubuntu, install it then restart
- 
Open cmd and type wsl
- 
Update everything: sudo apt update && sudo apt upgrade
- 
Install pip3 like this sudo apt-get install software-properties-common sudo apt-add-repository universe sudo apt-get update sudo apt-get install python3-pip
- 
Install Airflow: pip3 install apache-airflow
- 
Run sudo nano /etc/wsl.conf, insert the block below, save and exit withctrl+sctrl+x[automount] root = / options = "metadata"
- 
Run nano ~/.bashrc, insert the block below, save and exit withctrl+sctrl+x
export AIRFLOW_HOME=/c/users/YOURNAME/airflowhome
- Restart terminal, activate wsl, runairflow info- Everything is fine if you see something like Apache Airflow [1.10.12]
- If you get errors due to missing packages, install them with pip3 install [package-name]
- Try airflow infoagain
- If it does not work by now, try to follow instructions by the error message. You might want to revert to Docker .
 
- Everything is fine if you see something like 
  
Airflow on Windows WSL
I managed to make it work with a Windows subsystem for Linux (WSL) which was recommended on blogs or Stack Overflow . However, even these resources lead into dead ends.
After a lot of try and error I want to help you with an approach that worked for me. Try to follow these steps. If you get stuck, try to resolve the error by installing missing dependencies, restart terminal or carefully check the instructions.
- Open Microsoft Store, search for Ubuntu, install it then restart
Run the following commands run in terminal:
- 
everything up to date with sudo apt update && sudo apt upgrade
- 
install pip3by runningsudo apt-get install software-properties-common sudo apt-add-repository universe sudo apt-get update sudo apt-get install python3-pip
- 
Install Airflow: pip3 install apache-airflow
- 
type sudo nano /etc/wsl.conf
- 
To access directories like /c/users/philippinstead of/mnt/c/users/philippinsert the code block, save and exit withctrl+sctrl+x[automount] root = / options = "metadata"
- 
Type nano ~/.bashrc
- 
Define the environment variable AIRFLOW_HOMEby adding the code below, then save and exit withctrl+s,ctrl+xexport AIRFLOW_HOME=/c/Users/philipp/AirflowHome
- 
Close terminal, open cmd again, type wsl
- 
Install missing packages with pip3 install [package-name]
- 
Restart terminal, activate wsl, runairflow info- Everything is fine if you see something like Apache Airflow [1.10.12]
- If you get errors due to missing packages, install them with pip3 install [package-name]
- Try airflow infoagain
- If it does not work by now, try to follow instructions by the error message. You might want to revert to Docker .
 
- Everything is fine if you see something like 
   
Photo by Zhipeng Ya on Unsplash
Other ways to install Airflow
Docker offers a controlled environment (container) to run applications. Since Airflow solely runs on Linux it is a great candidate to use a Docker container. However, Docker is sometimes hard to debug, clunky and could add another layer of confusion. If you want to run Airflow with Docker see this tutorial .
How to run an Airflow instance
Now it is time to have a look at Airflow! Is AIRFLOW_HOME where you expect it to be? Open two cmd windows, activate wsland run:
# check whether AIRFLOW_HOME was set correctly
env | grep AIRFLOW_HOME
# initialize database in AIRFLOW_HOME
airflow initdb 
# initialize scheduler
airflow scheduler
# use the second cmd window to run
airflow webserver
# access the UI on localhost:8080 in your browser
Unfortunately, WSL does not support background tasks (daemon). This is why we have to open one terminal for airflow webserver and one for airflow scheduler.
Setup Airflow in a project setting
Copying your DAGs back and forth from a project folder to Airflow home directory is cumbersome. Fortunately, we can automate this with a bash script. For example, my project root directory is in /c/users/philipp/projects/project_name/ and contains one folder with all scripts related to data collection and processing named ./src/data/. I also have one folder for all Airflow-related files in ./src/airflow/. In this folder
Have a look at my project Run Spark EMR clusters with Airflow 
    on Github
 to see the project structure. You find the script deploy.sh in ./src/airflow.
I am thankful for Cookiecutter data science for inspiration about the project structure.