Apache airflow installation on ubuntu

Installation on Ubuntu



Installation of pip on Ubuntu
To set up a virtual environment, we need to install a python package named virtualenv.

sudo apt install python3-pip


Installing & Setting Up a Virtual Environment
After successfully installing pip, we will now install the virtualenv package using the following command:
sudo pip3 install virtualenv


To create a virtual environment directory as "airflow_env" inside the "airflow_workspace" directory, execute the following command:
virtualenv airflow_env
OUTPUT:
created virtual environment CPython3.8.10.final.0–64 in 841ms
.
.
activators BashActivator, CShellActivator, FishActivator, PowerShellActivator, PythonActivator


To activate the environment use the following command:
source airflow_env/bin/activate

You will observe that our virtual environment name precedes the username on the terminal, as shown below:
(airflow_env) username@desktop_name:~/airflow_workspace$

It indicates that we have successfully activated the virtual environment.

Next, we will install airflow and some additional libraries using the following command:
(airflow_env) username@desktop_name:~/airflow_workspace$ pip3 install apache-airflow[gcp,sentry,statsd]
OUTPUT:
Collecting apache-airflow[gcp,sentry,statsd]
.
Installing collected packages: (All the downloaded packages)

After successful installation, we will also install some additional libraries like sklearn and pyspark that you might need in the future.
pip3 install pyspark
pip3 install sklearn

Initialization of Airflow Database
Now we will go to the airflow directory and initialize the airflow database using the following commands:

(airflow_env) username@desktop_name:~/airflow_workspace/airflow$ airflow db init
OUTPUT:
Modules imported successfully
Initialization done
You will observe some new files and directories inside the airflow directory, as shown below in the image.

Airflow Directory after the 'airflow db init' command
It is time to create a dags folder. All the future dags will be stored here and accessed by the airflow components.

(airflow_env) username@desktop_name:~/airflow_workspace/airflow$ mkdir dags

Creating a New Airflow User

To create a new user with a username as admin with Admin role, we can run the following code:
(airflow_env) username@desktop_name:~/airflow_workspace/airflow$ airflow users create --username admin --password your_password --firstname your_first_name --lastname your_last_name --role Admin --email your_email@domain.com

airflow users create --username admin --password admin --firstname admin --lastname singh --role Admin --email admin@gmail.com

Run the following command to check if the user was created successfully:
(airflow_env) username@desktop_name:~/airflow_workspace/airflow$ airflow users list
OUTPUT:
id | username | email | first_name | last_name | roles
===+==========+=======+============+===========+======
 1 | admin | your_email@some.com | yout_first_name | your_last_name | Admin

Running of the Airflow Scheduler and Webserver
Now we will start the airflow scheduler using the airflow scheduler command after activating the virtual environment:

airflow scheduler
Open a new terminal, activate the virtual environment, go to the airflow directory, and start the web server.
username@desktop_name:~/airflow_workspace$ source airflow_env/bin/activate
(airflow_env) username@desktop_name:~/airflow_workspace$ cd airflow
(airflow_env) username@desktop_name:~/airflow_workspace/airflow$ airflow webserver
Once the scheduler and webserver get initialized, open any browser and go to http://localhost:8080/.
Port 8080 should be the default port for Airflow, and you see the following page:

Comments

Popular posts from this blog

Step-by-Step Guide: SonarQube Installation on Kubernetes with Helm Chart

Log Parsing and Formatting with Loki and Promtail: A Comprehensive Guide