Airflow is quite a complex project, and setting up a working environment, but we made it rather simple if you follow the guide.
There are three ways you can run the Airflow dev env:
Before deciding which method to choose, there are a couple factors to consider:
If you do not work with remote development environment, you need those prerequisites.
The below setup describe Ubuntu installation. It might be slightly different on different machines.
$ sudo apt-get update
$ sudo apt-get install \
apt-transport-https \
ca-certificates \
curl \
gnupg-agent \
software-properties-common
$ curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo apt-key add -
$ sudo add-apt-repository \
"deb [arch=amd64] https://download.docker.com/linux/ubuntu \
$(lsb_release -cs) \
stable"
$ sudo apt-get update
$ sudo apt-get install docker-ce docker-ce-cli containerd.io
$ sudo groupadd docker
$ sudo usermod -aG docker $USER
Note : After adding user to docker group Logout and Login again for group membership re-evaluation.
$ docker run hello-world
$ COMPOSE_VERSION="$(curl -s https://api.github.com/repos/docker/compose/releases/latest | grep '"tag_name":'\
| cut -d '"' -f 4)"
$ COMPOSE_URL="https://github.com/docker/compose/releases/download/${COMPOSE_VERSION}/\
docker-compose-$(uname -s)-$(uname -m)"
$ sudo curl -L "${COMPOSE_URL}" -o /usr/local/bin/docker-compose
$ sudo chmod +x /usr/local/bin/docker-compose
$ docker-compose --version
Note: You might have issues with pyenv if you have a Mac with an M1 chip. Consider using virtualenv as an alternative.
$ sudo apt-get install openssl sqlite default-libmysqlclient-dev libmysqlclient-dev postgresql
$ exec $SHELL
$ pyenv --version
$ pyenv install --list
$ pyenv install 3.8.5
$ pyenv versions
airflow-env
for installed version python. In next chapter virtual
environment airflow-env
will be used for installing airflow.
$ pyenv virtualenv 3.8.5 airflow-env
airflow-env
$ pyenv activate airflow-env
Goto https://github.com/apache/airflow/ and fork the project.
Goto your github account's fork of airflow click on Code
you will find the link to your repo.
Follow Cloning a repository to clone the repo locally (you can also do it in your IDE - see the Using your IDE chapter below.
For many of the development tasks you will need Breeze
to be configured. Breeze
is a development
environment which uses docker and docker-compose and it's main purpose is to provide a consistent
and repeatable environment for all the contributors and CI. When using Breeze
you avoid the "works for me"
syndrome - because not only others can reproduce easily what you do, but also the CI of Airflow uses
the same environment to run all tests - so you should be able to easily reproduce the same failures you
see in CI in your local environment.
pipx
- follow the instructions in Install pipx
pipx install -e ./dev/breeze
in your checked-out repository. Make sure to follow any instructions
printed by pipx
during the installation - this is needed to make sure that breeze
command is
available in your PATH.
$ breeze setup autocomplete
$ breeze --python 3.7 --backend postgres
Note
If you encounter an error like "docker.credentials.errors.InitializationError: docker-credential-secretservice not installed or not available in PATH", you may execute the following command to fix it:
$ sudo apt install golang-docker-credential-helper
Once the package is installed, execute the breeze command again to resume image building.
root@e4756f6ac886:/opt/airflow#
. This
means that you are inside the Breeze container and ready to run most of the development tasks. You can leave
the environment with exit
and re-enter it with just breeze
command.
Once you enter breeze environment, create airflow tables and users from the breeze CLI. airflow db reset
is required to execute at least once for Airflow Breeze to get the database/tables created. If you run
tests, however - the test database will be initialized automatically for you.
root@b76fcb399bb6:/opt/airflow# airflow db reset
root@b76fcb399bb6:/opt/airflow# airflow users create --role Admin --username admin --password admin \
--email admin@example.com --firstname foo --lastname bar
exit
to exit the container. The database created before will remain and servers will be
running though, until you stop breeze environment completely.
root@b76fcb399bb6:/opt/airflow#
root@b76fcb399bb6:/opt/airflow# exit
breeze stop
command.
$ breeze stop
breeze start-airflow
starts Breeze environment with last configuration run(
In this case python and backend will be picked up from last execution breeze --python 3.8 --backend postgres
)
It also automatically starts webserver, backend and scheduler. It drops you in tmux with scheduler in bottom left
and webserver in bottom right. Use [Ctrl + B] and Arrow keys
to navigate.
$ breeze start-airflow
Use CI image.
Branch name: main
Docker image: ghcr.io/apache/airflow/main/ci/python3.8:latest
Airflow source version: 2.4.0.dev0
Python version: 3.8
Backend: mysql 5.7
Port forwarding:
Ports are forwarded to the running docker containers for webserver and database
* 12322 -> forwarded to Airflow ssh server -> airflow:22
* 28080 -> forwarded to Airflow webserver -> airflow:8080
* 25555 -> forwarded to Flower dashboard -> airflow:5555
* 25433 -> forwarded to Postgres database -> postgres:5432
* 23306 -> forwarded to MySQL database -> mysql:3306
* 21433 -> forwarded to MSSQL database -> mssql:1443
* 26379 -> forwarded to Redis broker -> redis:6379
Here are links to those services that you can use on host:
* ssh connection for remote debugging: ssh -p 12322 airflow@127.0.0.1 pw: airflow
* Webserver: http://127.0.0.1:28080
* Flower: http://127.0.0.1:25555
* Postgres: jdbc:postgresql://127.0.0.1:25433/airflow?user=postgres&password=airflow
* Mysql: jdbc:mysql://127.0.0.1:23306/airflow?user=root
* MSSQL: jdbc:sqlserver://127.0.0.1:21433;databaseName=airflow;user=sa;password=Airflow123
* Redis: redis://127.0.0.1:26379/0
Alternatively you can start the same using following commands
$ breeze --python 3.8 --backend postgres
$ root@0c6e4ff0ab3d:/opt/airflow# tmux
$ root@0c6e4ff0ab3d:/opt/airflow# airflow scheduler
$ root@0c6e4ff0ab3d:/opt/airflow# airflow webserver
Now you can access airflow web interface on your local machine at http://127.0.0.1:28080 with user name admin
and password admin
.
Setup mysql database in
MySQL Workbench with Host 127.0.0.1
, port 23306
, user root
and password
blank(leave empty), default schema airflow
.
Stopping breeze
root@f3619b74c59a:/opt/airflow# stop_airflow
root@f3619b74c59a:/opt/airflow# exit
$ breeze stop
$ breeze --help
For more information visit : Breeze documentation
Following are some of important topics of Breeze documentation:
Before committing changes to github or raising a pull request, code needs to be checked for certain quality standards such as spell check, code syntax, code formatting, compatibility with Apache License requirements etc. This set of tests are applied when you commit your code.
To avoid burden on CI infrastructure and to save time, Pre-commit hooks can be run locally before committing changes.
$ sudo apt install libxml2-utils
$ pyenv activate airflow-env
$ pip install pre-commit
$ cd ~/Projects/airflow
$ pre-commit run --all-files
No-tabs checker......................................................Passed
Add license for all SQL files........................................Passed
Add license for all other files......................................Passed
Add license for all rst files........................................Passed
Add license for all JS/CSS/PUML files................................Passed
Add license for all JINJA template files.............................Passed
Add license for all shell files......................................Passed
Add license for all python files.....................................Passed
Add license for all XML files........................................Passed
Add license for all yaml files.......................................Passed
Add license for all md files.........................................Passed
Add license for all mermaid files....................................Passed
Add TOC for md files.................................................Passed
Add TOC for upgrade documentation....................................Passed
Check hooks apply to the repository..................................Passed
black................................................................Passed
Check for merge conflicts............................................Passed
Debug Statements (Python)............................................Passed
Check builtin type constructor use...................................Passed
Detect Private Key...................................................Passed
Fix End of Files.....................................................Passed
...........................................................................
$ pre-commit run --files airflow/decorators.py tests/utils/test_task_group.py
$ pre-commit run black --files airflow/decorators.py tests/utils/test_task_group.py
black...............................................................Passed
$ pre-commit run run-flake8 --files airflow/decorators.py tests/utils/test_task_group.py
Run flake8..........................................................Passed
$ cd ~/Projects/airflow
$ pre-commit install
$ git commit -m "Added xyz"
$ cd ~/Projects/airflow
$ pre-commit uninstall
Following are some of the important links of STATIC_CODE_CHECKS.rst
$ sudo apt-get install sqlite libsqlite3-dev default-libmysqlclient-dev postgresql
$ ./scripts/tools/initialize_virtualenv.py
export PATH=${PATH}:"/home/${USER}/Projects/airflow"
source ~/.bashrc
You can usually conveniently run tests in your IDE (see IDE below) using virtualenv but with Breeze you can be sure that all the tests are run in the same environment as tests in CI.
All Tests are inside ./tests directory.
Running Unit tests inside Breeze environment.
Just run pytest filepath+filename
to run the tests.
root@63528318c8b1:/opt/airflow# pytest tests/utils/test_decorators.py
======================================= test session starts =======================================
platform linux -- Python 3.8.6, pytest-6.0.1, py-1.9.0, pluggy-0.13.1 -- /usr/local/bin/python
cachedir: .pytest_cache
rootdir: /opt/airflow, configfile: pytest.ini
plugins: celery-4.4.7, requests-mock-1.8.0, xdist-1.34.0, flaky-3.7.0, rerunfailures-9.0, instafail
-0.4.2, forked-1.3.0, timeouts-1.2.1, cov-2.10.0
setup timeout: 0.0s, execution timeout: 0.0s, teardown timeout: 0.0s
collected 3 items
tests/utils/test_decorators.py::TestApplyDefault::test_apply PASSED [ 33%]
tests/utils/test_decorators.py::TestApplyDefault::test_default_args PASSED [ 66%]
tests/utils/test_decorators.py::TestApplyDefault::test_incorrect_default_args PASSED [100%]
======================================== 3 passed in 1.49s ========================================
$ breeze --backend postgres --postgres-version 10 --python 3.8 --db-reset --test-type All tests
Running specific type of test
$ breeze --backend postgres --postgres-version 10 --python 3.8 --db-reset --test-type Core
Running Integration test for specific test type
$ breeze --backend postgres --postgres-version 10 --python 3.8 --db-reset --test-type All --integration mongo
For more information on Testing visit : TESTING.rst
Following are the some of important topics of TESTING.rst
Following are some of important links of CONTRIBUTING.rst
Go to your GitHub account and open your fork project and click on Branches
Click on New pull request
button on branch from which you want to raise a pull request.
Add title and description as per Contributing guidelines and click on Create pull request
.
Often it takes several days or weeks to discuss and iterate with the PR until it is ready to merge.
In the meantime new commits are merged, and you might run into conflicts, therefore you should periodically
synchronize main in your fork with the apache/airflow
main and rebase your PR on top of it. Following
describes how to do it.
If you are familiar with Python development and use your favourite editors, Airflow can be setup similarly to other projects of yours. However, if you need specific instructions for your IDE you will find more detailed instructions here:
In order to use remote development environment, you usually need a paid account, but you do not have to setup local machine for development.
此处可能存在不合适展示的内容,页面不予展示。您可通过相关编辑功能自查并修改。
如您确认内容无涉及 不当用语 / 纯广告导流 / 暴力 / 低俗色情 / 侵权 / 盗版 / 虚假 / 无价值内容或违法国家有关法律法规的内容,可点击提交进行申诉,我们将尽快为您处理。