[Airflow] Using the KubernetesPodOperator on Cloud Composer

Cloud Composer is a fully managed version of the open source workflow tool Apache Airflow on Google Cloud Platform (GCP). To run docker container from Cloud Composer, one of the way is to use the KubernetesPodOperator, which can launch Kubernetes pods into Kubernetes.

This post will cover these topics:

Build container images with Google Cloud Build
Create Kubernetes Secrets
Using the KubernetesPodOperator

Build Container Images with Google Cloud Build

Google Cloud Build is a fully managed solution for building containers or other artifacts. It can import source code from Google Cloud Storage, Cloud Source Repositories, GitHub and BitBucket.

Build config file

A build pipeline can be easily set up by a YAML or json file in the same directory that contains the application source code and the Dockerfile. Build steps are defined in this config file. For example:

steps:
  - name: 'gcr.io/cloud-builders/docker'
    id: 'build-image'
    args: ['build', '-t', 'gcr.io/my_project/my_image', '.']
  - name: 'gcr.io/cloud-builders/docker'
    id: 'push-image'
    args: ['push', 'gcr.io/my_project/my_image']

There are two steps defined in the config file. Cloud Build provides some pre-build images (base images), named Cloud Builders. In the example file above, the name field specifies that the pre-build Docker image is used by Cloud Build, and the args field are the args passing to the image for execution.

Build command

Starting the build with:

1	$ gcloud builds submit --config [CONFIG_FILE_PATH] [SOURCE_DIRECTORY]

When the build completes, Cloud Build will upload the built image to the container registry. You can also pass in parameters (called substitutions) to the build and customize some build options such as image tags. For more details, please read the documentation.

Create Kubernetes Secrets

A Secret object contains a small amount of sensitive data such as a password, a token, or a key. To use a secret, a Pod needs to reference the secret.

Creating a Secret Using kubectl

To create a Kubernetes secret that sets tha value of my_secret to test_value, run the following command:

1 2	$ kubectl create secret generic airflow-secret \ --from-literal my_secret=test_value

If you need to mount a secret file, you can create your secret use:

1 2	$ kubectl create secret generic airflow-secret-file \ --from-file my_secret_file=file_path.json

Note that the scrects cannot be access from different namespace.

Define Secret in DAG

To reference the built secret in a DAG as an environment variable, you can use the following code:

from airflow.contrib.kubernetes import secret
my_secret_obj = secret.Secret(
    deploy_type='env',
    deploy_target='MY_SECRET',
    secret='airflow-secret',
    key='my_secret'
)

deploy_type: The exposing type of the Secret.
deploy_target: Name of the environment variable, since deploy_type is env rather than volume.
secret: Name of the built Kubernetes Secret.
key: Key name of a secret stored in this Secret object.

In this example, there is a built Secret object named airflow-secret, one of the key stored in this Secret object is called my_secret. The object will be deployed to kubernetes Pods as an enviornment variable, and the key of the variable is MY_SECRET.

To reference the built secret file:

from airflow.contrib.kubernetes import secret
my_secret_file_obj = secret.Secret(
    deploy_type='volume',
    deploy_target='/path/to/secret/file',
    secret='airflow-secret-file'
)

Note that the completed mount path will be /path/to/secret/file/file_path.json.

Also, you need to define the secret to pass to Pod. The way to pass it will be covered in the next section.

Using the `KubernetesPodOperator`

Let’s take a look into an example first:

from airflow.contrib.operators import kubernetes_pod_operator
my_pod_operator = kubernetes_pod_operator.KubernetesPodOperator(
    task_id='run-container',
    name='container-test',
    namespace='default',
    image='gcr.io/my_project/my_image',
    secrets=[my_secret_obj],
    arguments=['--arg1=test'],
    env_vars={
        'ENV_VAR': '{{ var.value.my_var }}'
    }
)

task_id: ID specified for the task.
name: Name of task you want to run, used to generate Pod ID.
namespace: Namespace to run within Kubernetes, default namespace is default.
image: The docker image to use. In our case, it’s the image built in section Build Container Images with Google Cloud Build.
secrets: Name of the Kubernetes Secret, defined in section Define Secret in DAG. The Pod will fail to create if the secrets you specify in a Secret object do not exist in Kubernetes.
arguments: Arguments to the entrypoint, jinja template is allowed.
env_vars: To access the variables defined in Airflow UI. In this case we are getting the value of my_var from the UI and pass it to the environment variable ENV_VAR.

For more information about each configuration variable, see the Airflow reference.

[Airflow] Using the KubernetesPodOperator on Cloud Composer

[Airflow] Using the KubernetesPodOperator on Cloud Composer

Build Container Images with Google Cloud Build

Build config file

Build command

Create Kubernetes Secrets

Creating a Secret Using kubectl

Define Secret in DAG

Using the KubernetesPodOperator

Reference

Using the `KubernetesPodOperator`