Batch Job

A batch job is a way to run a workload that is designed to complete a specific task and then terminate. It is often used for running tasks such as data processing, batch computations, or one-time administrative tasks. Batch jobs in Kubernetes ensure that a specified number of parallel tasks are completed successfully.

Use cases

Data Processing

Batch jobs are often used for data processing tasks such as ETL (Extract, Transform, Load), data migration, or data analysis. For example, you can use a batch job to process a large dataset, transform it into a desired format, and load it into a database.

Parallel Processing

Batch jobs can be used for parallel processing of tasks that can be divided into smaller independent units. Kubernetes can distribute these tasks across multiple worker nodes, enabling efficient and scalable parallel execution.

Image or Video Processing

When you have a large number of images or videos that require processing, batch jobs can be used to distribute the workload across multiple containers. For example, you can use a batch job to resize, compress, or transcode a batch of images or videos.

Machine Learning (ML) Training and Inference

Batch jobs are commonly used in ML workflows for training models or running inference on large datasets. Kubernetes can help distribute the workload across multiple nodes, enabling faster training or inference by leveraging the cluster’s resources.

System Maintenance or Updates

Batch jobs can be employed for system maintenance tasks, such as deploying software updates, applying patches, or running system health checks across the cluster. These jobs can be scheduled to run during maintenance windows to minimize disruptions.

Example

Let’s walk through an example to illustrate how batch jobs work in Kubernetes.

1. Create a Job Definition

First, you need to define a batch job in Kubernetes by creating a job definition file. Here’s an example of a job definition file named batch-job.yaml:

apiVersion: batch/v1
kind: Job
metadata:
  name: example-batch-job
spec:
  backOffLimit: 25
  completions: 3
  parallelism: 3
  template:
    spec:
      containers:
      - name: batch-job-container
        image: your-image:tag
        command: ["your-command"]
      restartPolicy: OnFailure

In this example, the job definition specifies the name of the job as example-batch-job. It contains a single container named batch-job-container, which runs an image specified by your-image:tag and executes the command your-command. The restartPolicy is set to OnFailure, which means that if the job fails, it will not be automatically restarted.

2. Create the Batch Job

Once you have the job definition file, you can create the batch job in Kubernetes using the kubectl command:

kubectl create -f batch-job.yaml

This command will create a job object in Kubernetes based on the provided definition.

3. Monitor the Job

You can monitor the status of the batch job using the kubectl command:

kubectl get jobs
kubectl describe job example-batch-job
kubectl logs <pod-name>

The first command (kubectl get jobs) lists all the jobs in the cluster, including the example-batch-job you created. The second command kubectl describe job example-batch-job provides detailed information about the job, including the status and any events related to it. The third command kubectl logs <pod-name> allows you to view the logs of the pod associated with the job, where <pod-name> can be obtained from the previous command.

4. Cleanup:

Once the job has completed, you can delete it using the kubectl delete command:

kubectl delete job example-batch-job

This command will remove the job object from Kubernetes.