Skip to main content

Deploy and use a Ray Cluster

The key steps are summarized down here, for more information refer to the official Ray's documentation.

1. Deploy a KubeRay Operator

helm repo add kuberay https://ray-project.github.io/kuberay-helm/
helm repo update
helm upgrade kuberay-operator kuberay/kuberay-operator

2. Deploy a RayCluster custom resource

helm install raycluster kuberay/ray-cluster --set worker.replicas=10 --set worker.maxReplicas=100
note

The values for the number of worker replicas and the maximum number of replicas are set to 10 and 100, respectively.

3. Run code from a Pod

First, let's spawn a simple pod with Ray already installed and get a bash inside it (the deployement of the pod ressource is not instantaneous so wait for the pod to be running before attaching a bash). Check out the supplementary materials to replicate this example.

kubectl apply -f simple-ray-pod.yaml
kubectl exec --stdin --tty python-ray -- /bin/bash

Then, in the pod's bash start the job. Here the script is already present in the image used for the pod.

python demo.py
Supplementary materials

For this example, the following image is be used (see on Dockerhub):

FROM python:3.9.19

ENV PYTHONUNBUFFERED=1
ENV PYTHONDONTWRITEBYTECODE=1

# Get build dependencies
RUN apt-get update \
&& apt-get install -y \
build-essential \
software-properties-common \
ca-certificates \
vim

RUN pip install ray[default]==2.34.0

COPY demo.py ./demo.py
note

The version of Python and Ray must match that of the RayCluster deployement (cf. Ray's helm chart).

The content of the demo.py script is explicited here below:

# demo.py

import ray
import time

ray.init(address="ray://raycluster-kuberay-head-svc.default.svc.cluster.local:10001")

print(f"Available ressources: {ray.cluster_resources()}")

@ray.remote
def dummy(i):
time.sleep(1)
return i

start_time = time.perf_counter()

futures = [dummy.remote(i) for i in range(11*10)]
print(f"Results: {ray.get(futures)}")

end_time = time.perf_counter()
run_time = end_time - start_time
print(f"Execution time = {run_time:.2f} seconds.")

Here is the configuration file of the simple Ray pod that was used to spawn the pod:

# simple-ray-pod.yaml

apiVersion: v1
kind: Pod
metadata:
name: python-ray
spec:
containers:
- name: python-ray
image: aelskens/python-ray:3.9.19-2.34.0
command: ['sleep', '3600']

4. Cleanup

helm uninstall raycluster
helm uninstall kuberay-operator