Install Aizen remote component - Static deployment
Prerequisites
Below are the software components that should be installed on GPU node(s) before installing Aizen remote components
Kubernetes cluster 1.26+
Kubernetes is a portable, extensible, open source platform for managing containerized workloads and services, that facilitates both declarative configuration and automation
Persistent storage class
Aizen uses presistent volumes (PV) for storing ephemeral cache and transient data. It needs storage class that can dynamically provision volumes. This storage should have high performance, typical of the SSD class. Storage class options are avaiable for public/private cloud and on prem deployments
Object storage
Aizen needs access to a scalable cloud storage for object storage
Nvidia drivers and libraries
Helm 3.2.0+
helm is a tool that automates creation, packaging, configuration, deployment of Kubernetes applications into a single reusable package
kubectl
kubectl is a command line tool that enables communications between Kubernetes API and the control plane. It allows application deployment, cluster resource management, and resource monitoring
Dockerhub credentials to access microservice images
Note
The assumption is this GPU node(s) has Kubernetes installed along with Container Storage Interface(CSI) and gpu operator
Create namespace for remote aizen components
kubectl create ns aizen
Label the gpu nodes
kubectl label node <nodename or ip> aizen.com/gpu.deploy=true
If you have Docker credential information, first create kubernetes secret for accessing Aizen images
kubectl create secret docker-registry aizenrepo-creds
--docker-username=aizencorp
--docker-password=<YOUR DOCKER CREDENTIALS>
-n aizen
Deploy remote components
#Script for Aizen remote deployment
#----------------------------------
NAMESPACE=aizen
HELMCHART_LOCATION=aizenremote-helmcharts-1.0.0
STORAGE_CLASS=
BUCKET_NAME=
CLUSTER_NAME=
CLOUD_ENDPOINT_URL=
CLOUD_ACCESSKEY_ID=
CLOUD_SECRET_KEY=
CLOUD_PROVIDER_REGION=
CLOUD_PROVIDER_TYPE=
#Needed only for cloudian
CLOUD_ENDPOINT_IP=
#IMAGE
IMAGE_REPO=aizencorp
IMAGE_REPO_SECRET=
IMAGE_TAG=1.0.0
#PVC
METASTORAGE_PERSISTENCE_SIZE=200Gi
#You don't need to change anything below this line
helm -n $NAMESPACE install aizenremote $HELMCHART_LOCATION/aizenremote \
--set global.clustername=$CLUSTER_NAME,\
global.s3.endpoint_url=$CLOUD_ENDPOINT_URL,\
global.s3.secrets.values.s3_access_key=$CLOUD_ACCESSKEY_ID,\
global.s3.secrets.values.s3_secret_key=$CLOUD_SECRET_KEY,\
global.customer_bucket_name=$BUCKET_NAME,\
global.storage_class=$STORAGE_CLASS,\
global.cloud_provider_type=$CLOUD_PROVIDER_TYPE,\
global.cloud_provider_region=$CLOUD_PROVIDER_REGION,\
global.image_registry=$IMAGE_REPO,\
global.storage_class=$STORAGE_CLASS,\
global.image_secret=$IMAGE_REPO_SECRET,\
global.image_tag=$IMAGE_TAG,\
metastorage.volume_size=$METASTORAGE_PERSISTENCE_SIZE
Check deployment status of remote components
Check the status of all remote components
kubectl -n aizen get pods
For any reason if any of the remote components are not in Running state please check troubleshooting section