Install infrastructure components

  • Aizen uses open src components as dependencies

    • Apache Spark

      • Spark operator which makes specifying and running spark applications as easy and idiomatic as running other workloads on Kubernetes It uses Kubernetes custom resources for specifying, running and surfacing the status of spark applications

    • Kuberay

      • Kuberay operator simplifies the deployment and management of Ray applications on Kubernetes

    • Apacke kafka

      • A distributed event store and stream processing platform

    • Filecopy server

      • Provides the ability to distribute custom preprocessor files to all nodes in the cluster

    • MLFlow

      • To manage machine learning(ML) lifecycle, including experiments, deployment and a central model registry. The MLFlow tracking lets you log and query experiments using python, Java or REST APIs. MLFlow runs are recorded in mysql database

    • Multi cluster app dispatcher (MCAD)

      • This controller provides an abstraction for wrapping all resources of jobs and treat them holistically by queueing the jobs and applying queueing policies and dispatching the jobs when cluster resources are available

    • Monitoring components

      • Prometheus operator

        • For configuration and management of Prometheus monitoring stack that runs in Kubernetes cluster

      • Elastic search

        • A distributed search and analytics engine designed for handling large volumes of data. It is used for storing, searching and analyzing structured and unstructured data in real time

      • Fluentd

        • A data collector that unify data collection from various data sources and log files

Important

  • Please remember to update STORAGE_CLASS, REPO_CREDS, IMAGE_TAG_TO_USE, INGRESS_HOST, BUCKET_NAME, ENDPOINT_URL, ACCESS_KEY, SECRET_KEY values

  • The default storage size is hardcoded in the deployment for the various components, please change the size for your needs

  • For MLFlow endpoint URL, feel free to use an S3 bucket or a local minio bucket, but remember to create the bucket

  • For Azure AKS cluster, add these additional properties to both infra and core deployments

    global.s3.azure.enabled=true,\
    global.s3.azure.values.storage_account_name=$STORAGE_ACCOUNT_NAME,\
    global.s3.azure.values.storage_access_key=$CLOUD_ACCESSKEY_ID,\
    global.s3.azure.values.storage_connection_string=$CLOUD_SECRET_KEY,\
    infra.hashicorp-vault.vault.server.standalone.config='ui = true listener "tcp" {  address = "[::]:8200"  cluster_address = "[::]:8201"  tls_disable = 1} storage "'"$CLOUD_PROVIDER_TYPE"'" { accountName = "'"$STORAGE_ACCOUNT_NAME"'"  accountKey = "'"$CLOUD_ACCESSKEY_ID"'" container = "'"$BUCKET_NAME"'" }',\
    global.mlflow.artifact.secrets.values.mlflow_endpoint_url=https://<your storage account name>.blob.core.windows.net,\
    global.mlflow.artifact.secrets.values.mlflow_artifacts_destination=wasbs://<storage containername>@<storage account name>.blob.core.windows.net/<destination folder>
    
  • For GCP cluster, if istio-injection is enabled. first disable istio-injection, install aizen-infra components and then enable istio-injection

    #Disable istio-injection
    kubectl label ns aizen-infra istio-injection-
    
    #Enable istio-injection
    kubectl label ns aizen-infra istio-injection=enable
    
    • Check if gateway namespace have istio-injection enabled

    kubectl get ns -L istio-injection
    
  • Create namespace for Aizen infrastructure components

kubectl create ns aizen-infra

If you have Docker credential information, first create kubernetes secret for accessing Aizen images

kubectl create secret docker-registry aizenrepo-creds
--docker-username=aizencorp
--docker-password=<YOUR DOCKER CREDENTIALS>
-n aizen-infra

Deploy infra components

NAMESPACE=aizen-infra
HELMCHART_LOCATION=aizen-helmcharts-1.0.0
SECURE_HTTPS=false
INGRESS_ENABLED=false
GATEWAY_ENABLED=true

STORAGE_CLASS=
INGRESS_HOST=
BUCKET_NAME=
CLUSTER_NAME=

CLOUD_ENDPOINT_URL=
CLOUD_ACCESSKEY_ID=
CLOUD_SECRET_KEY=
CLOUD_PROVIDER_REGION=
CLOUD_PROVIDER_TYPE=
AUTH_TYPE=ldap

#Needed for Azure
STORAGE_ACCOUNT_NAME=

#Needed only for cloudian
CLOUD_ENDPOINT_IP=

#IMAGE
IMAGE_REPO=aizencorp
IMAGE_REPO_SECRET=
IMAGE_TAG=1.0.0

#MLFLOW
MLFLOW_ACCESSKEY_ID=
MLFLOW_SECRET_KEY=
MLFLOW_ENDPOINT_URL=
MLFLOW_ARTIFACT_DESTINATION=s3://
MLFLOW_ARTIFACT_REGION=

if [[ "$AUTH_TYPE" = "ldap" ]]; then
   LDAP_SERVER_HOST="ldap://aizen-openldap-service.aizen-infra.svc.cluster.local:1389"
   LDAP_BIND_DN="uid={username}\,ou=users\,dc=aizencorp\,dc=local\,dc=com|uid={username}\,ou=people\,dc=aizencorp\,dc=local\,dc=com"
   LDAP_USER_DN="ou=users\,dc=aizencorp\,dc=local\,dc=com"
   LDAP_ADMIN_DN="cn=admin\,dc=aizencorp\,dc=local\,dc=com"
   LDAP_ADMIN_DNPWD="admin"
   LDAP_GROUP_DN="ou=groups\,dc=aizencorp\,dc=local\,dc=com"
   LDAP_ALLOWED_GROUPS="cn=dbgrp\,ou=groups\,dc=aizencorp\,dc=local\,dc=com"
   LDAP_SEARCH_FILTER="(uid={username})"
   AIZEN_ADMIN_USER="aizenadmin"
elif [[ "$AUTH_TYPE" = "oauth" ]]; then
   AUTH0_DOMAIN=
   AUTH0_AUDIENCE=
   AUTH0_CLIENT_ID="test"
   AUTH0_CLIENT_SECRET="test"
   JWT_SECRET=$AUTH0_CLIENT_SECRET
   AIZEN_ADMIN_USER="aizenadmin"
fi

#PVC
MLFLOW_MYSQL_PERSISTENCE_SIZE=200Gi
PROMETHEUS_PERSISTENCE_SIZE=55Gi
GRAFANA_PERSISTENCE_SIZE=20Gi
ELASTIC_SEARCH_LOG_SIZE=55Gi
VECTORDB_PERSISTENCE_SIZE=25Gi
VAULT_PERSISTENCE_SIZE=25Gi

#You don't need to change anything below this line

kubectl get ns ${NAMESPACE} >/dev/null 2>&1 || kubectl create ns ${NAMESPACE}

VAULT_PATH=$CLUSTER_NAME"-vault"
helm -n $NAMESPACE install aizen-infra $HELMCHART_LOCATION/aizen --create-namespace \
--set infra.enabled=true,\
infra.kafka.kafka.global.storageClass=$STORAGE_CLASS,\
infra.prometheus-operator.kube-prometheus-stack.prometheus.prometheusSpec.storageSpec.volumeClaimTemplate.spec.storageClassName=$STORAGE_CLASS,\
infra.prometheus-operator.kube-prometheus-stack.alertmanager.alertmanagerSpec.storage.volumeClaimTemplate.spec.storageClassName=$STORAGE_CLASS,\
infra.prometheus-operator.kube-prometheus-stack.prometheus.prometheusSpec.storageSpec.volumeClaimTemplate.spec.resources.requests.storage=$PROMETHEUS_PERSISTENCE_SIZE,\
infra.prometheus-operator.kube-prometheus-stack.grafana.persistence.size=$GRAFANA_PERSISTENCE_SIZE,\
infra.mlflow.mysql.primary.persistence.storageClass=$STORAGE_CLASS,\
infra.mlflow.mysql.primary.persistence.size=$MLFLOW_MYSQL_PERSISTENCE_SIZE,\
global.mlflow.artifact.region=$MLFLOW_ARTIFACT_REGION,\
global.mlflow.artifact.secrets.values.mlflow_access_key_id=$MLFLOW_ACCESSKEY_ID,\
global.mlflow.artifact.secrets.values.mlflow_access_secret_key=$MLFLOW_SECRET_KEY,\
global.mlflow.artifact.secrets.values.mlflow_endpoint_url=$MLFLOW_ENDPOINT_URL,\
global.mlflow.artifact.secrets.values.mlflow_artifacts_destination=$MLFLOW_ARTIFACT_DESTINATION,\
global.ingress.enabled=$INGRESS_ENABLED,\
global.gateway.enabled=$GATEWAY_ENABLED,\
global.image_registry=$IMAGE_REPO,\
global.storage_class=$STORAGE_CLASS,\
global.image_secret=$IMAGE_REPO_SECRET,\
global.image_tag=$IMAGE_TAG,\
global.ingress.host=$INGRESS_HOST,\
global.log.volume_size=$ELASTIC_SEARCH_LOG_SIZE,\
global.clustername=$CLUSTER_NAME,\
global.s3.endpoint_url=$CLOUD_ENDPOINT_URL,\
global.s3.endpoint_ip=$CLOUD_ENDPOINT_IP,\
global.s3.secrets.values.s3_access_key=$CLOUD_ACCESSKEY_ID,\
global.s3.secrets.values.s3_secret_key=$CLOUD_SECRET_KEY,\
global.customer_bucket_name=$BUCKET_NAME,\
global.secure_https=$SECURE_HTTPS,\
global.vault.ldap.server_host=$LDAP_SERVER_HOST,\
global.vault.ldap.userdn=$LDAP_USER_DN,\
global.vault.ldap.binddn=$LDAP_USER_DN,\
global.vault.ldap.groupdn=$LDAP_GROUP_DN,\
global.vault.ldap.admin_user=$AIZEN_ADMIN_USER,\
global.vault.auth0.domain=$AUTH0_DOMAIN,\
global.vault.auth0.audience=$AUTH0_AUDIENCE,\
global.vault.auth0.secrets.auth0_client_id=$AUTH0_CLIENT_ID,\
global.vault.auth0.secrets.auth0_client_secret=$AUTH0_CLIENT_SECRET,\
infra.vectordb.vectordb.primary.persistence.size=$VECTORDB_PERSISTENCE_SIZE,\
infra.hashicorp-vault.vault.injector.enabled=false,\
infra.hashicorp-vault.vault.server.enabled=true,\
infra.hashicorp-vault.vault.server.standalone.enabled=true,\
infra.hashicorp-vault.vault.server.dataStorage.enabled=true,\
infra.hashicorp-vault.vault.server.dataStorage.storageClass=$STORAGE_CLASS,\
infra.hashicorp-vault.vault.server.dataStorage.size=$VAULT_PERSISTENCE_SIZE,\
infra.hashicorp-vault.vault.server.standalone.config='ui = true listener "tcp" {  address = "[::]:8200"  cluster_address = "[::]:8201"  tls_disable = 1} storage "s3" { bucket = "'"$BUCKET_NAME"'"  access_key = "'"$CLOUD_ACCESSKEY_ID"'"  secret_key = "'"$CLOUD_SECRET_KEY"'"  endpoint = "'"$CLOUD_ENDPOINT_URL"'"  region = "'"$CLOUD_PROVIDER_REGION"'" s3_force_path_style = true  path = "'"$VAULT_PATH"'"}',\

For cloudian infra deployments please include additional properties as shown here

infra.hashicorp-vault.vault.server.hostAliases[0].ip="$CLOUD_ENDPOINT_IP",\
infra.hashicorp-vault.vault.server.hostAliases[0].hostnames[0]="< specify cloud endpoint url without http >"

To install Kubecost refer Install Kubecost (Optional)

For OpenLDAP, please follow instructions to create openldap users Install OpenLDAP

To install Istio, Kserve, Knative refer Install Istio section (Optional)

Infrastructure component deployment status

  • Check the status of all infrastucture components

kubectl -n aizen-infra get pods