Install infrastructure components

  • Aizen uses open src components as dependencies

    • Apache Spark

      • Spark operator which makes specifying and running spark applications as easy and idiomatic as running other workloads on Kubernetes It uses Kubernetes custom resources for specifying, running and surfacing the status of spark applications

    • Kuberay

      • Kuberay operator simplifies the deployment and management of Ray applications on Kubernetes

    • Apacke kafka

      • A distributed event store and stream processing platform

    • Filecopy server

      • Provides the ability to distribute custom preprocessor files to all nodes in the cluster

    • MLFlow

      • To manage machine learning(ML) lifecycle, including experiments, deployment and a central model registry. The MLFlow tracking lets you log and query experiments using python, Java or REST APIs. MLFlow runs are recorded in mysql database

    • Multi cluster app dispatcher (MCAD)

      • This controller provides an abstraction for wrapping all resources of jobs and treat them holistically by queueing the jobs and applying queueing policies and dispatching the jobs when cluster resources are available

    • Monitoring components

      • Prometheus operator

        • For configuration and management of Prometheus monitoring stack that runs in Kubernetes cluster

      • Elastic search

        • A distributed search and analytics engine designed for handling large volumes of data. It is used for storing, searching and analyzing structured and unstructured data in real time

      • Fluentd

        • A data collector that unify data collection from various data sources and log files

Important

  • Please remember to update STORAGE_CLASS, REPO_CREDS, IMAGE_TAG_TO_USE, INGRESS_HOST, BUCKET_NAME, ENDPOINT_URL, ACCESS_KEY, SECRET_KEY values

  • The default storage size is hardcoded in the deployment for the various components, please change the size for your needs

  • For MLFlow endpoint URL, feel free to use an S3 bucket or a local minio bucket, but remember to create the bucket

  • For Azure AKS cluster, add these additional properties to both infra and core deployments

    global.s3.azure.enabled=true,\
    global.s3.azure.values.storage_account_name=$STORAGE_ACCOUNT_NAME,\
    global.s3.azure.values.storage_access_key=$CLOUD_ACCESSKEY_ID,\
    global.s3.azure.values.storage_connection_string=$CLOUD_SECRET_KEY,\
    infra.hashicorp-vault.vault.server.standalone.config='ui = true listener "tcp" {  address = "[::]:8200"  cluster_address = "[::]:8201"  tls_disable = 1} storage "'"$CLOUD_PROVIDER_TYPE"'" { accountName = "'"$STORAGE_ACCOUNT_NAME"'"  accountKey = "'"$CLOUD_ACCESSKEY_ID"'" container = "'"$BUCKET_NAME"'" }',\
    global.mlflow.artifact.secrets.values.mlflow_endpoint_url=https://<your storage account name>.blob.core.windows.net,\
    global.mlflow.artifact.secrets.values.mlflow_artifacts_destination=wasbs://<storage containername>@<storage account name>.blob.core.windows.net/<destination folder>
    
  • For GCP cluster, if istio-injection is enabled. first disable istio-injection, install aizen-infra components and then enable istio-injection

    #Disable istio-injection
    kubectl label ns aizen-infra istio-injection-
    
    #Enable istio-injection
    kubectl label ns aizen-infra istio-injection=enable
    
    • Check if gateway namespace have istio-injection enabled

    kubectl get ns -L istio-injection
    
  • Create namespace for Aizen infrastructure components

kubectl create ns aizen-infra

If you have Docker credential information, first create kubernetes secret for accessing Aizen images

kubectl create secret docker-registry aizenrepo-creds
--docker-username=aizencorp
--docker-password=<YOUR DOCKER CREDENTIALS>
-n aizen-infra

Deploy infra components

NAMESPACE=aizen-infra
HELMCHART_LOCATION=aizen-helmcharts-1.0.0
SECURE_HTTPS=false
INGRESS_ENABLED=false
GATEWAY_ENABLED=true
GATEWAY_CLASSNAME=istio

#Used for secure dns communication
AIZEN_EXTERNAL_BASE_URL=

STORAGE_CLASS=
INGRESS_HOST=
BUCKET_NAME=
CLUSTER_NAME=

CLOUD_ENDPOINT_URL=
CLOUD_ACCESSKEY_ID=
CLOUD_SECRET_KEY=
CLOUD_PROVIDER_REGION=
CLOUD_PROVIDER_TYPE=
AUTH_TYPE=ldap

#Needed for Azure
STORAGE_ACCOUNT_NAME=

#Needed only for cloudian
CLOUD_ENDPOINT_IP=

#IMAGE
IMAGE_REPO=aizencorp
IMAGE_REPO_SECRET=
IMAGE_TAG=1.0.0

#MLFLOW
MLFLOW_ACCESSKEY_ID=$CLOUD_ACCESSKEY_ID
MLFLOW_SECRET_KEY=$CLOUD_SECRET_KEY
MLFLOW_ENDPOINT_URL=$CLOUD_ENDPOINT_URL
MLFLOW_ARTIFACT_DESTINATION=s3://$BUCKET_NAME/mlflow-artifacts
MLFLOW_ARTIFACT_REGION=$CLOUD_PROVIDER_REGION

if [[ "$GATEWAY_CLASSNAME" = "istio" ]]; then
  GATEWAY_CLASSNAME=istio
  GATEWAY_NAMESPACE=istio-system
  GATEWAY_NAME=istio-gateway
else
  GATEWAY_CLASSNAME=nginx
  GATEWAY_NAMESPACE=nginx-gateway
  GATEWAY_NAME=aizen-nginx-gateway
fi

if [[ -n "$AIZEN_EXTERNAL_BASE_URL" ]]; then
   GATEWAY_HOST="${AIZEN_EXTERNAL_BASE_URL#*//}"
fi

if [[ "$AUTH_TYPE" = "ldap" ]]; then
   LDAP_SERVER_HOST="ldap://aizen-openldap-service.aizen-infra.svc.cluster.local:1389"
   LDAP_BIND_DN="uid={username}\,ou=users\,dc=aizencorp\,dc=local\,dc=com|uid={username}\,ou=people\,dc=aizencorp\,dc=local\,dc=com"
   LDAP_USER_DN="ou=users\,dc=aizencorp\,dc=local\,dc=com"
   LDAP_ADMIN_DN="cn=admin\,dc=aizencorp\,dc=local\,dc=com"
   LDAP_ADMIN_DNPWD="admin"
   LDAP_GROUP_DN="ou=groups\,dc=aizencorp\,dc=local\,dc=com"
   LDAP_ALLOWED_GROUPS="cn=dbgrp\,ou=groups\,dc=aizencorp\,dc=local\,dc=com"
   LDAP_SEARCH_FILTER="(uid={username})"
   AIZEN_ADMIN_USER="aizenadmin"
elif [[ "$AUTH_TYPE" = "oauth" ]]; then
   AUTH0_DOMAIN=
   AUTH0_AUDIENCE=
   AUTH0_CLIENT_ID="test"
   AUTH0_CLIENT_SECRET="test"
   JWT_SECRET=$AUTH0_CLIENT_SECRET
   AIZEN_ADMIN_USER="aizenadmin"
fi

#PVC
MLFLOW_MYSQL_PERSISTENCE_SIZE=200Gi
PROMETHEUS_PERSISTENCE_SIZE=55Gi
GRAFANA_PERSISTENCE_SIZE=20Gi
ELASTIC_SEARCH_LOG_SIZE=55Gi
VECTORDB_PERSISTENCE_SIZE=25Gi
VAULT_PERSISTENCE_SIZE=25Gi

#You don't need to change anything below this line

kubectl get ns ${NAMESPACE} >/dev/null 2>&1 || kubectl create ns ${NAMESPACE}

VAULT_PATH=$CLUSTER_NAME"-vault"
helm -n $NAMESPACE install aizen-infra $HELMCHART_LOCATION/aizen --create-namespace \
--set infra.enabled=true,\
infra.kafka.kafka.global.storageClass=$STORAGE_CLASS,\
infra.prometheus-operator.kube-prometheus-stack.prometheus.prometheusSpec.storageSpec.volumeClaimTemplate.spec.storageClassName=$STORAGE_CLASS,\
infra.prometheus-operator.kube-prometheus-stack.alertmanager.alertmanagerSpec.storage.volumeClaimTemplate.spec.storageClassName=$STORAGE_CLASS,\
infra.prometheus-operator.kube-prometheus-stack.prometheus.prometheusSpec.storageSpec.volumeClaimTemplate.spec.resources.requests.storage=$PROMETHEUS_PERSISTENCE_SIZE,\
infra.prometheus-operator.kube-prometheus-stack.grafana.persistence.size=$GRAFANA_PERSISTENCE_SIZE,\
infra.mlflow.mysql.primary.persistence.storageClass=$STORAGE_CLASS,\
infra.mlflow.mysql.primary.persistence.size=$MLFLOW_MYSQL_PERSISTENCE_SIZE,\
global.mlflow.artifact.region=$MLFLOW_ARTIFACT_REGION,\
global.mlflow.artifact.secrets.values.mlflow_access_key_id=$MLFLOW_ACCESSKEY_ID,\
global.mlflow.artifact.secrets.values.mlflow_access_secret_key=$MLFLOW_SECRET_KEY,\
global.mlflow.artifact.secrets.values.mlflow_endpoint_url=$MLFLOW_ENDPOINT_URL,\
global.mlflow.artifact.secrets.values.mlflow_artifacts_destination=$MLFLOW_ARTIFACT_DESTINATION,\
global.ingress.enabled=$INGRESS_ENABLED,\
global.gateway.enabled=$GATEWAY_ENABLED,\
global.gateway.classname=$GATEWAY_CLASSNAME,\
global.gateway.name=$GATEWAY_NAME,\
global.gateway.namespace=$GATEWAY_NAMESPACE,\
global.gateway.host=$GATEWAY_HOST,\
global.image_registry=$IMAGE_REPO,\
global.storage_class=$STORAGE_CLASS,\
global.image_secret=$IMAGE_REPO_SECRET,\
global.image_tag=$IMAGE_TAG,\
global.ingress.host=$INGRESS_HOST,\
global.log.volume_size=$ELASTIC_SEARCH_LOG_SIZE,\
global.clustername=$CLUSTER_NAME,\
global.s3.endpoint_url=$CLOUD_ENDPOINT_URL,\
global.s3.endpoint_ip=$CLOUD_ENDPOINT_IP,\
global.s3.secrets.values.s3_access_key=$CLOUD_ACCESSKEY_ID,\
global.s3.secrets.values.s3_secret_key=$CLOUD_SECRET_KEY,\
global.customer_bucket_name=$BUCKET_NAME,\
global.secure_https=$SECURE_HTTPS,\
global.vault.ldap.server_host=$LDAP_SERVER_HOST,\
global.vault.ldap.userdn=$LDAP_USER_DN,\
global.vault.ldap.binddn=$LDAP_USER_DN,\
global.vault.ldap.groupdn=$LDAP_GROUP_DN,\
global.vault.ldap.admin_user=$AIZEN_ADMIN_USER,\
global.vault.auth0.domain=$AUTH0_DOMAIN,\
global.vault.auth0.audience=$AUTH0_AUDIENCE,\
global.vault.auth0.secrets.auth0_client_id=$AUTH0_CLIENT_ID,\
global.vault.auth0.secrets.auth0_client_secret=$AUTH0_CLIENT_SECRET,\
infra.vectordb.vectordb.primary.persistence.size=$VECTORDB_PERSISTENCE_SIZE,\
infra.hashicorp-vault.vault.injector.enabled=false,\
infra.hashicorp-vault.vault.server.enabled=true,\
infra.hashicorp-vault.vault.server.standalone.enabled=true,\
infra.hashicorp-vault.vault.server.dataStorage.enabled=true,\
infra.hashicorp-vault.vault.server.dataStorage.storageClass=$STORAGE_CLASS,\
infra.hashicorp-vault.vault.server.dataStorage.size=$VAULT_PERSISTENCE_SIZE,\
infra.hashicorp-vault.vault.server.standalone.config='ui = true listener "tcp" {  address = "[::]:8200"  cluster_address = "[::]:8201"  tls_disable = 1} storage "s3" { bucket = "'"$BUCKET_NAME"'"  access_key = "'"$CLOUD_ACCESSKEY_ID"'"  secret_key = "'"$CLOUD_SECRET_KEY"'"  endpoint = "'"$CLOUD_ENDPOINT_URL"'"  region = "'"$CLOUD_PROVIDER_REGION"'" s3_force_path_style = true  path = "'"$VAULT_PATH"'"}',\

For cloudian infra deployments please include additional properties as shown here

infra.hashicorp-vault.vault.server.hostAliases[0].ip="$CLOUD_ENDPOINT_IP",\
infra.hashicorp-vault.vault.server.hostAliases[0].hostnames[0]="< specify cloud endpoint url without http >"

To install Kubecost refer Install Kubecost (Optional)

To install Istio, Kserve, Knative refer Install Istio section (Optional)

Infrastructure component deployment status

  • Check the status of all infrastucture components

kubectl -n aizen-infra get pods