Install infrastructure components
Aizen uses open src components as dependencies
Apache Spark
Spark operator which makes specifying and running spark applications as easy and idiomatic as running other workloads on Kubernetes It uses Kubernetes custom resources for specifying, running and surfacing the status of spark applications
Kuberay
Kuberay operator simplifies the deployment and management of Ray applications on Kubernetes
Apacke kafka
A distributed event store and stream processing platform
Filecopy server
Provides the ability to distribute custom preprocessor files to all nodes in the cluster
MLFlow
To manage machine learning(ML) lifecycle, including experiments, deployment and a central model registry. The MLFlow tracking lets you log and query experiments using python, Java or REST APIs. MLFlow runs are recorded in mysql database
Multi cluster app dispatcher (MCAD)
This controller provides an abstraction for wrapping all resources of jobs and treat them holistically by queueing the jobs and applying queueing policies and dispatching the jobs when cluster resources are available
Monitoring components
Prometheus operator
For configuration and management of Prometheus monitoring stack that runs in Kubernetes cluster
Elastic search
A distributed search and analytics engine designed for handling large volumes of data. It is used for storing, searching and analyzing structured and unstructured data in real time
Fluentd
A data collector that unify data collection from various data sources and log files
Important
Please remember to update STORAGE_CLASS, REPO_CREDS, IMAGE_TAG_TO_USE, INGRESS_HOST, BUCKET_NAME, ENDPOINT_URL, ACCESS_KEY, SECRET_KEY values
The default storage size is hardcoded in the deployment for the various components, please change the size for your needs
For MLFlow endpoint URL, feel free to use an S3 bucket or a local minio bucket, but remember to create the bucket
If using local minio as object store remember to setup minio (Optional components section) first and create buckets
For Azure AKS cluster, add these additional properties to both infra and core deployments
global.s3.azure.enabled=true,\ global.s3.azure.values.storage_account_name=$STORAGE_ACCOUNT_NAME,\ global.s3.azure.values.storage_access_key=$CLOUD_ACCESSKEY_ID,\ global.s3.azure.values.storage_connection_string=$CLOUD_SECRET_KEY,\ infra.hashicorp-vault.vault.server.standalone.config='ui = true listener "tcp" { address = "[::]:8200" cluster_address = "[::]:8201" tls_disable = 1} storage "'"$CLOUD_PROVIDER_TYPE"'" { accountName = "'"$STORAGE_ACCOUNT_NAME"'" accountKey = "'"$CLOUD_ACCESSKEY_ID"'" container = "'"$BUCKET_NAME"'" }',\ global.mlflow.artifact.secrets.values.mlflow_endpoint_url=https://<your storage account name>.blob.core.windows.net,\ global.mlflow.artifact.secrets.values.mlflow_artifacts_destination=wasbs://<storage containername>@<storage account name>.blob.core.windows.net/<destination folder>For GCP cluster, if istio-injection is enabled. first disable istio-injection, install aizen-infra components and then enable istio-injection
#Disable istio-injection kubectl label ns aizen-infra istio-injection- #Enable istio-injection kubectl label ns aizen-infra istio-injection=enable
Check if gateway namespace have istio-injection enabled
kubectl get ns -L istio-injection
After core components are installed create gateway and virtual service for gui, mlflow and prediction Aizen gateway and virtual service
Create namespace for Aizen infrastructure components
kubectl create ns aizen-infra
If you have Docker credential information, first create kubernetes secret for accessing Aizen images
kubectl create secret docker-registry aizenrepo-creds
--docker-username=aizencorp
--docker-password=<YOUR DOCKER CREDENTIALS>
-n aizen-infra
Deploy infra components
NAMESPACE=aizen-infra
HELMCHART_LOCATION=aizen-helmcharts-1.0.0
SECURE_HTTPS=false
INGRESS_ENABLED=false
GATEWAY_ENABLED=true
STORAGE_CLASS=
INGRESS_HOST=
BUCKET_NAME=
CLUSTER_NAME=
CLOUD_ENDPOINT_URL=
CLOUD_ACCESSKEY_ID=
CLOUD_SECRET_KEY=
CLOUD_PROVIDER_REGION=
CLOUD_PROVIDER_TYPE=
AUTH_TYPE=ldap
#Needed for Azure
STORAGE_ACCOUNT_NAME=
#Needed only for cloudian
CLOUD_ENDPOINT_IP=
#IMAGE
IMAGE_REPO=aizencorp
IMAGE_REPO_SECRET=
IMAGE_TAG=1.0.0
#MLFLOW
MLFLOW_ACCESSKEY_ID=
MLFLOW_SECRET_KEY=
MLFLOW_ENDPOINT_URL=
MLFLOW_ARTIFACT_DESTINATION=s3://
MLFLOW_ARTIFACT_REGION=
if [[ "$AUTH_TYPE" = "ldap" ]]; then
LDAP_SERVER_HOST="ldap://aizen-openldap-service.aizen-infra.svc.cluster.local:1389"
LDAP_BIND_DN="uid={username}\,ou=users\,dc=aizencorp\,dc=local\,dc=com|uid={username}\,ou=people\,dc=aizencorp\,dc=local\,dc=com"
LDAP_USER_DN="ou=users\,dc=aizencorp\,dc=local\,dc=com"
LDAP_ADMIN_DN="cn=admin\,dc=aizencorp\,dc=local\,dc=com"
LDAP_ADMIN_DNPWD="admin"
LDAP_GROUP_DN="ou=groups\,dc=aizencorp\,dc=local\,dc=com"
LDAP_ALLOWED_GROUPS="cn=dbgrp\,ou=groups\,dc=aizencorp\,dc=local\,dc=com"
LDAP_SEARCH_FILTER="(uid={username})"
AIZEN_ADMIN_USER="aizenadmin"
elif [[ "$AUTH_TYPE" = "oauth" ]]; then
AUTH0_DOMAIN=
AUTH0_AUDIENCE=
AUTH0_CLIENT_ID="test"
AUTH0_CLIENT_SECRET="test"
JWT_SECRET=$AUTH0_CLIENT_SECRET
AIZEN_ADMIN_USER="aizenadmin"
fi
#PVC
MLFLOW_MYSQL_PERSISTENCE_SIZE=200Gi
PROMETHEUS_PERSISTENCE_SIZE=55Gi
GRAFANA_PERSISTENCE_SIZE=20Gi
ELASTIC_SEARCH_LOG_SIZE=55Gi
VECTORDB_PERSISTENCE_SIZE=25Gi
VAULT_PERSISTENCE_SIZE=25Gi
#You don't need to change anything below this line
kubectl get ns ${NAMESPACE} >/dev/null 2>&1 || kubectl create ns ${NAMESPACE}
VAULT_PATH=$CLUSTER_NAME"-vault"
helm -n $NAMESPACE install aizen-infra $HELMCHART_LOCATION/aizen --create-namespace \
--set infra.enabled=true,\
infra.kafka.kafka.global.storageClass=$STORAGE_CLASS,\
infra.prometheus-operator.kube-prometheus-stack.prometheus.prometheusSpec.storageSpec.volumeClaimTemplate.spec.storageClassName=$STORAGE_CLASS,\
infra.prometheus-operator.kube-prometheus-stack.alertmanager.alertmanagerSpec.storage.volumeClaimTemplate.spec.storageClassName=$STORAGE_CLASS,\
infra.prometheus-operator.kube-prometheus-stack.prometheus.prometheusSpec.storageSpec.volumeClaimTemplate.spec.resources.requests.storage=$PROMETHEUS_PERSISTENCE_SIZE,\
infra.prometheus-operator.kube-prometheus-stack.grafana.persistence.size=$GRAFANA_PERSISTENCE_SIZE,\
infra.mlflow.mysql.primary.persistence.storageClass=$STORAGE_CLASS,\
infra.mlflow.mysql.primary.persistence.size=$MLFLOW_MYSQL_PERSISTENCE_SIZE,\
global.mlflow.artifact.region=$MLFLOW_ARTIFACT_REGION,\
global.mlflow.artifact.secrets.values.mlflow_access_key_id=$MLFLOW_ACCESSKEY_ID,\
global.mlflow.artifact.secrets.values.mlflow_access_secret_key=$MLFLOW_SECRET_KEY,\
global.mlflow.artifact.secrets.values.mlflow_endpoint_url=$MLFLOW_ENDPOINT_URL,\
global.mlflow.artifact.secrets.values.mlflow_artifacts_destination=$MLFLOW_ARTIFACT_DESTINATION,\
global.ingress.enabled=$INGRESS_ENABLED,\
global.gateway.enabled=$GATEWAY_ENABLED,\
global.image_registry=$IMAGE_REPO,\
global.storage_class=$STORAGE_CLASS,\
global.image_secret=$IMAGE_REPO_SECRET,\
global.image_tag=$IMAGE_TAG,\
global.ingress.host=$INGRESS_HOST,\
global.log.volume_size=$ELASTIC_SEARCH_LOG_SIZE,\
global.clustername=$CLUSTER_NAME,\
global.s3.endpoint_url=$CLOUD_ENDPOINT_URL,\
global.s3.endpoint_ip=$CLOUD_ENDPOINT_IP,\
global.s3.secrets.values.s3_access_key=$CLOUD_ACCESSKEY_ID,\
global.s3.secrets.values.s3_secret_key=$CLOUD_SECRET_KEY,\
global.customer_bucket_name=$BUCKET_NAME,\
global.secure_https=$SECURE_HTTPS,\
global.vault.ldap.server_host=$LDAP_SERVER_HOST,\
global.vault.ldap.userdn=$LDAP_USER_DN,\
global.vault.ldap.binddn=$LDAP_USER_DN,\
global.vault.ldap.groupdn=$LDAP_GROUP_DN,\
global.vault.ldap.admin_user=$AIZEN_ADMIN_USER,\
global.vault.auth0.domain=$AUTH0_DOMAIN,\
global.vault.auth0.audience=$AUTH0_AUDIENCE,\
global.vault.auth0.secrets.auth0_client_id=$AUTH0_CLIENT_ID,\
global.vault.auth0.secrets.auth0_client_secret=$AUTH0_CLIENT_SECRET,\
infra.vectordb.vectordb.primary.persistence.size=$VECTORDB_PERSISTENCE_SIZE,\
infra.hashicorp-vault.vault.injector.enabled=false,\
infra.hashicorp-vault.vault.server.enabled=true,\
infra.hashicorp-vault.vault.server.standalone.enabled=true,\
infra.hashicorp-vault.vault.server.dataStorage.enabled=true,\
infra.hashicorp-vault.vault.server.dataStorage.storageClass=$STORAGE_CLASS,\
infra.hashicorp-vault.vault.server.dataStorage.size=$VAULT_PERSISTENCE_SIZE,\
infra.hashicorp-vault.vault.server.standalone.config='ui = true listener "tcp" { address = "[::]:8200" cluster_address = "[::]:8201" tls_disable = 1} storage "s3" { bucket = "'"$BUCKET_NAME"'" access_key = "'"$CLOUD_ACCESSKEY_ID"'" secret_key = "'"$CLOUD_SECRET_KEY"'" endpoint = "'"$CLOUD_ENDPOINT_URL"'" region = "'"$CLOUD_PROVIDER_REGION"'" s3_force_path_style = true path = "'"$VAULT_PATH"'"}',\
For cloudian infra deployments please include additional properties as shown here
infra.hashicorp-vault.vault.server.hostAliases[0].ip="$CLOUD_ENDPOINT_IP",\
infra.hashicorp-vault.vault.server.hostAliases[0].hostnames[0]="< specify cloud endpoint url without http >"
To install Kubecost refer Install Kubecost (Optional)
For OpenLDAP, please follow instructions to create openldap users Install OpenLDAP
To install Istio, Kserve, Knative refer Install Istio section (Optional)
Infrastructure component deployment status
Check the status of all infrastucture components
kubectl -n aizen-infra get pods
For any reason if any of the components are not in Running state please check troubleshooting section