:orphan: .. _aizeninfrastructure: Install infrastructure components ================================= * Aizen uses open src components as dependencies * Apache Spark * Spark operator which makes specifying and running spark applications as easy and idiomatic as running other workloads on Kubernetes It uses Kubernetes custom resources for specifying, running and surfacing the status of spark applications * Kuberay * Kuberay operator simplifies the deployment and management of Ray applications on Kubernetes * Apacke kafka * A distributed event store and stream processing platform * Filecopy server * Provides the ability to distribute custom preprocessor files to all nodes in the cluster * MLFlow * To manage machine learning(ML) lifecycle, including experiments, deployment and a central model registry. The MLFlow tracking lets you log and query experiments using python, Java or REST APIs. MLFlow runs are recorded in mysql database * Multi cluster app dispatcher (MCAD) * This controller provides an abstraction for wrapping all resources of jobs and treat them holistically by queueing the jobs and applying queueing policies and dispatching the jobs when cluster resources are available * Monitoring components * Prometheus operator * For configuration and management of Prometheus monitoring stack that runs in Kubernetes cluster * Elastic search * A distributed search and analytics engine designed for handling large volumes of data. It is used for storing, searching and analyzing structured and unstructured data in real time * Fluentd * A data collector that unify data collection from various data sources and log files .. important:: * Please remember to update STORAGE_CLASS, REPO_CREDS, IMAGE_TAG_TO_USE, INGRESS_HOST, BUCKET_NAME, ENDPOINT_URL, ACCESS_KEY, SECRET_KEY values * The default storage size is hardcoded in the deployment for the various components, please change the size for your needs * For MLFlow endpoint URL, feel free to use an S3 bucket or a local minio bucket, but remember to create the bucket * If using local minio as object store remember to setup minio (:ref:`Optional components section `) first and create buckets * For **Azure AKS cluster**, add these additional properties to both infra and core deployments .. code-block:: global.s3.azure.enabled=true,\ global.s3.azure.values.storage_account_name=$STORAGE_ACCOUNT_NAME,\ global.s3.azure.values.storage_access_key=$CLOUD_ACCESSKEY_ID,\ global.s3.azure.values.storage_connection_string=$CLOUD_SECRET_KEY,\ infra.hashicorp-vault.vault.server.standalone.config='ui = true listener "tcp" { address = "[::]:8200" cluster_address = "[::]:8201" tls_disable = 1} storage "'"$CLOUD_PROVIDER_TYPE"'" { accountName = "'"$STORAGE_ACCOUNT_NAME"'" accountKey = "'"$CLOUD_ACCESSKEY_ID"'" container = "'"$BUCKET_NAME"'" }',\ global.mlflow.artifact.secrets.values.mlflow_endpoint_url=https://.blob.core.windows.net,\ global.mlflow.artifact.secrets.values.mlflow_artifacts_destination=wasbs://@.blob.core.windows.net/ * For **GCP cluster**, if istio-injection is enabled. first disable istio-injection, install aizen-infra components and then enable istio-injection .. code-block:: #Disable istio-injection kubectl label ns aizen-infra istio-injection- #Enable istio-injection kubectl label ns aizen-infra istio-injection=enable * Check if gateway namespace have istio-injection enabled .. code-block:: kubectl get ns -L istio-injection * After core components are installed create gateway and virtual service for gui, mlflow and prediction :ref:`Aizen gateway and virtual service ` * Create namespace for Aizen infrastructure components .. code-block:: kubectl create ns aizen-infra If you have Docker credential information, first create kubernetes secret for accessing Aizen images .. code-block:: kubectl create secret docker-registry aizenrepo-creds --docker-username=aizencorp --docker-password= -n aizen-infra Deploy infra components .. code-block:: NAMESPACE=aizen-infra HELMCHART_LOCATION=aizen-helmcharts-1.0.0 SECURE_HTTPS=false INGRESS_ENABLED=false GATEWAY_ENABLED=true STORAGE_CLASS= INGRESS_HOST= BUCKET_NAME= CLUSTER_NAME= CLOUD_ENDPOINT_URL= CLOUD_ACCESSKEY_ID= CLOUD_SECRET_KEY= CLOUD_PROVIDER_REGION= CLOUD_PROVIDER_TYPE= AUTH_TYPE=ldap #Needed for Azure STORAGE_ACCOUNT_NAME= #Needed only for cloudian CLOUD_ENDPOINT_IP= #IMAGE IMAGE_REPO=aizencorp IMAGE_REPO_SECRET= IMAGE_TAG=1.0.0 #MLFLOW MLFLOW_ACCESSKEY_ID= MLFLOW_SECRET_KEY= MLFLOW_ENDPOINT_URL= MLFLOW_ARTIFACT_DESTINATION=s3:// MLFLOW_ARTIFACT_REGION= if [[ "$AUTH_TYPE" = "ldap" ]]; then LDAP_SERVER_HOST="ldap://aizen-openldap-service.aizen-infra.svc.cluster.local:1389" LDAP_BIND_DN="uid={username}\,ou=users\,dc=aizencorp\,dc=local\,dc=com|uid={username}\,ou=people\,dc=aizencorp\,dc=local\,dc=com" LDAP_USER_DN="ou=users\,dc=aizencorp\,dc=local\,dc=com" LDAP_ADMIN_DN="cn=admin\,dc=aizencorp\,dc=local\,dc=com" LDAP_ADMIN_DNPWD="admin" LDAP_GROUP_DN="ou=groups\,dc=aizencorp\,dc=local\,dc=com" LDAP_ALLOWED_GROUPS="cn=dbgrp\,ou=groups\,dc=aizencorp\,dc=local\,dc=com" LDAP_SEARCH_FILTER="(uid={username})" AIZEN_ADMIN_USER="aizenadmin" elif [[ "$AUTH_TYPE" = "oauth" ]]; then AUTH0_DOMAIN= AUTH0_AUDIENCE= AUTH0_CLIENT_ID="test" AUTH0_CLIENT_SECRET="test" JWT_SECRET=$AUTH0_CLIENT_SECRET AIZEN_ADMIN_USER="aizenadmin" fi #PVC MLFLOW_MYSQL_PERSISTENCE_SIZE=200Gi PROMETHEUS_PERSISTENCE_SIZE=55Gi GRAFANA_PERSISTENCE_SIZE=20Gi ELASTIC_SEARCH_LOG_SIZE=55Gi VECTORDB_PERSISTENCE_SIZE=25Gi VAULT_PERSISTENCE_SIZE=25Gi #You don't need to change anything below this line kubectl get ns ${NAMESPACE} >/dev/null 2>&1 || kubectl create ns ${NAMESPACE} VAULT_PATH=$CLUSTER_NAME"-vault" helm -n $NAMESPACE install aizen-infra $HELMCHART_LOCATION/aizen --create-namespace \ --set infra.enabled=true,\ infra.kafka.kafka.global.storageClass=$STORAGE_CLASS,\ infra.prometheus-operator.kube-prometheus-stack.prometheus.prometheusSpec.storageSpec.volumeClaimTemplate.spec.storageClassName=$STORAGE_CLASS,\ infra.prometheus-operator.kube-prometheus-stack.alertmanager.alertmanagerSpec.storage.volumeClaimTemplate.spec.storageClassName=$STORAGE_CLASS,\ infra.prometheus-operator.kube-prometheus-stack.prometheus.prometheusSpec.storageSpec.volumeClaimTemplate.spec.resources.requests.storage=$PROMETHEUS_PERSISTENCE_SIZE,\ infra.prometheus-operator.kube-prometheus-stack.grafana.persistence.size=$GRAFANA_PERSISTENCE_SIZE,\ infra.mlflow.mysql.primary.persistence.storageClass=$STORAGE_CLASS,\ infra.mlflow.mysql.primary.persistence.size=$MLFLOW_MYSQL_PERSISTENCE_SIZE,\ global.mlflow.artifact.region=$MLFLOW_ARTIFACT_REGION,\ global.mlflow.artifact.secrets.values.mlflow_access_key_id=$MLFLOW_ACCESSKEY_ID,\ global.mlflow.artifact.secrets.values.mlflow_access_secret_key=$MLFLOW_SECRET_KEY,\ global.mlflow.artifact.secrets.values.mlflow_endpoint_url=$MLFLOW_ENDPOINT_URL,\ global.mlflow.artifact.secrets.values.mlflow_artifacts_destination=$MLFLOW_ARTIFACT_DESTINATION,\ global.ingress.enabled=$INGRESS_ENABLED,\ global.gateway.enabled=$GATEWAY_ENABLED,\ global.image_registry=$IMAGE_REPO,\ global.storage_class=$STORAGE_CLASS,\ global.image_secret=$IMAGE_REPO_SECRET,\ global.image_tag=$IMAGE_TAG,\ global.ingress.host=$INGRESS_HOST,\ global.log.volume_size=$ELASTIC_SEARCH_LOG_SIZE,\ global.clustername=$CLUSTER_NAME,\ global.s3.endpoint_url=$CLOUD_ENDPOINT_URL,\ global.s3.endpoint_ip=$CLOUD_ENDPOINT_IP,\ global.s3.secrets.values.s3_access_key=$CLOUD_ACCESSKEY_ID,\ global.s3.secrets.values.s3_secret_key=$CLOUD_SECRET_KEY,\ global.customer_bucket_name=$BUCKET_NAME,\ global.secure_https=$SECURE_HTTPS,\ global.vault.ldap.server_host=$LDAP_SERVER_HOST,\ global.vault.ldap.userdn=$LDAP_USER_DN,\ global.vault.ldap.binddn=$LDAP_USER_DN,\ global.vault.ldap.groupdn=$LDAP_GROUP_DN,\ global.vault.ldap.admin_user=$AIZEN_ADMIN_USER,\ global.vault.auth0.domain=$AUTH0_DOMAIN,\ global.vault.auth0.audience=$AUTH0_AUDIENCE,\ global.vault.auth0.secrets.auth0_client_id=$AUTH0_CLIENT_ID,\ global.vault.auth0.secrets.auth0_client_secret=$AUTH0_CLIENT_SECRET,\ infra.vectordb.vectordb.primary.persistence.size=$VECTORDB_PERSISTENCE_SIZE,\ infra.hashicorp-vault.vault.injector.enabled=false,\ infra.hashicorp-vault.vault.server.enabled=true,\ infra.hashicorp-vault.vault.server.standalone.enabled=true,\ infra.hashicorp-vault.vault.server.dataStorage.enabled=true,\ infra.hashicorp-vault.vault.server.dataStorage.storageClass=$STORAGE_CLASS,\ infra.hashicorp-vault.vault.server.dataStorage.size=$VAULT_PERSISTENCE_SIZE,\ infra.hashicorp-vault.vault.server.standalone.config='ui = true listener "tcp" { address = "[::]:8200" cluster_address = "[::]:8201" tls_disable = 1} storage "s3" { bucket = "'"$BUCKET_NAME"'" access_key = "'"$CLOUD_ACCESSKEY_ID"'" secret_key = "'"$CLOUD_SECRET_KEY"'" endpoint = "'"$CLOUD_ENDPOINT_URL"'" region = "'"$CLOUD_PROVIDER_REGION"'" s3_force_path_style = true path = "'"$VAULT_PATH"'"}',\ For cloudian infra deployments please include additional properties as shown here .. code-block:: infra.hashicorp-vault.vault.server.hostAliases[0].ip="$CLOUD_ENDPOINT_IP",\ infra.hashicorp-vault.vault.server.hostAliases[0].hostnames[0]="< specify cloud endpoint url without http >" To install **Kubecost** refer :ref:`Install Kubecost ` (Optional) For **OpenLDAP,** please follow instructions to create openldap users :ref:`Install OpenLDAP ` To install **Istio, Kserve, Knative** refer :ref:`Install Istio section ` (Optional) Infrastructure component deployment status ------------------------------------------- * Check the status of all infrastucture components .. code-block:: kubectl -n aizen-infra get pods * For any reason if any of the components are not in **Running** state please check :ref:`troubleshooting section `