Infinity embeddings on Kubernetes with KubeAI

Sam Stoelinga - Sep 25 - - Dev Community

Just merged and released the Infinity support PR in KubeAI, adding Infinity as an embedding engine. So you can get embeddings running on your Kubernetes clusters with an OpenAI compatible API.

Infinity is a high performance and low latency embeddings engine: https://github.com/michaelfeil/infinity
KubeAI is a Kubernetes Operator for running OSS ML serving engines: https://github.com/substratusai/kubeai

How to use this?

Run on any K8s cluster:

helm repo add kubeai https://www.kubeai.org
helm install kubeai kubeai/kubeai --wait --timeout 10m
cat > model-values.yaml << EOF
catalog:
  bge-embed-text-cpu:
    enabled: true
    features: ["TextEmbedding"]
    owner: baai
    url: "hf://BAAI/bge-small-en-v1.5"
    engine: Infinity
    resourceProfile: cpu:1
    minReplicas: 1
EOF
helm install kubeai-models kubeai/models -f ./model-values.yaml
Enter fullscreen mode Exit fullscreen mode

Forward kubeai service to local host:

kubectl port-forward svc/kubeai 8000:80
Enter fullscreen mode Exit fullscreen mode

Afterwards you could use the OpenAI Python client to get embeddings:

from openai import OpenAI
# Assumes port-forward of kubeai service to localhost:8000.
client = OpenAI(api_key="ignored", base_url="http://localhost:8000/openai/v1")
response = client.embeddings.create(
    input="Your text goes here.",
    model="bge-embed-text-cpu"
)
print(response)
Enter fullscreen mode Exit fullscreen mode

What’s next?

  • Support for autoscaling based on Infinity reported metrics.
. . .
Terabox Video Player