Deployment Targets
spec.deployment tells the generator where the agent ships. You pick one target with type and fill in the matching sub-block. Both supported targets are shown below as complete, copy-pasteable manifests for the same agent.
Kubernetes
The Kubernetes block is intentionally minimal in v1: it carries the container image and leaves the Deployment, Service, and ConfigMap to the generator's templating.
yaml
apiVersion: adl.inference-gateway.com/v1
kind: Agent
metadata:
name: analytics-agent
description: Analytics agent deployed to Kubernetes from GHCR
version: "2.1.0"
spec:
capabilities:
streaming: true
pushNotifications: false
stateTransitionHistory: false
agent:
provider: anthropic
model: claude-sonnet-4-7
server:
port: 8080
scheme: https
language:
go:
module: github.com/example/analytics-agent
version: "1.26"
deployment:
type: kubernetes
kubernetes:
image:
registry: ghcr.io
repository: example/analytics-agent
tag: v2.1.0Cloud Run
The same agent, retargeted at Cloud Run. The cloudrun block adds typed slots for resources, autoscaling, and service settings, and can opt into Google Cloud Build with useCloudBuild: true.
yaml
apiVersion: adl.inference-gateway.com/v1
kind: Agent
metadata:
name: analytics-agent
description: Analytics agent deployed to Cloud Run with Cloud Build
version: "2.1.0"
spec:
capabilities:
streaming: true
pushNotifications: false
stateTransitionHistory: false
agent:
provider: anthropic
model: claude-sonnet-4-7
server:
port: 8080
scheme: https
language:
go:
module: github.com/example/analytics-agent
version: "1.26"
deployment:
type: cloudrun
cloudrun:
image:
registry: us-central1-docker.pkg.dev
repository: my-project/agents/analytics-agent
tag: v2.1.0
useCloudBuild: true
resources:
cpu: "2"
memory: 1Gi
scaling:
minInstances: 0
maxInstances: 20
concurrency: 40
service:
timeout: 600
allowUnauthenticated: false
serviceAccount: [email protected]
executionEnvironment: gen2
environment:
LOG_LEVEL: info
CACHE_TTL: "300"Highlights
typeselects one target. Set it tokubernetesorcloudrun; the generator reads the matching sub-block and ignores the other.- Shared
imageshape. Both targets reuse the sameimageobject (registry,repository,tag,useCloudBuild). - Cloud Run carries more knobs.
resources,scaling, andservicegive Cloud Run typed control over CPU/memory, instance counts, and request handling.scaling.minInstances: 0lets it scale to zero when idle. - Kubernetes stays lean on purpose. v1 keeps it to the image so the generator's templates own replica counts and limits. Future minor versions may add typed slots as patterns harden.