Skip to content

API Reference

Packages

inference.networking.x-k8s.io/v1alpha2

Package v1alpha2 contains API Schema definitions for the inference.networking.x-k8s.io API group.

Resource Types

Group

Underlying type: string

Group refers to a Kubernetes Group. It must either be an empty string or a RFC 1123 subdomain.

This validation is based off of the corresponding Kubernetes validation: https://github.com/kubernetes/apimachinery/blob/02cfb53916346d085a6c6c7c66f882e3c6b0eca6/pkg/util/validation/validation.go#L208

Valid values include:

  • "" - empty string implies core Kubernetes API group
  • "gateway.networking.k8s.io"
  • "foo.example.com"

Invalid values include:

  • "example.com/bar" - "/" is an invalid character

Validation: - MaxLength: 253 - Pattern: ^$|^[a-z0-9]([-a-z0-9]*[a-z0-9])?(\.[a-z0-9]([-a-z0-9]*[a-z0-9])?)*$

Appears in: - PoolObjectReference

InferenceModelRewrite

InferenceModelRewrite is the Schema for the InferenceModelRewrite API.

Field Description Default Validation
apiVersion string inference.networking.x-k8s.io/v1alpha2
kind string InferenceModelRewrite
metadata ObjectMeta Refer to Kubernetes API documentation for fields of metadata.
spec InferenceModelRewriteSpec
status InferenceModelRewriteStatus

InferenceModelRewriteRule

InferenceModelRewriteRule defines the match criteria and corresponding action. For details on how precedence is determined across multiple rules and InferenceModelRewrite resources, see the "Precedence and Conflict Resolution" section in InferenceModelRewriteSpec.

Appears in: - InferenceModelRewriteSpec

Field Description Default Validation
matches Match array
targets TargetModel array MinItems: 1

InferenceModelRewriteSpec

InferenceModelRewriteSpec defines the desired state of InferenceModelRewrite.

Appears in: - InferenceModelRewrite

Field Description Default Validation
poolRef PoolObjectReference PoolRef is a reference to the inference pool. Required: {}
rules InferenceModelRewriteRule array

InferenceModelRewriteStatus

InferenceModelRewriteStatus defines the observed state of InferenceModelRewrite.

Appears in: - InferenceModelRewrite

Field Description Default Validation
conditions Condition array Conditions track the state of the InferenceModelRewrite.
Known condition types are:
* "Accepted"
[map[lastTransitionTime:1970-01-01T00:00:00Z message:Waiting for controller reason:Pending status:Unknown type:Accepted]] MaxItems: 8

InferenceObjective

InferenceObjective is the Schema for the InferenceObjectives API.

Field Description Default Validation
apiVersion string inference.networking.x-k8s.io/v1alpha2
kind string InferenceObjective
metadata ObjectMeta Refer to Kubernetes API documentation for fields of metadata.
spec InferenceObjectiveSpec
status InferenceObjectiveStatus

InferenceObjectiveSpec

InferenceObjectiveSpec represents the desired state of a specific model use case. This resource is managed by the "Inference Workload Owner" persona.

The Inference Workload Owner persona is someone that trains, verifies, and leverages a large language model from a model frontend, drives the lifecycle and rollout of new versions of those models, and defines the specific performance and latency goals for the model. These workloads are expected to operate within an InferencePool sharing compute capacity with other InferenceObjectives, defined by the Inference Platform Admin.

Appears in: - InferenceObjective

Field Description Default Validation
priority integer Priority defines how important it is to serve the request compared to other requests in the same pool.
Priority is an integer value that defines the priority of the request.
The higher the value, the more critical the request is; negative values are allowed.
No default value is set for this field, allowing for future additions of new fields that may 'one of' with this field.
However, implementations that consume this field (such as the Endpoint Picker) will treat an unset value as '0'.
Priority is used in flow control, primarily in the event of resource scarcity(requests need to be queued).
All requests will be queued, and flow control will always allow requests of higher priority to be served first.
Fairness is only enforced and tracked between requests of the same priority.
Example: requests with Priority 10 will always be served before
requests with Priority of 0 (the value used if Priority is unset or no InferenceObjective is specified).
Similarly requests with a Priority of -10 will always be served after requests with Priority of 0.
poolRef PoolObjectReference PoolRef is a reference to the inference pool, the pool must exist in the same namespace. Required: {}

InferenceObjectiveStatus

InferenceObjectiveStatus defines the observed state of InferenceObjective

Appears in: - InferenceObjective

Field Description Default Validation
conditions Condition array Conditions track the state of the InferenceObjective.
Known condition types are:
* "Accepted"
[map[lastTransitionTime:1970-01-01T00:00:00Z message:Waiting for controller reason:Pending status:Unknown type:Ready]] MaxItems: 8

Kind

Underlying type: string

Kind refers to a Kubernetes Kind.

Valid values include:

  • "Service"
  • "HTTPRoute"

Invalid values include:

  • "invalid/kind" - "/" is an invalid character

Validation: - MaxLength: 63 - MinLength: 1 - Pattern: ^[a-zA-Z]([-a-zA-Z0-9]*[a-zA-Z0-9])?$

Appears in: - PoolObjectReference

Match

Match defines the criteria for matching the LLM requests.

Appears in: - InferenceModelRewriteRule

Field Description Default Validation
model ModelMatch Model specifies the criteria for matching the 'model' field
within the JSON request body.

MatchValidationType

Underlying type: string

MatchValidationType specifies the type of string matching to use.

Validation: - Enum: [Exact]

Appears in: - ModelMatch

Field Description
Exact MatchExact indicates that the model name must match exactly.

ModelMatch

ModelMatch defines how to match against the model name in the request body.

Appears in: - Match

Field Description Default Validation
type MatchValidationType Type specifies the kind of string matching to use.
Supported value is "Exact". Defaults to "Exact".
Exact Enum: [Exact]
value string Value is the model name string to match against. MinLength: 1

ObjectName

Underlying type: string

ObjectName refers to the name of a Kubernetes object. Object names can have a variety of forms, including RFC 1123 subdomains, RFC 1123 labels, or RFC 1035 labels.

Validation: - MaxLength: 253 - MinLength: 1

Appears in: - PoolObjectReference

PoolObjectReference

PoolObjectReference identifies an API object within the namespace of the referrer.

Appears in: - InferenceModelRewriteSpec - InferenceObjectiveSpec

Field Description Default Validation
group Group Group is the group of the referent. inference.networking.k8s.io MaxLength: 253
Pattern: ^$\|^[a-z0-9]([-a-z0-9]*[a-z0-9])?(\.[a-z0-9]([-a-z0-9]*[a-z0-9])?)*$
kind Kind Kind is kind of the referent. For example "InferencePool". InferencePool MaxLength: 63
MinLength: 1
Pattern: ^[a-zA-Z]([-a-zA-Z0-9]*[a-zA-Z0-9])?$
name ObjectName Name is the name of the referent. MaxLength: 253
MinLength: 1
Required: {}

TargetModel

TargetModel defines a weighted model destination for traffic distribution.

Appears in: - InferenceModelRewriteRule

Field Description Default Validation
weight integer (The following comment is copied from the original targetModel)
Weight is used to determine the proportion of traffic that should be
sent to this model when multiple target models are specified.
Weight defines the proportion of requests forwarded to the specified
model. This is computed as weight/(sum of all weights in this
TargetModels list). For non-zero values, there may be some epsilon from
the exact proportion defined here depending on the precision an
implementation supports. Weight is not a percentage and the sum of
weights does not need to equal 100.
If a weight is set for any targetModel, it must be set for all targetModels.
Conversely weights are optional, so long as ALL targetModels do not specify a weight.
Maximum: 1e+06
Minimum: 1
modelRewrite string