Run an LLM inference

Note: This endpoint is in Preview and is subject to change. If you have any feedback, contact Datadog support.

POST https://api.ap1.datadoghq.com/api/v2/llm-obs/v1/integrations/{integration}/{account_id}/inferencehttps://api.ap2.datadoghq.com/api/v2/llm-obs/v1/integrations/{integration}/{account_id}/inferencehttps://api.datadoghq.eu/api/v2/llm-obs/v1/integrations/{integration}/{account_id}/inferencehttps://api.ddog-gov.com/api/v2/llm-obs/v1/integrations/{integration}/{account_id}/inferencehttps://api.us2.ddog-gov.com/api/v2/llm-obs/v1/integrations/{integration}/{account_id}/inferencehttps://api.datadoghq.com/api/v2/llm-obs/v1/integrations/{integration}/{account_id}/inferencehttps://api.us3.datadoghq.com/api/v2/llm-obs/v1/integrations/{integration}/{account_id}/inferencehttps://api.us5.datadoghq.com/api/v2/llm-obs/v1/integrations/{integration}/{account_id}/inference

Overview

Run an LLM inference request through the specified integration and account, returning the model response and token usage.

Arguments

Path Parameters

Name

Type

Description

integration [required]

string

The name of the LLM integration.

account_id [required]

string

The ID of the integration account.

Request

Body Data (required)

Inference request parameters.

Expand All

Field

Type

Description

anthropic_metadata

object

Anthropic-specific metadata for an inference request.

effort

enum

The effort level for Anthropic inference. Allowed enum values: low,medium,high,max

thinking

object

Configuration for Anthropic extended thinking feature.

budget_tokens

int64

Maximum token budget for extended thinking. Required when type is enabled.

type [required]

enum

The thinking mode for Anthropic extended thinking. Allowed enum values: enabled,disabled,adaptive

azure_openai_metadata

object

Azure OpenAI-specific metadata for an integration account or inference request.

deployment_id

string

The Azure OpenAI deployment ID.

model_version

string

The model version deployed in Azure.

resource_name

string

The Azure OpenAI resource name.

bedrock_metadata

object

Amazon Bedrock-specific metadata for an inference request.

region

string

The AWS region for the Bedrock request.

frequency_penalty

double

Penalty for token frequency to reduce repetition.

json_schema

string

JSON schema for structured output, if supported by the model.

max_completion_tokens

int64

Maximum number of completion tokens to generate (alternative to max_tokens for some providers).

max_tokens

int64

Maximum number of tokens to generate.

messages [required]

[object]

List of messages in an inference conversation.

content

string

Plain text content of the message.

contents

[object]

List of structured content blocks in a message.

type [required]

string

The content block type.

value [required]

object

The typed value of a message content block.

text

string

Plain text content.

tool_call

object

A tool call made during LLM inference.

arguments

object

The arguments passed to the tool.

name

string

The name of the tool being called.

tool_id

string

Unique identifier for the tool call.

type

string

The type of tool call.

tool_call_result

object

The result returned by a tool call during LLM inference.

name

string

The name of the tool that produced this result.

result

string

The result content returned by the tool.

tool_id

string

Identifier matching the corresponding tool call.

type

string

The type of tool result.

id

string

Unique identifier for the message.

role

string

The role of the message author.

tool_calls

[object]

List of tool calls in a message.

arguments

object

The arguments passed to the tool.

name

string

The name of the tool being called.

tool_id

string

Unique identifier for the tool call.

type

string

The type of tool call.

tool_results

[object]

List of tool results in a message.

name

string

The name of the tool that produced this result.

result

string

The result content returned by the tool.

tool_id

string

Identifier matching the corresponding tool call.

type

string

The type of tool result.

model_id [required]

string

The model identifier to use for inference.

openai_metadata

object

OpenAI-specific metadata for an inference request.

reasoning_effort

enum

The reasoning effort level for OpenAI models that support it. Allowed enum values: none,low,medium,high,xhigh

reasoning_summary

enum

The verbosity of the reasoning summary. Allowed enum values: auto,concise,detailed

presence_penalty

double

Penalty for token presence to encourage topic diversity.

temperature

double

Sampling temperature between 0 and 2. Higher values produce more random output.

tools

[object]

List of tools available to the model.

function [required]

object

A function definition for a tool available to the model.

description

string

A description of what the function does.

name [required]

string

The name of the function.

parameters [required]

object

JSON schema describing the function parameters.

type [required]

string

The type of tool.

top_k

int64

Top-K sampling parameter.

top_p

double

Nucleus sampling probability mass.

vertex_ai_metadata

object

Vertex AI-specific metadata for an integration account or inference request.

location

string

The Vertex AI region.

project

string

The Google Cloud project ID.

project_ids

[string]

List of Google Cloud project IDs available to the service account.

{
  "anthropic_metadata": {
    "effort": "medium",
    "thinking": {
      "budget_tokens": 1024,
      "type": "enabled"
    }
  },
  "azure_openai_metadata": {
    "deployment_id": "my-gpt4-deployment",
    "model_version": "0613",
    "resource_name": "my-azure-resource"
  },
  "bedrock_metadata": {
    "region": "us-east-1"
  },
  "frequency_penalty": 0,
  "json_schema": "{\"type\":\"object\",\"properties\":{\"answer\":{\"type\":\"string\"}}}",
  "max_completion_tokens": 1024,
  "max_tokens": 1024,
  "messages": [
    {
      "content": "What is the capital of France?",
      "contents": [
        {
          "type": "text",
          "value": {
            "text": "Hello, how can I help you?",
            "tool_call": {
              "arguments": {
                "location": "San Francisco"
              },
              "name": "get_weather",
              "tool_id": "call_abc123",
              "type": "function"
            },
            "tool_call_result": {
              "name": "get_weather",
              "result": "The weather in San Francisco is 68°F and sunny.",
              "tool_id": "call_abc123",
              "type": "function"
            }
          }
        }
      ],
      "id": "msg_001",
      "role": "user",
      "tool_calls": [
        {
          "arguments": {
            "location": "San Francisco"
          },
          "name": "get_weather",
          "tool_id": "call_abc123",
          "type": "function"
        }
      ],
      "tool_results": [
        {
          "name": "get_weather",
          "result": "The weather in San Francisco is 68°F and sunny.",
          "tool_id": "call_abc123",
          "type": "function"
        }
      ]
    }
  ],
  "model_id": "gpt-4o",
  "openai_metadata": {
    "reasoning_effort": "medium",
    "reasoning_summary": "auto"
  },
  "presence_penalty": 0,
  "temperature": 0.7,
  "tools": [
    {
      "function": {
        "description": "Get the current weather for a location.",
        "name": "get_weather",
        "parameters": {
          "properties": {
            "location": {
              "type": "string"
            }
          },
          "type": "object"
        }
      },
      "type": "function"
    }
  ],
  "top_k": 50,
  "top_p": 1,
  "vertex_ai_metadata": {
    "location": "us-central1",
    "project": "my-gcp-project",
    "project_ids": [
      "my-gcp-project"
    ]
  }
}

Response

OK

The result of an LLM inference request, including input parameters and the model response.

Expand All

Field

Type

Description

anthropic_metadata

object

Anthropic-specific metadata for an inference request.

effort

enum

The effort level for Anthropic inference. Allowed enum values: low,medium,high,max

thinking

object

Configuration for Anthropic extended thinking feature.

budget_tokens

int64

Maximum token budget for extended thinking. Required when type is enabled.

type [required]

enum

The thinking mode for Anthropic extended thinking. Allowed enum values: enabled,disabled,adaptive

azure_openai_metadata

object

Azure OpenAI-specific metadata for an integration account or inference request.

deployment_id

string

The Azure OpenAI deployment ID.

model_version

string

The model version deployed in Azure.

resource_name

string

The Azure OpenAI resource name.

bedrock_metadata

object

Amazon Bedrock-specific metadata for an inference request.

region

string

The AWS region for the Bedrock request.

error_response

object

Error details returned when an inference provider returns an error.

message [required]

string

A human-readable description of the error.

type [required]

string

The provider-specific error type.

frequency_penalty

double

Frequency penalty that was applied.

json_schema

string

JSON schema that was applied for structured output.

max_completion_tokens

int64

Maximum number of completion tokens that were configured.

max_tokens

int64

Maximum number of tokens that were configured.

messages [required]

[object]

List of messages in an inference conversation.

content

string

Plain text content of the message.

contents

[object]

List of structured content blocks in a message.

type [required]

string

The content block type.

value [required]

object

The typed value of a message content block.

text

string

Plain text content.

tool_call

object

A tool call made during LLM inference.

arguments

object

The arguments passed to the tool.

name

string

The name of the tool being called.

tool_id

string

Unique identifier for the tool call.

type

string

The type of tool call.

tool_call_result

object

The result returned by a tool call during LLM inference.

name

string

The name of the tool that produced this result.

result

string

The result content returned by the tool.

tool_id

string

Identifier matching the corresponding tool call.

type

string

The type of tool result.

id

string

Unique identifier for the message.

role

string

The role of the message author.

tool_calls

[object]

List of tool calls in a message.

arguments

object

The arguments passed to the tool.

name

string

The name of the tool being called.

tool_id

string

Unique identifier for the tool call.

type

string

The type of tool call.

tool_results

[object]

List of tool results in a message.

name

string

The name of the tool that produced this result.

result

string

The result content returned by the tool.

tool_id

string

Identifier matching the corresponding tool call.

type

string

The type of tool result.

model_id [required]

string

The model identifier used for inference.

openai_metadata

object

OpenAI-specific metadata for an inference request.

reasoning_effort

enum

The reasoning effort level for OpenAI models that support it. Allowed enum values: none,low,medium,high,xhigh

reasoning_summary

enum

The verbosity of the reasoning summary. Allowed enum values: auto,concise,detailed

presence_penalty

double

Presence penalty that was applied.

response [required]

object

The output of a completed LLM inference call.

assessment [required]

string

An optional assessment of the inference output quality.

content [required]

string

The text content of the model response.

finish_reason [required]

string

The reason the model stopped generating tokens.

inference_codes [required]

[object]

List of generated code snippets for the inference configuration.

code [required]

string

The generated code content.

id [required]

string

Unique identifier for the code snippet.

type [required]

string

The programming language or SDK type of the code snippet.

input_tokens [required]

int64

Number of input tokens consumed.

internal_reasoning

object

The model's internal reasoning or thinking output, if available.

reasoning_tokens

int64

Number of tokens used for internal reasoning.

text [required]

string

The reasoning text produced by the model.

latency [required]

int64

Request latency in milliseconds.

output_tokens [required]

int64

Number of output tokens generated.

tools [required]

[object]

List of tools available to the model.

function [required]

object

A function definition for a tool available to the model.

description

string

A description of what the function does.

name [required]

string

The name of the function.

parameters [required]

object

JSON schema describing the function parameters.

type [required]

string

The type of tool.

total_tokens [required]

int64

Total tokens used (input plus output).

temperature

double

Sampling temperature that was used.

tools

[object]

List of tools available to the model.

function [required]

object

A function definition for a tool available to the model.

description

string

A description of what the function does.

name [required]

string

The name of the function.

parameters [required]

object

JSON schema describing the function parameters.

type [required]

string

The type of tool.

top_k

int64

Top-K sampling parameter that was used.

top_p

double

Nucleus sampling parameter that was used.

vertex_ai_metadata

object

Vertex AI-specific metadata for an integration account or inference request.

location

string

The Vertex AI region.

project

string

The Google Cloud project ID.

project_ids

[string]

List of Google Cloud project IDs available to the service account.

{
  "anthropic_metadata": {
    "effort": "medium",
    "thinking": {
      "budget_tokens": 1024,
      "type": "enabled"
    }
  },
  "azure_openai_metadata": {
    "deployment_id": "my-gpt4-deployment",
    "model_version": "0613",
    "resource_name": "my-azure-resource"
  },
  "bedrock_metadata": {
    "region": "us-east-1"
  },
  "error_response": {
    "message": "The model does not exist.",
    "type": "invalid_request_error"
  },
  "frequency_penalty": 0,
  "json_schema": "{\"type\":\"object\",\"properties\":{\"answer\":{\"type\":\"string\"}}}",
  "max_completion_tokens": 1024,
  "max_tokens": 1024,
  "messages": [
    {
      "content": "What is the capital of France?",
      "contents": [
        {
          "type": "text",
          "value": {
            "text": "Hello, how can I help you?",
            "tool_call": {
              "arguments": {
                "location": "San Francisco"
              },
              "name": "get_weather",
              "tool_id": "call_abc123",
              "type": "function"
            },
            "tool_call_result": {
              "name": "get_weather",
              "result": "The weather in San Francisco is 68°F and sunny.",
              "tool_id": "call_abc123",
              "type": "function"
            }
          }
        }
      ],
      "id": "msg_001",
      "role": "user",
      "tool_calls": [
        {
          "arguments": {
            "location": "San Francisco"
          },
          "name": "get_weather",
          "tool_id": "call_abc123",
          "type": "function"
        }
      ],
      "tool_results": [
        {
          "name": "get_weather",
          "result": "The weather in San Francisco is 68°F and sunny.",
          "tool_id": "call_abc123",
          "type": "function"
        }
      ]
    }
  ],
  "model_id": "gpt-4o",
  "openai_metadata": {
    "reasoning_effort": "medium",
    "reasoning_summary": "auto"
  },
  "presence_penalty": 0,
  "response": {
    "assessment": "pass",
    "content": "The capital of France is Paris.",
    "finish_reason": "stop",
    "inference_codes": [
      {
        "code": "import openai\nclient = openai.OpenAI()\n...",
        "id": "code-python-001",
        "type": "python"
      }
    ],
    "input_tokens": 15,
    "internal_reasoning": {
      "reasoning_tokens": 256,
      "text": "Let me think about this step by step..."
    },
    "latency": 843,
    "output_tokens": 10,
    "tools": [
      {
        "function": {
          "description": "Get the current weather for a location.",
          "name": "get_weather",
          "parameters": {
            "properties": {
              "location": {
                "type": "string"
              }
            },
            "type": "object"
          }
        },
        "type": "function"
      }
    ],
    "total_tokens": 25
  },
  "temperature": 0.7,
  "tools": [
    {
      "function": {
        "description": "Get the current weather for a location.",
        "name": "get_weather",
        "parameters": {
          "properties": {
            "location": {
              "type": "string"
            }
          },
          "type": "object"
        }
      },
      "type": "function"
    }
  ],
  "top_k": 50,
  "top_p": 1,
  "vertex_ai_metadata": {
    "location": "us-central1",
    "project": "my-gcp-project",
    "project_ids": [
      "my-gcp-project"
    ]
  }
}

Bad Request

API error response.

Expand All

Field

Type

Description

errors [required]

[object]

A list of errors.

detail

string

A human-readable explanation specific to this occurrence of the error.

meta

object

Non-standard meta-information about the error

source

object

References to the source of the error.

header

string

A string indicating the name of a single request header which caused the error.

parameter

string

A string indicating which URI query parameter caused the error.

pointer

string

A JSON pointer to the value in the request document that caused the error.

status

string

Status code of the response.

title

string

Short human-readable summary of the error.

{
  "errors": [
    {
      "detail": "Missing required attribute in body",
      "meta": {},
      "source": {
        "header": "Authorization",
        "parameter": "limit",
        "pointer": "/data/attributes/title"
      },
      "status": "400",
      "title": "Bad Request"
    }
  ]
}

Unauthorized

API error response.

Expand All

Field

Type

Description

errors [required]

[object]

A list of errors.

detail

string

A human-readable explanation specific to this occurrence of the error.

meta

object

Non-standard meta-information about the error

source

object

References to the source of the error.

header

string

A string indicating the name of a single request header which caused the error.

parameter

string

A string indicating which URI query parameter caused the error.

pointer

string

A JSON pointer to the value in the request document that caused the error.

status

string

Status code of the response.

title

string

Short human-readable summary of the error.

{
  "errors": [
    {
      "detail": "Missing required attribute in body",
      "meta": {},
      "source": {
        "header": "Authorization",
        "parameter": "limit",
        "pointer": "/data/attributes/title"
      },
      "status": "400",
      "title": "Bad Request"
    }
  ]
}

Forbidden

API error response.

Expand All

Field

Type

Description

errors [required]

[object]

A list of errors.

detail

string

A human-readable explanation specific to this occurrence of the error.

meta

object

Non-standard meta-information about the error

source

object

References to the source of the error.

header

string

A string indicating the name of a single request header which caused the error.

parameter

string

A string indicating which URI query parameter caused the error.

pointer

string

A JSON pointer to the value in the request document that caused the error.

status

string

Status code of the response.

title

string

Short human-readable summary of the error.

{
  "errors": [
    {
      "detail": "Missing required attribute in body",
      "meta": {},
      "source": {
        "header": "Authorization",
        "parameter": "limit",
        "pointer": "/data/attributes/title"
      },
      "status": "400",
      "title": "Bad Request"
    }
  ]
}

Too many requests

API error response.

Expand All

Field

Type

Description

errors [required]

[string]

A list of errors.

{
  "errors": [
    "Bad Request"
  ]
}

Internal Server Error

API error response.

Expand All

Field

Type

Description

errors [required]

[object]

A list of errors.

detail

string

A human-readable explanation specific to this occurrence of the error.

meta

object

Non-standard meta-information about the error

source

object

References to the source of the error.

header

string

A string indicating the name of a single request header which caused the error.

parameter

string

A string indicating which URI query parameter caused the error.

pointer

string

A JSON pointer to the value in the request document that caused the error.

status

string

Status code of the response.

title

string

Short human-readable summary of the error.

{
  "errors": [
    {
      "detail": "Missing required attribute in body",
      "meta": {},
      "source": {
        "header": "Authorization",
        "parameter": "limit",
        "pointer": "/data/attributes/title"
      },
      "status": "400",
      "title": "Bad Request"
    }
  ]
}

Code Example

                  ## default
# 

# Path parameters
export integration="openai"
export account_id="account-abc123"
# Curl command
curl -X POST "https://api.ap1.datadoghq.com"https://api.ap2.datadoghq.com"https://api.datadoghq.eu"https://api.ddog-gov.com"https://api.us2.ddog-gov.com"https://api.datadoghq.com"https://api.us3.datadoghq.com"https://api.us5.datadoghq.com/api/v2/llm-obs/v1/integrations/${integration}/${account_id}/inference" \ -H "Accept: application/json" \ -H "Content-Type: application/json" \ -H "DD-API-KEY: ${DD_API_KEY}" \ -H "DD-APPLICATION-KEY: ${DD_APP_KEY}" \ -d @- << EOF { "max_tokens": 256, "messages": [ { "content": "What is the capital of France?", "role": "user" } ], "model_id": "gpt-4o", "temperature": 0.7 } EOF