LLM Observability

Docs > DDSQL Reference > Data Directory > LLM Observability

This dataset represents LLM Observability events collected by Datadog. It provides per-span visibility into LLM applications, including request/response payloads, token usage, costs, evaluation outcomes, and experiment metadata. This enables analysis of LLM performance, quality, and cost across projects, experiments, models, and applications.

dd.llm_observability

LLM Observability Public Documentation

Monitor LLM Public Documentation

LLM Observability Experiments Public Documentation

Query Parameters

This dataset uses a polymorphic table function. You must specify parameters when querying.

Parameter	Type	Required	Description
`columns`	`array<string>`	Yes	List of fields to return for each LLM span (e.g., ’timestamp’, ‘@ml_app’, ‘@metrics.total_tokens’).
`scope`	`string`	No	Optional scope filter to constrain which LLM telemetry scope is queried (e.g., scope => ’experiments’ or scope => ‘production’).
`event_type`	`string`	No	Optional event type selector for the underlying telemetry (e.g., event_type => ‘span’ or event_type => ’evaluation’).
`filter`	`string`	No	Optional EVP search string. For example: filter => ‘@ml_app:some_app_name AND @status:error’.
`from_timestamp`	`string`	No	Lower time bound for the query; defaults to query context if omitted.
`to_timestamp`	`string`	No	Upper time bound for the query; defaults to query context if omitted.

Example Queries

-- Analyze token usage and duration for Anthropic spans
SELECT * FROM dd.llm_observability(
  columns => ARRAY[
    'discovery_timestamp',
    '@ml_app',
    '@name',
    '@status',
    '@meta.model_name',
    '@meta.model_provider',
    '@metrics.input_tokens',
    '@metrics.output_tokens',
    '@metrics.total_tokens',
    '@duration'
  ],
  event_type => 'span',
  filter => '@meta.model_provider:anthropic'
) AS (
  ts TIMESTAMP,
  ml_app VARCHAR,
  span_name VARCHAR,
  span_status VARCHAR,
  model_name VARCHAR,
  model_provider VARCHAR,
  input_tokens BIGINT,
  output_tokens BIGINT,
  total_tokens BIGINT,
  duration BIGINT);

-- Analyze evaluation metrics for an experiment
SELECT * FROM dd.llm_observability(
  columns => ARRAY[
  'discovery_timestamp',
  '@span_id',
  '@meta.input.job_title',
  '@meta.output.persona',
  'dataset_record_id',
  '@evaluation.external.exact_match.value',
  'experiment_id'],
  scope => 'experiments',
  event_type => 'spans',
  filter => 'experiment_id:some_experiment_id',
  from_timestamp => TIMESTAMP '2026-01-01 00:00:00.000+00:00',
  to_timestamp   => TIMESTAMP '2026-01-05 00:00:00.000+00:00'
  ) AS (
  ts TIMESTAMP,
  span_id VARCHAR,
  job_title VARCHAR,
  persona VARCHAR,
  dataset_record_id VARCHAR,
  exact_match BOOLEAN,
  experiment_id VARCHAR
);

Fields

Title	ID	Type	Data Type	Description
Timestamp	timestamp	core	timestamp	The time when the event occurred, as reported by the source (milliseconds since Unix epoch).
Source	source	core	string	Source of the event (e.g., integration, k9-saist). Applies to: span, evaluation.
Event Status	status	core	string	Top-level event status (e.g., info, ok). Applies to: span, evaluation.
Environment	env	core	string	Environment associated with the event (e.g., staging, prod). Applies to: span.
Service	service	core	string	Service associated with the event (e.g., nlq_translation, hallucination_demo). Applies to: span.
Org ID	org_id	core	int64	Organization identifier associated with this event when tagged (e.g., 2). Applies to: span, evaluation.
Trace ID	@trace_id	event_attribute	string	Unique identifier of the trace this span belongs to (e.g., '6045057188986015289'). Applies to: span, evaluation.
Span ID	@span_id	event_attribute	string	Unique identifier for the LLM span (for span events: the span itself; for evaluation events: the evaluated span). Applies to: span, evaluation.
Parent Span ID	parent_id	core	string	Identifier of the parent span when present (e.g., 'undefined'). Applies to: span.
Event Type	@event_type	event_attribute	string	Type of event within LLM Observability ('span' or 'evaluation').
ML App	@ml_app	event_attribute	string	Name of the LLM/ML application emitting the span (e.g., assistant_evaluation). Applies to: span.
Span Name	@name	event_attribute	string	Logical span name, often mapped to an experiment or evaluation ID (e.g., 'fetch_one'). Applies to: span.
Span Start Time (ns)	@start_ns	event_attribute	int64	Start time of the LLM span in nanoseconds since epoch (e.g., 1762595357539309600). Applies to: span.
Span Duration	@duration	event_attribute	int64	Duration of the LLM span (e.g., 106438139). Applies to: span.
Metric Source	@metric_source	event_attribute	string	Origin of the metric or signal within LLM Observability (e.g., custom, summary). Applies to: evaluation.
Label	@label	event_attribute	string	Label or name of the evaluation metric (e.g., 'recall_at_k', 'theoretical_row_recall_average_pairs'). Applies to: evaluation.
Language	language	core	string	Language associated with the span (e.g., 'python', 'go'). Applies to: span.
Input Tokens	@metrics.input_tokens	event_attribute	int64	Number of input tokens for the LLM request (e.g., 24600). Applies to: span.
Non-Cached Input Tokens	@metrics.non_cached_input_tokens	event_attribute	int64	Number of input tokens that were not served from cache (e.g., 19000). Applies to: span.
Output Tokens	@metrics.output_tokens	event_attribute	int64	Number of tokens in the LLM response (e.g., 150). Applies to: span.
Total Tokens	@metrics.total_tokens	event_attribute	int64	Total tokens accounted for the span. Applies to: span.
Estimated Input Cost	@metrics.estimated_input_cost	event_attribute	int64	Estimated cost attributed to input tokens (e.g., 12637000). Applies to: span.
Estimated Output Cost	@metrics.estimated_output_cost	event_attribute	int64	Estimated cost attributed to output tokens. Applies to: span.
Estimated Non-Cached Input Cost	@metrics.estimated_non_cached_input_cost	event_attribute	int64	Estimated cost attributed to non-cached input tokens. Applies to: span.
Estimated Total Cost	@metrics.estimated_total_cost	event_attribute	int64	Estimated total LLM cost for this span. Applies to: span.
Cache Read Input Tokens	@metrics.cache_read_input_tokens	event_attribute	int64	Number of input tokens read from cache. Applies to: span.
Cache Write Input Tokens	@metrics.cache_write_input_tokens	event_attribute	int64	Number of input tokens written to cache. Applies to: span.
Estimated Cache Read Input Cost	@metrics.estimated_cache_read_input_cost	event_attribute	int64	Estimated cost attributed to cached input token reads. Applies to: span.
Estimated Cache Write Input Cost	@metrics.estimated_cache_write_input_cost	event_attribute	int64	Estimated cost attributed to cached input token writes. Applies to: span.
Num Evaluations Failed	@metrics.num_evaluations_failed	event_attribute	int64	Number of managed/custom evaluations that failed on this span. Applies to: span.
Num Evaluations Passed	@metrics.num_evaluations_passed	event_attribute	int64	Number of managed/custom evaluations that passed on this span. Applies to: span.
Num Evaluations Without Assessment	@metrics.num_evaluations_without_assessment	event_attribute	int64	Number of evaluations executed without an assessment result. Applies to: span.
Model Name	@meta.model_name	event_attribute	string	Name of the model used for this span (e.g., gpt-5, gpt-4.1). Applies to: span.
Model Provider	@meta.model_provider	event_attribute	string	Provider of the model (e.g., openai, anthropic). Applies to: span.
Span Kind	@meta.span.kind	event_attribute	string	Kind of LLM span (e.g., llm, embedding, agent, workflow). Applies to: span.
UI Title	@meta.metadata.ui_title	event_attribute	string	Human-readable title used for UI rendering (e.g., 'Generated widget'). Applies to: span.
UI Content	@meta.metadata.ui_content	event_attribute	string	UI-renderable content associated with this span (e.g., Data source: `metrics` Query: `sum:dd.services.pods{*}`). Applies to: span.
Input Value	@meta.input.value	event_attribute	string	Primary input value to the LLM or evaluation (e.g., 'Hello. I need help'). Applies to: span.
Output Value	@meta.output.value	event_attribute	string	Primary output value from the LLM or evaluation (e.g., Hello! How can I help...). Applies to: span.
Prompt ID	@meta.input.prompt.id	event_attribute	string	Identifier of the prompt used (e.g., generate_answer_prompt). Applies to: span.
Prompt Template	@meta.input.prompt.template	event_attribute	string	Prompt template string used for rendering the final prompt. Applies to: span.
Prompt Template ID	@meta.input.prompt.template_id	event_attribute	string	Stable identifier/hash for the template. Applies to: span.
Prompt Version ID	@meta.input.prompt.version_id	event_attribute	string	Version identifier/hash for the prompt instance. Applies to: span.
Evaluation Source Type	@eval_source_type	event_attribute	string	Source of the evaluation (e.g., external). Applies to: evaluation.
Evaluation Metric Type	@eval_metric_type	event_attribute	string	Type of evaluation metric used (e.g., 'score', 'categorical'). Applies to: evaluation.
Score Value	@score_value	event_attribute	float64	Numeric score value for 'score' metrics (float; e.g., 0.79). Applies to: evaluation.
Failure to Answer Error Message	@evaluation.managed.failure_to_answer.error.message	event_attribute	string	Error message emitted by the managed 'failure_to_answer' evaluation (e.g., 'Not a root span - This happens in case the eval is configured...'). Applies to: span.
Failure to Answer Error Type	@evaluation.managed.failure_to_answer.error.type	event_attribute	string	Error type emitted by the managed 'failure_to_answer' evaluation (e.g., Azure OpenAI - Received HTTP 429). Applies to: span.
Failure to Answer Status	@evaluation.managed.failure_to_answer.status	event_attribute	string	Status emitted by the managed 'failure_to_answer' evaluation (e.g., WARN). Applies to: span.
Goal Completeness Error Message	@evaluation.managed.goal_completeness.error.message	event_attribute	string	Error message emitted by the managed 'goal_completeness' evaluation. Applies to: span.
Goal Completeness Error Type	@evaluation.managed.goal_completeness.error.type	event_attribute	string	Error type emitted by the managed 'goal_completeness' evaluation. Applies to: span.
Goal Completeness Status	@evaluation.managed.goal_completeness.status	event_attribute	string	Status emitted by the managed 'goal_completeness' evaluation. Applies to: span.
Hallucination Status	@evaluation.managed.hallucination.status	event_attribute	string	Status emitted by the managed hallucination/faithfulness evaluation. Applies to: span.
Hallucination Value	@evaluation.managed.hallucination.value	event_attribute	string	Value emitted by the managed hallucination/faithfulness evaluation (e.g., 'hallucination found'). Applies to: span.
Hallucination Score Value	@evaluation.managed.hallucination.score_value	event_attribute	float64	Score emitted by the managed hallucination/faithfulness evaluation. Applies to: span.
Hallucination Eval Metric Type	@evaluation.managed.hallucination.eval_metric_type	event_attribute	string	Metric type emitted by the managed hallucination/faithfulness evaluation (e.g., categorical). Applies to: span.
Hallucination Categorical Value	@evaluation.managed.hallucination.categorical_value	event_attribute	string	Categorical value emitted by the managed hallucination/faithfulness evaluation. Applies to: span.
External Harmfulness Eval Metric Type	@evaluation.external.harmfulness.eval_metric_type	event_attribute	string	Metric type for an external evaluation (e.g., score). Applies to: span.
External Harmfulness Value	@evaluation.external.harmfulness.value	event_attribute	float64	Value for an external evaluation (often equals score_value for score types). Applies to: span.
Evaluation Metric ID	@id	event_attribute	string	ID for the evaluation record (UUID). Applies to: evaluation.
Session ID	@session_id	event_attribute	string	Identifier used to correlate spans belonging to the same user or agent session. Applies to: span.
Billing Usage Attribution Tags	billing_header.usage_attribution_tags	core	array<string>	Tags used for cost attribution and billing purposes (e.g., ml_app:curated_dataset). Applies to: span.
Billable	billing_header.billable	core	bool	Whether the event is billable. Applies to: span.
Quality Evaluation Results	@meta.evaluations.quality	event_attribute	array<string>	List of quality evaluations that matched content in this span (e.g., ['Hallucination']). Applies to: span.
Security Evaluation Results	@meta.evaluations.security	event_attribute	array<string>	List of security scanners that matched content in this span (e.g., ['Standard Email Address','Standard Email Address']). Applies to: span.
Evaluation Timestamp	@eval_timestamp	event_attribute	int64	Timestamp (in milliseconds since epoch) at which the evaluation was executed (e.g., 1766403898176). Applies to: evaluation.
Metric Type	@metric_type	event_attribute	string	Metric value type for the evaluation (e.g., 'score', 'boolean'). Applies to: evaluation.
Error Message	@error.message	event_attribute	string	Error message for failed evaluations (e.g., "'NoneType' object has no attribute 'response'"). Applies to: evaluation.
Error Type	@error.type	event_attribute	string	Error type for failed evaluations (e.g., 'AttributeError'). Applies to: evaluation.
Error Stack Trace	@error.stack	event_attribute	string	Stack trace for failed evaluations. Applies to: evaluation.
Experiment ID	experiment_id	core	string	Identifier of the experiment associated with this evaluation metric. Applies to: span, evaluation.
Dataset ID	dataset_id	core	string	Identifier of the dataset associated with this evaluation metric. Applies to: evaluation.
Project ID	project_id	core	string	Identifier of the project associated with this evaluation metric (typically from tags). Applies to: evaluation.
Event ID	id	core	string	A unique identifier for the event.
Discovery Timestamp	discovery_timestamp	core	int64	The time when Datadog first received the event (milliseconds since Unix epoch). May differ from timestamp if there was an ingestion delay.
Tiebreaker	tiebreaker	core	int64	A value used to establish deterministic ordering among events that share the same timestamp.
Ingest Size	ingest_size_in_bytes	core	int64	The size of the event payload in bytes at the time of ingestion, before any processing.
Random Draw	random_draw	core	float64	A random value between 0.0 and 1.0 assigned at ingestion, useful for consistent sampling across queries.

LLM Observability

Query Parameters

Example Queries

Fields

How can I help you today?