LLM Observability

This dataset represents LLM Observability events collected by Datadog. It provides per-span visibility into LLM applications, including request/response payloads, token usage, costs, evaluation outcomes, and experiment metadata. This enables analysis of LLM performance, quality, and cost across projects, experiments, models, and applications.

dd.llm_observability

LLM Observability Public Documentation

Monitor LLM Public Documentation

LLM Observability Experiments Public Documentation

Query Parameters

This dataset uses a polymorphic table function. You must specify parameters when querying.

ParameterTypeRequiredDescription
columnsarray<string>YesList of fields to return for each LLM span (e.g., ’timestamp’, ‘@ml_app’, ‘@metrics.total_tokens’).
scopestringNoOptional scope filter to constrain which LLM telemetry scope is queried (e.g., scope => ’experiments’ or scope => ‘production’).
event_typestringNoOptional event type selector for the underlying telemetry (e.g., event_type => ‘span’ or event_type => ’evaluation’).
filterstringNoOptional EVP search string. For example: filter => ‘@ml_app:some_app_name AND @status:error’.
from_timestampstringNoLower time bound for the query; defaults to query context if omitted.
to_timestampstringNoUpper time bound for the query; defaults to query context if omitted.

Example Queries

-- Analyze token usage and duration for Anthropic spans
SELECT * FROM dd.llm_observability(
  columns => ARRAY[
    'discovery_timestamp',
    '@ml_app',
    '@name',
    '@status',
    '@meta.model_name',
    '@meta.model_provider',
    '@metrics.input_tokens',
    '@metrics.output_tokens',
    '@metrics.total_tokens',
    '@duration'
  ],
  event_type => 'span',
  filter => '@meta.model_provider:anthropic'
) AS (
  ts TIMESTAMP,
  ml_app VARCHAR,
  span_name VARCHAR,
  span_status VARCHAR,
  model_name VARCHAR,
  model_provider VARCHAR,
  input_tokens BIGINT,
  output_tokens BIGINT,
  total_tokens BIGINT,
  duration BIGINT);
-- Analyze evaluation metrics for an experiment
SELECT * FROM dd.llm_observability(
  columns => ARRAY[
  'discovery_timestamp',
  '@span_id',
  '@meta.input.job_title',
  '@meta.output.persona',
  'dataset_record_id',
  '@evaluation.external.exact_match.value',
  'experiment_id'],
  scope => 'experiments',
  event_type => 'spans',
  filter => 'experiment_id:some_experiment_id',
  from_timestamp => TIMESTAMP '2026-01-01 00:00:00.000+00:00',
  to_timestamp   => TIMESTAMP '2026-01-05 00:00:00.000+00:00'
  ) AS (
  ts TIMESTAMP,
  span_id VARCHAR,
  job_title VARCHAR,
  persona VARCHAR,
  dataset_record_id VARCHAR,
  exact_match BOOLEAN,
  experiment_id VARCHAR
);

Fields

TitleIDTypeData TypeDescription
TimestamptimestampcoretimestampThe time when the event occurred, as reported by the source (milliseconds since Unix epoch).
SourcesourcecorestringSource of the event (e.g., integration, k9-saist). Applies to: span, evaluation.
Event StatusstatuscorestringTop-level event status (e.g., info, ok). Applies to: span, evaluation.
EnvironmentenvcorestringEnvironment associated with the event (e.g., staging, prod). Applies to: span.
ServiceservicecorestringService associated with the event (e.g., nlq_translation, hallucination_demo). Applies to: span.
Org IDorg_idcoreint64Organization identifier associated with this event when tagged (e.g., 2). Applies to: span, evaluation.
Trace ID@trace_idevent_attributestringUnique identifier of the trace this span belongs to (e.g., '6045057188986015289'). Applies to: span, evaluation.
Span ID@span_idevent_attributestringUnique identifier for the LLM span (for span events: the span itself; for evaluation events: the evaluated span). Applies to: span, evaluation.
Parent Span IDparent_idcorestringIdentifier of the parent span when present (e.g., 'undefined'). Applies to: span.
Event Type@event_typeevent_attributestringType of event within LLM Observability ('span' or 'evaluation').
ML App@ml_appevent_attributestringName of the LLM/ML application emitting the span (e.g., assistant_evaluation). Applies to: span.
Span Name@nameevent_attributestringLogical span name, often mapped to an experiment or evaluation ID (e.g., 'fetch_one'). Applies to: span.
Span Start Time (ns)@start_nsevent_attributeint64Start time of the LLM span in nanoseconds since epoch (e.g., 1762595357539309600). Applies to: span.
Span Duration@durationevent_attributeint64Duration of the LLM span (e.g., 106438139). Applies to: span.
Metric Source@metric_sourceevent_attributestringOrigin of the metric or signal within LLM Observability (e.g., custom, summary). Applies to: evaluation.
Label@labelevent_attributestringLabel or name of the evaluation metric (e.g., 'recall_at_k', 'theoretical_row_recall_average_pairs'). Applies to: evaluation.
LanguagelanguagecorestringLanguage associated with the span (e.g., 'python', 'go'). Applies to: span.
Input Tokens@metrics.input_tokensevent_attributeint64Number of input tokens for the LLM request (e.g., 24600). Applies to: span.
Non-Cached Input Tokens@metrics.non_cached_input_tokensevent_attributeint64Number of input tokens that were not served from cache (e.g., 19000). Applies to: span.
Output Tokens@metrics.output_tokensevent_attributeint64Number of tokens in the LLM response (e.g., 150). Applies to: span.
Total Tokens@metrics.total_tokensevent_attributeint64Total tokens accounted for the span. Applies to: span.
Estimated Input Cost@metrics.estimated_input_costevent_attributeint64Estimated cost attributed to input tokens (e.g., 12637000). Applies to: span.
Estimated Output Cost@metrics.estimated_output_costevent_attributeint64Estimated cost attributed to output tokens. Applies to: span.
Estimated Non-Cached Input Cost@metrics.estimated_non_cached_input_costevent_attributeint64Estimated cost attributed to non-cached input tokens. Applies to: span.
Estimated Total Cost@metrics.estimated_total_costevent_attributeint64Estimated total LLM cost for this span. Applies to: span.
Cache Read Input Tokens@metrics.cache_read_input_tokensevent_attributeint64Number of input tokens read from cache. Applies to: span.
Cache Write Input Tokens@metrics.cache_write_input_tokensevent_attributeint64Number of input tokens written to cache. Applies to: span.
Estimated Cache Read Input Cost@metrics.estimated_cache_read_input_costevent_attributeint64Estimated cost attributed to cached input token reads. Applies to: span.
Estimated Cache Write Input Cost@metrics.estimated_cache_write_input_costevent_attributeint64Estimated cost attributed to cached input token writes. Applies to: span.
Num Evaluations Failed@metrics.num_evaluations_failedevent_attributeint64Number of managed/custom evaluations that failed on this span. Applies to: span.
Num Evaluations Passed@metrics.num_evaluations_passedevent_attributeint64Number of managed/custom evaluations that passed on this span. Applies to: span.
Num Evaluations Without Assessment@metrics.num_evaluations_without_assessmentevent_attributeint64Number of evaluations executed without an assessment result. Applies to: span.
Model Name@meta.model_nameevent_attributestringName of the model used for this span (e.g., gpt-5, gpt-4.1). Applies to: span.
Model Provider@meta.model_providerevent_attributestringProvider of the model (e.g., openai, anthropic). Applies to: span.
Span Kind@meta.span.kindevent_attributestringKind of LLM span (e.g., llm, embedding, agent, workflow). Applies to: span.
UI Title@meta.metadata.ui_titleevent_attributestringHuman-readable title used for UI rendering (e.g., 'Generated widget'). Applies to: span.
UI Content@meta.metadata.ui_contentevent_attributestringUI-renderable content associated with this span (e.g., Data source: `metrics` Query: `sum:dd.services.pods{*}`). Applies to: span.
Input Value@meta.input.valueevent_attributestringPrimary input value to the LLM or evaluation (e.g., 'Hello. I need help'). Applies to: span.
Output Value@meta.output.valueevent_attributestringPrimary output value from the LLM or evaluation (e.g., Hello! How can I help...). Applies to: span.
Prompt ID@meta.input.prompt.idevent_attributestringIdentifier of the prompt used (e.g., generate_answer_prompt). Applies to: span.
Prompt Template@meta.input.prompt.templateevent_attributestringPrompt template string used for rendering the final prompt. Applies to: span.
Prompt Template ID@meta.input.prompt.template_idevent_attributestringStable identifier/hash for the template. Applies to: span.
Prompt Version ID@meta.input.prompt.version_idevent_attributestringVersion identifier/hash for the prompt instance. Applies to: span.
Evaluation Source Type@eval_source_typeevent_attributestringSource of the evaluation (e.g., external). Applies to: evaluation.
Evaluation Metric Type@eval_metric_typeevent_attributestringType of evaluation metric used (e.g., 'score', 'categorical'). Applies to: evaluation.
Score Value@score_valueevent_attributefloat64Numeric score value for 'score' metrics (float; e.g., 0.79). Applies to: evaluation.
Failure to Answer Error Message@evaluation.managed.failure_to_answer.error.messageevent_attributestringError message emitted by the managed 'failure_to_answer' evaluation (e.g., 'Not a root span - This happens in case the eval is configured...'). Applies to: span.
Failure to Answer Error Type@evaluation.managed.failure_to_answer.error.typeevent_attributestringError type emitted by the managed 'failure_to_answer' evaluation (e.g., Azure OpenAI - Received HTTP 429). Applies to: span.
Failure to Answer Status@evaluation.managed.failure_to_answer.statusevent_attributestringStatus emitted by the managed 'failure_to_answer' evaluation (e.g., WARN). Applies to: span.
Goal Completeness Error Message@evaluation.managed.goal_completeness.error.messageevent_attributestringError message emitted by the managed 'goal_completeness' evaluation. Applies to: span.
Goal Completeness Error Type@evaluation.managed.goal_completeness.error.typeevent_attributestringError type emitted by the managed 'goal_completeness' evaluation. Applies to: span.
Goal Completeness Status@evaluation.managed.goal_completeness.statusevent_attributestringStatus emitted by the managed 'goal_completeness' evaluation. Applies to: span.
Hallucination Status@evaluation.managed.hallucination.statusevent_attributestringStatus emitted by the managed hallucination/faithfulness evaluation. Applies to: span.
Hallucination Value@evaluation.managed.hallucination.valueevent_attributestringValue emitted by the managed hallucination/faithfulness evaluation (e.g., 'hallucination found'). Applies to: span.
Hallucination Score Value@evaluation.managed.hallucination.score_valueevent_attributefloat64Score emitted by the managed hallucination/faithfulness evaluation. Applies to: span.
Hallucination Eval Metric Type@evaluation.managed.hallucination.eval_metric_typeevent_attributestringMetric type emitted by the managed hallucination/faithfulness evaluation (e.g., categorical). Applies to: span.
Hallucination Categorical Value@evaluation.managed.hallucination.categorical_valueevent_attributestringCategorical value emitted by the managed hallucination/faithfulness evaluation. Applies to: span.
External Harmfulness Eval Metric Type@evaluation.external.harmfulness.eval_metric_typeevent_attributestringMetric type for an external evaluation (e.g., score). Applies to: span.
External Harmfulness Value@evaluation.external.harmfulness.valueevent_attributefloat64Value for an external evaluation (often equals score_value for score types). Applies to: span.
Evaluation Metric ID@idevent_attributestringID for the evaluation record (UUID). Applies to: evaluation.
Session ID@session_idevent_attributestringIdentifier used to correlate spans belonging to the same user or agent session. Applies to: span.
Billing Usage Attribution Tagsbilling_header.usage_attribution_tagscorearray<string>Tags used for cost attribution and billing purposes (e.g., ml_app:curated_dataset). Applies to: span.
Billablebilling_header.billablecoreboolWhether the event is billable. Applies to: span.
Quality Evaluation Results@meta.evaluations.qualityevent_attributearray<string>List of quality evaluations that matched content in this span (e.g., ['Hallucination']). Applies to: span.
Security Evaluation Results@meta.evaluations.securityevent_attributearray<string>List of security scanners that matched content in this span (e.g., ['Standard Email Address','Standard Email Address']). Applies to: span.
Evaluation Timestamp@eval_timestampevent_attributeint64Timestamp (in milliseconds since epoch) at which the evaluation was executed (e.g., 1766403898176). Applies to: evaluation.
Metric Type@metric_typeevent_attributestringMetric value type for the evaluation (e.g., 'score', 'boolean'). Applies to: evaluation.
Error Message@error.messageevent_attributestringError message for failed evaluations (e.g., "'NoneType' object has no attribute 'response'"). Applies to: evaluation.
Error Type@error.typeevent_attributestringError type for failed evaluations (e.g., 'AttributeError'). Applies to: evaluation.
Error Stack Trace@error.stackevent_attributestringStack trace for failed evaluations. Applies to: evaluation.
Experiment IDexperiment_idcorestringIdentifier of the experiment associated with this evaluation metric. Applies to: span, evaluation.
Dataset IDdataset_idcorestringIdentifier of the dataset associated with this evaluation metric. Applies to: evaluation.
Project IDproject_idcorestringIdentifier of the project associated with this evaluation metric (typically from tags). Applies to: evaluation.
Event IDidcorestringA unique identifier for the event.
Discovery Timestampdiscovery_timestampcoreint64The time when Datadog first received the event (milliseconds since Unix epoch). May differ from timestamp if there was an ingestion delay.
Tiebreakertiebreakercoreint64A value used to establish deterministic ordering among events that share the same timestamp.
Ingest Sizeingest_size_in_bytescoreint64The size of the event payload in bytes at the time of ingestion, before any processing.
Random Drawrandom_drawcorefloat64A random value between 0.0 and 1.0 assigned at ingestion, useful for consistent sampling across queries.