- 필수 기능
- 시작하기
- Glossary
- 표준 속성
- Guides
- Agent
- 통합
- 개방형텔레메트리
- 개발자
- Administrator's Guide
- API
- Datadog Mobile App
- CoScreen
- Cloudcraft
- 앱 내
- 서비스 관리
- 인프라스트럭처
- 애플리케이션 성능
- APM
- Continuous Profiler
- 스팬 시각화
- 데이터 스트림 모니터링
- 데이터 작업 모니터링
- 디지털 경험
- 소프트웨어 제공
- 보안
- AI Observability
- 로그 관리
- 관리
Ragas is an evaluation framework for retrieval augmented generation (RAG) applications. Datadog’s Ragas integration enables you to evaluate your production application with scores for faithfulness, answer relevancy, and context precision. You can use these scores to find traces that have a high likelihood of inaccurate answers and review them to improve your RAG pipeline.
For a simplified setup guide, see Ragas Quickstart.
The Faithfulness score evaluates how consistent an LLM’s generation is against the provided ground truth context data.
This score is generated through three steps:
For more information, see Ragas’s Faithfulness documentation.
The Answer Relevancy (or Response Relevancy) score assesses how pertinent the generated answer is to the given prompt. A lower score is assigned to answers that are incomplete or contain redundant information, and higher scores indicate better relevancy. This metric is computed using the question, the retrieved contexts, and the answer.
The Answer Relevancy score is defined as the mean cosine similarity of the original question to a number of artificial questions, which are generated (reverse engineered) based on the response.
For more information, see Ragas’s Answer Relevancy documentation.
The Context Precision score assesses if the context was useful in arriving at the given answer.
This score is modified from Ragas’s original Context Precision metric, which computes the mean of the Precision@k for each chunk in the context. Precision@k is the ratio of the number of relevant chunks at rank k to the total number of chunks at rank k.
Datadog’s Context Precision score is computed by dividing the number of relevant contexts by the total number of contexts.
For more information, see Ragas’s Context Precision documentation.
Datadog’s Ragas evaluations require ragas
v0.1+ and ddtrace
v3.0.0+.
Install dependencies. Run the following command:
pip install ragas==0.1.21 openai ddtrace>=3.0.0
The Ragas integration automatically runs evaluations in the background of your application. By default, Ragas uses OpenAI’s GPT-4 model for evaluations, which requires you to set an OPENAI_API_KEY
in your environment. You can also customize Ragas to use a different LLM.
Instrument your LLM calls with RAG context information. Datadog’s Ragas integration attempts to extract context information from the prompt variables attached to a span.
Examples:
from ddtrace.llmobs import LLMObs
from ddtrace.llmobs.utils import Prompt
with LLMOBs.annotation_context(
prompt=Prompt(
variables={"context": "rag context here"},
rag_context_variable_keys = ["context"], # defaults to ['context']
rag_query_variable_keys = ["question"], # defaults to ['question']
),
name="generate_answer",
):
oai_client.chat.completions.create(...)
from ddtrace.llmobs import LLMObs
from ddtrace.llmobs.utils import Prompt
@llm(model = "llama")
def generate_answer():
...
LLMObs.annotate(
prompt=Prompt(variables={'context': "rag context..."})
)
(Optional, but recommended) Enable sampling. Datadog traces Ragas score generation. These traces contain LLM spans, which may affect your LLM Observability billing. See Sampling.
Run your script and specify enabled Ragas evaluators. Use the environment variable DD_LLMOBS_EVALUATORS
to provide a comma-separated list of Ragas evaluators you wish to enable. These evaluators are ragas_faithfulness
, ragas_context_precision
, and ragas_answer_relevancy
.
For example, to run your script with all Ragas evaluators enabled:
DD_LLMOBS_EVALUATORS="ragas_faithfulness,ragas_context_precision,ragas_answer_relevancy" \
DD_ENV=dev \
DD_API_KEY=<YOUR-DATADOG-API-KEY> \
DD_SITE=
\
python driver.py
To enable Ragas scoring for a sampled subset of LLM calls, use the DD_LLMOBS_EVALUATORS_SAMPLING_RULES
environment variable. Pass in a list of arrays, each containing the following fields:
Field | Description | Required | Type |
---|---|---|---|
sample_rate | Sampling rate from 0 to 1 | Yes | Float |
evaluator_label | RAGAS evaluator to apply rule to | No | String |
span_name | Name of spans to apply rule to | No | String |
In the following example, Ragas Faithfulness scoring is enabled for 50% of all answer_question
spans. Ragas evaluations are disabled for all other spans ("sample_rate": 0
).
export DD_LLMOBS_EVALUATORS_SAMPLING_RULES='[
{
"sample_rate": 0.5,
"evaluator_label": "ragas_faithfulness",
"span_name": "answer_question"
},
{
"sample_rate": 0
}
]'
Ragas supports customizations. For example, the following snippet configures the Faithfulness evaluator to use gpt-4
and adds custom instructions for the prompt to the evaluating LLM:
from langchain_openai import ChatOpenAI
from ragas.metrics import faithfulness
from ragas.llms.base import LangchainLLMWrapper
faithfulness.llm = LangchainLLMWrapper(ChatOpenAI(model="gpt-4"))
faithfulness.statement_prompt.instruction += "\nMake sure text containing code instructions are grouped with contextual information on how to run that code."
Any customizations you make to the global Ragas instance are automatically applied to Datadog’s Ragas evaluators. No action is required.
Ragas scores are sent to Datadog as evaluation metrics. When you view a scored trace in LLM Observability, Ragas scores appear under Custom Evaluations.
You can also configure your LLM Traces page to display Ragas scores.
@meta.span.kind:llm
in All Spans to view only LLM spans.-runner.integration:ragas
to the search field. Datadog automatically traces the generation of Ragas scores. Use this exclusion term to filter out these traces.Your LLM inference could be missing an evaluation for the following reasons:
runner.integration:ragas
to see traces for the Ragas evaluation itself.Use the LLMObs.flush()
command to guarantee all traces and evaluations are flushed to Datadog.
Note: This is a blocking function.
추가 유용한 문서, 링크 및 기사: