Service Page

이 페이지는 아직 한국어로 제공되지 않으며 번역 작업 중입니다. 번역에 관한 질문이나 의견이 있으시면 언제든지 저희에게 연락해 주십시오.
Detailed service page

Overview

Selecting a service on the Service Catalog leads you to the detailed service page. A service is a set of processes that do the same job - for example a web framework or database (read more about how services are defined in Getting Started with APM).

Consult on this page:

Service health

Opt in to the private beta!

Service health is in private beta. To request access, complete the form.

Request Access

The Service Health panel provides a real-time summary of service signals to help you understand if a service needs your attention.

Service health considers many types of signals (including monitors, incidents, Watchdog insights, and error tracking issues) and surfaces the most critical alerts. Additionally, the Service Health panel provides links to associated incidents, which helps you to take necessary actions.

Service Health panel on service page showing an active incident.

To access service health:

  1. Go to APM > Service Catalog.
  2. Hover over a service and click Full Page.
  3. Select Service Health.

The Service Health panel displays the status of your service as Ok, Warning, or Alert if at least one of the following conditions is met:

StatusCondition
AlertMonitors:
- A non-muted alerting P1 monitor is triggered.
- A non-muted monitor with a paging integration setup (PagerDuty or Opsgenie) is triggered.

Incidents:
- An incident of any severity is active.

Watchdog Insights:
- A faulty deployment is active.
- An ongoing APM latency/error rate alert is active.
WarningMonitors:
- A non-muted alerting P2 monitor is triggered.
- A non-muted warning P1 monitor is triggered.
- A non-muted warning monitor with a paging integration setup (PagerDuty or Opsgenie) is triggered.

Incidents:
- An incident of any severity is in a stable state.

Watchdog Insights:
- An ongoing log anomaly alert is active.

Error Tracking Issues:
- A new issue (within 48 hours) requires review.
OkNo signal from critical or alert state is active.

Service monitor

The Service monitor panel surfaces active Monitors and Synthetics tests linked to your service. Datadog also proposes a list of monitors depending on your service type:

Service Monitors

Enable them directly or create your own APM monitors.

Note: Tag any monitor or Synthetic Test with service:<SERVICE_NAME> to attach it to an APM service.

Watchdog Insights

The Watchdog Insights carousel surfaces anomalies and outliers detected on specific tags, enabling you to investigate the root cause of an issue. Insights are discovered from APM, Continuous Profiler, Log Management, and Infrastructure data that include the service tag. These insights are the same insights that appear in each of the product pages. For example, the same Log outliers on the service page can be found in the Logs Explorer.

Watchdog Insights

Click on an insight to see more details, such as the time frame of the insight, related logs or traces, and suggested next steps.

Watchdog Insights details

Summary cards

The service page features summary cards with highlights on your service health. Easily spot potential faulty deployments, click into the card to view details or traces of the latest deployment, or view all deployments on this service. See new issues flagged on your service through our integration with Error Tracking, where errors are automatically aggregated into issues.

Summary cards

Our Service Level Objectives (SLOs) and Incidents summaries allow you to monitor the status of SLOs and ongoing incidents, so that you can keep performance goals top of mind. Click the cards to create a new SLO on the service or declare an incident. The security signals summary highlights how your services react to application threats.

Out-of-the-box graphs

Datadog provides out-of-the-box graphs for any given Service:

  • Requests - Choose to display:

    • The Total amount of requests and errors
    • The amount of Requests and errors per second
  • Latency - Choose to display:

    • The Latency by Version
    • The Latency by Percentile (Avg/p75/p90/p95/p99/p99.9/Max latency of your traced requests) as a timeseries
    • The Historical Latency to compare the Latency distribution with the day and week before
    • The Latency Distribution over the selected timeframe
    • The Latency by Error to evaluate the latency impact of an error on traced requests
    • The Apdex score for web services; learn more about Apdex
  • Error - Choose to display:

    • The Total amount of errors
    • The amount of Errors per second
    • The % Error Rate
  • Dependency Map:

    • The Dependency Map showing upstream and downstream services.
  • Sub-services: When there are multiple services involved, a fourth graph (in the same toggle option as the Dependency Map) breaks down your %of time spent of your service by services or type.

    This represents the relative time spent by traces in downstream services from the current service to the other services or type.

    Note: For services like Postgres or Redis, which are “final” operations that do not call other services, there is no sub-services graph. Watchdog performs automatic anomaly detection on the Requests, Latency, and Error graphs. If there is an anomaly detected, there will be an overlay on the graph and a Watchdog icon you can click for more details in a side panel.

Out of the box service graphs

Export

On the upper-right corner of each graph click on the arrow in order to export your graph into a pre-existing dashboard:

Save to dashboard

Resources

See Requests, Latency, and Error graphs broken down by resource to identify problematic resources. Resources are particular actions for your services (typically individual endpoints or queries). Read more in Getting Started with APM.

Below, there’s a list of resources associated with your service. Sort the resources for this service by requests, latency, errors, and time, to identify areas of high traffic or potential trouble. Note that these metric columns are configurable (see image below).

Resources

Click on a resource to open a side panel that displays the resource’s out-of-the-box graphs (about requests, errors, and latency), a resource dependency map, and a span summary table. Use keyboard navigation keys to toggle between resources on the Resources list and compare resources in a service. To view the full resource page, click Open Full Page.

Refer to the dedicated resource documentation to learn more.

Columns

Choose what to display in your resources list:

  • Requests: Absolute amount of requests traced (per seconds)
  • Requests per second: Absolute amount of requests traced per second
  • Total time: Sum of all time spend in this resource
  • Avg/p75/p90/p95/p99/Max Latency: The Avg/p75/p90/p95/p99/Max latency of your traced requests
  • Errors: Absolute amount of error for a given resource
  • Error Rate: Percent of error for a given resource
Resource columns

Additional sections

Deployments

A service configured with version tags will show versions in the Deployment tab. The version section shows all versions of the service that were active during the selected time interval, with active versions at the top.

By default, you can see:

  • The version names deployed for this service over the timeframe.

  • The times at which traces that correspond to this version were first and last seen.

  • An Error Types indicator, which shows how many types of errors appear in each version that did not appear in the immediately previous version.

    Note: This indicator shows errors that were not seen in traces from the previous version. It doesn’t mean that this version necessarily introduced these errors. Looking into new error types can be a great way to begin investigating errors.

  • Requests per second.

  • Error rate as a percentage of total requests.

You can add columns to or remove columns from this overview table and your selections will be saved. The additional available columns are:

  • Endpoints that are active in a version that were not in the previous version.
  • Time active, showing the length of time from the first trace to the last trace sent to Datadog for that version.
  • Total number of Requests.
  • Total number of Errors.
  • Latency measured by p50, p75, p90, p95, p99, or max.
Deployments

Read more about Deployments on the service page.

Error Tracking

View issues on your service, which are similar errors aggregated together to turn a noisy stream of errors into manageable issues and help you assess the impact of your service’s errors. Read more about issues in Error Tracking.

This tab has overview graphs that show which resources have the most issues and a list of the most common issues occurring in your service. Click on an issue in the list to see details in a side panel, including its stack trace, related code versions, and total error occurrences since inception.

Error Tracking tab

Security

Understand the security posture of the service, including known vulnerabilities exposed in the service’s libraries and security signals on your service, which are automatically created when Datadog detects application attacks impacting your services. The signals identify meaningful threats for you to review instead of assessing each individual attack attempt. Read more about Application Security.

The top section of the security tab has overview graphs that show the number and severity of vulnerabilities, a timeline of attacks, the types of attacks, and attacker information (client IP or authenticated user).

The next section of the panel lists all the vulnerabilities and signals concerning the service. Click on a security vulnerability to open a side panel with relevant details to investigate further and remediate the vulnerability. Click on a security signal to get information about what the detected threat is and what actions you can take to remediate it.

Security

Databases

View the list of downstream database dependencies identified by Database Monitoring and identify latency or load outliers. Learn more about connecting DBM and APM.

Databases

Infrastructure

If your service is running on Kubernetes, you can see an Infrastructure tab on the Service Page. The live Kubernetes Pods table displays detailed information on your pods, such as if memory usage is close to its limit, and allows you to improve resource allocation by seeing if provisioned compute resources exceed what is required for optimal application performance.

Kubernetes Pods

The Kubernetes Metrics section contains a high level summary of your infrastructure health for the selected time period, and includes CPU, Memory, Network, and Disk metrics.

Kubernetes Metrics

For non-Kubernetes environments (such as host-based installation), see the Unified Service Tagging documentation.

Runtime Metrics

If runtime metrics are enabled in the tracing client, you’ll see a Runtime metrics tab corresponding to the runtime language of your service. Read more in Runtime Metrics.

Runtime Metrics

Profiling

You’ll see a Profiling tab if the Continuous Profiler is set up for your service.

Use the information in the Profiling tab to correlate a latency and throughput change to a code performance change.

In this example, you can see how latency is linked to a lock contention increase on /GET train that is caused by the following line of code:

Thread.sleep(DELAY_BY.minus(elapsed).toMillis());

Traces

View the list of traces associated with the service in the traces tab, which is already filtered on your service, environment, and operation name. Drill down to problematic spans using core facets such as status, resource, and error type. For more information, click a span to view a flame graph of its trace and more details.

Traces

Log patterns

View common patterns in your service’s logs, and use facets like status in the search bar to filter the list of patterns. Click on a pattern to open the side panel to view more details, such as what events triggered the cascade. Read more in Log patterns.

Log patterns

Costs

Visualize the cost associate with your service’s infrastructure used in the Costs tab. Learn More about Cloud Cost Management.

Costs

Further Reading