Getting Started in Datadog
This page provides a high-level overview of capabilities available on the Datadog site.
The Datadog site navigation varies based on the width of your browser. You can have up to three types of navigation.
To change the navigation type, adjust your browser width.
- Datadog has over
500 integrations officially listed.
- Custom integrations are available through the Datadog API.
- The Agent is open source.
- Once integrations have been configured, all data is treated the same throughout Datadog, whether it is living in a datacenter or in an online service.
Datadog Log Management lets you send and process every log produced by your applications and infrastructure. You can observe your logs in real-time using the Live Tail, without indexing them. You can ingest all of the logs from your applications and infrastructure, decide what to index dynamically with filters, and then store them in an archive.
APM & Continuous Profiler
Datadog Application Performance Monitoring (APM or tracing) provides you with deep insight into your application’s performance—from automatically generated dashboards for monitoring key metrics, like request volume and latency, to detailed traces of individual requests—side by side with your logs and infrastructure monitoring. When a request is made to an application, Datadog can see the traces across a distributed system, and show you systematic data about precisely what is happening to this request.
- All machines show up in the infrastructure list.
- You can see the tags applied to each machine. Tagging allows you to indicate which machines have a particular purpose.
- Datadog attempts to automatically categorize your servers. If a new machine is tagged, you can immediately see the stats for that machine based on what was previously set up for that tag. Read more on tagging.
The host map can be found under the Infrastructure menu. It offers the ability to:
- Quickly visualize your environment
- Identify outliers
- Detect usage patterns
- Optimize resources
See Host Map for more details.
The Event Explorer displays the most recent events generated by your infrastructure and services.
Events can include the following:
- Code deployments
- Service health changes
- Configuration changes
- Monitoring alerts
The Event Explorer automatically gathers events collected by the Agent and installed integrations.
You can also submit your own custom events using the Datadog API, custom Agent checks, DogStatsD, or the Events email API.
In the Event Explorer, filter your events by facets or search queries. Group or filter events by attribute and graphically represent them with event analytics.
Dashboards contain graphs with real-time performance metrics.
- Synchronous mousing across all graphs in a screenboard.
- Vertical bars are events. They put a metric into context.
- Click and drag on a graph to zoom in on a particular timeframe.
- As you hover over the graph, the event stream moves with you.
- Display by zone, host, or total usage.
- Datadog exposes a JSON editor for the graph, allowing for arithmetic and functions to be applied to metrics.
- Share a graph snapshot that appears in the stream.
- Graphs can be embedded in an iframe. This enables you to give a 3rd party access to a live graph without also giving access to your data or any other information.
Monitors provide alerts and notifications based on metric thresholds, integration availability, network endpoints, and more.
- Use any metric reporting to Datadog
- Set up multi-alerts by device, host, and more
@ in alert messages to direct notifications to the right people
- Schedule downtimes to suppress notifications for system shutdowns, off-line maintenance, and more
Network Performance Monitoring
Datadog Network Performance Monitoring (NPM) gives you visibility into your network traffic across any tagged object in Datadog: from containers to hosts, services, and availability zones. Group by anything—from datacenters to teams to individual containers. Use tags to filter traffic by source and destination. The filters then aggregate into flows, each showing traffic between one source and one destination, through a customizable network page and network map. Each flow contains network metrics such as throughput, bandwidth, retransmit count, and source/destination information down to the IP, port, and PID levels. It then reports key metrics such as traffic volume and TCP retransmits.
RUM & Session Replay
Datadog Real User Monitoring (RUM) allows you to visualize and analyze real-time user activities and experiences. With Session Replay, you can capture and view the web browsing sessions of your users to better understand their behavior. In the RUM Explorer, you can not only visualize load times, frontend errors, and page dependencies, but also you can correlate business and application metrics to troubleshoot issues with application, infrastructure, and business metrics in one dashboard.
Serverless lets you write event-driven code and upload it to a cloud provider, which manages all of the underlying compute resources. Datadog Serverless brings together metrics, traces, and logs from your AWS Lambda functions running serverless applications into one view, so that you can optimize performance by filtering to functions that are generating errors, high latency, or cold starts.
Datadog Cloud SIEM (Security Information and Event Management) automatically detects threats to your application or infrastructure. For example, a targeted attack, an IP communicating with your systems matching a threat intel list, or an insecure configuration. These threats are surfaced in Datadog as Security Signals and can be correlated and triaged in the Security Explorer.
Datadog Synthetic Monitoring allow you to create and run API and browser tests that proactively simulate user transactions on your applications and monitor all internal and external network endpoints across your system’s layers. You can detect errors, identify regressions, and automate rollbacks to prevent issues from surfacing in production.
Datadog on Mobile
The Datadog Mobile App, available on the Apple App Store and Google Play Store, gives key data for on-call engineers and business users to follow their service health and triage issues quickly without opening their laptop. Access your organization’s Dashboards, Monitors, Incidents, SLOs and more directly from your mobile device.
Additional helpful documentation, links, and articles: