---
title: Iceberg Tables (AWS Glue)
description: >-
  Connect AWS Glue to Datadog Data Observability to monitor Iceberg table
  metadata, freshness, and quality.
breadcrumbs: >-
  Docs > Data Observability Overview > Quality Monitoring > Data Lake
  Integrations > Iceberg Tables (AWS Glue)
---

# Iceberg Tables (AWS Glue)

{% callout %}
# Important note for users on the following Datadog sites: app.ddog-gov.com

{% alert level="danger" %}
This product is not supported for your selected [Datadog site](https://docs.datadoghq.com/getting_started/site). ().
{% /alert %}

{% /callout %}

## Overview{% #overview %}

If you're [using the Iceberg framework in AWS Glue](https://docs.aws.amazon.com/glue/latest/dg/aws-glue-programming-etl-format-iceberg.html), you can see metadata from your Iceberg tables in Datadog through the [AWS Glue integration](https://docs.datadoghq.com/integrations/amazon-glue/). Use this data to monitor table schemas, data freshness, row counts, and table sizes.

## Prerequisites{% #prerequisites %}

Before you begin, make sure you have:

- An AWS account with Glue Iceberg tables you want to monitor.
- The [Datadog AWS integration](https://docs.datadoghq.com/integrations/amazon-web-services/) configured for the account.
- IAM permissions to modify the Datadog role's policies.
- (Optional) AWS Lake Formation access if you use it to manage table permissions.

## Configure the AWS account{% #configure-the-aws-account %}

1. Navigate to [**Datadog Data Observability** > **Settings**](https://app.datadoghq.com/datasets/settings/integrations).

1. Click **Configure** next to AWS Glue.

   {% image
      source="https://datadog-docs.imgix.net/images/data_observability/aws_glue/settings-configure-button.89c6340c4a9809eaf5de6e335b1be3f7.png?auto=format"
      alt="AWS Glue configuration option in the Data Observability Settings page" /%}

1. Select an existing AWS account that is already connected to Datadog, or add a new one. For help adding a new account, see the [AWS integration documentation](https://docs.datadoghq.com/integrations/amazon-web-services/).

   {% image
      source="https://datadog-docs.imgix.net/images/data_observability/aws_glue/account-selection.075b327ec679d7d21d2793c447293894.png?auto=format"
      alt="AWS account selection dropdown in the configuration flow" /%}

## Add required IAM permissions{% #add-required-iam-permissions %}

The Data Observability crawler requires additional permissions to monitor Glue Iceberg tables. Attach the following policy to the Datadog IAM role configured for your AWS integration:

```json
{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": [
        "glue:GetCatalog",
        "glue:GetDatabase",
        "glue:GetDatabases",
        "glue:GetJobRun",
        "glue:GetJobRuns",
        "glue:GetJob",
        "glue:GetJobs",
        "glue:GetTable",
        "glue:GetTables",
        "glue:ListJobs",
        "s3:ListBucket",
        "kms:Decrypt",
        "lakeformation:GetDataAccess"
      ],
      "Resource": ["*"]
    },
    {
      "Sid": "AllowIcebergMetadataOnly",
      "Effect": "Allow",
      "Action": [
        "s3:GetObject",
        "s3:GetObjectVersion"
      ],
      "Resource": [
        "arn:aws:s3:::*/metadata/*"
      ]
    }
  ]
}
```

### (Optional) Restrict access to specific databases and tables{% #optional-restrict-access-to-specific-databases-and-tables %}

The policy above grants access to all Glue resources. To monitor only specific databases or tables, replace `Resource: ["*"]` in the example policy above with explicit ARNs of the databases or tables to monitor.

AWS Glue IAM permissions are hierarchical. To access a table, the policy must include the catalog, the database, and the table. Omitting any level results in an access denied error.

| Resource | ARN format                                                        | Example                                                      |
| -------- | ----------------------------------------------------------------- | ------------------------------------------------------------ |
| Catalog  | `arn:aws:glue:<REGION>:<ACCOUNT_ID>:catalog`                      | `arn:aws:glue:us-east-1:123456789012:catalog`                |
| Database | `arn:aws:glue:<REGION>:<ACCOUNT_ID>:database/<DB_NAME>`           | `arn:aws:glue:us-east-1:123456789012:database/analytics`     |
| Table    | `arn:aws:glue:<REGION>:<ACCOUNT_ID>:table/<DB_NAME>/<TABLE_NAME>` | `arn:aws:glue:us-east-1:123456789012:table/analytics/events` |

#### Example policies{% #example-policies %}

{% tab title="Specific databases" %}
To monitor all tables in specific databases, include the catalog, each database, and a wildcard for tables in those databases:

```json
{
  "Effect": "Allow",
  "Action": [
    "glue:GetCatalog",
    "glue:GetDatabase",
    "glue:GetDatabases",
    "glue:GetTable",
    "glue:GetTables"
  ],
  "Resource": [
    "arn:aws:glue:us-east-1:123456789012:catalog",
    "arn:aws:glue:us-east-1:123456789012:database/production_db",
    "arn:aws:glue:us-east-1:123456789012:database/analytics_db",
    "arn:aws:glue:us-east-1:123456789012:table/production_db/*",
    "arn:aws:glue:us-east-1:123456789012:table/analytics_db/*"
  ]
}
```

{% /tab %}

{% tab title="Specific tables" %}
To monitor only specific tables, list each table explicitly. You can also use wildcards to match table name patterns:

```json
{
  "Effect": "Allow",
  "Action": [
    "glue:GetCatalog",
    "glue:GetDatabase",
    "glue:GetDatabases",
    "glue:GetTable",
    "glue:GetTables"
  ],
  "Resource": [
    "arn:aws:glue:us-east-1:123456789012:catalog",
    "arn:aws:glue:us-east-1:123456789012:database/production_db",
    "arn:aws:glue:us-east-1:123456789012:table/production_db/orders",
    "arn:aws:glue:us-east-1:123456789012:table/production_db/customers",
    "arn:aws:glue:us-east-1:123456789012:table/production_db/events_*"
  ]
}
```

The wildcard `events_*` matches tables like `events_clicks`, `events_purchases`, and any other table starting with `events_`.
{% /tab %}

For more information, see the [AWS Glue identity-based policy examples](https://docs.aws.amazon.com/glue/latest/dg/security_iam_id-based-policy-examples.html).

## (Optional) Configure Lake Formation access{% #optional-configure-lake-formation-access %}

If you use AWS Lake Formation to manage access to your Glue Catalog tables, grant the Datadog role access to the databases and tables you want to monitor.

{% tab title="AWS CLI" %}
Use the following commands, replacing the placeholder values with your actual account ID, role name, database name, and S3 bucket:

```bash
PRINCIPAL=arn:aws:iam::<YOUR_AWS_ACCOUNT_ID>:role/<YOUR_DATADOG_ROLE_NAME>

aws lakeformation grant-permissions \
  --principal DataLakePrincipalIdentifier=$PRINCIPAL \
  --resource '{"Database":{"Name":"<YOUR_DATABASE_NAME>"}}' \
  --permissions DESCRIBE SELECT

aws lakeformation grant-permissions \
  --principal DataLakePrincipalIdentifier=$PRINCIPAL \
  --resource '{"TableWildcard":{"DatabaseName":"<YOUR_DATABASE_NAME>"}}' \
  --permissions DESCRIBE SELECT

aws lakeformation grant-permissions \
  --principal DataLakePrincipalIdentifier=$PRINCIPAL \
  --resource '{"DataLocation":{"ResourceArn":"arn:aws:s3:::<YOUR_S3_BUCKET_NAME>"}}' \
  --permissions DATA_LOCATION_ACCESS
```

{% /tab %}

{% tab title="AWS Console" %}

1. In the AWS Console, navigate to **Lake Formation** > **Data lake permissions**.
1. Click **Grant**.
1. Under **Principals**, select **IAM users and roles** and choose your Datadog role.
1. Under **LF-Tags or catalog resources**, select the database and tables you want to monitor.
1. Under **Permissions**, select **DESCRIBE** and **SELECT**.
1. Click **Grant**.

{% image
   source="https://datadog-docs.imgix.net/images/data_observability/aws_glue/lakeformation-permissions.83f4fe5ace11a4d159435fe4c7f7c018.png?auto=format"
   alt="Lake Formation permissions grant dialog in AWS Console" /%}

{% /tab %}

## Configure the crawler{% #configure-the-crawler %}

1. Select the AWS regions where your Glue Iceberg tables are located.

1. Enable the **Quality Monitoring for Apache Iceberg** toggle.

1. (Optional) Enable the **Job Monitoring** toggle if you also want to monitor Glue job health and performance.

1. Choose a sync frequency.

1. (Optional) Enter a catalog name if you use nested Glue catalog features. Leave this field empty for the default catalog.

   {% image
      source="https://datadog-docs.imgix.net/images/data_observability/aws_glue/crawler-configuration.a6bc7496b9cab08d39f11048c6bb4e8a.png?auto=format"
      alt="Crawler configuration showing region selection and sync frequency options" /%}

1. Click **Save**.

## Next steps{% #next-steps %}

After you complete the setup, Datadog begins syncing your Glue Iceberg table metadata in the background. Initial syncs can take up to an hour depending on the number of tables in your catalog.

After the sync completes, your tables appear in the [Data Catalog](https://app.datadoghq.com/datasets/catalog?integration=awsglue%2Fdatabase_account). You can also create a [Data Observability monitor](https://docs.datadoghq.com/monitors/types/data_observability/) to start alerting on freshness and row count.

## Further reading{% #further-reading %}

- [Data Observability Overview](https://docs.datadoghq.com/data_observability/)
- [AWS Integration](https://docs.datadoghq.com/integrations/amazon-web-services/)
- [Data Observability Monitors](https://docs.datadoghq.com/monitors/types/data_observability/)
