aws_bedrock_data_source

`account_id`

Type: STRING

`data_source`

Type: STRUCT
Provider name: dataSource
Description: Contains details about the data source.

created_at
Type: TIMESTAMP
Provider name: createdAt
Description: The time at which the data source was created.
data_deletion_policy
Type: STRING
Provider name: dataDeletionPolicy
Description: The data deletion policy for the data source.
data_source_configuration
Type: STRUCT
Provider name: dataSourceConfiguration
Description: The connection configuration for the data source.
- confluence_configuration
  Type: STRUCT
  Provider name: confluenceConfiguration
  Description: The configuration information to connect to Confluence as your data source. Confluence data source connector is in preview release and is subject to change.
  - crawler_configuration
    Type: STRUCT
    Provider name: crawlerConfiguration
    Description: The configuration of the Confluence content. For example, configuring specific types of Confluence content.
    - filter_configuration
      Type: STRUCT
      Provider name: filterConfiguration
      Description: The configuration of filtering the Confluence content. For example, configuring regular expression patterns to include or exclude certain content.
      - pattern_object_filter
        Type: STRUCT
        Provider name: patternObjectFilter
        Description: The configuration of filtering certain objects or content types of the data source.
        filters
        Type: UNORDERED_LIST_STRUCT
        Provider name: filters
        Description: The configuration of specific filters applied to your data source content. You can filter out or include certain content.
        exclusion_filters
        Type: UNORDERED_LIST_STRING
        Provider name: exclusionFilters
        Description: A list of one or more exclusion regular expression patterns to exclude certain object types that adhere to the pattern. If you specify an inclusion and exclusion filter/pattern and both match a document, the exclusion filter takes precedence and the document isn’t crawled.
        inclusion_filters
        Type: UNORDERED_LIST_STRING
        Provider name: inclusionFilters
        Description: A list of one or more inclusion regular expression patterns to include certain object types that adhere to the pattern. If you specify an inclusion and exclusion filter/pattern and both match a document, the exclusion filter takes precedence and the document isn’t crawled.
        object_type
        Type: STRING
        Provider name: objectType
        Description: The supported object type or content type of the data source.
      - type
        Type: STRING
        Provider name: type
        Description: The type of filtering that you want to apply to certain objects or content of the data source. For example, the PATTERN type is regular expression patterns you can apply to filter your content.
  - source_configuration
    Type: STRUCT
    Provider name: sourceConfiguration
    Description: The endpoint information to connect to your Confluence data source.
    - auth_type
      Type: STRING
      Provider name: authType
      Description: The supported authentication type to authenticate and connect to your Confluence instance.
    - credentials_secret_arn
      Type: STRING
      Provider name: credentialsSecretArn
      Description: The Amazon Resource Name of an Secrets Manager secret that stores your authentication credentials for your Confluence instance URL. For more information on the key-value pairs that must be included in your secret, depending on your authentication type, see Confluence connection configuration.
    - host_type
      Type: STRING
      Provider name: hostType
      Description: The supported host type, whether online/cloud or server/on-premises.
    - host_url
      Type: STRING
      Provider name: hostUrl
      Description: The Confluence host URL or instance URL.
- s3_configuration
  Type: STRUCT
  Provider name: s3Configuration
  Description: The configuration information to connect to Amazon S3 as your data source.
  - bucket_arn
    Type: STRING
    Provider name: bucketArn
    Description: The Amazon Resource Name (ARN) of the S3 bucket that contains your data.
  - bucket_owner_account_id
    Type: STRING
    Provider name: bucketOwnerAccountId
    Description: The account ID for the owner of the S3 bucket.
  - inclusion_prefixes
    Type: UNORDERED_LIST_STRING
    Provider name: inclusionPrefixes
    Description: A list of S3 prefixes to include certain files or content. For more information, see Organizing objects using prefixes.
- salesforce_configuration
  Type: STRUCT
  Provider name: salesforceConfiguration
  Description: The configuration information to connect to Salesforce as your data source. Salesforce data source connector is in preview release and is subject to change.
  - crawler_configuration
    Type: STRUCT
    Provider name: crawlerConfiguration
    Description: The configuration of the Salesforce content. For example, configuring specific types of Salesforce content.
    - filter_configuration
      Type: STRUCT
      Provider name: filterConfiguration
      Description: The configuration of filtering the Salesforce content. For example, configuring regular expression patterns to include or exclude certain content.
      - pattern_object_filter
        Type: STRUCT
        Provider name: patternObjectFilter
        Description: The configuration of filtering certain objects or content types of the data source.
        filters
        Type: UNORDERED_LIST_STRUCT
        Provider name: filters
        Description: The configuration of specific filters applied to your data source content. You can filter out or include certain content.
        exclusion_filters
        Type: UNORDERED_LIST_STRING
        Provider name: exclusionFilters
        Description: A list of one or more exclusion regular expression patterns to exclude certain object types that adhere to the pattern. If you specify an inclusion and exclusion filter/pattern and both match a document, the exclusion filter takes precedence and the document isn’t crawled.
        inclusion_filters
        Type: UNORDERED_LIST_STRING
        Provider name: inclusionFilters
        Description: A list of one or more inclusion regular expression patterns to include certain object types that adhere to the pattern. If you specify an inclusion and exclusion filter/pattern and both match a document, the exclusion filter takes precedence and the document isn’t crawled.
        object_type
        Type: STRING
        Provider name: objectType
        Description: The supported object type or content type of the data source.
      - type
        Type: STRING
        Provider name: type
        Description: The type of filtering that you want to apply to certain objects or content of the data source. For example, the PATTERN type is regular expression patterns you can apply to filter your content.
  - source_configuration
    Type: STRUCT
    Provider name: sourceConfiguration
    Description: The endpoint information to connect to your Salesforce data source.
    - auth_type
      Type: STRING
      Provider name: authType
      Description: The supported authentication type to authenticate and connect to your Salesforce instance.
    - credentials_secret_arn
      Type: STRING
      Provider name: credentialsSecretArn
      Description: The Amazon Resource Name of an Secrets Manager secret that stores your authentication credentials for your Salesforce instance URL. For more information on the key-value pairs that must be included in your secret, depending on your authentication type, see Salesforce connection configuration.
    - host_url
      Type: STRING
      Provider name: hostUrl
      Description: The Salesforce host URL or instance URL.
- share_point_configuration
  Type: STRUCT
  Provider name: sharePointConfiguration
  Description: The configuration information to connect to SharePoint as your data source. SharePoint data source connector is in preview release and is subject to change.
  - crawler_configuration
    Type: STRUCT
    Provider name: crawlerConfiguration
    Description: The configuration of the SharePoint content. For example, configuring specific types of SharePoint content.
    - filter_configuration
      Type: STRUCT
      Provider name: filterConfiguration
      Description: The configuration of filtering the SharePoint content. For example, configuring regular expression patterns to include or exclude certain content.
      - pattern_object_filter
        Type: STRUCT
        Provider name: patternObjectFilter
        Description: The configuration of filtering certain objects or content types of the data source.
        filters
        Type: UNORDERED_LIST_STRUCT
        Provider name: filters
        Description: The configuration of specific filters applied to your data source content. You can filter out or include certain content.
        exclusion_filters
        Type: UNORDERED_LIST_STRING
        Provider name: exclusionFilters
        Description: A list of one or more exclusion regular expression patterns to exclude certain object types that adhere to the pattern. If you specify an inclusion and exclusion filter/pattern and both match a document, the exclusion filter takes precedence and the document isn’t crawled.
        inclusion_filters
        Type: UNORDERED_LIST_STRING
        Provider name: inclusionFilters
        Description: A list of one or more inclusion regular expression patterns to include certain object types that adhere to the pattern. If you specify an inclusion and exclusion filter/pattern and both match a document, the exclusion filter takes precedence and the document isn’t crawled.
        object_type
        Type: STRING
        Provider name: objectType
        Description: The supported object type or content type of the data source.
      - type
        Type: STRING
        Provider name: type
        Description: The type of filtering that you want to apply to certain objects or content of the data source. For example, the PATTERN type is regular expression patterns you can apply to filter your content.
  - source_configuration
    Type: STRUCT
    Provider name: sourceConfiguration
    Description: The endpoint information to connect to your SharePoint data source.
    - auth_type
      Type: STRING
      Provider name: authType
      Description: The supported authentication type to authenticate and connect to your SharePoint site/sites.
    - credentials_secret_arn
      Type: STRING
      Provider name: credentialsSecretArn
      Description: The Amazon Resource Name of an Secrets Manager secret that stores your authentication credentials for your SharePoint site/sites. For more information on the key-value pairs that must be included in your secret, depending on your authentication type, see SharePoint connection configuration.
    - domain
      Type: STRING
      Provider name: domain
      Description: The domain of your SharePoint instance or site URL/URLs.
    - host_type
      Type: STRING
      Provider name: hostType
      Description: The supported host type, whether online/cloud or server/on-premises.
    - site_urls
      Type: UNORDERED_LIST_STRING
      Provider name: siteUrls
      Description: A list of one or more SharePoint site URLs.
    - tenant_id
      Type: STRING
      Provider name: tenantId
      Description: The identifier of your Microsoft 365 tenant.
- type
  Type: STRING
  Provider name: type
  Description: The type of data source.
- web_configuration
  Type: STRUCT
  Provider name: webConfiguration
  Description: The configuration of web URLs to crawl for your data source. You should be authorized to crawl the URLs. Crawling web URLs as your data source is in preview release and is subject to change.
  - crawler_configuration
    Type: STRUCT
    Provider name: crawlerConfiguration
    Description: The Web Crawler configuration details for the web data source.
    - crawler_limits
      Type: STRUCT
      Provider name: crawlerLimits
      Description: The configuration of crawl limits for the web URLs.
      - max_pages
        Type: INT32
        Provider name: maxPages
        Description: The max number of web pages crawled from your source URLs, up to 25,000 pages. If the web pages exceed this limit, the data source sync will fail and no web pages will be ingested.
      - rate_limit
        Type: INT32
        Provider name: rateLimit
        Description: The max rate at which pages are crawled, up to 300 per minute per host.
    - exclusion_filters
      Type: UNORDERED_LIST_STRING
      Provider name: exclusionFilters
      Description: A list of one or more exclusion regular expression patterns to exclude certain URLs. If you specify an inclusion and exclusion filter/pattern and both match a URL, the exclusion filter takes precedence and the web content of the URL isn’t crawled.
    - inclusion_filters
      Type: UNORDERED_LIST_STRING
      Provider name: inclusionFilters
      Description: A list of one or more inclusion regular expression patterns to include certain URLs. If you specify an inclusion and exclusion filter/pattern and both match a URL, the exclusion filter takes precedence and the web content of the URL isn’t crawled.
    - scope
      Type: STRING
      Provider name: scope
      Description: The scope of what is crawled for your URLs. You can choose to crawl only web pages that belong to the same host or primary domain. For example, only web pages that contain the seed URL “https://docs.aws.amazon.com/bedrock/latest/userguide/" and no other domains. You can choose to include sub domains in addition to the host or primary domain. For example, web pages that contain “aws.amazon.com” can also include sub domain “docs.aws.amazon.com”.
    - user_agent
      Type: STRING
      Provider name: userAgent
      Description: Returns the user agent suffix for your web crawler.
    - user_agent_header
      Type: STRING
      Provider name: userAgentHeader
      Description: A string used for identifying the crawler or bot when it accesses a web server. The user agent header value consists of the bedrockbot, UUID, and a user agent suffix for your crawler (if one is provided). By default, it is set to bedrockbot_UUID. You can optionally append a custom suffix to bedrockbot_UUID to allowlist a specific user agent permitted to access your source URLs.
  - source_configuration
    Type: STRUCT
    Provider name: sourceConfiguration
    Description: The source configuration details for the web data source.
    - url_configuration
      Type: STRUCT
      Provider name: urlConfiguration
      Description: The configuration of the URL/URLs.
      - seed_urls
        Type: UNORDERED_LIST_STRUCT
        Provider name: seedUrls
        Description: One or more seed or starting point URLs.
        url
        Type: STRING
        Provider name: url
        Description: A seed or starting point URL.
data_source_id
Type: STRING
Provider name: dataSourceId
Description: The unique identifier of the data source.
description
Type: STRING
Provider name: description
Description: The description of the data source.
failure_reasons
Type: UNORDERED_LIST_STRING
Provider name: failureReasons
Description: The detailed reasons on the failure to delete a data source.
knowledge_base_id
Type: STRING
Provider name: knowledgeBaseId
Description: The unique identifier of the knowledge base to which the data source belongs.
name
Type: STRING
Provider name: name
Description: The name of the data source.
server_side_encryption_configuration
Type: STRUCT
Provider name: serverSideEncryptionConfiguration
Description: Contains details about the configuration of the server-side encryption.
- kms_key_arn
  Type: STRING
  Provider name: kmsKeyArn
  Description: The Amazon Resource Name (ARN) of the KMS key used to encrypt the resource.
status
Type: STRING
Provider name: status
Description: The status of the data source. The following statuses are possible:
- Available – The data source has been created and is ready for ingestion into the knowledge base.
- Deleting – The data source is being deleted.
updated_at
Type: TIMESTAMP
Provider name: updatedAt
Description: The time at which the data source was last updated.
vector_ingestion_configuration
Type: STRUCT
Provider name: vectorIngestionConfiguration
Description: Contains details about how to ingest the documents in the data source.
- chunking_configuration
  Type: STRUCT
  Provider name: chunkingConfiguration
  Description: Details about how to chunk the documents in the data source. A chunk refers to an excerpt from a data source that is returned when the knowledge base that it belongs to is queried.
  - chunking_strategy
    Type: STRING
    Provider name: chunkingStrategy
    Description: Knowledge base can split your source data into chunks. A chunk refers to an excerpt from a data source that is returned when the knowledge base that it belongs to is queried. You have the following options for chunking your data. If you opt for NONE, then you may want to pre-process your files by splitting them up such that each file corresponds to a chunk.
    - FIXED_SIZE – Amazon Bedrock splits your source data into chunks of the approximate size that you set in the fixedSizeChunkingConfiguration.
    - HIERARCHICAL – Split documents into layers of chunks where the first layer contains large chunks, and the second layer contains smaller chunks derived from the first layer.
    - SEMANTIC – Split documents into chunks based on groups of similar content derived with natural language processing.
    - NONE – Amazon Bedrock treats each file as one chunk. If you choose this option, you may want to pre-process your documents by splitting them into separate files.
  - fixed_size_chunking_configuration
    Type: STRUCT
    Provider name: fixedSizeChunkingConfiguration
    Description: Configurations for when you choose fixed-size chunking. If you set the chunkingStrategy as NONE, exclude this field.
    - max_tokens
      Type: INT32
      Provider name: maxTokens
      Description: The maximum number of tokens to include in a chunk.
    - overlap_percentage
      Type: INT32
      Provider name: overlapPercentage
      Description: The percentage of overlap between adjacent chunks of a data source.
  - hierarchical_chunking_configuration
    Type: STRUCT
    Provider name: hierarchicalChunkingConfiguration
    Description: Settings for hierarchical document chunking for a data source. Hierarchical chunking splits documents into layers of chunks where the first layer contains large chunks, and the second layer contains smaller chunks derived from the first layer.
    - level_configurations
      Type: UNORDERED_LIST_STRUCT
      Provider name: levelConfigurations
      Description: Token settings for each layer.
      - max_tokens
        Type: INT32
        Provider name: maxTokens
        Description: The maximum number of tokens that a chunk can contain in this layer.
    - overlap_tokens
      Type: INT32
      Provider name: overlapTokens
      Description: The number of tokens to repeat across chunks in the same layer.
  - semantic_chunking_configuration
    Type: STRUCT
    Provider name: semanticChunkingConfiguration
    Description: Settings for semantic document chunking for a data source. Semantic chunking splits a document into into smaller documents based on groups of similar content derived from the text with natural language processing.
    - breakpoint_percentile_threshold
      Type: INT32
      Provider name: breakpointPercentileThreshold
      Description: The dissimilarity threshold for splitting chunks.
    - buffer_size
      Type: INT32
      Provider name: bufferSize
      Description: The buffer size.
    - max_tokens
      Type: INT32
      Provider name: maxTokens
      Description: The maximum number of tokens that a chunk can contain.
- context_enrichment_configuration
  Type: STRUCT
  Provider name: contextEnrichmentConfiguration
  Description: The context enrichment configuration used for ingestion of the data into the vector store.
  - bedrock_foundation_model_configuration
    Type: STRUCT
    Provider name: bedrockFoundationModelConfiguration
    Description: The configuration of the Amazon Bedrock foundation model used for context enrichment.
    - enrichment_strategy_configuration
      Type: STRUCT
      Provider name: enrichmentStrategyConfiguration
      Description: The enrichment stategy used to provide additional context. For example, Neptune GraphRAG uses Amazon Bedrock foundation models to perform chunk entity extraction.
      - method
        Type: STRING
        Provider name: method
        Description: The method used for the context enrichment strategy.
    - model_arn
      Type: STRING
      Provider name: modelArn
      Description: The Amazon Resource Name (ARN) of the model used to create vector embeddings for the knowledge base.
  - type
    Type: STRING
    Provider name: type
    Description: The method used for context enrichment. It must be Amazon Bedrock foundation models.
- custom_transformation_configuration
  Type: STRUCT
  Provider name: customTransformationConfiguration
  Description: A custom document transformer for parsed data source documents.
  - intermediate_storage
    Type: STRUCT
    Provider name: intermediateStorage
    Description: An S3 bucket path for input and output objects.
    - s3_location
      Type: STRUCT
      Provider name: s3Location
      Description: An S3 bucket path.
      - uri
        Type: STRING
        Provider name: uri
        Description: The location’s URI. For example, s3://my-bucket/chunk-processor/.
  - transformations
    Type: UNORDERED_LIST_STRUCT
    Provider name: transformations
    Description: A Lambda function that processes documents.
    - step_to_apply
      Type: STRING
      Provider name: stepToApply
      Description: When the service applies the transformation.
    - transformation_function
      Type: STRUCT
      Provider name: transformationFunction
      Description: A Lambda function that processes documents.
      - transformation_lambda_configuration
        Type: STRUCT
        Provider name: transformationLambdaConfiguration
        Description: The Lambda function.
        lambda_arn
        Type: STRING
        Provider name: lambdaArn
        Description: The function’s ARN identifier.
- parsing_configuration
  Type: STRUCT
  Provider name: parsingConfiguration
  Description: Configurations for a parser to use for parsing documents in your data source. If you exclude this field, the default parser will be used.
  - bedrock_data_automation_configuration
    Type: STRUCT
    Provider name: bedrockDataAutomationConfiguration
    Description: If you specify BEDROCK_DATA_AUTOMATION as the parsing strategy for ingesting your data source, use this object to modify configurations for using the Amazon Bedrock Data Automation parser.
    - parsing_modality
      Type: STRING
      Provider name: parsingModality
      Description: Specifies whether to enable parsing of multimodal data, including both text and/or images.
  - bedrock_foundation_model_configuration
    Type: STRUCT
    Provider name: bedrockFoundationModelConfiguration
    Description: If you specify BEDROCK_FOUNDATION_MODEL as the parsing strategy for ingesting your data source, use this object to modify configurations for using a foundation model to parse documents.
    - model_arn
      Type: STRING
      Provider name: modelArn
      Description: The ARN of the foundation model to use for parsing.
    - parsing_modality
      Type: STRING
      Provider name: parsingModality
      Description: Specifies whether to enable parsing of multimodal data, including both text and/or images.
    - parsing_prompt
      Type: STRUCT
      Provider name: parsingPrompt
      Description: Instructions for interpreting the contents of a document.
      - parsing_prompt_text
        Type: STRING
        Provider name: parsingPromptText
        Description: Instructions for interpreting the contents of a document.
  - parsing_strategy
    Type: STRING
    Provider name: parsingStrategy
    Description: The parsing strategy for the data source.

`document_details`

Type: UNORDERED_LIST_STRUCT
Provider name: documentDetails
Description: A list of objects, each of which contains information about the documents that were retrieved.

data_source_id
Type: STRING
Provider name: dataSourceId
Description: The identifier of the data source connected to the knowledge base that the document was ingested into or deleted from.
identifier
Type: STRUCT
Provider name: identifier
Description: Contains information that identifies the document.
- custom
  Type: STRUCT
  Provider name: custom
  Description: Contains information that identifies the document in a custom data source.
  - id
    Type: STRING
    Provider name: id
    Description: The identifier of the document to ingest into a custom data source.
- data_source_type
  Type: STRING
  Provider name: dataSourceType
  Description: The type of data source connected to the knowledge base that contains the document.
- s3
  Type: STRUCT
  Provider name: s3
  Description: Contains information that identifies the document in an S3 data source.
  - uri
    Type: STRING
    Provider name: uri
    Description: The location’s URI. For example, s3://my-bucket/chunk-processor/.
knowledge_base_id
Type: STRING
Provider name: knowledgeBaseId
Description: The identifier of the knowledge base that the document was ingested into or deleted from.
status
Type: STRING
Provider name: status
Description: The ingestion status of the document. The following statuses are possible:
- STARTED – You submitted the ingestion job containing the document.
- PENDING – The document is waiting to be ingested.
- IN_PROGRESS – The document is being ingested.
- INDEXED – The document was successfully indexed.
- PARTIALLY_INDEXED – The document was partially indexed.
- METADATA_PARTIALLY_INDEXED – You submitted metadata for an existing document and it was partially indexed.
- METADATA_UPDATE_FAILED – You submitted a metadata update for an existing document but it failed.
- FAILED – The document failed to be ingested.
- NOT_FOUND – The document wasn’t found.
- IGNORED – The document was ignored during ingestion.
- DELETING – You submitted the delete job containing the document.
- DELETE_IN_PROGRESS – The document is being deleted.
status_reason
Type: STRING
Provider name: statusReason
Description: The reason for the status. Appears alongside the status IGNORED.
updated_at
Type: TIMESTAMP
Provider name: updatedAt
Description: The date and time at which the document was last updated.

`next_token`

Type: STRING
Provider name: nextToken
Description: If the total number of results is greater than the maxResults value provided in the request, use this token when making another request in the nextToken field to return the next batch of results.

`tags`

Type: UNORDERED_LIST_STRING