azure.cognitiveservices_accounts.model_availability_rate (gauge) | Availability percentage with the following calculation: (total calls - server errors)/total calls. Server errors include any 500-level HTTP responses. Shown as percent |
azure.cognitiveservices_accounts.model_requests (count) | Number of calls made to the model API over a period of time. Applies to PTU, PTU-managed, and pay-as-you-go deployments. |
azure.cognitiveservices_accounts.input_tokens (count) | Number of prompt tokens processed (input) on a model. Applies to PTU, PTU-managed, and pay-as-you-go deployments. |
azure.cognitiveservices_accounts.output_tokens (count) | Number of tokens generated (output) from an OpenAI model. Applies to PTU, PTU-managed, and pay-as-you-go deployments. |
azure.cognitiveservices_accounts.total_tokens (count) | Number of inference tokens processed on a model. Calculated as prompt tokens (input) plus generated tokens (output). Applies to PTU, PTU-managed, and pay-as-you-go deployments. |
azure.cognitiveservices_accounts.count (gauge) | Count of CognitiveServices accounts. |
azure.cognitiveservices_accounts.annotated_pages (count) | Total number of pages processed with annotations. Applies to PTU, ptu-managed and pay-as-you-go deployments. |
azure.cognitiveservices_accounts.audio_input_tokens (count) | Number of audio prompt tokens processed (input) on an openai model. Applies to ptu-managed model deployments. |
azure.cognitiveservices_accounts.audio_output_tokens (count) | Number of audio prompt tokens generated (output) on an openai model. Applies to ptu-managed model deployments. |
azure.cognitiveservices_accounts.generated_images (count) | Total number of images generated. Applies to PTU, ptu-managed and pay-as-you-go deployments. |
azure.cognitiveservices_accounts.total_pages (count) | Total number of pages processed. Applies to PTU, ptu-managed and pay-as-you-go deployments. |
azure.cognitiveservices_accounts.realtime_api_seconds_used (count) | RealtimeAPI number of seconds used. |
azure.cognitiveservices_accounts.prompt_tokens_read_from_cache (count) | Total number of tokens read from the cache. Applies to anthropic model deployments. Surfaced in response usage section as cache_read_input_tokens. |
azure.cognitiveservices_accounts.prompt_tokens_written_to_cache_1_hour_ttl (count) | Number of prompt tokens used to create the 1 hour entry. Applies to anthropic model deployments. Surfaced in response usage section as cache_creation.ephemeral_1h_input_tokens. |
azure.cognitiveservices_accounts.prompt_tokens_written_to_cache_5_minute_ttl (count) | Number of prompt tokens used to create the 5 minute cache entry. Applies to anthropic model deployments. Surfaced in response usage section as cache_creation.ephemeral_5m_input_tokens. |
azure.cognitiveservices_accounts.voice_live_audio_input_tokens (count) | Number of audio input tokens, excluding cached tokens. |
azure.cognitiveservices_accounts.voice_live_audio_output_tokens (count) | Number of audio output tokens. |
azure.cognitiveservices_accounts.voice_live_cached_audio_input_tokens (count) | Number of cached audio input tokens. |
azure.cognitiveservices_accounts.voice_live_cached_text_input_tokens (count) | Number of cached text input tokens. |
azure.cognitiveservices_accounts.voice_live_text_input_tokens (count) | Number of text input tokens, excluding cached tokens. |
azure.cognitiveservices_accounts.voice_live_text_output_tokens (count) | Number of text output tokens. |