Azkaban Monitoring

Overview

Azkaban is an open-source workflow engine for Hadoop eco system. It is a batch job scheduler allowing developers to control job execution inside Java and especially Hadoop projects.

Key Components

  • Relational Database (MySQL): Azkaban uses MySQL to store much of its state. Both the AzkabanWebServer and the AzkabanExecutorServer access the DB.
  • AzkabanWebServer: The AzkabanWebServer is the main manager to all of Azkaban. It handles project management, authentication, scheduler, and monitoring of executions. It also serves as the web user interface.
  • AzkabanExecutorServer: Azkaban Executor Server handles the actual execution of the workflow and jobs. Previous versions of Azkaban had both the AzkabanWebServer and the AzkabanExecutorServer features in a single server. The Executor has since been separated into its own server.

Features

  • Compatible with any version of Hadoop
  • Easy to use web UI
  • Simple web and http workflow uploads
  • Project workspaces
  • Scheduling of workflows
  • Modular and pluginable
  • Authentication and Authorization
  • Tracking of user actions
  • Email alerts on failure and successes
  • SLA alerting and auto killing
  • Retrying of failed jobs

Monitoring Capabilities

Azkaban Executor Job Stats

Metric Metric Description
Azkaban Running Jobs Number of Running Jobs.
Azkaban Executed Jobs/Sec Number of executed jobs per second.
Azkaban Failed Jobs/Sec Number of failed jobs per second.
Azkaban Succeeded Jobs/Sec Number of succeeded jobs per second.

Azkaban Container Stats

Metric Metric Description
Azkaban Average Connection’s Duration (Sec) Average duration of open connections in seconds.
Azkaban Maximum Connection’s Duration (Sec) Maximum duration of open connection in seconds.
Azkaban Minimum Connection’s Duration (Sec) Minimum duration of connections in seconds.
Azkaban Total Connection’s Duration (Sec) Total duration of connections in seconds.
Azkaban Average Requests/Connection Average number of requests per connection.
Azkaban Maximum Requests/Connection Maximum number of requests per connection.
Azkaban Minimum Requests/Connection Minimum number of requests per connection.
Azkaban Accepted Connections/Sec Number of connections accepted per second by the server.
Azkaban Open Connections Number of connections currently opened.
Azkaban Maximum Open Connections Maximum number of connections opened.
Azkaban Minimum Open Connections Minimum number of opened connections.
Azkaban Threads Number of threads.
Azkaban Idle Threads Number of Idle threads.

Azkaban Flow Stats

Metric Metric Description
Azkaban Flow Elapsed Time (Sec) Total time taken by this flow to execute in seconds
Azkaban Flow Status Status of flow. Status is 1 = KILLED, 2 = FAILED, 3 = RUNNING and 4 = SUCCEEDED

Azkaban Sub Flow Stats

Metric Metric Description

Azkaban Sub Flow Elapsed Time (Sec)

Total time taken by this flow to execute in seconds

Azkaban Sub Flow Status

Status of flow. Status is 1 = KILLED, 2 = FAILED, 3 = RUNNING and 4 = SUCCEEDED

Azkaban Sub Flow Map Output Records

Number of map output records in this sub flow

Azkaban Flow Runner Manager Stats

Metric Metric Description
Azkaban Queued Flows Number of Queued flows.
Azkaban Maximum Queued Flows Maximum number of queued flows.
Azkaban Running Flows Number of running flows.
Azkaban Maximum Running Flows Maximum number of running flows.
Azkaban Total Executed Flows/Sec Total number of executed flows per second.

Azkaban Executor Job Callback Stats

Metric Metric Description
Azkaban Job Callbacks/Sec Number of job callbacks per second.
Azkaban Successful Job Callbacks/Sec Number of Successful job callbacks per second.
Azkaban Failed Job Callbacks/Sec Number of Failed job callbacks per second.
Azkaban Active Job Callbacks Number of active job callbacks.

Azkaban Web Server Executor Manager Stats

Metric Metric Description
Azkaban Last Successful Executor Info Refresh (Sec) Last successful executor info refresh time-stamp in seconds.
Azkaban Thread Active Status of executor thread.Status is 1=True, 0=False.
Azkaban Running Flows Number of running flows.
Azkaban Last Thread Check Time (Sec) Check time of last thread in second.
Azkaban Queue Processor Active Status of queued processor.Status is 1=True, 0=False.

Azkaban Web Trigger Manager Stats

Metric Metric Description
Azkaban Last Runner Thread Check Time (Sec) Check Time of Last Runner Thread in seconds.
Azkaban Runner Thread Active Status of Runner thread. Status is 1=True, 0=False.
Azkaban Scanner Idle Time (Sec) Idle time of Scanner in seconds.
Azkaban Triggers Number of triggers.

Azkaban Coordinator Stats

Metric Metric Description
75thPercentile Service Response Time (ms) 75th percentile of time taken for service response in millisecond.
95thPercentile Service Response Time (ms) 95th percentile of time taken for service response in millisecond.
98thPercentile Service Response Time (ms) 98th percentile of time taken for service response in millisecond.
99thPercentile Service Response Time (ms) 99th percentile of time taken for service response in millisecond.
999thPercentile Service Response Time (ms) 999th percentile of time taken for service response in millisecond.
Mean Service Response Time (ms) Mean on response time in milliseconds.
50thPercentile Service Response Time (ms) 50th percentile of time taken for service response in millisecond.
Minimum Response Time (ms) Minimum time in millisecond for response in server.
Maximum Response Time (ms) Maximum time in millisecond for response in server.
Request/Sec Number of request per second.