Azkaban Monitoring

Overview

Azkaban is an open-source workflow engine for Hadoop eco system. It is a batch job scheduler allowing developers to control job execution inside Java and especially Hadoop projects.

Key Components

  • Relational Database (MySQL): Azkaban uses MySQL to store much of its state. Both the AzkabanWebServer and the AzkabanExecutorServer access the DB.
  • AzkabanWebServer: The AzkabanWebServer is the main manager to all of Azkaban. It handles project management, authentication, scheduler, and monitoring of executions. It also serves as the web user interface.
  • AzkabanExecutorServer: Azkaban Executor Server handles the actual execution of the workflow and jobs. Previous versions of Azkaban had both the AzkabanWebServer and the AzkabanExecutorServer features in a single server. The Executor has since been separated into its own server.

Features

  • Compatible with any version of Hadoop
  • Easy to use web UI
  • Simple web and http workflow uploads
  • Project workspaces
  • Scheduling of workflows
  • Modular and pluginable
  • Authentication and Authorization
  • Tracking of user actions
  • Email alerts on failure and successes
  • SLA alerting and auto killing
  • Retrying of failed jobs

Monitoring Capabilities

Azkaban Executor Job Stats

MetricMetric Description
Azkaban Running JobsNumber of Running Jobs.
Azkaban Executed Jobs/SecNumber of executed jobs per second.
Azkaban Failed Jobs/SecNumber of failed jobs per second.
Azkaban Succeeded Jobs/SecNumber of succeeded jobs per second.

Azkaban Container Stats

MetricMetric Description
Azkaban Average Connection’s Duration (Sec)Average duration of open connections in seconds.
Azkaban Maximum Connection’s Duration (Sec)Maximum duration of open connection in seconds.
Azkaban Minimum Connection’s Duration (Sec)Minimum duration of connections in seconds.
Azkaban Total Connection’s Duration (Sec)Total duration of connections in seconds.
Azkaban Average Requests/ConnectionAverage number of requests per connection.
Azkaban Maximum Requests/ConnectionMaximum number of requests per connection.
Azkaban Minimum Requests/ConnectionMinimum number of requests per connection.
Azkaban Accepted Connections/SecNumber of connections accepted per second by the server.
Azkaban Open ConnectionsNumber of connections currently opened.
Azkaban Maximum Open ConnectionsMaximum number of connections opened.
Azkaban Minimum Open ConnectionsMinimum number of opened connections.
Azkaban ThreadsNumber of threads.
Azkaban Idle ThreadsNumber of Idle threads.

Azkaban Flow Stats

MetricMetric Description
Azkaban Flow Elapsed Time (Sec)Total time taken by this flow to execute in seconds
Azkaban Flow StatusStatus of flow. Status is 1 = KILLED, 2 = FAILED, 3 = RUNNING and 4 = SUCCEEDED

Azkaban Sub Flow Stats

MetricMetric Description

Azkaban Sub Flow Elapsed Time (Sec)

Total time taken by this flow to execute in seconds

Azkaban Sub Flow Status

Status of flow. Status is 1 = KILLED, 2 = FAILED, 3 = RUNNING and 4 = SUCCEEDED

Azkaban Sub Flow Map Output Records

Number of map output records in this sub flow

Azkaban Flow Runner Manager Stats

MetricMetric Description
Azkaban Queued FlowsNumber of Queued flows.
Azkaban Maximum Queued FlowsMaximum number of queued flows.
Azkaban Running FlowsNumber of running flows.
Azkaban Maximum Running FlowsMaximum number of running flows.
Azkaban Total Executed Flows/SecTotal number of executed flows per second.

Azkaban Executor Job Callback Stats

MetricMetric Description
Azkaban Job Callbacks/SecNumber of job callbacks per second.
Azkaban Successful Job Callbacks/SecNumber of Successful job callbacks per second.
Azkaban Failed Job Callbacks/SecNumber of Failed job callbacks per second.
Azkaban Active Job CallbacksNumber of active job callbacks.

Azkaban Web Server Executor Manager Stats

MetricMetric Description
Azkaban Last Successful Executor Info Refresh (Sec)Last successful executor info refresh time-stamp in seconds.
Azkaban Thread ActiveStatus of executor thread.Status is 1=True, 0=False.
Azkaban Running FlowsNumber of running flows.
Azkaban Last Thread Check Time (Sec)Check time of last thread in second.
Azkaban Queue Processor ActiveStatus of queued processor.Status is 1=True, 0=False.

Azkaban Web Trigger Manager Stats

MetricMetric Description
Azkaban Last Runner Thread Check Time (Sec)Check Time of Last Runner Thread in seconds.
Azkaban Runner Thread ActiveStatus of Runner thread. Status is 1=True, 0=False.
Azkaban Scanner Idle Time (Sec)Idle time of Scanner in seconds.
Azkaban TriggersNumber of triggers.

Azkaban Coordinator Stats

MetricMetric Description
75thPercentile Service Response Time (ms)75th percentile of time taken for service response in millisecond.
95thPercentile Service Response Time (ms)95th percentile of time taken for service response in millisecond.
98thPercentile Service Response Time (ms)98th percentile of time taken for service response in millisecond.
99thPercentile Service Response Time (ms)99th percentile of time taken for service response in millisecond.
999thPercentile Service Response Time (ms)999th percentile of time taken for service response in millisecond.
Mean Service Response Time (ms)Mean on response time in milliseconds.
50thPercentile Service Response Time (ms)50th percentile of time taken for service response in millisecond.
Minimum Response Time (ms)Minimum time in millisecond for response in server.
Maximum Response Time (ms)Maximum time in millisecond for response in server.
Request/SecNumber of request per second.