Introduction

BlueMind's Tick package is used to monitor large amounts of data (metrics). Some monitored data is raw, but other data is the result of pre-processing to provide more relevance make interpretation and analysis easier.

Every metric has its own tree structure which can contain:

  • datalocation: server name
  • host: host name or IP 
  • meterType: data type
    • gauge: instant measurement
    • counter: incremental counter
    • distsum: counter-amount data pair 
      e.g.:
      • bm-lmtpd.emailSize = (number of emails, total size of emails)
      • bm-lmtpd.emailRecipients = (number of emails, number of recipients)
    • timer: same as distsum but with the amount always expressed in nanoseconds
  • status: depending on the type of data, this status may be ok/failed (e.g. request successful/failed), success/failure (e.g. authentication successful/failed), etc.

Common data

As a rule, metrics are grouped by component.

JVM

There are JVM metrics for every JVM component:

  • bm-<composant>.hprof: the number of hprof files on the server can be used as an indication of a crash
  • bm-<composant>.jvm.*: all the jvm information for this component (current or maximum memory usage, etc.)

Heartbeat

In each component with interactions with the core, you will find the following metrics which are used to make sure that the component is receiving the core's health data:

Metric NameTypeContentAdditional Information
heartbeat.receiver.ageGaugeage of the last heartbeat received

The time between 2 heartbeats.

The core is supposed to send its health information every 4 seconds. Durations exceeding this, or exceeding 8 seconds, may indicate some issue.

heartbeat.receiver.failuresCounternumber of failed heartbeats
heartbeat.receiver.latencyGaugeheartbeat delivery timeTime between the heartbeat being sent by the core and it being received by the component.
heartbeat.receiver.latencyMaxGaugemaximum heartbeat delivery time
heartbeat.receiver.receivedCounternumber of successful heartbeats

Hazelcast

The servers members of the hazelcast cluster comprise the following metric:

Metric NameTypeContentAdditional Information
cluster.membersGauge
The value of this metric must be '3'

Metrics

Metric NameTypeContentAdditional Information
agent.metricsGatheredCounternumber of metrics collected by the agentThis metric is mostly useful for checking whether the agent is working properly: no data means that the agent isn't collecting anything and therefore that it is no longer working.
agent.vmware*
agent host server data

The agent is enabled only if vmware tools are detected on the BlueMind host servers. In this case, the "vSphere Guest SDK" metrics are extracted and historized.

These metrics are used to diagnose issues with BlueMind's virtualization on vmware.

bluemind.cluster


bluemind.cluster.partitions


bm-coreMain BlueMind Engine
callsCountCounternumber of calls received by the core
dirVersionGauge

directory.cluster.eventsCounter

handlingDurationTimerrequest handling time
heartbeat.broadcastCounter

heartbeat.maxPeriodGauge

heartbeat.periodGauge

bm-easMobile Connection Service
executionTimeTimer

responseSizeDistSum

bm-hpsAuthentication Service
authCountCounternumber of connections requests to BlueMind
  • success: successful connection
  • failed: failed connection (wrong username and/or password)
ftlTemplates.requestsCounternumber of page requests
requestsCountCounternumber of hps requests
  • kind: maintenance - maintenance page displays
  • kind: protected - protected pages displays

Used, among other things, to check the number of times the maintenance page has been displayed. Too many "maintenance" requests may indicate an issue.

staticFile.requestsCounternumber of static page requestse.g.: login page
upstreamRequestSizeDistSum

upstreamRequestTimeTimerrequest handling duration
upstreamRequestsCountCounternumber of requests
bm-ipsIMAP Operations Tracking
activeConnectionsGaugenumber of active ips connections
bm-lmtpdEmail Delivery Service
activeConnectionsGaugenumber of active connections
connectionCountCounter

deliveriesCounter

emailRecipientsDistSumnumber of recipients per email
emailSizeDistSumsize of messages
sessionDurationTimer

traffic.transportLatencyTimer

bm-locatorService Localization
executionTimeTimerrequest execution time
requestsCountCounternumber of requests received by the service
  • origin: component that makes the request
  • statusCode: http return code
bm-milterAnalysis and Modification of Emails at SMTP Level
connectionsCountCounter

sessionDurationTimer

traffic.classCounter

traffic.sizeCounter

bm-webserverWeb Application Server
appCache.requestTimeTimer

appCache.requestsCounter

ftlTemplates.requestsCounternumber of display requests generated by the webserver
staticFile.requestsCounternumber of static page display requests
bm-xmppInstant Messaging Service
packetsCountCounternumber of packets sent by the serviceused to assess messaging service usage and whether it is working properly or has stopped
bm-ysnpData Validation Service
authCountCounternumber of requests handled
  • ok statuses: confirmed requests (e.g. authentications accepted for a username/password entered by a user)
  • failed statuses: rejected validations (e.g. failed authentications due to a wrong password)
Other
cpu
processor usage dataused to monitor usage and processor distribution
disk
disk handling spaceused to monitor disk usage space used/free/total/etc. by disk, partition, path, etc.
diskio
number of bites written/read in real timeused to see whether the disk is working properly or excessively
elasticsearch*
ElasticSearch datafor more information and details about ES metrics, please refer to the dedicated documentation https://github.com/influxdata/telegraf/tree/master/plugins/inputs/elasticsearch
imapd.process


influxdb*
metrics storage database data
kapacitor*
tool-specific data
kernel


kernel_vmstat


mem


memcached


net


netstat


nginx


phpfpm


postfix_queue


postgresql
BlueMind database information
processes


swap


syslog


system


  • No labels