No activity today, make something!
cdent-rhat Minimizing Nova Polling from Ceilometer

20150323162141 cdent  

Draft

Current State Of Affairs

Ceilometer gathers a larger collection of different measurements from Nova. Some of these are provided by notifications but the rest are fetched by the ceilometer-compute-agent which polls the local hypervisor to get the data at an interval defined in the pipeline.yaml file. The chart linked above indicates "Notifcation" or "Pollster" for each meter name.

As of just past the kilo-3 milestone the distinguishing characteristic between what is a notification and what is not is effectively that notifications are made up of facts which could be described as configuration of an instance (e.g disk and memory size, number of configured virtual cpus) and is stored in the Nova data store. These facts are updated by Nova at various times. When they are updated a notification is produced in the form described in info_from_instance in nova/notifications.py.

The compute-agent runs as a service on a compute node. It queries the Nova API to get a list of instances on the current host and then uses a VirtInspector to get state from the instance (e.g. memory.usage). Other than providing the instance ids, Nova is not involved. These measures are then put on the messaging bus to be "collected".

Note that while the memory.usage goes up and down over time there is no change in the facts that Nova holds about the form the instance is supposed to take an uses to generate notifications.

What's Wrong With This?

  • Ceilometer is operating in two domains: the production and consumption of metrics. Ideally it would operate only in the latter. By operating in the former, it is creating an awkward coupling and duplication of knowledge between Ceilometer and Nova.
  • Polling is weird.
  • Testing is made more complex as there is significant ambiguity on when (or even if) a measure will be taken and transmitted to its eventual store. While it should be possible to adjust the polling system to respond to an immediate request for a poll, depending on the number of resources involved it is still hard to predict when the data will be available. On the other hand a signal to tell something to send a notification is easier to confirm, narrowing the ambiguity in testing situations.

A Way To Fix It

One way to improve this situation is to create a service within Nova which:

  • Retrieves some subsection of the measures that ceilometer polls and produces notifications (the mechanics left to others more familiar with Nova's structure).
  • Will produce those notifications both on a cycle (defined by configuration) and on demand (either authentic request to an API endpoint or a signal to a process or something else we dream up).

What Do We Get

In the short term this can lead to more effective (more reliable, less ambiguous) testing.

In the longer term this can lead to a more diverse ecosystem where multiple systems can more easily consume useful metrics from Nova.