Netprobe to Gateway Communication and Efficiency

This article describes how a Netprobe sends sample data to a Gateway and how it does so efficiently. The name for the inter-process protocol for traditional Geneos components is EMF2.

Please note: The below is intended to be purely descriptive for informational purposes and does not in any way try to define the data format.

The primary container of sampled data sent from the Netprobe to the Gateway is the Dataview. Each Dataview is made up of rows of column data and individual headlines. Each row of data is identified by a unique name known as, unsurprisingly, the row name. This is also always shown as the first column in the Active Console.

When as an administrator you refer to data in the Gateway you will use XPaths to identify sets of data items. These XPaths are not used in the Netprobe which only knows about sampler data in the form of Dataviews as described above.

When a sampler updates a Dataview the Netprobe compares the values before and after and sends only those changed items to the Gateway. This means that if only a single cell is updated on a large Dataview then only that one cell is sent. To identify the specific item the protocol also includes the row name as described above. Headlines are treated much the same way.

This update message may be compressed (using the Snappy algorithm) if it is over a certain size. This size is fixed and was chosen to balance the overhead of compression versus sending the data as-is.

Some plugins may not update data in the most efficient way possible and this can cause performance issues in the Gateway. Very few native plugins have this problem but from testing we believe that the TCP-LINKS plugin is one example.

Outside of native plugins, Geneos administrators may implement their own data sources using plugins such as TOOLKIT, SQL-TOOLKIT or the XML-RPC or REST APIs. While the first two are native plugins, the control over the data is in the hands of the administrator and there are times when they may not be used in the most efficient manner possible.

There is an existing Performance Tuning document that describes some of the issues mentioned in this article and also offers more guidance on measuring performance in the Gateway.

Some - but by no means all - types of behaviour could be considered poor practice include:

  • Changing the row name often and/or unnecessarily

    If the row name is changed for what may be the same underlying data then this is has the same effect as deleting the old row and adding a new one. This can happen when the row name is not carefully chosen to represent the data being sampled.

    Row names should be representative of the sampler data and should be constructed to be long lived for the same set of data where possible.

  • Using fixed row names with unsorted data

    Conversely to the above if you provide data with numerically increasing row names as a row ID but the data is produced without regard to the ordering of the rest of the row data then there will be a large number of updates sent to the Gateway.

    With the exception of testing, for example setting the “Show Row Line” in the SQL-TOOLKIT, you should not use an arbitrary row name unless there is no better choice.

  • Removing the old data and adding new

    Some samplers may first delete old data and then repaint the Dataview in full. This adds additional load on the Gateway as it has to re-evaluate all potential XPaths that may apply to the Dataview concerned. In many cases the number of rows is so small as to not have a visible impact.

    This is how, for example, TCP-LINKS appears to behave.

  • Changing column names and positions

    When sending data in via customer samplers such as those mentioned above it is also important to try to use a fixed set of columns and keep the order of the columns fixed.

    An example of what not to do could be a shell script that uses conditional logic to run one of two external data source programs which return similar data but with different column order. This will more than likely result in “undefined behaviour” between samples, but even if is works for the viewer of the data, the underlying Dataview will be replaced and updated in full as described above.

When using the older XML-RPC API the various updating functions do the right thing when used correctly. In this case “correctly” implies not using any of the poor practices mentioned above. I am not aware of the REST API behaviour but as it is loosely modelled on the older XML-RPC API the same principles should apply.

In addition to the performance impact on the Gateway there may also be effects to other Geneos components such as the visualisation tools - Active Console, Web Dashboard - and also the Gateway Hub as the data being published may result in data being rejected until the schema is updated and the data is then published to a new metric series.

1 Like