#monitoringsucks gave us more tools, but didn’t make using them any easier.
Exponential Complexity (almost)
Every monitoring system has it’s own import and export hooks, which necessitates the reconfiguration of every monitoring system in your infrastructure every time you add a new one. There is a word that describes this arrangement where I come from – a word that rhymes with “full-spit”.
I don’t think the systems are themselves to blame. The best each can do is provide the most elegant and simplistic solution for data I/O that makes sense in their own context, and most of these interfaces are, in fact, elegant and simple. (mostly) No one objects to this or that interface specifically, but the burden of gluing three or four of them together is the sort of thing that sends all of us raging into the blogosphere
Maybe another monitoring system isn’t what we need
What if every monitoring system imported and exported a common data format so that instead of that ugly picture above, they looked like this instead:
Impossible you say?
The Riemann Event Struct does a pretty great job of describing a system-agnostic blob of monitoring data. Whatever your monitoring system, I’d wager the data it collects can be imported into this struct. For most systems this struct is overkill.
In fact, when I think about any other monitoring system in the context of this struct, not only does it fit, but a procedure for performing the translation springs to mind. With Nagios, for example, I’d create a new notification command that used Nagios macros to write this struct out in JSON.
Import is a little more difficult, but still easy enough to imagine. For Nagios, we’d take a JSON-encoded blob off the wire, parse it into a passive check result and inject it into CMD file. There are smarter ways but that gets us there.
The important point is that if every monitoring system provided native support for this struct, we wouldn’t need to think about import/export at all. If we were careful about naming our services etc.. data exchange would “just work”, and all we’d need to worry about is getting blobs on the wire; queuing them, and routing them around – which really is the problem we WANT to worry about, because network architecture, and scale, and environmental specifics is the stuff that actually differs for us users.
I think that, before we grab our torches and pitchforks and mob the vendor floor we should prove out the model. I want to see it working in practice and build a few broken things to make sure we get it right. So to that end I’ve written libhearsay.
Hearsay implements a common data model for monitoring systems, and includes two tools that take care of most of the messaging details. A “scrap” of hearsay is a Riemann Event Struct, plus an optional UID field (to assist with de-duplication and commuting).
The Spewer utility takes a JSON-encoded scrap of hearsay on STDIN or a TCP socket. It then validates the scrap(adding default values if necessary), and puts it on the wire using either a ZeroMQ PUSH or PULL socket.
The generic Listener utility takes a JSON-encoded scrap of hearsay off the wire, validates it, and places it on STDOUT. It takes a “filter” string which you can use to filter out messages you don’t want, and has a ‘Nagios’ mode where it outputs passive check results instead of JSON.
I’m thinking about and writing special purpose listeners for specific monitoring tools which inject the scraps directly into various monitoring systems in the way those systems expect to receive input. The critically important part of this actually working, I think, resides in an admin’s ability to have the listeners “just work”. We should just be able to point the listener at the spewer cloud and magically start seeing updates in the Monitoring systems UI.
Here’s a short list of systems I want to make special-purpose listeners for (your help would be appreciated, if you contribute one of these I will buy you a beer at the next conference we both go to):
Hearsay is written in GO and depends on the gozmq package. It’s super buzzword compliant.
Using just spewer and generic listener and some shell scripts, we should be able to get some systems talking to each other, and even experiment with some messaging patterns to see where stuff breaks, and what I haven’t thought about.
Assuming this isn’t a long road to a dead-end, here’s the plan:
- Step 1. Get a lib implemented and a few simple tools (mostly almost done kind of)
- Step 2. Hack up special purpose listeners (and maybe spewers) to lower the barrier to entry
- Step 3. Push for Native adoption EVERYWHERE.
- Step 4. Narnia
take it easy