Tracking the flow in the mesh of services

I’m currently worried on how our embrace of SOA and REST can take a big impact on our support team. In a world of closed entities where the boundary of the underlaying operation is well defined, a simple logging system is enough to track down the source of the problem. The failed operation is well confined inside an application scope, but on a service oriented architecture the message could have been propagated to other services. Tracking down the source of the problem definitively doesn’t seem to be trivial.

A central logging service is a good start. All message outcomes can be tracked into this service, which is a central shared point to look for answers. But what if your architecture is a large SOA implementation with a mesh of services? If a call to a service generates smaller calls to secondary services, the logging system is still insufficient. One could start reading log by log  in the hope to understand what was the outcome of each service call in order to get the whole picture, but the always-busy support team is not going to be happy with this.

I could think in a couple of simple solutions: one could be to propagate the id of the root message, the one that triggered everything, so that a query with this id to the logging system would return a list of all related message. Our we can scale this to not only track the root message but to propagate in each message (as embedded info into each message) the id of the previous message in the chain. A list of chained message can easily be looked up in our logging system as a list of log messages (of a list of services messages chain). This information should also be embedded into the event drive part of the system, thus all events should also track this chain of ancestor messages. With this approach we can easily visualize which message triggered a resource operation in another service, consequently allowing the technical support team to look and better comprehend into the flow of that certain execution path.

Our we can move to the next level: some sort of exception handling (in the SOA way)…


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s