Yes, Ambari Metrics is a horrid, terrible information presentation platform that needs a major overhaul. I am not sure who it was designed for, but it was not for data scientists interested in cluster performance. I really appreciate the work Hortonworks has invested in making hadoop more approachable as a platform, but I was really disappointed in the gutting of Ganglia/Nagios capabilities when the replacement, Ambari Metrics, was just not capable of providing the diagnostic capabilities and layered access to data that Ganglia provided.
Here’s the monitoring console I’m unhappy with:
At first glance, beautiful right? It’s elegant. It looks pretty. It looks fancy. But ask deeper questions.
- Is this a good use of pixels?
- What is the information density of this design?
- Are the graphs optimally displaying the data?
- Can I did down to get more details?
In my opinion, no, lousy, no, not really. I’ll dig in on these examples, and explain how a Tufte believer would approach altering the design.
Use of pixels / information density
Information density in Ambari Metrics is not just bad, its epically bad. Terrible. Horrific. A complete waste of space. Here’s why:
We’re displaying one number – up time. The dashboard graphic uses approximately 145 x 160 pixels to accomplish this – that’s 23,200 pixels for those of you following along. | Within the Hbase management tab you can get a different presentation of the same statistic. This time it consumes 175×20 pixels, or 3,500 pixels, or a footprint that is 85% smaller. |
Here’s another example for a different hadoop component, HDFS. In this case we’re using another 23,000+ pixels to show one number – a percentage. This is yet another immense waste of space. Moving over to the component specific HDFS dashboard you get a much more efficient view… | The representation of disk usage uses 335×20 or 6,700 pixels. At 72% smaller, it’s another massive improvement. It even contains more details without taking any action; the donut plot forces you to roll over the image to obtain the actual metrics, so I’d give the textual representation the nod for a more informative design. |
When you compound this with other metrics, you can really collapse the wasted space on this dashboard to quickly relay relevant data, going so far as to pushing the summary data for multiple hadoop components on to the main landing page rather than burying them one more click away. I much prefer the layout of the Hbase console compared to the hadoop monitoring console:
This is succinct, and relays the same numbers in a much more efficient footprint. You could use color to convey issues in the table as well, and provide access to time series data using sparklines or small multiples of scatter plots and line / time plots.
Poor data accessibility
This dashboard is the epitome of designers getting in the way of data. If I want more information on something like CPU usage, I can take a quick look at the thumbnail plot:
I have no quarrel with this approach. Micro plots used in small multiples are quite effective. They are attempting to relay a higher density of data in the same 23,000 pixel footprint, and it’s effective. What I have a major problem with is data accessibility. In this example, if I click on the thumbnail I get an expanded view:
If I want the actual numbers, I have to be really precise in how I roll over each element of the plot to reveal a value:
Most of the time this will be OK. But what if I want to start looking at what is happening across multiple subsystems from multiple hadoop components at the same point in time? Well, I can create a custom view with a bunch of thumbnail plots, but that is about it.
So why am I ranting? Well, Ambari decommissioned Ganglia and decided to use Ambari Metrics as the replacement. When you make this decision, you had better have your data accessibility at least on par with the tool you are replacing. Take a look at what Ganglia does for data accessibility:
Are you a data nut like me? If so, click on CSV, and we’ll dump the raw data for you. Can you do it better than us? Have at it, here’s the data with one click of a button. Now we can drop down into RRD’s and pull raw data an manipulate stuff on our own, but why are you forcing me to take that action when the prior UI placed access to this data right there, up front?
The short of it
So there are a few comments on why Ambari Metrics are so bad. I’ve felt out a lot of technical folks on calls about information design and data access within Ambari, and I’ve yet to find one technical person that says “hey it’s better than X”. Ambari Metrics needs to do a better job developing for user personas interested in workload analytics and cluster monitoring.
Edit :: HDP 2.4
HDP 2.4 was recently released. I’ll check out any improvements to Ambari Metrics and update this post.