Earlier, I posted some information on GKrellM and top/htop/screen, which are useful graphical or text-based system monitoring tools.

For a Linux cluster, one may not want to have a ton of GKrellM widgets on the desktop (seen it before, but it’s not a pretty sight). A more elegant tool is Ganglia. Basically, a daemon is run on each node on the cluster, and one can use a web-based tool to monitor the CPU/memory usage and network load. Although clusters usually have a load-balancing software tool implemented, using Ganglia makes it easy to find which nodes are busy, which ones may be down, etc.

The Ganglia site has some demos, copied here and here, if you want to check it out.