A Complete Guide to System Monitoring Software

Traducciones al Español
Estamos traduciendo nuestros guías y tutoriales al Español. Es posible que usted esté viendo una traducción generada automáticamente. Estamos trabajando con traductores profesionales para verificar las traducciones de nuestro sitio web. Este proyecto es un trabajo en curso.
Create a Linode account to try this guide with a $ credit.
This credit will be applied to any valid services used during your first  days.

“You can’t manage what you can’t measure” is a business school axiom that applies equally well to the IT world. After all, a lack of visibility into key systems can lead to trouble.

That’s where system monitoring software comes in. It keeps tabs on your IT infrastructure and alerts you when trouble occurs. A system monitoring tool can help you maximize uptime, better understand your systems, and even plan for the future.

This guide explains what system monitoring software is, why you should use it, and what features to look for. It also showcases both the open source and commercial options available.

What is Monitoring Software?

System monitoring software tests the status and measures the performance of your IT infrastructure. At a high level, all monitoring falls into one of two buckets:

  • Systems: A system can be a server, virtual machine (VM), container instance, network device, or a collection of these in a cloud deployment.

  • Service: A service is, broadly speaking, anything that runs on a system. This might be an application on a single system, such as an Apache or Nginx web server. It could also be distributed software, such as a NoSQL database running across thousands of VMs or containers. Services also might refer to underlying tasks such as processes, APIs, or network connections.

A cloud platform can host both systems or services that you monitor. For example, you might keep tabs on the number of VMs or container instances currently running in a cloud deployment. While some cloud providers offer proprietary monitoring tools, you might prefer third-party monitoring alternatives, particularly if you’re looking to avoid vendor lock-in.

Why Should You Use System Monitoring Software?

Although system monitoring tools take many forms, most deliver the same core benefits, including:

  • Expanded Visibility: Monitoring tools show you the status of every system and service within your organization in one place. The ability to spot issues quickly is critical. Many tools have slick interfaces that help make sense of even very large organizations by organizing monitored components into logical groups and hierarchies.

  • Reachability: Knowing whether all your systems and services are running is a bedrock IT responsibility, and a core monitoring function. The top-level dashboards of many products include a simple up/down display that shows status at a glance.

  • Alerting: Many system monitoring tools notify you via email, text, or log entry when trouble occurs. Some tools also allow you to define groups of recipients for different types of events, as every alert does not necessarily need to go to every team member.

  • Autodiscovery: As the number of systems and services in your organization grows, so does the configuration and management burden, but system monitoring tools can help. Many have autodiscovery functions that poll an entire network and automatically populate their databases. Autodiscovery is much faster for initial setup than manual configuration.

    There are two caveats with autodiscovery: It can lead to information overload, and might even degrade performance on some devices. For example, if you configure monitoring software to autodiscover a network switch, every port on that switch returns data from dozens of counters, most of which aren’t useful in day-to-day IT management. Moreover, the modest CPUs and memory found in many low-end switches might not be able to keep pace with frequent polling from a monitoring system.

    To be sure, autodiscovery is a great time-saver for initial configuration. Once you’re up and running, it’s best practice to reduce noise levels by deleting non-critical monitored elements. You can also use different intervals for different metrics. For example, you might need to run application performance monitoring all the time, but only check for switch port errors once daily or even weekly. As with all system monitoring, the key point is to ensure you’re seeing only what you need to see.

  • Improved Reliability: The ability to spot a small problem today might very well prevent a catastrophic failure tomorrow. This is especially true for complex, multi-tier applications.

    For instance, your website might appear to the outside world as a single entity. However, in reality, it might comprise multiple tiers of load balancers, web servers, and back-end database servers. A full or failing drive on any one of these systems could cause cascading failures, taking your entire site offline.

    Monitoring software automatically alerts you to small problems. For example, sending an alert whenever disk usage or disk errors exceed some predefined threshold. Monitoring software can identify small problems before they become big ones.

  • More Accurate Capacity Planning: How many VMs or containers will you need to spin up next year? How much storage will you need? How about RAM or network bandwidth?

    Budgeting is an essential task for any IT team. System monitoring tools help, not only by showing current resource usage, but also how resource usages change over time. This trending data is invaluable in deciding what resources you need for the future.

System Monitoring Tools: Key Features

Here are some common criteria to use in evaluating system monitoring tools:

  • Comprehensive or Point Solution? Many system monitoring applications have extensive, highly customizable feature sets that let you monitor anything, anywhere. Often, that’s exactly what you want, but in some situations it may be overkill.

    Other monitoring products follow the Linux/Unix philosophy of “small tools doing one thing well”. What if you just need to know if a service is reachable, or only want to track network latency over time?

    That’s where more focused point solutions come in. WhatsUp Gold began as a simple tool for monitoring host availability. In its main screen, only four states exist: up, down, maintenance, and unknown. This can be useful when monitoring large numbers of servers, VMs, or container instances where all you really need is a quick display of system availability.

    Similarly, PingPlotter’s main task is to continually send ping packets and graph latency over time. It’s a very simple form of network monitoring, but also highly effective in identifying bandwidth bottlenecks or intermittent outages.

    In fairness, both WhatsUp Gold and PingPlotter have added other functions over time. For example, PingPlotter has a version specifically for tracking cloud performance. However, if all you need is one monitoring function, a point solution may be the right tool.

    Many other monitoring tools sit at the opposite end of the spectrum. These comprehensive solutions can keep tabs on virtually any service, application, system, network, or cloud. They can also present the data with sophisticated modeling tools and seamlessly integrate with third-party analysis tools. Overall, this comprehensive approach is more common with system monitoring software.

  • Agent or Agentless? Most monitoring tools don’t require the installation of agent software on the monitored system or service. Instead, they use standards-based health checks such as ping packets or Simple Network Management Protocol (SNMP) queries for monitoring.

    The agentless approach is easier to install and manage, and may be sufficient for many tasks, but it’s not without limits. For example, an agentless approach puts most of the processing burden on the central monitoring system. Moreover, certain services (e.g. some internal Windows functions) might not be externally visible. Even worse, a network outage might cause monitoring software to miss key events on the monitored system or service.

    You may want to opt for on-board agents, or at least a monitoring tool that offers the option of using them. Agents help offload the central monitoring system and continue data collection even if the monitoring system isn’t available.

  • Data Security: System monitoring involves highly sensitive data (e.g. configurations, log entries, and network topologies) that you definitely don’t want an intruder to see. Look for features in monitoring tools to encrypt data both in-flight and at-rest. Most system monitoring tools support SNMP version 3, which provides authentication and encryption of data in-transit. Many application response time monitoring systems use HTTPS to send data through an encrypted tunnel. Some enterprise-scale system tools discussed here, such as SolarWinds SAM, can encrypt log entries. It’s also best practice to limit access to/from the IP subnet and VLAN on which your monitoring system resides.

  • Deployment Model: Most system monitoring tools began as standalone on-premise applications, though some vendors now have cloud-based services in addition to, or instead of, self-hosting. The standalone offerings include open source tools like Cacti, Icinga, Nagios, and Zabbix. All of these can also be installed on VMs in cloud deployments. There are also vendors that offer monitoring as a cloud-based service, including Site24x7, Atera, and NinjaOne.

  • Language Support: Although most system monitoring tools use English by default, you may want displays or alerts in other languages, especially if your organization spans multiple sites around the world. Monitoring tools with multi-language support include Atera, Site24x7, SolarWinds SAM, and Zabbix. You may want to include language support as a criterion when assessing monitoring software.

The 11 Best System Monitoring Applications

Here are six open source system monitoring tools, two enterprise-grade commercial platforms, and three cloud-based services. Which works best for you depends on your monitoring own needs. In fact, you may ultimately decide on a combination of two or more tools for complete coverage.

Nagios/Nagios XI

Nagios is among the most venerable and widely used system monitoring applications. It’s a highly extensible monitoring and alerting system that can email or text you as soon as a system or service goes offline.

Nagios has two versions. Nagios Core is the open source version, while Nagios XI offers a proprietary interface and commercial support (which is also free for up to seven nodes). Akamai Connected Cloud offers instructions for monitoring Debian and Ubuntu systems using Nagios.

Both Nagios versions have flexible options for monitoring and alerting. Both versions also support a vast collection of “plugins” that extend monitoring capabilities and the types of monitored components.

Nagios works with or without agents. Installed on monitored systems, the Nagios Cross-Platform Agent (NCPA) monitors system load, process state, disk usage, and many other tasks locally. Nagios has NCPA agents for monitoring many aspects of Linux, Windows, and macOS system behavior.

  • Description: Comprehensive monitoring/alerting software
  • Licensing: Open Source (with paid support options)
  • Platforms: Linux/Unix (Nagios Core), Windows, Linux, VMware (Nagios XI)

Icinga

Icinga is an open source fork of Nagios with a Web 2.0 interface and extensive back-end database support, including MySQL/MariaDB, Oracle, and PostgreSQL. The project has two components: Icinga Core performs monitoring and alerting tasks, while Icinga Web aggregates and presents data via a web browser. Icinga also has a RESTful API to integrate extensions without the need to modify Icinga Core.

Icinga supports most Nagios plugins and offers the same logical grouping of systems and services. Like Nagios, Icinga can alert you when any monitored component goes offline. Icinga Web’s top-level dashboard displays the number of hosts and services that are up or down. Additional displays show system, service, or network outages. Nagios also has reporting tools to show trends, alert histograms, and alert history, all of which can help identify problem areas.

Icinga is an ideal choice for cost-sensitive small- and medium-sized organizations. If you’re willing to do the initial configuration and customization, you get a graphical overview of system and service availability. Although it’s free and open source software, the Icinga team also has a paid subscription and support model.

  • Description: Comprehensive monitoring/alerting software
  • Licensing: Open Source (with paid support options)
  • Platforms: Linux, Raspberry Pi (Raspbian), Windows

Prometheus

Prometheus is an open source event monitoring and altering tool. It makes heavy use of time-series databases, which can store large volumes of time-stamped data with fast insertion and retrieval. Prometheus provides a powerful query language called PromQL and easy visualization through integration with the Grafana open source analysis and graphing tool.

Created by devops engineers at SoundCloud, the key design goals for Prometheus were large-scale data collection, efficient storage, and easy instrumentation of services.

Its high scalability makes Prometheus well-suited for aggregating large data sets. For example, one simple query can determine the rate at which HTTP requests came in across all web servers in the past hour, even for a very large website. Many other monitoring tools require you to query each individual webserver and then aggregate the data. Even then, you still need to time-slice the data.

You can extend Prometheus with built-in client libraries for Go, Java, Python, Ruby, and Rust. The project also includes unofficial third-party libraries for many other languages.

  • Description: High-speed monitoring/alerting, graphing, and query tool for large datasets
  • Licensing: Open Source (with third-party paid support options)
  • Platforms: Linux, macOS (Darwin), Windows

Cacti

Cacti is another open source monitoring tool best known for graphing time-based data series. Although its most common use cases are for graphing network bandwidth and server resource usage, Cacti can plot performance over time for any system or service that supports SNMP. That makes it a simple, extensible tool for understanding system, service, or network trends.

Cacti is popular among ISPs, MSPs, and cloud providers. It can provide each customer with a dedicated login, showing performance over time for their VMs and/or containers.

Cacti works as a front-end for RRDtool, the open source data-logging tool. It can run agentless or with a small agent called a remote data collector that continues to run even if the core Cacti database is unavailable.

One thing Cacti does not do is alert you when a system or service goes offline. Its strength is in its ability to graph performance data over time.

  • Description: Monitoring/graphing tool for time-based data series
  • Licensing: Open Source
  • Platforms: Linux, Unix, Windows

Observium

Although it began as a network monitoring tool, the open source Observium project can monitor any system or service that uses SNMP.

Observium setup and management is straightforward. It can autodiscover devices and services, track state changes, and build time-based performance graphs. The software includes monitors for Apache, BIND, Memcached, MySQL, and other popular open source applications.

Observium has three pricing tiers, all able to track an unlimited number of systems and services. A free Community version auto-discovers devices using SNMP queries and graphs systems or services using RRDtool. A Professional version targets small- or medium-sized businesses with commercial support, and adds alerting and a RESTful API. An Enterprise version offers a higher level of support and the ability to scale across multiple servers.

  • Description: Comprehensive monitoring platform with alerting available in commercial versions
  • Licensing: Open Source (with commercial supported versions available)
  • Platforms: Linux

Zabbix

Zabbix is another well-established open source systems monitoring tool. Zabbix began with server and network checks, but is now capable of monitoring applications, cloud deployments, and virtually any service.

Zabbix comes with predefined templates that make quick work of basic tasks like server health checks and network interface monitoring. It’s also extensible, allowing you to write custom templates and modules. Zabbix supports agent and agentless operation, and users have reported excellent scalability with both approaches.

Some users report that the Zabbix design doesn’t lend itself to automated tasks, such as an out-of-the-box ability to generate and email reports. While you can do this, it requires scripting and integration with other tools.

Although Zabbix is free and open source software with a GPLv2 license, it’s always been a closed development product, with all coding done by Zabbix LLC. The company also offers support and monitoring-as-a-service options.

  • Description: Comprehensive monitoring/alerting software
  • Licensing: Open Source (with paid support options)
  • Platforms: Linux

SolarWinds Server and Application Monitor (SAM)

SolarWinds Server and Application Monitor (SAM) is a well-known commercial monitoring platform. The capabilities of this enterprise-scale application extend well beyond simple monitoring and alerting, and as a commercial product, it’s fully supported.

SAM includes more than 1,200 ready-made templates for system and service monitoring, plus another 1,000 community-developed templates. These templates cover virtually all aspects of IT infrastructure, including operating systems, application performance monitoring, proxy servers, databases, DNS and DHCP servers, mail servers, and web services. Each of those categories includes many different systems or applications. For example, SAM offers templates to monitor 14 popular databases. You can also integrate SAM with many third-party analysis tools.

SAM goes beyond system monitoring with configuration management and log aggregation tools picked up through its Kiwi acquisition. For example, you can run nightly downloads of web server, router, or switch configurations, and, if necessary, roll back unwanted changes. For log management, SAM can sort, re-direct, compress, and email incoming syslog data.

SAM is server-centric, and monitors any type of Windows, Linux, or Unix server, VMware virtual machine, or Docker container. If you want to monitor network devices and bandwidth usage, SolarWinds offers a separate product called Network Performance Monitor.

  • Description: Comprehensive platform for monitoring and alerting with configuration and log management
  • Licensing: Commercial
  • Platforms: Windows Server

Paessler PRTG Network Monitor

PRTG Network Monitor is another enterprise-grade commercial monitoring and alerting application noted for its ease of setup and flexible pricing options.

The tool includes hundreds of ready-to-use “sensors”. Similar to templates in SolarWinds SAM or Zabbix, these sensors automatically monitor popular systems, services, network devices, and cloud platforms. For data visualization, PTRG Network Monitor includes predefined topology maps, but also lets you configure your own dashboard.

PRTG Network Monitor works exclusively in agentless mode, and puts a heavy emphasis on being a turnkey solution. However, the software integrates with third-party management tools like Splunk for log analysis, NetBrain for issue remediation, and PagerTree or xMatters for alert management.

Unusual among monitoring vendors, Paessler lists pricing on its website. The company calculates pricing tiers based on the number of services monitored. It ranges from about 500 for small- and medium-sized businesses, up to about 10,000 for very large environments.

  • Description: Comprehensive platform for monitoring and alerting
  • Licensing: Commercial
  • Platforms: Windows Server (with access via web browser or Android or iOS apps)

Site24x7

Site24x7 takes a software-as-a-service (SaaS) approach to system and service monitoring, with data collection and reporting hosted on the company’s cloud servers.

You can use “tools” in Site24x7, similar to “modules” in SolarWinds SAM, to monitor systems and services. Site24x7 tools monitor commercial applications such as Microsoft Exchange, SQL Server, and Active Directory, but also open source offerings such as Memcached, MySQL, Nginx, and many others.

Other tools focus on application performance monitoring (APM) for software running on Java, .NET, Node.js, PHP, or Ruby. For complex applications, such as payment systems built using multiple tiers, the APM tools use a distributed tracing mechanism to identify the exact point where faults and response-time bottlenecks occur.

Site24x7 also has tools for Linux VM, container, and cloud monitoring. You can use purpose-built tools to track performance of individual servers, Hadoop clusters, and Docker instances. Site24x7 tools also monitor Linux service metrics such as process and thread counts. For security, you can use Site24x7 to track changes to files, directories, and logs. All these tools alert you when a system or service goes offline or crosses a defined threshold.

  • Description: Comprehensive cloud-based monitoring and alerting platform
  • Licensing: Commercial
  • Platforms: Cloud-based SaaS model (viewed from a web browser or from iOS or Android apps)

Atera & NinjaOne

The final two offerings, Atera and NinjaOne (formerly NinjaRMM), are commercial products in the remote monitoring and management (RMM) space. The “remote” part of RMM means these are SaaS offerings, like Site24x7. The “management” part covers proactive tasks (e.g. patch management) and automation of common tasks (e.g. disk checks and antivirus signature updating). In other words, these products not only monitor systems and services, they also reconfigure them as needed.

You can use either to track performance of servers, network infrastructure, and applications. Either platform can monitor physical machines, VMs, containers, cloud deployments, and any type of network infrastructure. Both also send alerts when something fails.

Because they’re cloud-based, RMM offerings don’t require you to set up and manage one or more servers just for monitoring. Whether they’re superior to self- and cloud-hosted monitoring software that you install and manage really depends on your needs.

RMM products are strongest in situations when it comes to endpoint monitoring and management. RMMs may be an attractive option if you manage a large number of desktop systems and/or VMs, and need to ensure they’re all fully patched and have current antivirus signatures. On the other hand, if system and service monitoring is your key requirement, you may find more features and/or reporting capabilities in some of the other purpose-built monitoring tools described here.

  • Description: Comprehensive, cloud-based remote monitoring and management (RMM) platform
  • Licensing: Commercial
  • Platforms: Cloud-based SaaS model

Conclusion

There are many options when it comes to system monitoring software. In fact, your best choice may be a mix of two or more tools. A useful starting point is to envision how you’d like to consume monitored data, then work backwards from there. This approach is far more efficient than configuring every single system and service in your organization, only to discover a given monitoring interface really doesn’t serve your needs after all.

Fortunately, with VM and container hosting on Akamai Connected Cloud, you can spin up a prototype monitoring setup to test out several options. Contact Akamai today if you’d like to learn more about options for hosting system monitoring software.

This page was originally published on


Your Feedback Is Important

Let us know if this guide was helpful to you.


Join the conversation.
Read other comments or post your own below. Comments must be respectful, constructive, and relevant to the topic of the guide. Do not post external links or advertisements. Before posting, consider if your comment would be better addressed by contacting our Support team or asking on our Community Site.
The Disqus commenting system for Linode Docs requires the acceptance of Functional Cookies, which allow us to analyze site usage so we can measure and improve performance. To view and create comments for this article, please update your Cookie Preferences on this website and refresh this web page. Please note: You must have JavaScript enabled in your browser.