Investigation of Internet Object Cache Performance Bottlenecks
by Bertold Kolics
Internet object caching is one of the biggest driving forces in the Internet related development area, because it saves bandwidth and reduces access latency for popular objects. Since 1995 about 50 cache servers have been installed in Hungarnet, the local academic and higher education network in Hungary. These servers form a loose, at most three-level hierarchy with one server at the top. It is in the nature of the currently used hierarchy that the topmost server perceives the largest load. Accordingly, the investigation focuses primarily on the performance of the top-level server. The primary goal of the project conducted at SZTAKI is to develop tools for identifying performance bottlenecks, by using extensive monitoring and log analysis at the investigated host. As the number of requests increases day after day, the prediction of performance bottlenecks in the future is also of interest.
Different levels of Internet object caching are illustrated in Fig. 1. In the case of Hungarnet a loose cache hierarchy has been deployed, with only one server at the top. TLC in a circle represents the place of the investigated host - the top-level cache. This server acts as a parent cache, ie, its child caches request objects from it if the requested URL cannot be found in the local cache. (In this context an object is an entity identified by a unique URL.) Furthermore, the top-level server also changes information with other caches, ie, it can only retrieve an object from such a neighbour cache if the requested URL can be found at the neighbour. Therefore, in this kind of hierarchy the top-level server perceives the largest load.
In most cases Squid is used as a proxy cache software and it follows that the host operating systems are UNIX-clones. Thus, the performance problems can have network-, Squid-, hardware- or UNIX-specific sources.
The monitoring of the caching host is performed in several ways: First, an SNMP client retrieves status data regularly from the caching software and the result is displayed in a graphical output produced by MRTG. Variables such as the number of requests, service time of hits (objects serviced from the cache) and misses (objects serviced using external sources), traffic volumes are tracked. Trends, unusual events can be easily identified with this type of monitoring. Second, a summary of the cache activities is created (with Calamaris) on a daily basis by using the request log file of Squid. This log file includes, among other things, the timestamp of the request, the service time, the clients IP address, the requested URL, the size of the object. From this summary the cache manager can identify what the cache is primarily used for (eg which sites/domains are popular), how the cache behaves towards different clients, what the gains of the cache hierarchy are, how much savings the cache achieved on the given day.
Third, the underlying OS is monitored. The top-level server runs on a recent Solaris operating system. OS specific tools as iostat, vmstat can be used for monitoring the performance of the system, but at this server we use The SE Performance Toolkit. The toolkit makes use of the standard libraries shipped with Solaris to read kernel values. It evaluates these values in every 30 second and logs the unusual or unwanted system performance specific events (eg too much load on network interface card or on a specific disk). 30 second sampling time is long enough to eliminate bursts in the system load and small enough not to lose valuable information by taking averages for long periods of time.
By evaluating the information gained by this three-level cache analysis, the cache manager can identify possible bottlenecks in the cache infrastructure more accurately in an easier way.
The project itself is Hungarnet-specific, but the results and experiences of the investigations can be useful for other cache operators as well. A cache management system with performance measuring/evaluation module is to be built in the near future.
Currently, two institutions from Hungary are involved in the project: SZTAKI, Network Department and Systems & Control Laboratory and the Technical University of Budapest, Department of Control Engineering and Information Technology. The only international partner of the project is the TF-CACHE project of TERENA. Co-operation with other ERCIM members in this field would be welcome.Please contact:
Bertold Kolics - SZTAKI
Tel: +36 1 349 7532
E-mail: Bertold.Kolics@sztaki.hu