Webalizer PHP

Current version: 0.28 - released 2009-11-09

Page last updated: 2010-01-07

Author: Peter Mescalchin

About

Webalizer PHP is a PHP5 application designed to serve up password protected web stats generated by the popular Webalizer.

In addition the application reformats the default HTML of Webalizer stats through a rather complex series of regular expressions to produce valid and lighter XHTML output which is then styled via CSS. This produces pages that are typically 30-50% smaller than their originals, largely due to the replacement of deprecated <font>, <strong> and related tags with class/id attributes.

It also functions nicely with the excellent Geolizer HTTP stats patch, which adds better quality location/country based stats using the GeoIP API library, flag icons for countries and displays file sizes in (bytes/KB/MB/GB/TB) instead of the default kilobytes.

Webalizer PHP was originally written to get around Plesk's default Webalizer password protection method of HTTP authentication, with usernames/passwords that are identical to their respective FTP login - which is not always ideal. The ability to repurpose the default Webalizer HTML was a nice bonus.

If you have any questions, comments, or possible improvements please contact me.

Updates

Screenshots

Requirements

Installation

Downloads

Links

webalizer.conf - search engines & search engine spiders

For those that are interested, here is my current list of search engines and search engine spider exclusions used within webalizer.conf. This information is extracted from my live web server logs. Any suggestions or improvements are welcome.

Last updated: 2010-01-07

SearchEngine	aolsearch.	q=
SearchEngine	ask.com	p=
SearchEngine	bing.	q=
SearchEngine	bingj.	q=
SearchEngine	facebook.	q=
SearchEngine	fastbrowsersearch.com q=
SearchEngine	google.	q=
SearchEngine	search.alot.	q=
SearchEngine	search.aol.	q=
SearchEngine	search.yahoo.	p=
IgnoreAgent	AdsBot-Google (+http://www.google.com/adsbot.html)
IgnoreAgent	Alltop/1.1
IgnoreAgent	Apache/*
IgnoreAgent	Apple-PubSub/*
IgnoreAgent	AppleSyndication/*
IgnoreAgent	Baiduspider+*
IgnoreAgent	blogged_crawl/0.3
IgnoreAgent	Bloglines/3.1*
IgnoreAgent	facebookexternalhit/*
IgnoreAgent	Feedfetcher-Google;*
IgnoreAgent	GingerCrawler/1.0*
IgnoreAgent	Googlebot-Image/1.0
IgnoreAgent	ia_archiver*
IgnoreAgent	Jakarta Commons-HttpClient/3.1
IgnoreAgent	Java/*
IgnoreAgent	libwww-perl/*
IgnoreAgent	Linguee Bot*
IgnoreAgent	Mozilla/5.0 (compatible; aiHitBot-DS/1.0; +http://www.aihit.com/)
IgnoreAgent	Mozilla/5.0 (compatible; Ask Jeeves/Teoma; +http://about.ask.com/en/docs/about/webmasters.shtml)
IgnoreAgent	Mozilla/5.0 (compatible; DotBot/1.1; http://www.dotnetdotcom.org/, crawler@dotnetdotcom.org)
IgnoreAgent	Mozilla/5.0 (compatible; Dow Jones Searchbot)
IgnoreAgent	Mozilla/5.0 (compatible; Exabot/3.0*
IgnoreAgent	Mozilla/5.0 (compatible; Google Desktop/*
IgnoreAgent	Mozilla/5.0 (compatible; Googlebot/2.1;*
IgnoreAgent	Mozilla/5.0 (compatible; LinksManager.com_bot +http://linksmanager.com/linkchecker.html)
IgnoreAgent	Mozilla/5.0 (compatible; MJ12bot/v1.3.1; http://www.majestic12.co.uk/bot.php?+)
IgnoreAgent	Mozilla/5.0 (compatible; ScoutJet; +http://www.scoutjet.com/)
IgnoreAgent	Mozilla/5.0 (compatible; Yahoo! Slurp*
IgnoreAgent	Mozilla/5.0 (compatible; YodaoBot/1.0; http://www.youdao.com/help/webmaster/spider/; )
IgnoreAgent	Mozilla/5.0 (Twiceler-0.9 http://www.cuil.com/twiceler/robot.html)
IgnoreAgent	msnbot/*
IgnoreAgent	NetworkedBlogs
IgnoreAgent	OctoBot/1.2.0 (OctoBot/1.2.0; +http://www.octofinder.com/octobot.html)
IgnoreAgent	OutlookConnector/*
IgnoreAgent	PHP/*
IgnoreAgent	PycURL/*
IgnoreAgent	Python-urllib/*
IgnoreAgent	Sogou web spider/4.0(+http://www.sogou.com/docs/help/webmasters.htm#07)
IgnoreAgent	Sosospider+(+http://help.soso.com/webspider.htm)
IgnoreAgent	Speedy Spider (http://www.entireweb.com/about/search_tech/speedy_spider/)
IgnoreAgent	UniversalFeedParser/4.1 +http://feedparser.org/
IgnoreAgent	Windows-RSS-Platform/*
IgnoreAgent	www.webwombat.com.au
IgnoreAgent	Yahoo! MyBlogLog API Client (curl) 5.2.5
IgnoreAgent	Yandex/*