The most powerful open source web analytics tools? [closed]

Need some advices on choosing open source web analytics tools, including both of page tagging and log file analysis type. I know some of them: piwik, open web analytics, awstats and more... what are the best ones among them(page tagging type and log file analysis type) ?


Solution 1:

In the log files analyzer domain, these are the most widely used:

  • AWStats
  • Webalizer
  • Analog
  • W3Perl

Analog and Webalizer are written in C and are the faster (10000-20000 lines per second).

As mentioned earlier by @MadHatter Analog has been developed by an ex-Cambridge statistician, that makes Analog a really precise and technical tool, but since the year 2005 it's not developed anymore.

Webalizer is not developed anymore but it's easier then Analog to be used.

AWStat and W3Perl are written in Perl and they are the most active projects, but are faraway slower then Analog and Webalizer (3000/4500 lines per second). They differ from each others for the data produced and the way the data is rendered.

AWStat displays statistics in a really attractive manner, but produce less statistics then W3Perl.

I advice you these links for further info:

http://www.aardal.com/stats/docs/uk/speed.html
http://www.w3perl.com/
http://awstats.sourceforge.net/

My advice is:

  • W3Perl if you want granular statistics and you don't have too many GB of log files
  • AWStat in the case you need don't have too many GB of files and you want a nice graphical representation.
  • Webalizer if you have tons of log files to analyze.
  • Analog if you need really accurate statistics, huge log files and you have C development experience (or you know anybody who can help).

Regarding Page tagging The winner is surelly Google Analytics, has the data collected and produced is better than the other solutions, but as one day may happens that Google will ask money for it...

W3Counter and Xiti are providers that require you, for the free version, to install an image on each web page you want to monitor. Both are for small sites.

Open Web Analytics and Pikwik are great open source solutions. Both are quite mature and stable but they require MySQL database and PHP support.

If you don't need an "home made" solution I would definitively go for Google Analytics and between the open source projects I would choose Open Web Analytics as it's more mature then Piwik.

Solution 2:

I am very satisfied with piwik. I just miss the possibility to adjust the widgets dependent on the website. I use it to check my drupal sites and there is a module for drupal too.

Solution 3:

For my money, the best log analysis tool is analog. It's screamingly fast (modulo DNS lookups, which are equally slow for everyone), is written by an ex-Cambridge statistician, and has a very useful page telling you - and your management - what can and cannot properly be known from your web logs, bizarre commercial claims notwithstanding.

I can't comment on page tagging, though; sorry.

Solution 4:

This is experimental software, but nevertheless, quite impressive:

  • http://projects.nuttnet.net/hummingbird/

Hummingbird lets you see how visitors are interacting with your website in real time. And by “real time” we don’t mean it refreshes every 5 minutes—WebSockets enable Hummingbird to update 20 times per second. Hummingbird is built on top of Node.js, a new javascript web toolkit that can handle large amounts of traffic and many concurrent users.

Solution 5:

I would say (although I'm biased as co-founder ;-) that SnowPlow is the most powerful open source tagging-based web analytics tool out there.

SnowPlow has a loosely coupled, distributed architecture which uses Hadoop and Hive, so it scales to millions or even billions of events - this is something that no MySQL or other RDBMS-based solution can do.

The other big innovation in SnowPlow is that your event data is stored in a clean, immutable, denormalised, atomic "flat file" structure - in other words, an analytics data warehouse. This enables a lot of very sophisticated analyses using Hive, as well as straightforward joins with your third-party data (e.g. CRM or sales data). Again, this is more powerful than other solutions, which tend to collapse atomic data into aggregates, truncate old data or use head-scratching normalised structures which are really hard to query directly or join to other sources.

You can read more about SnowPlow's technical architecture here.