LearnPragmatic Reporting for your Web Apps

Daniel Hilton
writes on October 23, 2007

Reporting is perhaps one of the biggest business deliverables for enterprise IT. Being able to say how many widgets were sold in month seven in year XX, sub-divided by customer ABC grouping, is one of the easiest ways to make management happy. But when building a web app, reporting is the last thing on many feature lists. Clear reporting is vitally important for the long term success of your web app and managing your growth. However, if you're still in the beta stage, we'll explore some ways of building simple reporting fast.

Why reporting? Because it helps you know the numbers. You need to be able to take data from web analytics, your own system stats (how many sign-ups per month, what's this month's revenue?, etc.) and be able to render that visually in some sensible way for you to be able to understand it and make the correct business decisions based upon it.

Big business style

If you're a large multi-national with large IT budgets, you're going to looking for a reporting system that can query multiple data warehouses with hundreds of users and multiple output formats. You'll also want it as near real time as possible. Packages like Crystal Reports and Jasper reports provide large scale interfaces to data sources and provide nice tools to create easy to read reports. But being as we're bootstraping upstarts, we need a solution that can pull together the data we need into one place and export it how we want.

Super simple (nearly) real time reporting

I'm not going to go into depth about database replication, load balancing your queries, redundancy or the technological challenges you face in getting your web app's data out. I am going to assume that you have a database of some sort, a web analytics package and clear objectives for what you want to measure. The most important of those is the objectives. What is it that illustrates growth and successes of your web app? This question in itself could fill thousands of articles, but for the sake of getting to some useful examples we'll look at the following:

  1. Number of unique visits to your site: last 24 hours, last month, ever
  2. Number of new subscribers to your rss feed: last 24 hours, last month, ever
  3. Number of sign ups: last 24 hours, last month, ever
  4. Number of adds of your f8/widget: last 24 hours, last month, ever
  5. How much downtime: last 24 hours, last month, ever
  6. How much revenue have you generated: last 24 hours, last month, year to date

If we look at the list above as our starting six key performance indicators, lets look at some ways of getting the information into a format we can use. For the web analytics based targets (1,2) we can within most analytical packages, create custom reports and get them emailed to us in a variety of formats. We're going to use XML and reformat it for our needs quickly in PHP5. Having setup our goals and metrics in Google analytics, for example, we can then set them to be emailed every morning. We then have the attachment written to a log folder. Whenever we refresh our reporting dashboard we check for the latest file in the folder and process it, then store the processed data as a small HTML fragment which we include into our dashboard. This limits the amount of processing we have to do to a minimum and keeps our dashboard quick to refresh for the more real-time metrics.

For number 3 (number of sign ups) we'll make a query of our users table, requesting the number of users created in the last 24 hours, last month, ever. As you start to get bigger, these queries will start to become very long, taking valuable time away from your production database. You can attack this problem either by using aggregates or by moving all your data to a non-production environment purely for reporting. There is a danger in this in terms of replication lag and out of date data, but facing tables locks and the problems they cause traditional LAMP based web applications, it perhaps is a risk worth managing.

For number 4 (widget adds) we can do a variety of things. If we're looking at new Facebook subscribers then we can use the CSV download from Adonomics (formerly Appaholic). For example, the Gaping Void app, is accessibile via the url: http://adonomics.com/csv.php?mode=daily&display=4703559265 We can thus make a CURL or wget call to get this data every morning:

   wget http://adonomics.com/csv.php?mode=daily&display=4703559265
 

If we're looking at a widget, say embedded into MySpace or a blog, then we could set-up a download goal in an analytics package and measure it just like we did the unique visitors or you could get a feed from a log analyzer crunching the server logs for requests for a key component of your widget.

For number 5 (downtime), we need to define what downtime is in regards to your web app. Is it not only when the server is down but also when the database is locked or when traffic causes it to go slow? Or when disk space runs out or the load is massive? However, we don't need that granularity. For our purposes we just care if it's been usable and if not how long it wasn't usable for.

The best way of doing the more in-depth testing is to monitor the system externally via a package such as Mon, Ganglia and Nagios, all of which are beyond the scope of this article. We shall assume that you care when your service or application can not be reached. Thus we need to use Synthetic monitoring, a form of monitoring that checks a particular web app action every minute or so.

For example, you could set-up a check-out every minute to check that your basket works. Or perform a site search for a particular phrase and check the response contains a link to a certain page. If we a large corporation, we'd go and buy something like Sitescope from HP However yet again we're bootstrapping a we need a super simple solution. We can again use CURL to simulate an action, check the response for what we expect and then log the data for further analysis. If you were looking to run this in production you'd have to look at using a server in a different location to do the polling, perhaps just a cheap VPS with a cron job that runs every minute. It's left as an exercise to the reader to think of what actions are triggered by a downtime event. (SMS Alert? email? Jabber?)

$test1 =  curl_init("http://www.ourapp.com/action1?search=bob“);
	try {
	    $buffer = curl_exec($test1);
		curl_close($test1);
		if (strpos($buffer, “Bob Jones”) === false) {
			throwAlert(”App Down, Server Up”);
		} else {
			// Let's log a good search - so we can track the timeline of when things go bad.
			// We could also log how long the action took to complete - an interesting
			// metric to compare against system resources over time.
			logGoodSearch(now());
		}
	catch (Exception $e) {
		   // Having caught the error do something useful with it, including both logging and alerting
		}

For number six we need to do a query in your database. If you work with a Payment Services Provider, such as Metacharge or SECPay you'll probably interface via an API or payment page where you post in the amount and other details about a transaction. You should log the transactions in a database somewhere, taking note of the time, amount and other related metadata. You can't however store any of the card details unless you are PCI-DSS compliant as enforced by your merchant account provider. If we assume you use a table called transactionsSuccessful we can query it thus:

	SELECT SUM(value) FROM transactionsSuccessful;
	# Total revenue ever from online transactions

	SELECT SUM(value) FROM transactionsSuccessful WHERE time > NOW() - 86400;
	# Total revenue in the last 24 hours

	SELECT SUM(value) FROM transactionsSuccessful WHERE time > NOW() - 2419200;
	# Total revenue in the last month

We can do that query in real-time (dependent on database size and whether you're sharding, etc.) which means we can add that to your web app reporting screen.

Displaying your data

When displaying statistical data, there is a belief that more “data porn” is better. While that may give you a false sense of knowing the real picture, what we really need to be able to understand is what actions need to taken based upon the data. Is our disk space creeping up slowly or suddenly filling up? Do we need more disks? Are people who come from blogs to our site more likely to sign-up versus general traffic? If they are, what are you going to do about it? The brilliant O'reilly book, Information Dashboard Design goes through the many pitfalls of designing dashboards for management information systems. Visual metaphors such as speed dials don't help explain or enrich your data. They just decrease the signal to noise ratio. Edward Tuft's work on design is worth studying in this field, if only for his examples of how to display information with the lowest amount of visual noise.

Conclusions

In this brief article we've looked at some examples of how to produce reporting, how to provide it in a useful form and how to look at the data over time. This is a huge field – one in its infancy. However, I hope that I have at least got you thinking about how you produce internal metrics and how you can use that business data to make better business decisions to help build and grow your web app.

2 Responses to “Pragmatic Reporting for your Web Apps”

  1. thanks admin good post

  2. Nicely written and always relevant! Good job.

Leave a Reply

You must be logged in to post a comment.

Learn to code with Treehouse

Start your 7 day free trial today and get access to hundreds of video courses in web development, design and business!

Learn more