Understanding how our fees web-services are being used

After intensive testing, fee status information was released via the myMMU portal to all first year students on November 17, 2010. Naturally, there has been considerable interest in understanding the impact of this new system in terms of take-up and in the broader context of feedback from students and front-line FLS staff dealing with financial queries.
Here is how we sorted the records to get the statistics

Alex’s earlier post described how LRT developers worked with colleagues in Financial and Legal Services (FLS) to give students access to a personalised traffic light summary of their financial standing across three categories: tuition, accommodation and other fees. Information displayed in the myMMU SharePoint portal was provided by a getFeeStatus web-service and used in a WebPart that enabled students to use a second web-service getFeeEmail if they wished to receive a detailed financial statement by email.

After intensive testing, fee status information was released via the myMMU portal to all first year students on November 17, 2010. Naturally, there has been considerable interest in understanding the impact of this new system in terms of take-up and in the broader context of feedback from students and front-line FLS staff dealing with financial queries.

So, here are some initial statistics gathered from the server running the REST web-services:

Total accesses Nov 17th 2010 to January 15th 2011:

All logs from November:   1,754,573 for all objects and pages on the web-services server

Students who have viewed the Traffic Lights summary page:  9084

Students who follow through from the Traffic Lights summary page to request a detailed financial statement:  1132

How did we go about getting these stats?

The getFeeStatus and getFeeEmail REST web-services run on a Microsoft IIS server which, like all other MS IIS servers, keeps hit logs of people and machines accessing pages hosted on it: URL u accessed at time t using browser b from internet address i – and the really good news is that being REST services, not only will the name of the web-service called be logged, but all the querystring calling parameters will also appear in the URL … that’s going to come in handy later.

So the process of determining usage of the traffic light web-services starts and even ends (for this part of the analysis) on the IIS server, and by necessity it involves getting grubby with raw web server log files, rooting out nuggets of information pertinent to the things we’re interested in.

Sure – we could run the logs (all x Gigabytes of them) through a good log analyser like AWStats or the freeware Funnel Web Analyzer Kieron used for the graphs on a previous post.

These are great at providing generic page and object hits, but they tend to deal with ‘top 10 links served in November’ type scenarios. Useful, but not really what we need to discover how many students made use of a particular service with particular parameters within a particular timeframe.

To do this we decided to use a suite of geek-level powertools on a Linux box. Of course, we could use a suite of geek-level tools on a Windows box, but we’d have to find them first. Yes, yes… there is Powershell, but just read on – and let me know if Powershell could cope!

Prelims – pour yourself a good strong coffee!

Firstly, we had to copy the raw logs over from the IIS box to an ancillary box – why? First of all so we can use the aforementioned Linux power tools (bash commands.) Secondly, log processing is inherently processor and memory intensive – why ruin the service you are trying to report on.  Admittedly we perhaps shouldn’t have used the web server hosting this blog as it slowed it down a bit!  But we did.

Now, let’s take a look at a few typical lines from an IIS log file; if you try this on an Apache based log file (all of these servers adhere to the Common Log Format, which, alas, allows for many different variations of the actual information stored in certain agreed keys – so you need to examine the fields in your file really carefully to make sure you are storing the right stuff. Apache normally does this out of the box, IIS has certain interesting data turned-off by default. No idea why – just take some time and make sure you configure the logging correctly on your server – stats could be dribbling away into the ether!)

[codesyntax lang=”apache”]

2011-01-14 11:56:07 W3SVC1797328370 149.170.241.162 GET /convertid/Service1.svc/getIdByMmuId8 id=5503xxxx 80 - 149.170.241.162 - 200 0 0
2011-01-14 11:56:07 W3SVC1797328370 149.170.241.162 GET /finance/Service1.svc/getFeeStatus person=550xxxx0&dtm=1295006167&developer=mymmu&format=rss&token=ee5b3e4bc033f093bd2eecec7331812f 80 - 149.170.247.8 - 200 0 0
2011-01-14 11:56:09 W3SVC1797328370 149.170.241.162 GET /srs/Service1.svc/getCurrentEnrolments person=0838xxxx 80 - 149.170.247.8 - 200 0 0
2011-01-14 11:56:13 W3SVC1797328370 149.170.241.162 GET /vle/Service1.svc/getWebCtAreas format=rss&dtm=12950xxxx2&developer=mymmu&token=057ee448560dedadbddfc842eaf838f1&person=08186066 80 - 149.170.247.8 - 200 0 0
2011-01-14 11:56:13 W3SVC1797328370 149.170.241.162 GET /convertid/Service1.svc/getIdByMmuId8 id=0818xxxx 80 - 149.170.241.162 - 200 0 0

[/codesyntax]

Gibberish isn’t it? Well, no. It just looks complicated because it has really been designed to be read by machines and witty processing routines. The fields are delineated (made distinct from other fields) by spaces. So – the first field on each line is the date field containing ‘2011-01-14’ and the 6th field starts ‘/convertid’ and so on. Each line is a record, an instance of  single request to the server for a single object or action. This is an important point to remember – each record is not a record of a single person hitting your page as practically all pages on a server contain more objects than just the page itself.

(NB: student IDs in the example data have been anonymised by replacing half of the ID number with xxxx.)

Another important thing to remember for this exercise, is that the records are distinct, regular and structured. In effect – this is the same as having a database of materials that we can process looking for regular pieces of information that match a pattern that we are interested in. As the data is regular we can know that we shouldn’t be getting any weird anomalies that would distort any processing and give errant data. That the data is distinct means we know that we can count up instances of the records we find and know that the count actually represents something.

Continue reading “Understanding how our fees web-services are being used”

Web Services Update

Back in December 2010 Mark blogged about the Web Services we use in MMU to feed our Portal and mobile devices (see his post). At the end of November, web service usage was:

Web Services Hits
Usage at November 2010

This post is by way of an update on usage, and to explain how usage has grown.

Graph showing Web Services usage
Usage at January 2011

The current position can be seen below November’s graph. While the order has not changed – getWebctAnnouncements is still the most popular – the number of hits has grown from just over 800,000 to over 2.1 million! Of particular interest is the 391,000 hits on the PC availability web service: all these hits are from mobile devices using the CampusM myMMU-mobile App Of more surprise to me is the 700,000 hits on getFeeStatus.

I should clarify that the figures are for the current academic year: 2010/11