Alex’s earlier post described how LRT developers worked with colleagues in Financial and Legal Services (FLS) to give students access to a personalised traffic light summary of their financial standing across three categories: tuition, accommodation and other fees. Information displayed in the myMMU SharePoint portal was provided by a getFeeStatus web-service and used in a WebPart that enabled students to use a second web-service getFeeEmail if they wished to receive a detailed financial statement by email.
After intensive testing, fee status information was released via the myMMU portal to all first year students on November 17, 2010. Naturally, there has been considerable interest in understanding the impact of this new system in terms of take-up and in the broader context of feedback from students and front-line FLS staff dealing with financial queries.
So, here are some initial statistics gathered from the server running the REST web-services:
Total accesses Nov 17th 2010 to January 15th 2011:
All logs from November: 1,754,573 for all objects and pages on the web-services server
Students who have viewed the Traffic Lights summary page: 9084
Students who follow through from the Traffic Lights summary page to request a detailed financial statement: 1132
How did we go about getting these stats?
The getFeeStatus and getFeeEmail REST web-services run on a Microsoft IIS server which, like all other MS IIS servers, keeps hit logs of people and machines accessing pages hosted on it: URL u accessed at time t using browser b from internet address i – and the really good news is that being REST services, not only will the name of the web-service called be logged, but all the querystring calling parameters will also appear in the URL … that’s going to come in handy later.
So the process of determining usage of the traffic light web-services starts and even ends (for this part of the analysis) on the IIS server, and by necessity it involves getting grubby with raw web server log files, rooting out nuggets of information pertinent to the things we’re interested in.
These are great at providing generic page and object hits, but they tend to deal with ‘top 10 links served in November’ type scenarios. Useful, but not really what we need to discover how many students made use of a particular service with particular parameters within a particular timeframe.
To do this we decided to use a suite of geek-level powertools on a Linux box. Of course, we could use a suite of geek-level tools on a Windows box, but we’d have to find them first. Yes, yes… there is Powershell, but just read on – and let me know if Powershell could cope!
Prelims – pour yourself a good strong coffee!
Firstly, we had to copy the raw logs over from the IIS box to an ancillary box – why? First of all so we can use the aforementioned Linux power tools (bash commands.) Secondly, log processing is inherently processor and memory intensive – why ruin the service you are trying to report on. Admittedly we perhaps shouldn’t have used the web server hosting this blog as it slowed it down a bit! But we did.
Now, let’s take a look at a few typical lines from an IIS log file; if you try this on an Apache based log file (all of these servers adhere to the Common Log Format, which, alas, allows for many different variations of the actual information stored in certain agreed keys – so you need to examine the fields in your file really carefully to make sure you are storing the right stuff. Apache normally does this out of the box, IIS has certain interesting data turned-off by default. No idea why – just take some time and make sure you configure the logging correctly on your server – stats could be dribbling away into the ether!)
2011-01-14 11:56:07 W3SVC1797328370 188.8.131.52 GET /convertid/Service1.svc/getIdByMmuId8 id=5503xxxx 80 - 184.108.40.206 - 200 0 0 2011-01-14 11:56:07 W3SVC1797328370 220.127.116.11 GET /finance/Service1.svc/getFeeStatus person=550xxxx0&dtm=1295006167&developer=mymmu&format=rss&token=ee5b3e4bc033f093bd2eecec7331812f 80 - 18.104.22.168 - 200 0 0 2011-01-14 11:56:09 W3SVC1797328370 22.214.171.124 GET /srs/Service1.svc/getCurrentEnrolments person=0838xxxx 80 - 126.96.36.199 - 200 0 0 2011-01-14 11:56:13 W3SVC1797328370 188.8.131.52 GET /vle/Service1.svc/getWebCtAreas format=rss&dtm=12950xxxx2&developer=mymmu&token=057ee448560dedadbddfc842eaf838f1&person=08186066 80 - 184.108.40.206 - 200 0 0 2011-01-14 11:56:13 W3SVC1797328370 220.127.116.11 GET /convertid/Service1.svc/getIdByMmuId8 id=0818xxxx 80 - 18.104.22.168 - 200 0 0
Gibberish isn’t it? Well, no. It just looks complicated because it has really been designed to be read by machines and witty processing routines. The fields are delineated (made distinct from other fields) by spaces. So – the first field on each line is the date field containing ‘2011-01-14’ and the 6th field starts ‘/convertid’ and so on. Each line is a record, an instance of single request to the server for a single object or action. This is an important point to remember – each record is not a record of a single person hitting your page as practically all pages on a server contain more objects than just the page itself.
(NB: student IDs in the example data have been anonymised by replacing half of the ID number with xxxx.)
Another important thing to remember for this exercise, is that the records are distinct, regular and structured. In effect – this is the same as having a database of materials that we can process looking for regular pieces of information that match a pattern that we are interested in. As the data is regular we can know that we shouldn’t be getting any weird anomalies that would distort any processing and give errant data. That the data is distinct means we know that we can count up instances of the records we find and know that the count actually represents something.