<?xml version="1.0" encoding="utf-8"?>
<rss version="2.0" xml:base="http://whijo.net" xmlns:dc="http://purl.org/dc/elements/1.1/">
<channel>
 <title>mysql</title>
 <link>http://whijo.net/taxonomy/term/71/feed</link>
 <description>The taxonomy view with a depth of 0.</description>
 <language>en</language>
<item>
 <title>Statistics logging for Django - part 2</title>
 <link>http://whijo.net/blog/brad/2007/07/29/statistics-logging-django-part-2.html</link>
 <description>In &lt;a href=&quot;http://whijo.net/blog/brad/2007/07/19/statistics-logging-django.html&quot;&gt;part 1&lt;/a&gt; I explained how to build middleware and an associated model to capture page accesses, and tie them to a user session. Now that we have all this useful info logged we need to do something with it, like, display it. Unfortunately Django doesn&#039;t have a facility for using GROUP BY with mysql, so you have two major choices (there are more but we can ignore them): implement a custom request in a &lt;a href=&quot;http://www.djangoproject.com/documentation/model-api/#managers&quot;&gt;custom Manager&lt;/a&gt; (see &lt;a href=&quot;http://www.djangosnippets.org/snippets/236/&quot;&gt;snippet&lt;/a&gt; and &lt;a href=&quot;http://www.djangosnippets.org/snippets/1/&quot;&gt;snippet&lt;/a&gt;, or &lt;a href=&quot;http://www.djangosnippets.org/tags/group-by/&quot;&gt;tagged snippets&lt;/a&gt;), or exploit a &lt;a href=&quot;&quot;&gt;mysql view&lt;/a&gt; and model it in Django. Now for me I prefer the latter because it means my custom sql becomes a mysql customisation and as far as Django is concerned it is dealing with a normal table (but don&#039;t tell Django that it is read only), and thus the model code works, so subsequent queries and manipulations can exploit the &lt;acronym title=&quot;Object Relational Manager&quot;&gt;ORM&lt;/acronym&gt; easily. My subjective and non-scientific experience is that using views is a lot more efficient/quick than using custom queries in the manager (it probably has to do with whatever optimisations exist with views, and the fact that you only fetch items when Django decides you need to fetch a row). So, how the hell do we do it?
&lt;!--break--&gt;
First I created a model that describes what information I want to deal with (something which maps neatly on to our other model):
&lt;pre&gt;&lt;code&gt;class UserActivity(models.Model):
        session = models.OneToOneField(Session,
                                        db_index=True, 
                                        null=True,blank=True,
                                        primary_key=True)
        user = models.ForeignKey(User,null=True,blank=True)
        date = models.DateTimeField(
                       help_text=&quot;Date Request started processing&quot;,
                       auto_now_add=True,
                       db_index=True)
        processing_time = models.IntegerField(
                       help_text=&quot;Total time spent on this user&quot;)
        requests = models.IntegerField(
                       help_text=&quot;Total Requests in this session&quot;)
        stats = UserActivityManager()
        def __str__(self):
                return &#039;%s: %s %s - %s - %s&#039; % (self.user,self.session,self.date,self.processing_time,self.requests)
        class Admin:
                list_display= (&#039;user&#039;,&#039;session&#039;,&#039;date&#039;,&#039;processing_time&#039;,&#039;requests&#039;)&lt;/code&gt;&lt;/pre&gt;

The nice thing about this set up is when we aggregate our activity logs we can pull out random stuff like total processing time for requests for a user/session, along with number of requests/user/session (and thus average request time)

But that is just our model, we still need the magic. To implement the magic nicely I put some custom initial SQL into the sql directory of my application (in my case the housing application for this is called accounts, so I make a file called accounts/sql/useractivity.sql), you can read more about initial data &lt;a href=&quot;http://www.djangoproject.com/documentation/model-api/#providing-initial-sql-data&quot;&gt;here&lt;/a&gt;, &lt;a href=&quot;http://www.djangoproject.com/documentation/models/fixtures/&quot;&gt;Django fixtures&lt;/a&gt;).My SQL looks like this:
&lt;pre&gt;&lt;code&gt;DROP TABLE accounts_useractivity;
CREATE OR REPLACE VIEW accounts_useractivity AS 
SELECT i.session_id,
       i.user_id,
       MAX(i.date) as date,
       sum(i.request_time) AS processing_time, 
       count(*) AS requests 
FROM accounts_activitylog i 
GROUP BY 1 
ORDER BY NULL;
&lt;/code&gt;&lt;/pre&gt;
So first I tell mysql to drop the table that django just created (accounts_useractivity), and create a view in it&#039;s place. The view is very simple, in that it just GROUP BY the session_id. The real hair puller for me was figuring out that I needed to use the MAX(i.date) (see more about &lt;a href=&quot;http://dev.mysql.com/doc/refman/5.0/en/group-by-functions.html&quot;&gt;aggregate functions&lt;/a&gt;) to get the most recent access to float to the top when it normalises the data (otherwise the GROUP BY normally ORDER BY the session_id, which helps no one), the ORDER BY NULL is &lt;a href=&quot;http://dev.mysql.com/doc/refman/5.0/en/group-by-optimization.html&quot;&gt;an optimisation&lt;/a&gt; to tell GROUP BY not to ORDER BY. I am hoping that because date is an INDEX (from our logging model) it shouldn&#039;t cost too much to do a MAX. (I would like someone with Much MYSQL-fu to point out any further optimisations to this, or even alternative approaches to the whole thing).

So now we have an aggregating VIEW which Django maps using it&#039;s ORM, so that to figure out sessions which have been active in the last x minutes (where x is a datetime.timedelta object) we simply do a:
 &lt;pre&gt;&lt;code&gt;UserActivity.objects.get_query_set().filter(date__gte=datetime.now()-x)&lt;/code&gt;&lt;/pre&gt;

I wrote a custom manager for getting recent sessions etc., but that is an exercise for the reader. What I did include in my model is something which returns a stepped &quot;request_weight&quot; i.e. session requests / largest session request x steps, which in my case defaults to 6. This means I can style my users like one would a &quot;&lt;a href=&quot;http://en.wikipedia.org/wiki/Tag_cloud&quot;&gt;tag cloud&lt;/a&gt;&quot;, so very active sessions will grow bigger than less active sessions. I needed to implement a helper function in the custom manager to return the session with the most requests.

The final tip is to use a &lt;a href=&quot;http://www.djangoproject.com/documentation/templates_python/#subclassing-context-requestcontext&quot;&gt;context processor&lt;/a&gt; to make the information available to all your templates, although you could do it with middleware (maybe middleware is the proper way to do it?).</description>
 <comments>http://whijo.net/blog/brad/2007/07/29/statistics-logging-django-part-2.html#comments</comments>
 <category domain="http://whijo.net/geek-tags/django">django</category>
 <category domain="http://whijo.net/tags/geek">geek</category>
 <category domain="http://whijo.net/geek-tags/middleware">middleware</category>
 <category domain="http://whijo.net/geek-tags/mysql">mysql</category>
 <category domain="http://whijo.net/geek-tags/mysql-views">mysql views</category>
 <category domain="http://whijo.net/geek-tags/python">python</category>
 <category domain="http://whijo.net/geek-tags/statistics">statistics</category>
 <pubDate>Sun, 29 Jul 2007 21:52:25 +0200</pubDate>
 <dc:creator>brad</dc:creator>
 <guid isPermaLink="false">110 at http://whijo.net</guid>
</item>
</channel>
</rss>
