<?xml version="1.0" encoding="utf-8"?>
<rss version="2.0" xml:base="http://whijo.net" xmlns:dc="http://purl.org/dc/elements/1.1/">
<channel>
 <title>development</title>
 <link>http://whijo.net/taxonomy/term/66/feed</link>
 <description>The taxonomy view with a depth of 0.</description>
 <language>en</language>
<item>
 <title>Statistics logging for Django</title>
 <link>http://whijo.net/blog/brad/2007/07/19/statistics-logging-django.html</link>
 <description>&lt;p&gt;Last night I built some middleware/models for a django application to log visitor/user activity on the site. The intention is to be able to do better user tracking, and build more comprehensive statistics stored in the mysql db (obviously I am also logging everything with apache). The current set up still needs some periodical scripts to conflate data into statistics. I was thinking of doing a daily-weekly-monthly routine (i.e. once a day stats are conflated for yesterday&#039;s stats, and once a week they are turned into weekly stats, and once a month they are minimised into a monthly overview. It was actually really simple to implement, but I butted my head against some django issues (more at the end).&lt;/p&gt;
&lt;p&gt;So, first we build a model to represent a request:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;
class UserActivity(models.Model):
        user = models.ForeignKey(
                      User,
                      null=True, blank=True,
                      db_index=True
               )
        session = models.ForeignKey(
                      Session,
                      db_index=True,
                      null=True, blank=True
                  )
        date = models.DateTimeField(
                      help_text=&quot;Date Request started processing&quot;,
                      auto_now_add=True,
                      db_index=True)
        request_time = models.IntegerField(
                              help_text=&quot;Processing time (in ms)&quot;,
                              null=True, blank=True)
        request_url = models.CharField(maxlength=800,db_index=True)
        referer_url = models.URLField(
                              verify_exists=False,
                              db_index=True,
                              blank=True, null=True)
        client_address = models.IPAddressField(
                              blank=True,null=True)
        client_host = models.CharField(
                              maxlength=256,
                              blank=True,null=True)
        browser_info = models.TextField(null=True,blank=True)
        error = models.TextField(null=True,blank=True)
        def set_request_time(self):
                from datetime import datetime
                self.request_time = (
                                      datetime.now() - 
                                      self.date 
                                    ).microseconds
                self.save()
&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;(download &lt;a href=&quot;http://whijo.net/files/models.py_.txt&quot; title=&quot;Download: models.py_.txt (1.17 KB)&quot;&gt;models.py_.txt&lt;/a&gt;)&lt;/p&gt;
&lt;p&gt;I think the model captures all the relevant info (we tie a request to a session and user, we have the time they made the request (and using middleware we can calculate how long the request took), the referer, and some info about the client).&lt;/p&gt;
&lt;p&gt;Most of the fields can be blank/null because we are not always going to have a session (see below), etc.&lt;/p&gt;
&lt;p&gt;The function set_request_time is called by the outgoing middleware function (process_response) and just notes how long the request took, and saves the object.&lt;/p&gt;
&lt;p&gt;Next we need some middleware to handle the object creation:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;
from datetime import datetime
from django.conf import settings
from my_app.models import UserActivity

class Activity(object):
        def process_request(self,request):
                if request.META.has_key(&#039;HTTP_REFERER&#039;):
                        referer = request.META[&#039;HTTP_REFERER&#039;]
                else:
                        referer = &#039;&#039;

                self.activity = UserActivity(
                        user = request.user,
                        session = request.session,
                        date = datetime.now(),
                        request_url = request.META[&#039;PATH_INFO&#039;],
                        referer_url = referer,
                        client_address = request.META[&#039;REMOTE_ADDR&#039;],
                        client_host = request.META[&#039;REMOTE_HOST&#039;],
                        browser_info = request.META[&#039;HTTP_USER_AGENT&#039;]
                )

        def process_exception(self,request,exception):
                self.activity.error = exception
                self.activity.save()

        def process_response(self,request,response):
                self.activity.set_request_time()
                return response
&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;(download &lt;a href=&quot;http://whijo.net/files/middleware.py_.txt&quot; title=&quot;Download: middleware.py_.txt (825 bytes)&quot;&gt;middleware.py_.txt&lt;/a&gt;)&lt;/p&gt;
&lt;p&gt;You may (or may not) have noticed that we only actually save our model on the outgoing response, so we only have one db write per request. The middleware system is very easy to build for, and is &lt;a href=&quot;http://www.djangoproject.com/documentation/middleware/&quot;&gt;documented here&lt;/a&gt;. The nice thing is the process_exception will keep a record of the exception (but I am not sure if this could be done so it stores more information than just the exception.__str__()?)&lt;/p&gt;
&lt;p&gt;To install this you would need to have your model within in an app that is &quot;installed&quot; and &quot;syncdb&quot;. The middleware needs to be placed after the session middleware, for e.g. in settings.py (in MIDDLEWARE_CLASSES):&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;    
    &quot;django.middleware.common.CommonMiddleware&quot;,
    &quot;django.contrib.sessions.middleware.SessionMiddleware&quot;,
    &quot;django.contrib.auth.middleware.AuthenticationMiddleware&quot;,
    &quot;league.middleware.Activity&quot;,
&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;
The next step is to build a context_processor that will include some useful stats like who is logged in etc. but that will need a more models, or mysql view or UserActivityManager that does a custom sql request with some &quot;group by&quot; magic. I have not built those parts yet, so I won&#039;t speak about them yet.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;My gripes about this implementation&lt;/strong&gt; doing regular user activity stats is a relatively costly request (you need to do a SELECT COUNT(*) WHERE date&amp;gt;now()-(20 minutes) GROUP BY user). This could be cheapened by having a OneToOne join table with the user table which just has an indexed recent_activity field against a User which is touched every request from that user. To get anonymous user activity we can only really rely on ip addresses, since sessions are not set until a user logs in/logs out, so we would need to do a similar system to the user OneToOne table, and use the REPLACE syntax of mysql (not sure if this is possible using django).&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;My gripes about the session middleware&lt;/strong&gt; is that users do not get sessions until they log in/log out. This is good because once of visitor etc. do not get sent a cookie, and you don&#039;t allocate them a session in the DB, but it means unique sessions are more difficult to track because anonymous, first time visitors are only unique by their IP address, and nothing else. I can obviously change this, by setting any session variable for visitors without a session in the process_request of the activity middleware. This is neat because it is an opt in db hit, but after wrestling for ages with session middleware appreciating opt in is something to be done in the sober light of day.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;My gripes about Django&#039;s ORM&lt;/strong&gt; are that there is no neat way to do custom sql requests (the nicest group by sql snippet I have seen is &lt;a href=&quot;http://www.djangosnippets.org/snippets/1/&quot;&gt;this one&lt;/a&gt; because it uses django&#039;s _meta to get the table names). Newer changes in Django introduced the &lt;a href=&quot;http://www.djangoproject.com/documentation/db-api/#extra-select-none-where-none-params-none-tables-none&quot;&gt;extra&lt;/a&gt; parameter, which means less completely custom sql (i.e. you can just append your customisations to the existing sql statement), but it still doesn&#039;t allow you to use very specific stuff like GROUP BY (which not all DBs support). The way to remedy this is to figure out some way you can still send sanitised sql to a db server in an extra statement, while allowing more appended customisations for developers. The alternative is to build group_by functions which either translate to DB specific requests, or do it virtually (much like the transactions infrastructure). I prefer the latter solution because I think GROUP BY is very relevant and very useful, but the latter solution does mean that if your DB doesnt support it, then it could be a very costly operation in python-space.&lt;/p&gt;
</description>
 <comments>http://whijo.net/blog/brad/2007/07/19/statistics-logging-django.html#comments</comments>
 <category domain="http://whijo.net/tags/development">development</category>
 <category domain="http://whijo.net/geek-tags/django">django</category>
 <category domain="http://whijo.net/tags/geek">geek</category>
 <category domain="http://whijo.net/geek-tags/logging">logging</category>
 <category domain="http://whijo.net/geek-tags/middleware">middleware</category>
 <category domain="http://whijo.net/geek-tags/python">python</category>
 <category domain="http://whijo.net/geek-tags/statistics">statistics</category>
 <enclosure url="http://whijo.net/files/middleware.py_.txt" length="825" type="text/plain" />
 <pubDate>Thu, 19 Jul 2007 12:05:13 +0200</pubDate>
 <dc:creator>brad</dc:creator>
 <guid isPermaLink="false">108 at http://whijo.net</guid>
</item>
</channel>
</rss>
