Statistics logging for Django

brad's picture

Last night I built some middleware/models for a django application to log visitor/user activity on the site. The intention is to be able to do better user tracking, and build more comprehensive statistics stored in the mysql db (obviously I am also logging everything with apache). The current set up still needs some periodical scripts to conflate data into statistics. I was thinking of doing a daily-weekly-monthly routine (i.e. once a day stats are conflated for yesterday's stats, and once a week they are turned into weekly stats, and once a month they are minimised into a monthly overview. It was actually really simple to implement, but I butted my head against some django issues (more at the end).

So, first we build a model to represent a request:


class UserActivity(models.Model):
        user = models.ForeignKey(
                      User,
                      null=True, blank=True,
                      db_index=True
               )
        session = models.ForeignKey(
                      Session,
                      db_index=True,
                      null=True, blank=True
                  )
        date = models.DateTimeField(
                      help_text="Date Request started processing",
                      auto_now_add=True,
                      db_index=True)
        request_time = models.IntegerField(
                              help_text="Processing time (in ms)",
                              null=True, blank=True)
        request_url = models.CharField(maxlength=800,db_index=True)
        referer_url = models.URLField(
                              verify_exists=False,
                              db_index=True,
                              blank=True, null=True)
        client_address = models.IPAddressField(
                              blank=True,null=True)
        client_host = models.CharField(
                              maxlength=256,
                              blank=True,null=True)
        browser_info = models.TextField(null=True,blank=True)
        error = models.TextField(null=True,blank=True)
        def set_request_time(self):
                from datetime import datetime
                self.request_time = (
                                      datetime.now() - 
                                      self.date 
                                    ).microseconds
                self.save()

(download models.py_.txt)

I think the model captures all the relevant info (we tie a request to a session and user, we have the time they made the request (and using middleware we can calculate how long the request took), the referer, and some info about the client).

Most of the fields can be blank/null because we are not always going to have a session (see below), etc.

The function set_request_time is called by the outgoing middleware function (process_response) and just notes how long the request took, and saves the object.

Next we need some middleware to handle the object creation:


from datetime import datetime
from django.conf import settings
from my_app.models import UserActivity

class Activity(object):
        def process_request(self,request):
                if request.META.has_key('HTTP_REFERER'):
                        referer = request.META['HTTP_REFERER']
                else:
                        referer = ''

                self.activity = UserActivity(
                        user = request.user,
                        session = request.session,
                        date = datetime.now(),
                        request_url = request.META['PATH_INFO'],
                        referer_url = referer,
                        client_address = request.META['REMOTE_ADDR'],
                        client_host = request.META['REMOTE_HOST'],
                        browser_info = request.META['HTTP_USER_AGENT']
                )

        def process_exception(self,request,exception):
                self.activity.error = exception
                self.activity.save()

        def process_response(self,request,response):
                self.activity.set_request_time()
                return response

(download middleware.py_.txt)

You may (or may not) have noticed that we only actually save our model on the outgoing response, so we only have one db write per request. The middleware system is very easy to build for, and is documented here. The nice thing is the process_exception will keep a record of the exception (but I am not sure if this could be done so it stores more information than just the exception.__str__()?)

To install this you would need to have your model within in an app that is "installed" and "syncdb". The middleware needs to be placed after the session middleware, for e.g. in settings.py (in MIDDLEWARE_CLASSES):

    
    "django.middleware.common.CommonMiddleware",
    "django.contrib.sessions.middleware.SessionMiddleware",
    "django.contrib.auth.middleware.AuthenticationMiddleware",
    "league.middleware.Activity",

The next step is to build a context_processor that will include some useful stats like who is logged in etc. but that will need a more models, or mysql view or UserActivityManager that does a custom sql request with some "group by" magic. I have not built those parts yet, so I won't speak about them yet.

My gripes about this implementation doing regular user activity stats is a relatively costly request (you need to do a SELECT COUNT(*) WHERE date>now()-(20 minutes) GROUP BY user). This could be cheapened by having a OneToOne join table with the user table which just has an indexed recent_activity field against a User which is touched every request from that user. To get anonymous user activity we can only really rely on ip addresses, since sessions are not set until a user logs in/logs out, so we would need to do a similar system to the user OneToOne table, and use the REPLACE syntax of mysql (not sure if this is possible using django).

My gripes about the session middleware is that users do not get sessions until they log in/log out. This is good because once of visitor etc. do not get sent a cookie, and you don't allocate them a session in the DB, but it means unique sessions are more difficult to track because anonymous, first time visitors are only unique by their IP address, and nothing else. I can obviously change this, by setting any session variable for visitors without a session in the process_request of the activity middleware. This is neat because it is an opt in db hit, but after wrestling for ages with session middleware appreciating opt in is something to be done in the sober light of day.

My gripes about Django's ORM are that there is no neat way to do custom sql requests (the nicest group by sql snippet I have seen is this one because it uses django's _meta to get the table names). Newer changes in Django introduced the extra parameter, which means less completely custom sql (i.e. you can just append your customisations to the existing sql statement), but it still doesn't allow you to use very specific stuff like GROUP BY (which not all DBs support). The way to remedy this is to figure out some way you can still send sanitised sql to a db server in an extra statement, while allowing more appended customisations for developers. The alternative is to build group_by functions which either translate to DB specific requests, or do it virtually (much like the transactions infrastructure). I prefer the latter solution because I think GROUP BY is very relevant and very useful, but the latter solution does mean that if your DB doesnt support it, then it could be a very costly operation in python-space.

AttachmentSize
middleware.py_.txt825 bytes
models.py_.txt1.17 KB

Trackback URL for this post:

http://whijo.net/trackback/108

Brad,

I think you'll be able to do what you want using a custom manager within your Django model.

Al.

Thanks for the tip Al. Your tip worked perfectly for me.

this definitely help in user tracking. I have tried this at work and it worked well. thanks for the great info
bdhost

I am happy to find this post very useful for me, thanks for sharing it here.
Tom Ford Sunglasses

I like the foundation of this blog has a great variety of comments I really like it, several cash gifting points of view helps in the appreciation of the forex trading system subject,is very interesting and I would like learn more.

zadoc

this is definitely good for statistics. This will help you get a lot of information from it. I would love to use it.Orlando Personal Injury Lawyers

This is very useful I feel that the standard user tracking isn't sufficient and I've been seeking something better. Thanks Fake Tattoos

i think, it is very useless. no thanks webcam chat shemale

Choose, buy and shop for on sale tiffany jewelry including Tiffany & Co Silver Necklace, Pendants, Bangles, Bracelets, Earrings, Rings and Accessories.
tiffany jewelry
We will surprise to find the high quality in much.
Everyone will focus on tiffany and co
Tiffany Bracelets
Tiffany Earrings
Tiffany Necklaces
links of london

Well this is very interesting indeed. Would love to read a little more of this. Great post. Thanks for the heads-up...This blog was very informative and knowledgeable…Colon Cleanse Colon Cleanse Acai Berry

UserActivityManager that does a custom sql request with some "group by" magic. I have not built those parts yet, so I won't speak about them yet.
Free Sports Picks | Fun Things to do in San Francisco

I must admit that it is really great information, thanks a lot for sharing.
R2D2 Phone | Graduation Cookie Cutters | Kate Hudson Pregnant

import logging from django.conf import settings def getlogger(): logger ... stdlib import threading import logging # django from django.conf import settings ...
Boxes
Boxes Sydney
Moving Boxes
storage boxes
box shop

hey,
Excellent blog.I found it very interesting and at the same time very informative.Thanks!!!

Emarketing Dubai || Copywriting Dubai

session = models.ForeignKey(
Session,
db_index=True,
null=True, blank=True
)
Is it better to use SessionID here...
penis enlargement

antique furniture

nice blog. it is very intresting. by the way, if you need weight loss with acai berry in dubai, please try acai berries. thanks

I don't know If I said it already, but this so good stuff keep up the good work. I read a lot of blogs on a daily basis and for the most part POLITIC just wanted to make a quick comment to say I’m glad I found your blog. Thanks.

Best regards
Seo Motivation | Blogosphere news | Berita Di Blogospheree | Automotive | kontes astaga lifestyle indonesia | by Oes tsetnoc | Seo Motivation | Gadget Reviewers | Blog Gadget Review | Belajar seo blogspot | Belajar blogging | Antonfkip Blog | miss925.com

thanks for a way to track the activity of the visitor at website. informative article.

this is so true. I agree with you so much. This will definitely work for me at work.
Blog-pensieri

Yes.. I tried with the mentioned example it works well.. also tried with the comment idea.. it work better..

Dentists in Brighton

I think you'll be able to do what you want using Laptop Computers

Glad it turned out well, and am looking forward to the outline! Online GED Test AND Online High School Diploma AND homeschool online

I was able to do that using the custom manger for my Django Model. Thanks for the great tip!
Apex Professionals

This is really a Humorous article. It is really refreshing. I like this so much and truly enjoyed it. It releases me from my stress.it is really nice. GED test AND Adison High School

i been working on it django.. thanks for the post

You recommend stuff is great, thx.

Its always difficult to explain when people ask if Finley is talking, because he is, its just that we seem to be the only ones who understand what he is saying.
Professional Logo - stationery design
cheapest air tickets - cheap flights to capetown - cheap holidays deals

The custom manager idea is a good one. I tried this once before and it worked nicely.

Thanks, and will use your advice on my sites, a millionaire dating site and a models profile hosting site.

Gypsy jazz guitar is a genre based on the music of Django Reinhardt, a guitar player who overcame a severe disability to become a legend in jazz music. Most people have heard music by the Quintet du Hot Club de France or one of the gypsy jazz groups devoted to its style of music. Born in the 1930's this group with Stephane Grapelli on violin, Django Reinhardt, Joseph Reinhardt and Roger Chaput on guitars and Louis Vola on bass, pioneered the concept of lead and rhythm guitar.

SEO

You will need some Moroccan lamps to keep it up.

Thats totally true!

cheap auto insurance quotes

Hey Brad... nice model. Just what I'm looking for. Before I start playing with this - have you added anything?

This is some good work!

but where do you import the request.META['HTTP_REFERER'] ..

i stumble around :( and feel that i need to read the doc ...

Post new comment

The content of this field is kept private and will not be shown publicly.
Captcha
This question is used to make sure you are a human visitor and to prevent spam submissions.
Syndicate content

Recent comments

About this website

Whijo.net is the online internets of Bradley Whittington, Amanda Joseph, and our son Finley James Whittington. "Whijo" is 29% Whittington, 33% Joseph, and 37% Internet. Quite Web 2.0 of us.