| « Big Buck Bunny | Big Buck Bunny DVDs » |
Google Analytics for Mobile Sites
I implemented tracking using Google Analytics for my company's mobile sites using a technique described by Peter van der Graff on his site. The technique involves performing a GET to to an image on Google's server and passing it a bunch of options. Incidentally this is because Javascript can perform gets of images but not gets for any other kinds of content (as an aside, this kind of protection seems usless since the server could return any kind of content in wants to the javascript even though the GET has an image in the url. Maybe someone could enlighten me).
Peter originally came up with the idea because he wanted to track hits to a RSS xml url (which also seemed strange to me since the rss aggregator could read it as many times as it wants and doesn't give much insight into the number of readers, but I digress), or to another type of file download (image, pdf, etc) which wouldn't trigger the javascript that Google uses for Analytics.
One important difference between his motives and mine were that I'm tracking hits to a mobile site. Doing analytics on the server side are important since most phones (in Japan at least) don't support javascript. I also, because of the differences in what I was doing, needed to make some changes to how his script worked. Since I'm not tracking downloads or rss hits, I care about things like sessions, language, and user agent (why Peter didn't also care about this I'm not sure).
So I modified his code as follows. I forward the language and user agent of the client to Google Analytics so that I can track these things properly. I also pass my own cookie number so that Google Analytics can aggregate page hits from the same user into a session. I also make use of the user var to track hits to different customer's web pages. The example is in PHP but it could be easily translated into another language.
Note that, because of the use of stream contexts, this code will require a version of PHP >= 4.3.0.
| $var_utmac=MOBILE_GOOGLE_ANALYTICS_CODE; //enter the new urchin code | |
| $var_utmhn=WEB_DOMAIN; //enter your domain | |
| $var_utmn=rand(1000000000,9999999999);//random request number | |
| $var_cookie=$session; //cookie number | |
| $var_random=rand(1000000000,2147483647); //number under 2147483647 | |
| $var_today=time(); //today | |
| $var_referer=$_SERVER['HTTP_REFERER']; //referer url | |
| $var_uservar=$storeinfo['storeid']; //enter your own user defined variable | |
| $var_utmp=$_SERVER['REQUEST_URI']; // request uri | |
| $urchinUrl='http://www.google-analytics.com/__utm.gif?utmwv=1&utmn='.$var_utmn.'&utmsr=-&utmsc=-&utmul=-&utmje=0&utmfl=-&utmdt=-&utmhn='.$var_utmhn.'&utmr='.$var_referer.'&utmp='.$var_utmp.'&utmac='.$var_utmac.'&utmcc=__utma%3D'.$var_cookie.'.'.$var_random.'.'.$var_today.'.'.$var_today.'.'.$var_today.'.2%3B%2B__utmb%3D'.$var_cookie.'%3B%2B__utmc%3D'.$var_cookie.'%3B%2B__utmz%3D'.$var_cookie.'.'.$var_today.'.2.2.utmccn%3D(direct)%7Cutmcsr%3D(direct)%7Cutmcmd%3D(none)%3B%2B__utmv%3D'.$var_cookie.'.'.$var_uservar.'%3B'; | |
| $header = ''; | |
| //Set the language to that of the client so analytics can track it. | |
| if (!empty($_SERVER['HTTP_ACCEPT_LANGUAGE'])) { | |
| $header = 'Accept-language: '.$_SERVER['HTTP_ACCEPT_LANGUAGE'].'\r\n'; | |
| } | |
| //Set the user agent to that of the client so analytics can track it. | |
| if (!empty($_SERVER['HTTP_USER_AGENT'])) { | |
| $header = 'User-Agent: '.$_SERVER['HTTP_USER_AGENT'].'\r\n'; | |
| } | |
| $opts = array( | |
| 'http'=>array( | |
| 'method'=>'GET', | |
| 'header'=>$header | |
| ) | |
| ); | |
| $handle = fopen($urchinUrl, 'r', false, stream_context_create($opts)); | |
| $test = fgets($handle); | |
| fclose($handle); |
13 comments
I’ve set up a Unit Test based on your example that’s not generating any Google Analytics data, although the HTTP request/response appears to work:
Get Google Analytics…
GA get request:
http://www.google-analytics.com/__utm.gif?utmwv=1
&utmn=1259442976
&utmsr=-
&utmsc=-
&utmul=-
&utmje=0
&utmfl=-
&utmdt=-
&utmhn=tha.artlogic.com
&utmr=
&utmp=/scripts/TestAGoogleAnalyticsOriginalCode_html.php
&utmac=UA-4635840-1
&utmcc=__utma%3D35105132.1106525547.1213378550.1213378550.1213378550.2%3B%2B__utmb%3D35105132%3B%2B__utmc%3D35105132%3B%2B__utmz%3D35105132.1213378550.2.2.utmccn%3D(direct)%7Cutmcsr%3D(direct)%7Cutmcmd%3D(none)%3B%2B__utmv%3D35105132.-%3B
GA get response: GIF89a
TestCase AGoogleAnalyticsOriginalCodeTestCase->testGoogleAnalytics() passed
The only remaining issue we're having is that the User-Agent info is not getting into the appropriate GA segments like "Browser", "Operating System", etc.
I am passing the User-Agent in the http header as per your example, but i'm not sure if there's some other way that your sample differs from Peter van der Graaf's with respect to the user agent value.
I have also tried to do so using the following curl() technique:
curl_setopt($handle, CURLOPT_HTTPHEADER, $httpOptions);
Here's a var_dump of httpOptions:
httpOptions: Array
array(1) { ["http"]
Thanks again for your assistance. KB.
http://tha.artlogic.com/test/scripts/TestAGoogleAnalyticsFetcher_html.php
GET /test/scripts/TestAGoogleAnalyticsFetcher_html.php HTTP/1.1
Host: tha.artlogic.com
User-Agent: Mozilla/5.0 (Macintosh; U; Intel Mac OS X; en-US; rv:1.8.1.14) Gecko/20080404 Firefox/2.0.0.14
Accept: text/xml,application/xml,application/xhtml+xml,text/html;q=0.9,text/plain;q=0.8,image/png,*/*;q=0.5,text/vnd.wap.wml;q=0.6
Accept-Language: en-us,en;q=0.5
Accept-Encoding: gzip,deflate
Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7
Keep-Alive: 300
Connection: keep-alive
Cookie: __utmz=94521817.1213383401.1.1.utmcsr=(direct)|utmccn=(direct)|utmcmd=(none); __utma=94521817.96527180171977230.1213383401.1214262273.1214265030.3; userCookie=3b445a5e-ded7-102a-86b6-fe550003005f
HTTP/1.x 200 OK
Date: Wed, 25 Jun 2008 17:34:26 GMT
Server: Apache/2.0.46 (CentOS)
X-Powered-By: PHP/5.2.4
Content-Length: 1088
Connection: close
Content-Type: text/html; charset=UTF-8
However, when i sniff a page with the GA Javascript on it, i get what i believe is the header that is missing from my Unit Test:
http://www.google-analytics.com/__utm.gif?utmwv=4.2&utmn=1632023345&utmhn=tha.artlogic.com&utmcs=UTF-8&utmsr=1440x852&utmsc=32-bit&utmul=en-us&utmje=1&utmfl=8.0%20r27&utmdt=43kix%20Mobile%20Movie%20Showtimes%20%26%20Gossip&utmhid=512814441&utmr=-&utmp=/test/wap/home_with_GA_Javascript.php&utmac=UA-4635840-2&utmcc=__utma%3D94521817.96527180171977230.1213383401.1214262273.1214265030.3%3B%2B__utmz%3D94521817.1213383401.1.1.utmcsr%3D(direct)%7Cutmccn%3D(direct)%7Cutmcmd%3D(none)%3B
GET /__utm.gif?utmwv=4.2&utmn=1632023345&utmhn=tha.artlogic.com&utmcs=UTF-8&utmsr=1440x852&utmsc=32-bit&utmul=en-us&utmje=1&utmfl=8.0%20r27&utmdt=43kix%20Mobile%20Movie%20Showtimes%20%26%20Gossip&utmhid=512814441&utmr=-&utmp=/test/wap/home_with_GA_Javascript.php&utmac=UA-4635840-2&utmcc=__utma%3D94521817.96527180171977230.1213383401.1214262273.1214265030.3%3B%2B__utmz%3D94521817.1213383401.1.1.utmcsr%3D(direct)%7Cutmccn%3D(direct)%7Cutmcmd%3D(none)%3B HTTP/1.1
Host: www.google-analytics.com
User-Agent: Mozilla/5.0 (Macintosh; U; Intel Mac OS X; en-US; rv:1.8.1.14) Gecko/20080404 Firefox/2.0.0.14
Accept: image/png,*/*;q=0.5
Accept-Language: en-us,en;q=0.5
Accept-Encoding: gzip,deflate
Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7
Keep-Alive: 300
Connection: keep-alive
Referer: http://tha.artlogic.com/test/wap/home_with_GA_Javascript.php
Do i need to flip the Javascript on and off for each PHP page i want my code to work with in order for the headers to go out correctly, or is there something else i've missed?
Thanks again, KB
PHP 5.2.4 (cli) (built: May 7 2007 16:01:37) with CURL extension, fopen enabled
1) fopen()
context = stream_context_create($httpOptions);
$handle = fopen($this->fUrchinUrl, 'r', false, $context);
2) curl():
curl_setopt($handle, CURLOPT_USERAGENT, $_SERVER['HTTP_USER_AGENT']);
GA is still not recognizing user sessions: i.e, each hit is seen as an Absolute Unique Visitor, so none of the Visitor Trending or site crawling stats make sense
here are the cookie values i'm sending, with a uniquely generated, 8 digit session id of: "73305051" in this case (my session management code also handles cookieless browsers):
&utmcc=__utma%3D73305051.1572483978.1215112280.1215112280.1215112280.2%3B%2B
__utmb%3D73305051%3B%2B__utmc%3D73305051%3B%2B__utmz%3D73305051.1215112280.2.2.
utmccn%3D(direct)%7Cutmcsr%3D(direct)%7Cutmcmd%3D(none)%3B%2B__utmv%3D73305051.unit+test%3B
in case it helps others, the tricks that got the USER_AGENT working for me were:
1) using CURL as follows:
// using curl()
$handle = curl_init($this->fUrchinUrl);
curl_setopt($handle, CURLOPT_RETURNTRANSFER, true);
curl_setopt($handle, CURLOPT_HTTPHEADER, $httpOptions);
curl_setopt($handle, CURLOPT_USERAGENT, $_SERVER['HTTP_USER_AGENT']);
$this->fContent = curl_exec($handle);
curl_close($handle);
2) being careful to set the Host in the HTTP headers like so:
$header = 'Host: www.google-analytics.com\r\n';
if (!empty($_SERVER['HTTP_USER_AGENT']))
{
$header .= 'User-Agent: '.$_SERVER['HTTP_USER_AGENT'].'\r\n';
}
//Set the language to that of the client so analytics can track it.
if (!empty($_SERVER['HTTP_ACCEPT_LANGUAGE']))
{
$header .= 'Accept-language: '.$_SERVER['HTTP_ACCEPT_LANGUAGE'].'\r\n';
}
$httpOptions = array(
'http'
'method'
'header'
'request_fulluri'
)
);
If you look at other sites, the entire _utmcc string doesn't change on subsequent hits to the site where as the code is generating random numbers and the current number of seconds since the epoch and putting it into the _utmcc which is probably causing the session to change for every hit to the site. utmcc seems to be built using utm? variables. Analytics javascript saves utma, utmb, utmc, and utmd strings as cookies but it seems that based on user settings not all are passed to the server but a is always passed. I also haven't seen a site where utmz is not passed.
You may need to generate a "utma" number like the following and save it to your session so it can be passed to the analytics.
_utma
aaaaaaaaa.bbbbbbbbb.cccccccccc.dddddddddd.eeeeeeeeee.fff
_utmz uses the first part of the utma
aaaaaaaaa.ddddddddd.ggg.hhh
utmv (user var, optional) also uses the first part of the utma
_utmv
aaaaaaaaa.$user_var
f, g, and h seem to be any number up to 3 digits. so 1, 23, or 248 would be ok. If they have any meaning I don't know what it is.
Could you please tell me what mean this variables and how I can get it in my script?
$var_cookie=$session;
$var_uservar=$storeinfo['storeid'];
Thanks!
Why don't you add that URL to an image in the page, instead of fetching it via curl?
I thought about adding it as an image tag to the page as that might give better results in terms of getting the location and I planned to update this post.
Part of the reason I did it this way myself was to avoid problems with different mobile browsers especially in Japan. I'm not totally sure what they would do if I included the url in the page. Given that the image returned is supposed to be an empty or one pixel image it would probably work but might cause unexpected problems with some phones.