PanaramioStatsProject
From Rory.wiki
Contents |
Panaramio Stats Project
Goal
Automate the process of tracking clicks on panaramio photographs, so we can tell how many of a given set of photographs have been clicked on over a given time period
Thoughts
Panaramio can show photos per user based on a URL of http://www.panoramio.com/user/210736 . from that base URL the subsequent pages of photos are displayable by adding &photos_page=<x> to the URL .
So parse that first page get the number of pages (shown on the first page) then iterate over each page getting the ID of the photo, the name of the photo and the number of times that it has been viewed.
Implementation Notes
Obvious requirement is a parses for HTML so we can programatically move through the page and extract the relevant information. Nokogiri works pretty well from this point of view.
So using nokogiri we can construct XPATH queries to get the relevant elements from the page.
Using firebug is an easy way to identify the path to an element.
Table where the photos are stored is //body/div/div/div/table[@id="photos_cont"]
And within that table the bits we're looking for are
Title of the photo = '//div[@class="thumb_box"]/a/img[@class="photo"]')[photo].attribute('title')
Number of times it's been clicked = '//div[@class="subcaption"]')[photo].content
Relative URL of the picture '//div[@class="thumb_box"]/a')[photo].attribute('href')
Code
This code does the basic job of setting up the job and pulling the stats from a single page on panoramio for a supplied user_id. Required additions are for code which detects the number of pages for a given user and then does that many iterations of the basic parse_page method
class PanoramioStats def initialize(user_id) require 'rubygems' require 'nokogiri' require 'logger' require 'open-uri' @pan_stats_log = Logger.new('pan_stats.log') @pan_stats_log.level = Logger::DEBUG @user_id = user_id @output_report = File.new('panoramio-report' + Time.now.gmtime.to_s.gsub(/\W/,'') + '.txt','a+') @output_csv = File.new('panoramio-report' + Time.now.gmtime.to_s.gsub(/\W/,'') + '.csv','a+') end def parse_page(page_num) photos = Hash.new uri = URI.parse('http://www.panoramio.com/user/' + @user_id.to_s + '&photos_page=' + page_num.to_s) content = uri.read page = Nokogiri::HTML(content) table = page.search('//body/div/div/div/table[@id="photos_cont"]') num_photos = table.search('//div[@class="thumb_box"]/a/img[@class="photo"]').length num_photos.times do |photo| key = table.search('//div[@class="thumb_box"]/a/img[@class="photo"]')[photo].attribute('id') title = table.search('//div[@class="thumb_box"]/a/img[@class="photo"]')[photo].attribute('title') value_text = table.search('//div[@class="subcaption"]')[photo].content value = value_text[/\d+/] url = "http://www.panoramio.com#{table.search('//div[@class="thumb_box"]/a')[photo].attribute('href')}" photos[key] = "#{url},#{title},#{value}" end photos.each do |key,value| url, title, number = value.split(',') @output_report.puts "Photo #{title} at URL #{url} was viewed #{number} times." @output_csv.puts "#{url}, #{title}, #{number}" end end end
