Ik_guy

Geekskillz_logo1


"Everything that boots is beautiful."


Aug 22
2007

Sitemaps In Ruby on Rails

Tags: Programming   Ruby on Rails


If you’re using Ruby on Rails to build your website, then you’ve probably got a lot of dynamically generated pages. In this blog, I have one page for every article I write. I also have a page for each month, and a page for each year. I also have a page for each tag which lists all the articles with those tags. It isn’t too much, so Google's web crawler can keep up with it and get everything indexed.

But I have another site with hundreds of individual pages, and Google’s crawler stops indexing the site after only a few dozen pages, missing the important ones. If one of my pages isn't in Google's index, then it isn't working for me! It's just wasting hard drive space. It was obvious that I needed to tell Google what to crawl by giving it a sitemap. Read Google's docs about sitemaps, and be familiar with Google's Webmaster Tools in order to use submit sitemaps. Basically, a sitemap acts as a table of contents for Google, so it can understand which pages it should be looking at when it visits your site.

Because Rails is so awesome, there’s an easy way to generate sitemaps using Ruby. It should only take you a few minutes to write your sitemap code once you have a look at how it’s done. I’ll give a brief overview here.


In the same way that you can generate HTML files using .rhtml files, you can create XML files using .rxml. First, let’s look at Google’s sitemap format to see what kind of XML document we need to build. Here’s an example sitemap:

<?xml version="1.0" encoding="UTF-8"?>

<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
<url>
<loc>http://www.geekskillz.com/tags/1</loc>
<priority>0.9</priority>
<changefreq>weekly</changefreq>
</url>
<url>
<loc>http://www.geekskillz.com/tags/2</loc>
<priority>0.8</priority>
<changefreq>weekly</changefreq>
</url>
...
</urlset>

Each page you want Google to crawl is contained in a url element, which contains a mandatory loc element and a bunch of optional elements. I use the priority element so that the more important pages get crawled if Google is too lazy to crawl my entire site, which is usually what happens. It’s like saying, “Please index this one first, and if you have time, here are the other ones you should index.”

Priority is a value between 0 and 1, and the values are compared with each other. So setting all your url’s to a priority of 1.0 is pointless. 1.0 doesn't mean that your web pages are super important. It would mean that all your pages are equally important in relation to each other, which doesn’t help the web crawler at all. It wants to know where it should start crawling, not how important you think your web page is compared to everyone else on the internet. We know your mom says you're special, but Google doesn't consider her to be credible on these matters.

The changefreq element is another hint to Google, suggesting how often it should return to find new content on your website. It’s something that might get ignored, but it’s probably worth using. Google’s sitemap documentation says that these hints may be ignored, so don’t spend too much time thinking about them.

Now to the Ruby code. To create a sitemap that keeps up to date on your site’s content, you can use a .rxml file. I named mine sitemap.rxml. The file is written in Ruby, and uses the xml object to create the XML elements in the output. Here’s a small example that shows how it basically works:

xml.instruct! :xml, :version=>"1.0" 

xml.outside{
xml.inside("Hello World!")
}

This creates this output XML file:

<?xml version="1.0" encoding="UTF-8"?>

<outside>
<inside>Hello World!</inside>
</outside>

So, you create an element by calling the xml object with the name of the element you want to create ("outside" and "inside" in the above example). To put a value inside the element, you pass it in as an argument (like "Hello World!" in the example). Easy pickings!

And really, that’s all you need to learn to start building your sitemap. So, here’s an example .rxml file that creates a sitemap:

xml.instruct! :xml, :version=>"1.0" 

xml.urlset(:xmlns => "http://www.sitemaps.org/schemas/sitemap/0.9"){
# High priority pages:
@important_tags.each do |tag|
xml.url{
xml.loc("http://www.website.com" + tag_path(tag))
xml.priority(0.9)
xml.changefreq("weekly")
}
end

# Low priority pages:
# We want the priority value to range from 0.2 to 0.8
@priority_multiplier = 0.6 / @low_pri_pages.size.to_f

@ low_pri_pages.each_with_index do |tag, i|
xml.url{
xml.loc("http://www. website.com" + tag_path(tag))
priority = 0.8 - (i * @priority_multiplier)
xml.priority( "%.2f" % priority )
}
end
}

This is doing something a bit fancy with the priority value. The @low_pri_pages array has been sorted (in the controller) in order of most important to least important (based on some criteria). Then the priority value is calculated based on the order of the array. So when this .rxml file is run, we’ll get something like this:

  ...

<url>
<loc>http://www.website.com/tags/10</loc>
<priority>0.72</priority>
</url>
<url>
<loc>http://www. website.com/tags/11</loc>
<priority>0.71</priority>
</url>
<url>
<loc>http://www. website.com/tags/12</loc>
<priority>0.70</priority>
</url>
...

If that part of the example is unclear, forget about it. My point is this: You can use any Ruby code here to generate the sitemap with as much dynamicness (is that a word?) as you like. I chose to dynamically generate the priority value, just because that's how I get my kicks.

Note that I am using Rails 1.2, so I'm using resource helpers like tag_path to get the URLs. If you're not using these helpers, then you can generate your <loc> values using the url_for function like this:

      xml.loc("http://www.website.com"

+ url_for(:controller => 'tags', :action => 'show', :id => tag))

Before you run off and create your own sitemap, you should think about where to put it. Which controller owns it? What does the routes.rb file look like?

Google recommends that sitemap.xml be placed at the root of your website. For example, http://www.website.com/sitemap.xml is how Google will look for it. So, your routes.rb file should setup a route to that location to tell your Rails app which controller to use to find the sitemap.rxml file. Here’s what I added to the top of my routes.rb file:

map.connect 'sitemap.xml', :controller => 'products', :action => 'sitemap'

So, I created an action in my products_controller that is responsible for creating the sitemap. Done.

Let me know if you have any tips on using sitemaps with Rails apps. I didn’t spend more than a day on it, so I’m sure there are plenty of useful tips.



StumbleUpon This! Bookmark This Article Digg This Story

Be the first to post a comment:



Post a comment:

Name:
E-mail: (will not be sold or published)
Website: (Optional)

Your comment:

Are you human?
Please enter the
text in the picture:


About Me

ThinkGeek

Feed-icon Technorati


Loot For Geeks:
4inkjets Great Prices and Best Quality!
Man's Wig! All size heads! Handsome! Sideburns! Modacrylic Fiber!
Protection
!
Make money online selling grit! Famous men sell grit!