XML SiteMap Print - A Simple Drupal Module for Beginners

One of the problems with Drupal is there are three different Drupal modules that generate sitemaps for your website. So why is that a problem? Well, none of them appear to me to be perfect. Maybe I am using them incorrectly - if so, *PLEASE* let me know so I can correct my comments! The three modules are:

Lets have a look at them in turn.

Site Map

On the face of it this module does a good job and creates a default /sitemap page on your site which nicely lists all your books and your menus. But hold on, where are links to nodes? As far as I can see there is no method of listing every page on your website - if it isn't hanging off a menu, it can't be accessed from your sitemap. That's one pretty darned stupendous shortcoming. I stand to be corrected here if I am using the module incorrectly - but even if this feature is there somewhere, hidden out of view, it escaped me which itself suggests a lack of an intuitive feel to the module. So, until proven otherwise, using this module is a non-starter.

SiteMenu

This module currently doesn't have an official Drupal 6 release. Therefore I can't consider it for a live site.

XML SiteMap

This module looks the most powerful - it creates a standard format xml sitemap file which can be submitted periodically to numerous search engines. In addition there is a facility to add all nodes to the xml file. The downside is there appears no integrated way of outputting this file as a page on a website. An XSLT file can be created but this would not be output as a standard Drupal page. This presents a lovely opportunity for me to create a very simple example module, geared towards Drupal module developer novices which will print to the screen the contents of the xml file.

So, we now have our requirements. First, you will need to download the XML SiteMap module and install it on your rig. As per normal, I am making the assumption that you are comfortable installing contributed Drupal modules from the Drupal website. If you are not, you need to do some mugging up before we go any further. Buy yourself a good Drupal handbook by checking out my reviews here

Getting Started

First thing we need to do is create a new directory for our module which we will call xmlsitemapprint. Not the most inspirational name I have to admit! Create the directory and create .info, .module and .tpl.php files which are placeholders for later.

$ mkdir sites/all/modules/xmlsitemapprint
$ touch sites/all/modules/xmlsitemapprint/xmlsitemapprint.info
$ touch sites/all/modules/xmlsitemapprint/xmlsitemapprint.module
$ touch sites/all/modules/xmlsitemapprint/xmlsitemapprint.tpl.php

The .info file contains basic information about the module. Open up our xmlsitemapprint.info file and copy in the following content.
name = "XML SiteMap Print"
description = "XML SiteMap Print Utility"
core = 6.x
php = 5.2

Here I am giving a name and a description to our module (rather obviously) and I am saying this module has been written to work with Drupal version 6.x and using at least PHP version 5.2 or above. If you are using an earlier version of PHP then this module will not work. You should upgrade your system if possible.

Ok, that was easy. Now we'll move on to the .module file which contains the business logic and the Drupal hooks. So open xmlsitemapprint.info and first of all add a pretty header to the file. A nice gentle start.

<?php
/**
*    MODULE: XML SiteMap Print - A print utility for the XML SiteMap module
*    Copyright (C) 2009 <a href="http://www.badzilla.co.uk" title="www.badzilla.co.uk">www.badzilla.co.uk</a>
*
*    FILE:    xmlsitempapprint.module
*    VERSION: 0.1.0
*    AUTHOR:  Badzilla
*    DATE:    October 2009
*
*    This program is released under the GPL license
*
*
*    This program is distributed in the hope that it will be useful,
*    but WITHOUT ANY WARRANTY; without even the implied warranty of
*    MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
*
*
*/
?>

The first piece of 'actual, doing something' code is our implementation of hook_menu(), and a menu callback which will simply print the salutation "Hello".
<?php
/**
* Implements hook_menu().
*/
function xmlsitemapprint_menu() {

   
$items = array();
   
   
$items['sitemap'] = array(
       
"title"            => "Sitemap",
       
"page callback"     => "xmlsitemapprint_view",
       
"type"             => MENU_CALLBACK,
       
"access arguments"     => array('access sitemap'),
    );

    return
$items;
}


function
xmlsitemapprint_view() {

    return
t('Hello');

}
?>

We are now ready to test our simple module. First, the module needs to be enabled so go to admin/build/modules and check the box next to XML SiteMap Print in the Other group of modules. Next, test it out by pointing your bowser at http://localhost/sitemap - you should see the Hello message. If you don't, you have made a mistake somewhere so retrace your steps.

A few words about the code. The function xmlsitemapprint_menu() is an implementation of hook_menu and is used to specify url paths used by a module. This is specified in the defined array - and of particular interest are two of the array values - 'page callback' and 'access arguments'. The page callback is giving the name of a function to call once a user points their web browser to /sitemap, whilst the access arguments is saying this url will be avilable to all users with 'sitemap' access - more on this later.

Ok - we now have proved the concept and are confident what we are doing is correct. Time now to apply the business logic which will live in the xmlsitemapprint_view() function. We know that the XML SiteMap module writes the site information into a sitemamp.xml file in the root directory. So we need to read this file, and thankfully there is an easy way of doing this using the PHP simplexml_load_file() function (NOTE: This is only available from PHP 5.0 - hence my earlier dependency on this version). So we need to rewrite the xmlsitemapprint_menu() function and apply the business logic.

Before we go any further, it would be wise to check that the XML SiteMap module is also working - and this next step has the bonus of generating the sitemap.xml file should it not already exist - it won't if you haven't run cron since the module was installed. So lets head off potential problems by pointing our browser at http://localhost/sitemap.xml. This will generate the sitemap in xml format, show it on the screen, but actually place this xml file at sites/default/files/xmlsitemap/xsm-en.xml. If you are not using English, your filename may be different so you will have to modify the module we are currently writing.

Next step is to get rid of that "Hello" placeholder. The entire function is rewritten below - so replace it in your code.

<?php
function xmlsitemapprint_view() {

   
$xmlpath = $_SERVER['PATH_INFO'] . "sites/default/files/xmlsitemap/xsm-en.xml";

    if (!
file_exists($xmlpath))
       
$res = NULL;
    elseif ((
$sitexml = simplexml_load_file($xmlpath)) === FALSE)
       
$res = NULL;
    else
       
$res = $sitexml->url;

    return
theme('xmlsitemapprint', $res);

}
?>

This is the business logic. In fact there isn't much logic because this module doesn't do much. Central to the function is the simplexml_load_file() function which loads the entire xml file and returns an object to a class of SimpleXMLElement. How about that? We are using OO code without you even realising it! The xmlsitemapprint_view() function returns a reference to an array of the sitemap objects. However, you will note that it is wrapped in a function call to theme(). This is fundamental to good coding practice - we are separating the business logic from the presentation of the information. The presentation logic will be held in a separate template file - more of that in a few minutes.

Before we develop that notion further, lets go back to permissions. We want to be able to select who has access to this page. Maybe, under certain circumstances, we would only want registered users to be able to see our sitemap output. Or maybe we have some roles defined and only those privileged users should be able to see our sitemap output. Having the option to specify who can see what is important in any Drupal module. Thankfully, it is easy to implement. Add the code below to the .module file

<?php
/**
* Implements hook_perm()
*/
function xmlsitemapprint_perm() {
    return array(
   
'view sitemap',
    );
}


/**
* Implements hook_access()
*/
function xmlsitemapprint_access($op, $node, $account) {
    switch (
$op) {
    case
'view':
        return
user_access('view sitemap', $account);
    }
}
?>

The hook_perm() callback defines all the possible permissions for a module - and these will appear under admin/user/permissions. For this module all we need is the facility to view the output. - so that is defined with our "view sitemap" string. The hook_access() callback performs the necessary checks that a particular user trying to access our /sitemap page actually does have permission.

Now back to themes. To reiterate, it is important that the business and the presentation logic is kept apart. So we need to tell Drupal where our theme template is. To do that, add the following function to our .module file.

<?php
/**
* Implements hook_theme().
*/
function xmlsitemapprint_theme() {
    return array(
       
'xmlsitemapprint' => array(
       
'template' => 'xmlsitemapprint',
       
'arguments' => array(
           
'xml' => NULL,
            ),
        ),
    );
}
?>

This is telling Drupal our template is named xmlsitemapprint and the .tpl.php will be added on automatically. In addition, an argument of "xml" will be passed to the template - this is the reference to the array of sitemap objects we will need to display in the template.

We are now ready to code our template. Open the .tpl.php file we created earlier and populate as below. It is a simple loop through the referenced array, outputting the url.

<?php
/**
*    MODULE: XML SiteMap Print - A print utility for the XML SiteMap module
*      Copyright (C) 2009 <a href="http://www.badzilla.co.uk" title="www.badzilla.co.uk">www.badzilla.co.uk</a>
*
*      FILE:       xmlsitempaprint.tpl.php
*      VERSION:  0.1.0
*    AUTHOR:      Badzilla
*    DATE:      October 2009
*
*      This program is released under the GPL license
*
*
*      This program is distributed in the hope that it will be useful,
*      but WITHOUT ANY WARRANTY; without even the implied warranty of
*      MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
*
*
*/


   
if (empty($xml))
        echo
t('There is no sitemap.xml available. Either you have no content yet, or you need to run cron to generate the file');
    else
        foreach(
$xml as $value)
            echo
l($value->loc, $value->loc) . "<br />";
?>

You could of course add your own formatting in here if you want to make it look a little more sophisticated.

Because we edited the .module file as we went along, you may be feeling a little confused. So, here is the .module file again in its entirety so you can easily cut and paste it into your own website.

<?php
/**
*      MODULE: "XML SiteMap Print - A print utility for the XML SiteMap module
*      Copyright (C) 2009 <a href="http://www.badzilla.co.uk" title="www.badzilla.co.uk">www.badzilla.co.uk</a>
*
*      FILE:       xmlsitemapprint.module
*      VERSION:  0.1.0
*      AUTHOR:      Badzilla
*      DATE:      October 2009
*
*      This program is released under the GPL license
*
*
*      This program is distributed in the hope that it will be useful,
*      but WITHOUT ANY WARRANTY; without even the implied warranty of
*      MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
*
*
*/

/**
* Implements hook_menu().
*/
function xmlsitemapprint_menu() {

   
$items = array();
   
   
$items['sitemap'] = array(
       
"title"               => "Sitemap",
       
"page callback"    => "xmlsitemapprint_view",
       
"type"                => MENU_CALLBACK,
       
"access arguments" => array('view sitemap'),
    );

    return
$items;
}


function
xmlsitemapprint_view() {

   
$xmlpath = $_SERVER['PATH_INFO'] . "sites/default/files/xmlsitemap/xsm-en.xml";

    if (!
file_exists($xmlpath))
       
$res = NULL;
    elseif ((
$sitexml = simplexml_load_file($xmlpath)) === FALSE)
       
$res = NULL;
    else
       
$res = $sitexml->url;

    return
theme('xmlsitemapprint', $res);

}


/**
* Implements hook_theme().
*/
function xmlsitemapprint_theme() {
    return array(
       
'xmlsitemapprint' => array(
           
'template' => 'xmlsitemapprint',
           
'arguments' => array(
            
'xml' => NULL,
        ),
     ),
    );
}


/**
* Implements hook_perm()
*/
function xmlsitemapprint_perm() {
    return array(
       
'view sitemap',
    );
}


/**
* Implements hook_access()
*/
function xmlsitemapprint_access($op, $node, $account) {
    switch (
$op) {
        case
'view':
            return
user_access('view sitemap', $account);
    }
}
?>

Before we can try everything out we need to rebuild the menus. This is achieved by going to admin/build/modules - you don't need to do anything when you are there, just loading that page automatically rebuilds the menus. Then we need to set the permissions for our sitemap output page. I am going to have mine unrestricted, but you could decide on another approach. Go to admin/user/permissions and scroll down until you see this module. Then tick the boxes to set the permissions you need. Save these and you are ready to test the module out. Point your web browser to http://localhost/sitemap and hey presto, your sitemap will magically appear.

UPDATE: Possibility XMLSiteMap gets its Knickers in a Twist

During the development of this module, I noticed that XMLSiteMap was refusing to rebuild the sitemap.xml file when I added new content and ran cron. So I deleted sitemap.xml to see if it would be correctly recreated after a cron run. It wasn't. I spent / wasted many hours trying to trace this bug until I stumbled upon this thread here. I think I may have upgraded the version of XMLSiteMap which caused the module to get muddled up, and this was the cause of my woe - so be careful, it could happen to you too. The solution is actually quite easy - just visualise the xml file by pointing your browser to sitemap.xml.