Content

sitemap.xml and robots.txt

Pagegen can generate a sitemap.xml http://www.sitemaps.org/ file and handle a robots.txt http://www.robotstxt.org/ file. Both files are for use by web crawlers/spiders and are useful for SEO.

sitemap.xml

Through configuration settings it is possible to setup generation of a sitemap.xml file. The sitemap.xml will be placed in the web root folder.

Site settings

The following settings must be configured to turn on sitemap generation, set these either globally in pagegen.conf or per site in site.conf.

 # This turns on sitemap generation
 create_sitemap_xml=1

 # This is required for the sitemap URLs to make sense
 site_base_url=http://mysite.com

Page settings

In addition to the site settings each content page may specify certain variables to affect its listing in the sitemap.xml. These are set in the page header.

Variable nameValuesDescription
omit_page_from_sitemap1If set the page will not be listed in the sitemap
page_modified_dateYYYY-MM-DDSpecifies the date the content page was modified, default will use the actual date of the file, this variable is useful for overriding, for instance if the content file is a script it could set the date to the date the script ran, instead of the last time the script file was modified
sitemap_page_change_freqalways, hourly, daily, weekly, monthly, yearly or neverHow often the page is likely to change
sitemap_page_priority0.0 – 1.0The priority of this page relative to other pages on the site. Default is 0.5

Example page header

 page_modified_date=2009-05-11
 sitemap_page_priority=0.2
 =====
 h2 This page is just for show

robots.txt

Simply create a robots.txt file (see robotstxt.org http://www.robotstxt.org/) and place it in the directory/templates directory, on generation the file will be detected and copied to the web root folder, which is where it belongs.

Last changed 2010-11-08 19:39