Google Sitemaps

Background

Google Sitemaps do not replace Google spidering your site. Google will still spider using its current methods. The Google Sitemap can only help the crawler do a better job because URLs to pages can be listed manually and tagged with other important information.

Benefits of a Google Sitemap

  • Could contain URLs that would be otherwise unreachable via the Google Spider – e.g. if there are pages with no links to them – they can be placed manually.
  • Can list the last modified date of a page.
  • Can list how often each individual page gets modified (e.g. always, hourly, daily, weekly, etc.)
  • Set a priority number (from 0.0 to 1.0) to each page. The priority you assign to a page has no influence on the position of your URLs in a search engine’s result pages. Search engines use this information when selecting between URLs on the same site, so you can use this tag to increase the likelihood that your more important pages are present in a search index.
  • You can have multiple Sitemap files that are referenced via a Sitemap index file.

Limitations

  • Each Sitemap file must have no more than 50,000 URLs and must be no larger than 10MB (10,485,760 bytes). You can compress that file using gzip (http://www.gzip.org/) but the original file must be no larger than 10MB. To list more URLs, you must use a Sitemap index file.
  • Each Sitemap index file may not list more than 1,000 Sitemaps. Therefore, currently, the maximum number of URLs per site are 1,000 x 50,000 = 50,000,000 URLs.

More Information about Google Sitemaps

From Google:

A Sitemap provides an additional view into your site (just as your home page and HTML site map do). This program does not replace our normal methods of crawling the web. Google still searches and indexes your sites the same way it has done in the past whether or not you use this program. A Sitemap simply gives Google additional information that we may not otherwise discover. Sites are never penalized for using this service. This is a beta program, so we cannot make any predictions or guarantees about when or if your URLs will be crawled or added to our index. Over time, we expect both coverage and time-to-index to improve as we refine our processes and better understand webmasters’ needs.

https://www.google.com/webmasters/sitemaps/docs/en/about.html

Example of a Sitemap index file (see that it points to the gzip files):

<?xml version="1.0" encoding="UTF-8"?>
  <sitemapindex xmlns="http://www.google.com/schemas/sitemap/0.84">
  <sitemap>
    <loc>http://www.example.com/sitemap1.xml.gz</loc>
    <lastmod>2004-10-01T18:23:17+00:00</lastmod>
  </sitemap>
  <sitemap>
    <loc>http://www.example.com/sitemap2.xml.gz</loc>
    <lastmod>2005-01-01</lastmod>
  </sitemap>
</sitemapindex>

Sitemap Protocol – released by Google in June, 2005 (still Beta)

Quote from Google:

“Please note that the Sitemap Protocol supplements, but does not replace, the crawl-based mechanisms that search engines already use to discover URLs. By submitting a Sitemap (or Sitemaps) to a search engine, you will help that engine’s crawlers to do a better job of crawling your site.

Using this protocol does not guarantee that your webpages will be included in search indexes. (Note that using this protocol will not influence the way your pages are ranked by Google.)”

http://google.com/webmasters/sitemaps/docs/en/protocol.html

Sitemap Protocol format

The Sitemap Protocol format consists of XML tags. All data values in a Sitemap must be entity-escaped. The file itself must be UTF-8 encoded.

A sample Sitemap that contains just one URL and uses all optional tags is shown below. The optional tags are in italics.

<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.google.com/schemas/sitemap/0.84">
  <url>
    <loc>http://www.example.com/</loc>
    <lastmod>2005-01-01</lastmod>
    <changefreq>monthly</changefreq>
    <priority>0.8</priority>
  </url>  
</urlset>

Also, it is possibly to NOT use the XML format for the Sitemap file and just list the URLs per line. You can name the text file anything you wish. Google recommends giving the file a .txt extension to identify it as a text file (for instance, sitemap.txt). You have the same limitation of 50,000 URLs as before.

Location of Sitemap Files

The location of a Sitemap file determines the set of URLs that can be included in that Sitemap. A Sitemap file located at http://example.com/catalog/sitemap.gz can include any URLs starting with http://example.com/catalog/ but can not include URLs starting with http://example.com/images/.

If you have the permission to change http://example.org/path/sitemap.gz, it is safe to assume that you also have permission to provide information for URLs with the prefix http://example.org/path/. Examples of URLs considered valid in http://example.com/catalog/sitemap.gz include:

http://example.com/catalog/show?item=23

http://example.com/catalog/show?item=233&user=3453

URLs not considered valid in http://example.com/catalog/sitemap.gz include:

http://example.com/image/show?item=23

http://example.com/image/show?item=233&user=3453

https://example.com/catalog/page1.html

URLs that are not considered valid are dropped from further consideration. It is strongly recommended that you place your Sitemap at the root directory of your web server. For example, if your web server is at example.com, then your Sitemap index file would be at http://example.com/sitemap.gz. In certain cases, you may need to produce different Sitemaps for different paths e.g. if security permissions in your organization compartmentalize write access to different directories.

Step-by-Step Guide How to do a Google Sitemap

http://www.sitemaps.your-tips.com/ – This is a site that shows you a video of a step-by-step guide on how to create a Google Sitemap.

Google Sitemaps (BETA) Help: Creating a Sitemap

http://google.com/webmasters/sitemaps/docs/en/overview.html

Another Example of a more thorough Sitemap file:

<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.google.com/schemas/sitemap/0.84">
<urlset>
	<url>
		<loc>http://www.google.com/BUILD</loc>
		<lastmod>2005-04-30T03:45:08+00:00</lastmod>
	</url>	
	<url>		<loc>http://www.example.com/catalog?item=74&amp;desc=vacation_newfoundland</loc>
		<lastmod>2004-12-23T18:00:15+00:00</lastmod>
		<priority>0.3</priority>
	</url>
	<url>
		<loc>http://www.example.com/catalog?item=83&amp;desc=vacation_usa</loc>
		<lastmod>2004-11-23</lastmod>
	</url>
</urlset>

Sitemaps Third Party Programs & Websites

http://code.google.com/sm_thirdparty.html

Additional Resources

Google Sitemaps (BETA) page
https://www.google.com/webmasters/sitemaps/docs/en/about.html

Google Sitemap Video Tutorial
http://www.sitemaps.your-tips.com/

How to Create a Google Sitemap
http://www.ryangrant.net/archives/how-to-create-a-google-sitemap

Create a Google Sitemap for your Web Site
http://www.developertutorials.com/tutorials/xml/google-sitemaps-050811/page1.html

SiteMapXML – How To Create and Submit Your Google SiteMap XML
http://www.sitemapxml.com/free-info.php

Google Sitemap
http://www.ecompal.com/resources/GoogleSiteMap.htm

Create Google sitemaps
http://www.likno.com/google-sitemap.html

Google Sitemap Tips
http://feeds.feedburner.com/HowToCreateAGoogleSitemapVideoTutorial

Inside Google Sitemaps
http://sitemaps.blogspot.com/

Leave a Reply