Search Engine Optimisation

Just dug this up, thought someone else on the planet may find it of interest.


 

Prepared by: Mark Cohen 

Overview

Basic Idea

The basic principle of search engine optimisation is to look at a website and to pretend you have to assess it?s value by reading text only and using only your mouse to navigate. There should be a way to reach all of your content from the homepage without using the keyboard. More specifically, there should be a way to reach any content by following a link that is contained within an anchor tag and that does not use JavaScript.

Bots, Crawlers, Spiders

All search engines use web-crawlers a.k.a. bots a.k.a. spiders to crawl the web and harvest content. The single most significant bot in the world must be the GoogleBot. Learn a lot more about the GoogleBot here. 

Page Rank

PageRank is a principle based on the premise that the web is an anarchistic democracy. If a site goes up that is important it will be linked to by other webmasters. A page that is more important than others is linked to more often. Thus the homepage generally has the highest PageRank on a site.When a page links to another page it is considered a vote towards the target page?s rank. More specifically, it is a weighted vote based on the source page?s PageRank. This is a nonlinear weighting with a page of 8 counting far more than a link from a page with a PageRank of 1. PageRank is significant in that it is a factor in the ordering of search results. All other things being equal, higher ranked pages come up before lower ranked pages in search results.  
PageRank is explained really well in detail here

Page Structure

Page Title

Page titles are very significant. They should always be set to be as relevant to the page content as possible. It seems they are more important than the keyword or description tags. Avoid putting other sites or company?s names in them and avoid wasting them on your own URL or site name (beyond the homepage). If page titles are short it will help to tack on a short sentence containing relevant keywords.

  • Keep the length of the page titles under 80 characters.
  • Page titles are shown in search results so keep them user friendly URLs
  • Do not have a dynamic URL as your homepage (ie: no querystring parameters etc, preferably an html page as your homepage. Also avoid having a homepage that does a response.redirect to another page as you will dilute / lose your page rank gained from people who link to you.
  • Do a server.transfer rather than a response.redirect.
  • Keywords found in URLs will make pages rank very highly. Seed URLs with keywords using folder structures that are relevant keywords. Folders do not have to exist, an http module can be written to handle beginRequest events and it can strip them out of the URL before a 404 happens.

Meta Tags

Keyword Tag

  • Keep less than 1024 characters
  • Ensure that keywords appear in the body of the page too. More often is better as the phrase will rank higher.
  • Seed pages with keywords ? use in alt text, component titles and names, wherever you can.
  • Remember to put plurals and singular and bots are generally not intelligent enough to recognise they are the same word
  • Keywords should be specific phrases, not just single words (single words are not specific enough and will be common to all your competitors)

Description Tag

  • Keep less than 250 characters
  • Start off with friendly text, then pad out with keywords if you don?t have enough chars
  • Make sure you accurately describe the content of your page while trying to entice visitors to click on your listing.
  • Include 3-4 of your most important keyword phrases, especially those used in your title tag and page copy.
  • Try to have your most important keywords appear at the beginning of your description. This often brings better results, and will help avoid having any search engine cut off your keywords if they limit the length of your description.

Robots Tag

The Robots Meta Tag is referred to by WebCrawlers and is useful in controlling the way the bots handle the content of the page. The following three descriptions are provided by Google and are not obeyed by all bots.

<META NAME=”robots” CONTENT=”noindex”>
- Googlebot will retrieve the document, but it will not index the document.
<META NAME=”robots” CONTENT=”nofollow”>
- Googlebot will not follow any links that are present on the page to other documents.
<META NAME=”robots” CONTENT=”noarchive”>
- Google maintains a cache of all the documents that we fetch, to permit our users to access the content that was indexed (in the event that the original host of the content is inaccessible, or the content has changed). If you do not wish us to archive a document from your site, you can place this tag in the head of the document, and Google will not provide an archive copy for the document.

Google will group all pages with the same meta tags and especially with similar content as ?similar pages? so you will lose your bandwidth in the search results if your tags are static and the same throughout your site.

Page Content

Page content should have plenty of valid text. Remember most people who link to you will link to your homepage. Your PageRank will therefore usually be highest on your homepage, especially for dynamic sites. This is a good reason to have at least summary text of you main content on your homepage.

Images and non-text content

WebCrawlers are not currently sophisticated enough to OCR images. Even if they were, the value added by this process would not be worth the effort considering the processor power and time that would be required What does influence indexing is the name given to an image as well as the alt text used in the IMG tag. Alt text is also used for accessibility browsers so it is a good idea to put in valid and descriptive alt text as a rule.  Flash and other multimedia content is also not indexed and a requirement for accessibility compliance is that a text-based script of all multimedia content be available for disabled site users. A text script will also be indexed so as a
rule, a transcript or script should always be available.  Take every opportunity to put in alt text or titles on all elements possible. 
Note: many document formats are crawled (eg: word, rtf, txt, pdf) but making html or text alternatives available is (as far as I can see) not currently considered spamming.

Page Body

All flash and image content should have alt text for disabled users as well as for bots and crawlers. Accessibility guidelines actually require a (text)
script of flash or multimedia content to be available. This is worth providing both for the users who require accessibility support (becoming a legal requirement) and because bots will spider them.

If you use frames, make sure you also have a ?noframes? section with as much text content as possible. Make the text legitimate and user friendly as anyone with a browser that does not support frames (eg: accessibility browsers) will see your noframes section. 

<frameset>

</frameset>
<noframes> 
        blah blah blah
</noframes>

There was a practice in the past, which was to put all description and keywords into a few hidden html elements. If you see this in your legacy pages you are probably best off removing it as it is not effective and puts you at risk of being banned as a spammer.
Examples of this would be:

  • comments padded out with keywords:
    <!– blah blah blah –>
  • hidden fields stuffed with keywords:
    <input type = hidden value = ?blah blah blah?>
  • useless javascripts full of comments (client-side scripting)
    Make use of an ?archive? page and a ?site map? to ensure that every page is always accessible from the homepage

?Do not?s: 

  1. Do not directly copy the text from your meta and keyword tags etc into the body as this is watched for and may get you penalized. 
  2. Do not do anything that ?feels dodgy? to try trick the search engines into listing your site higher. If what you are doing is regarded as underhanded by 
    the search engines, they will likely view it as spam and penalize or ban you. This can be VERY difficult to undo? 
  3. Do not list keywords anywhere except in your keywords meta tag. By “list” I mean something like – keyword 1, keyword 2, keyword 3, keyword 4, etc. There 
    are very few legitimate reasons that a list of keywords would actually appear on a web page or within the page’s HTML code and the search engines know this. 
    While you may have a legitimate reason for doing this we would recommend avoiding it so that you do not risk being penalized by the search engines. 
  4. Do not use the same colour text on your page as the page’s background colour. This has often been used to keyword-stuff a web page. Search engines can detect this and view it as spam. People do this using css and different styles with the same colours. Stay away, it?s not worth it. 
  5. Do not use multiple instances of the same tag. For example, using more than one title tag. Search engines can detect this and view it as spam. 
  6. Do not submit identical pages. For example, do not duplicate a page of your site, give the copies different file names, and submit each one. Search engines can detect this and view it as spam (and if you don?t get filtered they will group the pages as similar anyway). 
  7. Do not submit the same page to any engine more than once within 24hrs. 
  8. Do not use any keywords in your keywords meta tag that do not directly relate to the content of your page or pages reached through it. 

The significance of links

The keywords contained within a link have a huge influence on indexing and search results. The meaning or context of a link is largely based on the text within the hyperlink. A good example of this is the terror tactic called Google-bombing. Search Google for Miserable Failure and see ?Biography of President George W. Bush? come up in first place. This has relevance within your site as well as externally but you have great control of this internally and can boost certain pages by how they are referenced. 
Your database IDs are meaningless in querystrings, but links containing ?houses for sale in eastern suburbs Sydney? can do a lot for your pages. Pages referenced as
/view.asp?id=999
are meaningless to Google but
/view/houses/Sydney/eastern_suburbs/Maroubra/999?title=?3+bed+semi+with+garden+and+one+parking?
will be indexed and influence rank in the results.

Hyperlinks are the gold coins of search engine optimisation. 

  • The more links other people give you the richer you are. 
  • The less links you give away the richer you are. 
  • Links you give to yourself keep you as rich, they just put the wealth in different places. 

Links from others to you

As above, links to you count as votes for your significance and raise your rank. 

Links from you to others.

If your page has a lot of links to external sites, the PageRank calculation deducts rank as it assumes the things you are pointing to have significance as to why people have linked to you (you are a stop along the way rather than a destination). So more than a certain number of links out of your site will eventually start leaking your PageRank. To circumvent this, wrap your external links in a JavaScript. Bots aren?t smart enough to read script languages yet – they will (currently) ignore the links. Note: There is already discussion in some forums indicating that Google is starting to follow any links in your page
content.

Links between your pages

To make links effective a few things need to be considered as above. Also, you should make sure that links contain descriptive text to boost the target pages when indexed a subtle but illustrative example is:

Weak: Click here to browse bikes for sale in Sydney 
Better: Click here to browse bikes for sale in Sydney

Conclusion

A lot of work can be done on most production systems to boost their SEO. The process itself is iterative and experimental, and the gain is going to be a long-term (also incremental) process. Webmasters / programmers should commit to periodic review of their sites, and when deemed necessary repeat the process, and tweak what is done each time. Webcrawler technology is experimental, and the bots change every day. A well optimised site that is left untouched will not stay optimal as the metrics will change over time.


Leave a Reply

Powered by WP Hashcash