Mark Cohen is a CIO at Australia's largest online retailer and is a hands-on, sleeves-rolled-up, code-cutting geek. He lives in Sydney, Australia with his wife and boys and can sometimes be spotted puffing and panting as he runs at Maroubra Beach

Archive for the 'SEO' Category

Regrets

cartoon showing regrets

I nicked this awesome little cartoon from xkcd.com – one of my favourite “entertainment” blogs.  I think I love it because it’s just so true.  More often than not we regret the things we didn’t do.  This holds true for ski trips (even if they hospitalise my team members ;) ) as much as work or social interactions or one-off opportunities.  Especially one-off opportunities.

Tony Robbins, on of those “Life Coaches” who used to be all over TV a few years ago used to say that it’s all about taking a step.   Then it’s al about taking the next step.  and so on.  If you have a goal, and commit to taking one step a day to get closer to it, you’d be surprised how far you can get in a month.

Here’s his video from TED in 2006.  Check it out.  the guy he engages with in the front row is Al Gore :)

CMS and SEO

Its amazing how many RFQs I am seeing that say “SEO” or “must have SEO” or “Must be Search Engine Optimized”. These are in the same RFQs where the prospective client is asking for a “fully flexible” solution. We probably need a form – letter response to send people explaining that the two concepts are mutually exclusive.

We cannot guarantee the content you put into a free-form wysiwyg editor is going to be search-engine friendly. We cannot even guarantee that it will always be compatible with all the browsers on the market, especially if you “roll your own” css. We cannot guarantee that the site you link to will exist tomorrow, nor that the site that links to you will still link to you tomorrow. We can’t even guarantee that Google will crawl your site. We especially cannot guarantee that the search terms people are typing in are the ones that your content is optimised for.

What we can do, however, is provide you with guidelines as to how you should be structuring your content so that it makes the most sense to a bot that can’t see colour or the fantastic work that our creative team comes up with. We can also put you in touch with some fantastic search engine marketing consultants who will help you with your in-page optimisations and just as importantly, with your paid search marketing strategies. If you are prepared to commit to spending a hundred grand on a web project, why not budget on a hundred and ten grand and run some optimised adwords campaigns.

Think of your new CMS as something like you standing on the side of the Cahill Expressway at 5:30pm with a sign that says “Apples for sale”. Thats your new shop on the web. How are you going to get anyone to stop? Spend a little and get a professional to throw some nails on the freeway in the right place. How d’you like them apples? :P

SEO at a glance


A friend of mine has put together a website selling car parts online. The site is not doing well at all and so I offered to take a quick look at it. The following is what I fed back to him after a five minute review. If you see anything glaringly obvious and catastrophically bad that I have missed, please drop me an email or comment and I’ll forward on to him :) He’s a web novice and so I’m starting off with the very basics

  1. The pages are structured well although the pages are a bit heavy. They are good enough though.
  2. On the sidebar there are a whole lot of links to Ford, NRMA, Norton, etc. They point to those sites and these links are on every page. If you can remove the links or point them to pages inside the site, that would be better. The images are fine, it’s just the links out to other people’s sites that you should get rid of.
  3. Get rid of the link to eHub.com.au in the footer of the site.
  4. Try and have as few links that go outside of the site as possible
  5. The site is listed on Google but they haven’t crawled beyond the homepage yet. It takes a while to get crawled,ultimately it’s a waiting game
  6. Get as many people as you can with relevant sites to put up backlinks to the site with the words "car parts" in the link:
  7. The html for the link you need to get from people is like this:
    <a href=”http://www.parts4cars.com.au” title=”car parts online”>car parts</a>

  8. The guys who set the site up have left the packaged products meta data in the meta tags. Meta tags are tags in the head section of web pages that describe the page. There are two important tags – keywords and description. Keywords is blank and description is “The powerful shopping cart software for web stores and e-commerce enabled stores is based on PHP / PHP4 with SQL database with highly configurable implementation based on templates.” This should be fixed to reflect the site’s content. They are setting the page headings on each page but not the keywords or descriptions. I would regard this as defective product and ask them to fix it. The site hasn’t been fully crawled yet so fixing the meta tags should be a priority.
  9. The links and images don’t use titles. When you hold the mouse over a link it does not show a tooltip. You can set the tooltip to show something like buy [product] online which will target the extra keywords. Similarly for images. Search engines don’t know what the images are so they look for the “alt” text which describes the image. These are not imperative fixes but would be something to look at soon.
  10. If possible, I would swap the page title elements around. Right now it says “Parts 4 Cars – Air filter”. I would change it to "Air Filter – Parts 4 Cars" . Even though the site is called "parts 4 cars" I would consider using "parts for cars" as it is a much better match compared to what people would type into a search engine.
  11. When you go to a product detail page, the Meta description is set properly but still no keywords. I would ask this to be fixed as defective too.
[Listening to: Riding with the King - B.B. King & Eric Clapton - Riding with the King (04:23)]

Where flies land.

Just doing a bit of an analysis looking at My Pet Project and a competitive site. The point of analysis is this: Two nearly identical websites are set up around the same time. Both run on good enough servers for speed not to be a factor. Both offer roughly the same functionality. Neither offers a particularly impressive UI and both are graphics free and visually bland. Neither is given any form of marketing that could be considered significant. Yet ours succeeds beyond all expectations and the other is a dismal flop.

This has given birth to Cohen’s Laws of The information Highway.
1. Nothing stops on the highway without reason.
2. Farm stalls do not do business on the highway.
3. Roadkill can catch some attention but it is fleeting and usually not repeated.
4. Road rage on the information highway outweighs roadrage on tarmac across the planet by two orders of magnitude.
5. Trolls live under every bridge.
6. Getting noticed on the information highway is not always a good thing.
7. Nobody waits at red lights.
8. Speed limits piss people off.
9. There are people who want to hijack you at every intersection.
10. If you’re looking for Brian then everybody is Brian and so is their wife (apologies to Monty Python).

Harrah! PR at last!

Hehehe! The legendary PR update happened at last! My little blog went from 0 to 3. I’m so happy you’d think I won the lottery. Well, a small lottery but still… I set this blog up about three months ago, and only just got my first dose of PR today.

msn search

Did a bit of aggregating of comments on msn, together with my own observations. Read them in their rawest form here.

Results seem weighted heavily on keywords found in the URL, and many, many listings from the same domains for pages and pages. There will have to be a way to surpress that. Not sure if anyone else noticed this, but keyword in domains seems to be weighted reasonably well…

Back-link text is as relevant as on Google – search on miserable failure shows Goerge W’s biography: Anchor text is influencing the other results, but something else is influencing the first result (which is currently 404).

Observation: New MSN search seems to favor sites using tags, even if the content is not relevant to the search term. In the serps i reviewed they are giving alot of weight. To search terms in the URL. Might want to tone that down, creating alot of irrelvent spam in the serps I look at.

Tightly themed sites seem to be doing very well with relatively few inbounds.

Right now the algorithm seems to be really sensitive to page title and hyphenated urls (I assume this is due to link text), and seems to have its PageRank-type thresholds set similarly to Yahoo, which is to say that on the moderately competitive stuff it’s easy to spam. I’m seeing top 10 rankings for stuff that doesn’t show in the top 100 in Google, but also show in the top 10 for Yahoo. Some of my pages do much better than they deserve.

Algorithm seems to deprecate keywords in to weight in the page. Seen the top 15 listings being pages of the same site. Base URL seems to top out on results, though the page(s) about the term is deeper in the site (something ATW used to do years ago).

One change that would dramatically improve the results would be to demote the importance of all subdomains. Way too many spammy keyword1.keyword2.com sites coming up in the first few spots.

think current algo is:
have keyphrase once in
title
meta description
meta keywords
url
tag at the beginning of html body

I actually like some of these results. However, one of the major problems I see (besides the annoying redundant listings) is with plurals. If I search for “green widget”, a site about “green widgets” would be very appropriate, and reciprocally. So far, only Google seems to have figured that out.

they changed their algo,
know seems to give more weight to title than before,….. my hompage fall 3 pages, but have another page on first page…
The homepage is more related, its on first page in yahoo and google
As well seems to give less importance to home pages or don´t give them more importance than inner pages.

search.msn – some notes

Did a bit of research into the new msn search product. Here are some comments, some my own and some aggregated that may be of use / interest to those of you who want to be on the front foot.

  • Results seem weighted heavily on keywords found in the URL, and many, many listings from the same domains for pages and pages. There will have to be a way to surpress that. 
  • Not sure if anyone else noticed this, but keyword in domains seems to be weighted reasonably well… 
  • Back-link text is as relevant as on Google – search on miserable failure shows Goerge W’s biography: Anchor text is influencing the other results, but something else is influencing the first result (which is currently 404). 
  • Observation: New MSN search seems to favor sites using <h> tags, even if the content is not relevant to the search term. 
  • In the serps i reviewed they are giving alot of weight. To search terms in the URL. Might want to tone that down, creating alot of irrelvent spam in the serps I look at. 
  • On one of our new site’s most targeted terms every site returned by this MSN is very relevant except the first one which is total spam with nothing more than the key phrase in h1 and font 6’s with bold. The key phrase isn’t in anything else except the title. No meta desc or keywords. 
  • Tightly themed sites seem to be doing very well with relatively few inbounds. 
  • Right now the algorithm seems to be really sensitive to page title and hyphenated urls (I assume this is due to link text), and seems to have its PageRank-type thresholds set similarly to Yahoo, which is to say that on the moderately competitive stuff it’s easy to spam.  
  • Algorithm seems to deprecate keywords in <meta> to weight in the page. 
  • Seen the top 15 listings being pages of the same site. 
  • Seen sites that were prevelant in other engines – not even getting a shout! 
  • Base URL seems to top out on results, though the page(s) about the term is deeper in the site.
  • Keyword-stuffed domain names rank very highly  – less value for brands on this
    engine
  • One change that would dramatically improve the results would be to demote the importance of all
    subdomains. 
  • subdomains do not get less weight than domains, so
    keyword.domain-name.com.au will help target traffic
  • think current algo is: 
    have keyphrase once in 
        title 
        meta description 
        meta keywords 
        url 
        <h> tag at the beginning of html body 
  • Current engine does not recognise singular and plural as same word (like Google does)

More recently:

  • they changed their algorithm, now seems to give more weight to title than before. 
  • Engine doesn’t seem to give homepages more value than sub-pages in within site.

A new trick I’ve learned is to use Mozilla Firefox to preview your site. Especially if you use css, this is great. You can turn off the stylesheet and view the site as it would appear to a Bot.

Back-links

Back-links are so hugely important and underestimated in SEO. Your average jumped-on-the-bandwagon SEO specialists will come in your door wearing armani suits and trendy ties, and will run a few different tools they downloaded to check your keyword density. They’ll then tell you that you need more copy on your pages, and that you should put some better text in page titles and links. Then they’ll hit you with a hundred thousand dollar invoice and waltz off to rape some other poor sucker.

Most will not mention backlinks at all. Because they are too lazy to read the forums, or more sagely and signifact sources of knowledge like my blog ;)

Backlinks are votes for your pages significance, aggregated to give you your page rank. Backlinks from your own site still count towards page rank (this is what people call local page rank). When you do your SEO on your site make sure all your pages link back to the pages that call them. And make sure that your pages use the exact same URL to link to them as the user / bot would have come in to them on. And don’t be a schmuck and use the referrer from the server variables unless you want to make sure that the bots (which don’t actually click on links, they tabulate and open) don’t see the back-links

That’s my $0.02 on SEO for today.

Search Engine Optimisation

Just dug this up, thought someone else on the planet may find it of interest.


 

Prepared by: Mark Cohen 

Overview

Basic Idea

The basic principle of search engine optimisation is to look at a website and to pretend you have to assess it?s value by reading text only and using only your mouse to navigate. There should be a way to reach all of your content from the homepage without using the keyboard. More specifically, there should be a way to reach any content by following a link that is contained within an anchor tag and that does not use JavaScript.

Bots, Crawlers, Spiders

All search engines use web-crawlers a.k.a. bots a.k.a. spiders to crawl the web and harvest content. The single most significant bot in the world must be the GoogleBot. Learn a lot more about the GoogleBot here. 

Page Rank

PageRank is a principle based on the premise that the web is an anarchistic democracy. If a site goes up that is important it will be linked to by other webmasters. A page that is more important than others is linked to more often. Thus the homepage generally has the highest PageRank on a site.When a page links to another page it is considered a vote towards the target page?s rank. More specifically, it is a weighted vote based on the source page?s PageRank. This is a nonlinear weighting with a page of 8 counting far more than a link from a page with a PageRank of 1. PageRank is significant in that it is a factor in the ordering of search results. All other things being equal, higher ranked pages come up before lower ranked pages in search results.  
PageRank is explained really well in detail here

Page Structure

Page Title

Page titles are very significant. They should always be set to be as relevant to the page content as possible. It seems they are more important than the keyword or description tags. Avoid putting other sites or company?s names in them and avoid wasting them on your own URL or site name (beyond the homepage). If page titles are short it will help to tack on a short sentence containing relevant keywords.

  • Keep the length of the page titles under 80 characters.
  • Page titles are shown in search results so keep them user friendly URLs
  • Do not have a dynamic URL as your homepage (ie: no querystring parameters etc, preferably an html page as your homepage. Also avoid having a homepage that does a response.redirect to another page as you will dilute / lose your page rank gained from people who link to you.
  • Do a server.transfer rather than a response.redirect.
  • Keywords found in URLs will make pages rank very highly. Seed URLs with keywords using folder structures that are relevant keywords. Folders do not have to exist, an http module can be written to handle beginRequest events and it can strip them out of the URL before a 404 happens.

Meta Tags

Keyword Tag

  • Keep less than 1024 characters
  • Ensure that keywords appear in the body of the page too. More often is better as the phrase will rank higher.
  • Seed pages with keywords ? use in alt text, component titles and names, wherever you can.
  • Remember to put plurals and singular and bots are generally not intelligent enough to recognise they are the same word
  • Keywords should be specific phrases, not just single words (single words are not specific enough and will be common to all your competitors)

Description Tag

  • Keep less than 250 characters
  • Start off with friendly text, then pad out with keywords if you don?t have enough chars
  • Make sure you accurately describe the content of your page while trying to entice visitors to click on your listing.
  • Include 3-4 of your most important keyword phrases, especially those used in your title tag and page copy.
  • Try to have your most important keywords appear at the beginning of your description. This often brings better results, and will help avoid having any search engine cut off your keywords if they limit the length of your description.

Robots Tag

The Robots Meta Tag is referred to by WebCrawlers and is useful in controlling the way the bots handle the content of the page. The following three descriptions are provided by Google and are not obeyed by all bots.

<META NAME=”robots” CONTENT=”noindex”>
- Googlebot will retrieve the document, but it will not index the document.
<META NAME=”robots” CONTENT=”nofollow”>
- Googlebot will not follow any links that are present on the page to other documents.
<META NAME=”robots” CONTENT=”noarchive”>
- Google maintains a cache of all the documents that we fetch, to permit our users to access the content that was indexed (in the event that the original host of the content is inaccessible, or the content has changed). If you do not wish us to archive a document from your site, you can place this tag in the head of the document, and Google will not provide an archive copy for the document.

Google will group all pages with the same meta tags and especially with similar content as ?similar pages? so you will lose your bandwidth in the search results if your tags are static and the same throughout your site.

Page Content

Page content should have plenty of valid text. Remember most people who link to you will link to your homepage. Your PageRank will therefore usually be highest on your homepage, especially for dynamic sites. This is a good reason to have at least summary text of you main content on your homepage.

Images and non-text content

WebCrawlers are not currently sophisticated enough to OCR images. Even if they were, the value added by this process would not be worth the effort considering the processor power and time that would be required What does influence indexing is the name given to an image as well as the alt text used in the IMG tag. Alt text is also used for accessibility browsers so it is a good idea to put in valid and descriptive alt text as a rule.  Flash and other multimedia content is also not indexed and a requirement for accessibility compliance is that a text-based script of all multimedia content be available for disabled site users. A text script will also be indexed so as a
rule, a transcript or script should always be available.  Take every opportunity to put in alt text or titles on all elements possible. 
Note: many document formats are crawled (eg: word, rtf, txt, pdf) but making html or text alternatives available is (as far as I can see) not currently considered spamming.

Page Body

All flash and image content should have alt text for disabled users as well as for bots and crawlers. Accessibility guidelines actually require a (text)
script of flash or multimedia content to be available. This is worth providing both for the users who require accessibility support (becoming a legal requirement) and because bots will spider them.

If you use frames, make sure you also have a ?noframes? section with as much text content as possible. Make the text legitimate and user friendly as anyone with a browser that does not support frames (eg: accessibility browsers) will see your noframes section. 

<frameset>

</frameset>
<noframes> 
        blah blah blah
</noframes>

There was a practice in the past, which was to put all description and keywords into a few hidden html elements. If you see this in your legacy pages you are probably best off removing it as it is not effective and puts you at risk of being banned as a spammer.
Examples of this would be:

  • comments padded out with keywords:
    <!– blah blah blah –>
  • hidden fields stuffed with keywords:
    <input type = hidden value = ?blah blah blah?>
  • useless javascripts full of comments (client-side scripting)
    Make use of an ?archive? page and a ?site map? to ensure that every page is always accessible from the homepage

?Do not?s: 

  1. Do not directly copy the text from your meta and keyword tags etc into the body as this is watched for and may get you penalized. 
  2. Do not do anything that ?feels dodgy? to try trick the search engines into listing your site higher. If what you are doing is regarded as underhanded by 
    the search engines, they will likely view it as spam and penalize or ban you. This can be VERY difficult to undo? 
  3. Do not list keywords anywhere except in your keywords meta tag. By “list” I mean something like – keyword 1, keyword 2, keyword 3, keyword 4, etc. There 
    are very few legitimate reasons that a list of keywords would actually appear on a web page or within the page’s HTML code and the search engines know this. 
    While you may have a legitimate reason for doing this we would recommend avoiding it so that you do not risk being penalized by the search engines. 
  4. Do not use the same colour text on your page as the page’s background colour. This has often been used to keyword-stuff a web page. Search engines can detect this and view it as spam. People do this using css and different styles with the same colours. Stay away, it?s not worth it. 
  5. Do not use multiple instances of the same tag. For example, using more than one title tag. Search engines can detect this and view it as spam. 
  6. Do not submit identical pages. For example, do not duplicate a page of your site, give the copies different file names, and submit each one. Search engines can detect this and view it as spam (and if you don?t get filtered they will group the pages as similar anyway). 
  7. Do not submit the same page to any engine more than once within 24hrs. 
  8. Do not use any keywords in your keywords meta tag that do not directly relate to the content of your page or pages reached through it. 

The significance of links

The keywords contained within a link have a huge influence on indexing and search results. The meaning or context of a link is largely based on the text within the hyperlink. A good example of this is the terror tactic called Google-bombing. Search Google for Miserable Failure and see ?Biography of President George W. Bush? come up in first place. This has relevance within your site as well as externally but you have great control of this internally and can boost certain pages by how they are referenced. 
Your database IDs are meaningless in querystrings, but links containing ?houses for sale in eastern suburbs Sydney? can do a lot for your pages. Pages referenced as
/view.asp?id=999
are meaningless to Google but
/view/houses/Sydney/eastern_suburbs/Maroubra/999?title=?3+bed+semi+with+garden+and+one+parking?
will be indexed and influence rank in the results.

Hyperlinks are the gold coins of search engine optimisation. 

  • The more links other people give you the richer you are. 
  • The less links you give away the richer you are. 
  • Links you give to yourself keep you as rich, they just put the wealth in different places. 

Links from others to you

As above, links to you count as votes for your significance and raise your rank. 

Links from you to others.

If your page has a lot of links to external sites, the PageRank calculation deducts rank as it assumes the things you are pointing to have significance as to why people have linked to you (you are a stop along the way rather than a destination). So more than a certain number of links out of your site will eventually start leaking your PageRank. To circumvent this, wrap your external links in a JavaScript. Bots aren?t smart enough to read script languages yet – they will (currently) ignore the links. Note: There is already discussion in some forums indicating that Google is starting to follow any links in your page
content.

Links between your pages

To make links effective a few things need to be considered as above. Also, you should make sure that links contain descriptive text to boost the target pages when indexed a subtle but illustrative example is:

Weak: Click here to browse bikes for sale in Sydney 
Better: Click here to browse bikes for sale in Sydney

Conclusion

A lot of work can be done on most production systems to boost their SEO. The process itself is iterative and experimental, and the gain is going to be a long-term (also incremental) process. Webmasters / programmers should commit to periodic review of their sites, and when deemed necessary repeat the process, and tweak what is done each time. Webcrawler technology is experimental, and the bots change every day. A well optimised site that is left untouched will not stay optimal as the metrics will change over time.