drupal search engine friendliness (sef aka seo)
It's time for me to take another serious look at se friendliness. My newest site has been grey barred on the main page. An older site has gone from greenbar 2 to whitebar. I may have been lax in paying attention to the basics.
PR is unclear
I'm not using the term PR because PR is not what it used to be. You'll also notice I use the term seo (search engine optimisation) hesitantly, because that is also not the intention. I'm just fine if my site is end-user friendly and at the same time search engine friendly. That's really just good enough for me.
Basic measurements
Just for practicality, my own greybar, whitebar, and greenbar terminology is based on a small firefox addon called SearchStatus, which I have been using for quite some years. It's simple and direct. Sometimes I also use one of the other heavier ones, but that's only for follow-ups. Anyway, so-called PR values have lost a lot of their meaning, a grey bar does not anymore signify a ban and a whitebar might actually still be what it was many years ago. But no normal citizen really knows these things for sure.
Muddling through
I think working with se's is a bit like economics. Most of the theories don't count becasue their preconditons are unrealistic. The premise of "all other things being equal" is never true. And the notion of all actors basing decisions on the same information base is never true either. So, its all muddling through.
Some things to watch out for:
without/with www
If your site works with both, there is a risk that search engines will see it as two sites, and that they are duplicated. That is not good. The drupal .htaccess file contains the instructions on how to set this right. You must take a stand for one of the two versions, with or without. Upon starting yor site online, this is really the first thing to check, decide and do. Being liberal in many and traditional in some ways, I choose with www. My choice is also related to dealing with multiple languages in subdomains, using de, es, fr, nl.
Better to do this right from the very first start, so that se's will not begin searching your site with confusion. But if both are active already and want to reduce it to one, you might also want to check which one of the two is doing better for se's. This might influence your decision. If you are thinking longer term it should probably not.
Note also that if an se finds duplicate content - all other factors being the same (which it never is)- it may prefer the shorter address. This should probably not happen for unique content.
Teasers
The teasers/intro's/summaries on the main page are nice, but have a potential to look like duplicate content to se's. This is especially the case when the original article is short while the teaser is long and when at the same time the page that contains the teaser does not have a lot of other content. I believe that se's make a calculation on the quantity of duplicate content, which includes some allowances (menus, headers, footers, etc.) but finally they draw a line somewhere.
For example, during an online development phase, there may be a few low content pages, with teasers on the main page. That may be risky. Exactly this point may have been the reason for se's analysis to go from initial whitebar to greybar for one of my online development sites. In that case it is better to write a longer piece of content with a short teaser on the main page, and simply let that be at first.
The issue is the teaser/summary/intro itself, whenever the above conditions apply. It is not really the Read more link or the summary/intro title link, although these contribute by leading the way to the potential duplicate content.
Separate pages with content and comments
The pages that contain duplicate content plus a comment form, should be a sef no-no. Here the Add new comment link leads directly to the duplicate page. But an external link will lead an se there as well, and it will start analysing.
Utilities
Generally, all admin, creating, editing nodes and all utilities like, for example, Subcribe to: This post are not useful as search results. So they should not be indexed by search engines. WIP
On the other hand, in some cases only authenticated users will see some of those links, like Add a new comment, and the se's are not part of that group, anyway. In that case there's no need to be overly concerned about it.
Some solutions to those things:
.htaccess
The dupal .htaccess file contains the instructions and settings to set your site to either with or without the www. The settings file does not make this choice for you, rightly so. You need to act on it.
I think that during one of the updates my settings on this were overwritten with a generic htaccess file. A while later I lost the 2 greenbar points, now that site is whitebar. That was just... unwise.
robots.txt
The robots.txt file gives a set of instructions for se's not to go to specific files and directories. It cannot control behavior on individual links. Impolite robots or se's may ignore the robots.txt file.
Checking the robots.txt file, it seems that the robots text file takes care of the Add new comment link with the line: Disallow: /comment/reply/
To take care of the Subcribe to: This post link, I think I'll add: Disallow: /subscribe/
Note that all changes here are likely to affect search engine behaviour. This means taking utmost care is essential.
rel="nofollow"
This is a link attribute developed by google that web savvy people can use to indicate that a certain link should not be followed. In other words, the link is ignored and does not count for the search engine. So it is ok to place the link to duplicate content if it has that attribute. - But - the duplicate content still exists and if it is linked to from an external web site an SE may very well still analyse it as a duplicate to the original content.
You want to use "nofollow" when you don't trust a link, or when you are, for example, linking to a related site that is on the same IP. SE's may punish you for such links.
The Input formats > [format name] > Configure section has a "Spam deterrent" which adds the nofollow attribute. That appears to be only for content - and then appends "nofollow" to *all* links from content. This may be effective when there is a lot of content spam links (but you would probably want to get totally rid of those links in the first place.)
Then, it appears that the "Spam deterrent" doesn't always work! I've had instances where it is simply not being added when the Input filter is configured with Url filter for "clickable links". In this specific case the code is allowed in the Input filter. If is used, the "nofollow" is added, but when the link is without the html code, the "nofollow" is not added! - I think this may have cost me a PR point or two.
This all needs to be yesfollowed up WIP.
Well, found one Nofollow module, version is 5.x-1.x-dev (only), no go.
Some hands on measures:
Sef for "Login or register to post comments" link
These two links are purely administrative, they server no user content purpose. They do not need to or should not appear in se results. The robots.text identifies them as "nogo". Now we need a "nofollow.
Sef for "User's blog" link
Hacked core in blog.module: with code as suggested here (same place): http://drupal.org/node/475378. Replaced line 126 with following code; the only difference is the ,'rel'=>"nofollow" insertion toward the end of the line: 'attributes' => array('title' => t("Read !username's latest blog entries.", array('!username' => $node->name)),'rel'=>"nofollow") Note that I fully agree with drupal principle that code hacks should not be neceassary. That's one reason why I chose drupal. I guess there are exceptions to...
Sef for "Add new comment" link
The "Add new comment" sef situation does not exist when the comments and the comment form are on the same page as the original content. It is also beside the point, I believe, if the "Add new comment" link appears only for registered users.
Sef for standard Menu links
Looked at it. Menu links don't seem to allow html. No idea yet.
Sef for non-standard Menu links
Made my own horizontal menu links in the footer, some of them with "nofollow".
WIP: http://drupal.org/project/elf check the external links filter module