Luca Vassalli's Website - Home page


Home
Index

1.Introduction

2.Ethical SEO

3.Spider's view

4.SEO spam

5.General topics

 

3.3 Structure of the website

When you care about SEO, you should also consider the website as a whole. Every website has a particular structure which is reflected in all its parts and it is important to choose a design that is search engine friendly. The choices you make are going to be with you for a long time and errors will be very time-consuming to repair at a later stage.
For example you have already seen that it is better to keep CSS and JavaScript file in external files. It is also better to restrict the use of PDF and DOC files. The crawlers of the major search engines can handle these formats; but if a document is long it may be translated in several pages of good content instead of a single file. Since every page has its own PageRank, from a SEO point of view, it is much better this solution than using the file.
Another important question is if it is fine to use frames.

3.3.1 Frames, tables and CSS

Generally frames are not as search engine friendly as tables. That is not to say that it is impossible to build a site that uses frames and does well in the ranks, it is just harder to do than with tables.
The problem is that there are more frames for the same URL, which is equivalent to say that there are more pages for just a single URL. The spider handles much better when there is just a page for every URL, that is the why it does not appreciate frames.
If you are determined to utilize frames, use the tag <noframes> which is useful for both the users that are using a browser which does not support frames, and for crawlers. Inside the tag you will write all the content of the page your frameset points to and links to all of your other content pages.
However there is still a problem due to the design itself of a website developed by frames. Usually the navigation menu and the content are in different frames, it means that if you use the tag <noframes> to describe the content of the page, it will not contain the navigation menu. When eventually a surfer will arrive to that page, he will not be able to see the navigation frame since only the frame correspondent to the searched content will be loaded. But with no means to navigate inside the website he will soon look for another website.
A partial solution is using the following JavaScript code:
<script type="text/javascript" language="javascript">
<!--
if (top == self) location.replace("FILENAME OF YOUR FRAMESET PAGE");
-->

</script>
You have to write this script in an external file and include it in all the html files. Every time a page is loaded it will check whether the frameset is loaded and if not, it will load it. This is quite good; the main problem is that it points to your entry frameset page that can be for instance the homepage of the website. It is not great that a surfer coming to your website will find himself at the homepage instead at the researched page.
Also this problem can be fixed. You have to use this script in all the <head> of your HTML pages:
<script type="text/javascript" language="JavaScript">
<!--
if (top == self || (parent.frames[1].name != myframeset))
top.location.href = 'frameset.html?' + location.href;
//-->

</script>
It works like in the precedent example but it passes the location of the current page (location.href) to the parent frame. The page with the parent frame is "frameset.html", in its code you will add:
<script type="text/javascript" language="JavaScript">
<!--
document.write('<frame src="' + (location.search ? unescape(location.search.substring(1)): 'default.htm') + '">');
document.write('<frame src="rightframe.htm" NAME="myframeset">');
document.write('<\/frameset>');
//-->

</script>
The important line is the third one where is read if it is already set the value of the location of the sub frame, if it is not the default page is loaded.
Unfortunately this solution has some compatibilities problems with the browser Opera and, for the browsers where JavaScript is disabled, no relocation can be made.

Search engines generally don't have any trouble reading a table-based page. Anyway it is still true that for a spider is more important what is at the top of the page than in the middle. If your website has the very common layout according to which the navigation menu is on the left side of the page and the content is on the right, the HTML code will present all the instructions needed to build the menu at the top of the page. It would be much better to have all those instructions at the bottom.
This is still possible without changing the layout of the page using the 'Rowspan' attribute of the <td>. You have to divide the page in a table of four main areas with two rows and two columns:
  • You create a very small cell in the first are on the top-left and leave it empty;
  • Using rowspan=2 you merge the two area on the top and on the bottom of the right side of the page, then you put the main content text in this big cell;
  • In the area on the bottom-left you put the navigation menu;
Using a background colour the cell in the top-left area will be invisible. The code of this table is the following:
<table border="0" cellspacing="0" cellpadding="0">
<tr>
<td height="1"> </td>
<td rowspan="2" valign="top">
Main Content
</td>
</tr>
<tr>
<td>
Navigation Menu
</td>
</tr>
</table>
However you still have to give up having a header at the top-right of the page, like most of the website.
Finally the best solution to create your layout is to use CSS. They are flexible, efficient, recommend by the W3C, easy to modify and search engine friendly. A large part of the code needed to display in a nice way the information is separated by the content so that the spiders always read the most important and well-optimized part of the page first.

3.3.2 Sitemap

A sitemap is just a map of your site: on a single page you show the structure of your site with all its sections and the links to get there.
Sitemaps are important both for users, who will use it to navigate directly to the section they are looking for, and for spiders, which will know where to go, and if there are new added sections to the website. It will take much less the spider to find all the new pages of the website, in particular if it is big. Sitemaps can also solve temporary problems in case of broken internal links, avoiding orphaned pages that cannot be reached in other way.
Unfortunately, although standard HTML sitemaps are fine for Yahoo and MSN, they are not enough for Google. To have your sitemap analysed also by Google, you need to write a first HTML file with the sitemap for human readers, and a second XML one for Googlebot. Obviously since this is a specific Google's indication, you do not risk to be penalised for duplicate content.
To create a XML sitemap there are two ways: you download a tool from the Google website, install, configure and run it, which will then produce a Python script which is the actual XML sitemap generator; the alternative is using an online free sitemap generator, you can find a list of products that support Google sitemap at http://code.google.com/sm_thirdparty.html. The first choice is more difficult but you have more control over the output.
After you have produced your sitemap you can submit it to Google. Currently Yahoo! and MSN do not support XML sitemaps: Yahoo allows submitting a text file with a list of URLs; MSN indexes the sitemap which is available online. Anyway it is general belief that they will catch with Google supporting XML sitemaps since they are such a powerful SEO tool.