The effects of filenames in SEO
When search engines spider or crawl your website, they normally start with your main page (index.html, default.html, etc.) and follow your site navigation to find the other pages. They can also arrive from links to your website from other websites, and from sitemap publishing. No matter how they get to your website, the filenames are important for a few different reasons that may not be immediately understood or even considered.
A site's URL structure should be as simple as possible. Consider organizing your content so that URLs are constructed logically and in a manner that is most intelligible to humans (when possible, readable words rather than long ID numbers). For example, if you're searching for information about domain names, a URL like http://guardianhost.com/hosting-design/index.php?page=domains will help you decide whether to click that link. A brief look at our example will show you that the page likely deals with domains and hosting and design, which is true. A URL like http://www.example.com/index.php?id_sezione=360&sid=3a5ebc944f41daa6f849f730f1, is much less appealing to users.
Perhaps you might also consider using punctuation in your URLs. The URL http://www.example.com/red-apple.html is much more useful to search engines than http://www.example.com/redapple.html. Why? Because a search engine doesn't know that red and apple are an adjective and noun, but treats it as one word. To over come this, we recommend that you use hyphens (-) instead of underscores (_) in your URLs to indicate seperate words that have relevant meaning to your page's content.
Why you should limit the length and complexity of your URLs and filenames
Overly complex URLs - those containing multiple parameters, session ids, and non relevant words - can cause problems for spiders and crawlers by creating unnecessarily high numbers of URLs that point to identical or similar content on your site. Page duplication is a "no no" and will lead to negative rankings. Another problem with these overly complex URLS is that the spiders and crawlers may consume much more bandwidth than necessary, or may be unable to completely index all the content on your site. If they encounter a problem with bandwidth usage or complexitiy, they tend to give up crawling the remainder of your site.
Common causes of this problem
Overly complex and unnecessarily high numbers of URLs can be caused by a number of issues. These include:
- Additive filtering of a set of items - Many sites provide different views of the same set of items or search results, often allowing the user to filter this set using defined criteria (for example: show me hotels on the beach). When filters can be combined in a additive manner (for example: hotels on the beach and with a fitness center), the number of URLs (views of data) in the sites explodes. Creating a large number of slightly different lists of hotels is redundant, because most spiders needs to see only a small number of lists from which it can reach the page for each hotel. For example:
- Hotel properties at "value rates": http://www.example.com/hotel-search-results.jsp?Ne=292&N=461
- Hotel properties at "value rates" on the beach: http://www.example.com/hotel-search-results.jsp?Ne=292&N=461+4294967240
- Hotel properties at "value rates" on the beach and with a fitness center:
- Dynamic generation of documents. This can result in small changes because of counters, timestamps, or advertisements.
- Problematic parameters in the URL. Session IDs, for example, can create massive amounts of duplication and a greater number of URLs.
- Sorting parameters. Some large shopping sites provide multiple ways to sort the same items, resulting in a much greater number of URLs. For example: http://www.example.com/results?search_type=search_videos&search_query=tpb&search_sort=relevance&search_category=25
- Irrelevant parameters in the URL, such as referral parameters. For example:
- Calendar issues. A dynamically generated calendar might generate links to future and previous dates with no restrictions on start of end dates. For example:
- Broken relative links. Broken relative links can often cause infinite spaces. Frequently, this problem arises because of repeated path elements. For example:
Steps to resolve these problems
To avoid potential problems with URL structure, we recommend the following:
- Consider using a robots.txt file to block spider's access to problematic URLs. Typically, you should consider blocking dynamic URLs, such as URLs that generate search results, or URLs that can create infinite spaces, such as calendars. Using regular expressions in your robots.txt file can allow you to easily block large numbers of URLs.
- Wherever possible, avoid the use of session IDs in URLs. Consider using cookies instead.
- Whenever possible, shorten URLs by trimming unnecessary parameters.
- If your site has an infinite calendar, add a nofollow attribute to links to dynamically created future calendar pages.
- Check your site for broken relative links.
These recommendations have been provided from all of the big three search engine providers and it just makes sense to us to listen and adapt our filename structures to best help their spiders crawl our websites.
We hope you agree!