Monday, June 25, 2012

URLs and SEO: Various Strategies for URL File Names



Quite a long time ago we discussed best practices for URL structure – that old post needs both an update and more details to discuss. So I decided to start a new post summarizing and discussingvarious strategies for URL file naming.

1. Why do we care?

URL is undoubtedly one of the most important aspects that affect both SEO and usability.
It affects:
  • Rankings (placing keywords in the file path is one of the most effective ways to make the keywords more prominent);
  • Click-through: a “clear”, “readable” URL can be another reinforcement signal for the user to click it;
  • Usability: a good “obvious” URL helps the user understand what the page is about even before entering the page.

2. Keywords in the file name

There is no doubt that keywords in the URL matter (so far they even matter a lot). However this doesn’t mean that you need to stuff your URLs with only keywords. The best practices would be:
  • Keywords in the file path occur naturally;
  • Keywords in the file path help make the URL easier comprehensible and memorable;
  • URLs do not consist of only keywords: here’s a good point expressed by Onreact in his post ontop 10 fatal URL design mistakes:

    Recently bloggers tend to shorten their URLs in as much as their posting becomes totally boring. I won’t click /2008/06/27/google if I see only the URLs (like, say, in an email) but I will click google-files-for-bankrupcy

3. Word separators

While Google has become much smarter when it comes to identifying separate words in the file path, a dash is still considered the best choice:
Word separatorDisadvantagesExample
SpaceURL encoded as %20 (makes the URL not easy to read). This may also prevent from sharing the URL in some social bookmarking services./word1%20word2
&URL encoded as %26 (makes the URL not easy to read). This may also prevent from sharing the URL in some social bookmarking services./word1%26word2
Comma (,) or period (.)Abused by spammers/word1.word2 OR /word1,word2
UnderscoreTraditionally it isn’t seen by search engines as a word separator (this is slowly changing now)/word1_word2
HyphenNONE/word1-word2

4. URL length

While it is still considered the best practice to stick to shorter URLs, the factor is becoming less and less important:
  • Usability: Very few people manually type a URL in the address bar. They either use bookmarks or search history (e.g. FireFox / Chrome smart address bar that shows URLs while you start typing the title of the page) or just use Google to find the page again;
  • SEO: Google can handle very long URLs (though it is still rumored that it prefers short URLs, I personally don’t see any big difference);
  • Click-through: Google now breaks long URL in SERPs smartly: it only shows the parts which use the search term or even substitutes the URL with breadcrumbs.

5. Case sensitivity

We have discussed this before: URLs are case sensitive. That being said, if you have two versions of the URL live and linked to (which is only possible if your site is on Windows server), this means that both lower- and higher-case URL versions return 200 OK status when queried. This will cause some duplicate content issues but Google will most likely be able to figure that out (by choosing one of them). What’s more important is that you are wasting plenty of link juice spreading it between the two versions.
It is recommended to always choose lowercase pattern (just because there will always be people who will link to a more traditional, plain-text version) and to use 301 status code to redirect all other (capitalized, upper-case, etc) versions to the lowercase one.

6. URL Extensions

We’ve discussed URL extensions previously and come to the conclusion that it doesn’t matter too much if an URL have one or not. There are some pros and cons (listed below) but these are rather minor arguments:
Argument for using an extension: intuitive browsing: seeing an .htlm people may understand that is a page with content, seeing / people may assume that’s a folder. Although there is no direct impact on rankings, an URL extension makes it clear both to a user and a search bot whether this is a page or subdirectory.
Arguments against using an extension:
  • Reduce the overall URL length, which is just better overall. Not that the 4/5 characters that are in the .html or .php really add a lot, but sometimes small things can make a difference.
  • No problems with any technology changes (moving to anew CMS, etc): no need to redirect the old URLs to the new ones.

URL Capitalization and SEO: How Much Does It Matter?

http://www.searchenginejournal.com/url-capitalization-and-seo/12667/


Many people are really surprised to learn that URLs are actually case sensitive (unlike the actual domain name). Simply put, while it doesn’t really matter how you spell your domain name (domainname.com or DomainName.com or DOMAINNAME.com), it DOES make a huge difference how you spell your URLs (domainname.com/page1 or domainname.com/Page1).
Let’s say, you have started promoting capitalized version (domainname.com/Page1) because you think it looks prettier and is better to remember. Let me explain what may happen (I guess a table is better to explain this because you can always use it as a cheatsheet):
 domainname.com/page1domainname.com/Page1
Your site is hosted on a Windows-based server
Header response when requested either of the two200200
Google’s reactionBoth URLs will be indexed and ranked. Obviously, this will cause some duplicate content issues but Google will most obviously be able to figure that out (by choosing one of them). What’s more important is that you are wasting plenty of link juice spreading it between the two versions.
Your site is hosted on a Linux / Unix-based server
Header response when requested either of the two200404
Google’s reactionGoogle will try to index both but will drop the 404-one. Again, you are wasting your link juice in this situation. What’s also important, you confuse your visitors by sending them to the non-existent page.
So what’s the best way to handle the problem?
  • While most SEOs will recommend sticking to only one version, I recommend to always choose lowercase pattern (just because there will always be people who will link to a more traditional, plain-text version);
  • If for some reason you start seeing URLs with capital letters get into index (someone linked to it or you changed your content management system and it capitalized some URLs), use 301-redirect to let people, search crawlers and links go to non-capitalized URLs to avoid any problems.

SEO Best Practices for URL Structure




I’ve decided to make up a short all-in-one guide to summarize what we know about SEO for URLs. And if you have something to add, please do. So he we go:
[Google] algorithms typically will just weight those words less and just not give you as much credit.”

Here is one more evidence in favor of short URL: recent research shows that short URLs within Google SERPs get clicked twice as often as long ones. So by sticking to short URLs you get both better rankings and better clickthrough.
Short URLs will also help in direct type ins of URLs (if anyone still uses that instead of Google).
  • Dashes are better than underscores. Although Google has no individual preferences (meaning you won’t be penalizes for either of the versions), dashes are more preferable as Google “sees” each hyphened word as an individual one:
So if you have a url like word1_word2, Google will only return that page if the user searches for word1_word2 (which almost never happens). If you have a url like word1-word2, that page can be returned for the searches word1, word2, and even “word1 word2?.
  • Unlike a domain nameURL is case sensitive – meaning that if by any reason (your choice or CMS) you stick to a an upper-case version, remember that this can cause a few issues: people are most likely to link to the standard lower case one and you might both lose link juice and suffer from duplicate content issues.
  • Moving to static URL structure: my (and actually not only my) favorite tactic is to use 301 redirect only for most powerful (in terms of linking and traffic) pages and leave all others to be handled via 404.

Saturday, June 23, 2012

Installing Nginx With PHP5 And MySQL Support On CentOS 5.6

http://www.howtoforge.com/installing-nginx-with-php5-and-mysql-support-on-centos-5.6


Maven Repositories on GitHub - By http://chkal.blogspot.com




http://chkal.blogspot.com/2010/09/maven-repositories-on-github.html

https://github.com/chkal/jsf-maven-util/tree/gh-pages/repository

http://chkal.github.com/jsf-maven-util/repository/

https://help.github.com/articles/user-organization-and-project-pages


The basic idea of hosting Maven repositories on GitHub is to use GitHub Pages. This GitHub feature offers a simple but powerful way for creating and hosting web sites on their infrastructure. Fortunately this is all we need to create Maven repositories.



1
2
3
4
5
$ pwd
/home/ck/workspace/jsf-maven-util
$ cd ..
$ git clone git@github.com:chkal/jsf-maven-util.git jsf-maven-util-pages
$ cd jsf-maven-util-pages



The first step is to create a separate clone of your GitHub repository in a directory next to your primary local repository:


The GitHub Pages web site must be created as a branch named gh-pages in your repository. So lets create this branch and empty it. Refer to the GitHub Pages Manual if you are interested in the exact meaning of these commands.



1
2
3
$ git symbolic-ref HEAD refs/heads/gh-pages
$ rm .git/index
$ git clean -fdx


We will place the Maven repository in a subdirectory of this new branch:



1
$ mkdir repository




We also want to have a pretty directory listing. Unfortunately GitHub Pages doesn't have native support for this. So we will create our own directory listing with a simple bash script.
Create a file named update-directory-index.sh in the root of the new branch (next to therepository directory). This script will walk recursively into the repository directory and createindex.html files in each subdirectory. Please be careful when using this script as it overwrites all exiting index.html files it finds.




1
2
3
4
5
6
7
8
9
#!/bin/bash
 
for DIR in $(find ./repository -type d); do
  (
    echo -e "\n\n

Directory listing

\n
\n
"
    ls -1pa "${DIR}" | grep -v "^\./$" | grep -v "^index\.html$" | awk '{ printf "%s\n",$1,$1 }'
    echo -e "\n\n"
  ) > "${DIR}/index.html"
done




Congratulations! Your repository is ready. Now you will have to modify thedistributionManagement section of your pom.xml to let Maven deploy your artifacts to the new repository. Go back to your primary repository clone and edit your pom.xml:



1
2
3
4
5
6
<distributionManagement>
  <repository>
    <id>gh-pagesid>
    <url>file:///${basedir}/../jsf-maven-util-pages/repository/url>
  repository>
distributionManagement>


Now you are ready to deploy your first artifact to the repository:



1
$ mvn -DperformRelease=true clean deploy


You will see that Maven copies the artifacts to your local checkout of the GitHub Pages branch. After Maven has finished you'll have to update the directory listings, commit the changes made to the repository and push them to GitHub:



1
2
3
4
5
$ cd ../jsf-maven-util-pages/
$ ./update-directory-index.sh
$ git add -A
$ git commit -m "Deployed my first artifact to GitHub"
$ git push origin gh-pages



Now let's check the result. Please note that the first publish may take some time to appear on the web server.
Looks great, doesn't it? :-)
If you want to use your repository in another project, just add the following repository entry to the pom.xml:
?



1
2
3
4
5
<repository>
  <id>jsf-maven-util-repoid>
  <name>jsf-maven-util repository on GitHubname>
  <url>http://chkal.github.com/jsf-maven-util/repository/url>
repository>