Robots.txt -- Joomla Best Practices

All sites should have have a robots.txt. Is it better to block the search engines from more or less directories. As with most things it is about tradeoffs here is what we learned.

Steven Johnson
by | Posted: July 28, 2015 | Updated: August 6, 2015
Blog Post Image

Today Google Webmaster tools emailed me some CSS and JS resources being blocked on my sites.  This update is related to the mobile friendly or responsive signals Google is giving a higher priority to.  Google wants to be able to see the CSS and JS so they can view the site as the user does and determine if the page is mobile friendly.

Here is a good post https://www.ostraining.com/blog/general/google-mobile-robots/ 

 

In the past, (like Joomla 2.5) we typically go with the Joomla default robots.txt. Sometimes we would allow the crawling of /images/  by removing this line from robots.txt file.

In Joomla 3.4 many improvements have been made to the robots.txt file including removing /images/ by default.  If you want to see the current version of the Joomla robots.txt file check out.

https://github.com/joomla/joomla-cms/blob/master/robots.txt.dist

 

# If the Joomla site is installed within a folder such as at
# e.g. www.example.com/joomla/ the robots.txt file MUST be
# moved to the site root at e.g. www.example.com/robots.txt
# AND the joomla folder name MUST be prefixed to the disallowed
# path, e.g. the Disallow rule for the /administrator/ folder
# MUST be changed to read Disallow: /joomla/administrator/
#
# For more information about the robots.txt standard, see:
# http://www.robotstxt.org/orig.html
#
# For syntax checking, see:
# http://tool.motoricerca.info/robots-checker.phtml

User-agent: *
Disallow: /administrator/
Disallow: /bin/
Disallow: /cache/
Disallow: /cli/
Disallow: /components/
Disallow: /includes/
Disallow: /installation/
Disallow: /language/
Disallow: /layouts/
Disallow: /libraries/
Disallow: /logs/
Disallow: /modules/
Disallow: /plugins/
Disallow: /tmp/

 

After looking at several sites we found a few directories that routinely needed to be removed from the file.  They are

 

# Disallow: /bin/
# Disallow: /components/
# Disallow: /libraries/
# Disallow: /modules/
# Disallow: /plugins/ 

 

 Here is our cleaned up file

 

# If the Joomla site is installed within a folder such as at
# e.g. www.example.com/joomla/ the robots.txt file MUST be
# moved to the site root at e.g. www.example.com/robots.txt
# AND the joomla folder name MUST be prefixed to the disallowed
# path, e.g. the Disallow rule for the /administrator/ folder
# MUST be changed to read Disallow: /joomla/administrator/
#
# For more information about the robots.txt standard, see:
# http://www.robotstxt.org/orig.html
#
# For syntax checking, see:
# http://www.sxw.org.uk/computing/robots/check.html

User-agent: *
Disallow: /administrator/
Disallow: /cache/
Disallow: /cli/
Disallow: /components/
Disallow: /includes/
Disallow: /installation/
Disallow: /language/
Disallow: /libraries/
Disallow: /logs/
Disallow: /media/
Disallow: /modules/
Disallow: /plugins/
Disallow: /templates/
Disallow: /tmp/

 

 

 

 

 Helpful Resources

https://docs.joomla.org/Robots.txt_file

http://joomlaseo.com/checklist/robots-txt-for-search-engines

http://joomlaseo.com/blog/robots-txt-do-not-block-css-and-javascript

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Follow Online Backup Search at Twitter Online Backup Search Feed Online Backup Search on FacebookOnline Backup Search on YoutubeIntown Web Design on Google+ Page