Robots.txt, does it need preceding directory structure?

Milian

Do you need the entire preceding path in robots.txt for it to match?

e.g:

I know if i add Disallow: /fish to robots.txt it will block

/fish
/fish.html
/fish/salmon.html
/fishheads
/fishheads/yummy.html
/fish.php?id=anything

But would it block?:

en/fish
en/fish.html
en/fish/salmon.html
en/fishheads
en/fishheads/yummy.html
**en/fish.php?id=anything

(taken from Robots.txt Specifications)** I'm hoping it actually wont match, that way writing this particular robots.txt will be much easier!

As basically I'm wanting to block many URL that have BTS- in such as:

http://www.example.com/BTS-something
http://www.example.com/BTS-somethingelse
http://www.example.com/BTS-thingybob

But have other pages that I do not want blocked, in subfolders that also have BTS- in, such as:

http://www.example.com/somesubfolder/BTS-thingy
http://www.example.com/anothersubfolder/BTS-otherthingy

Thanks for listening

Milian

Yes this is what I thought, but wanted some second opinions.

Although I wouldn't actually need a wild card after BTS, as just leaving it open is the same as using a wildcard:

/fish*.......... Equivalent to "/fish" -- the trailing wildcard is ignored. https://developers.google.com/webmasters/control-crawl-index/docs/robots_txt Thanks for the link, I'll take a look

PinpointDesigns

You're right in with the **Disallow: /fish **in the robots file blocking all those initial links, but if you wanted to block everything inside the /en/ folder, you would need to do disallow: /en/fish

You could use a wildcard in the robots.txt file to do something along the lines of Disallow: /BTS-*

This _'should' _work, but it's always worth checking using a tool to make sure it's all implemented correctly. Distilled did a post a while back about a JS tool which allows you to test if robots.txt files work correctly which can be found here - http://www.distilled.net/blog/seo/js-bookmarklet-for-checking-if-a-page-is-blocked-by-robots-txt/

In addition to this, you could also use the 'blocked URLs' tool in GWT to see if the pages are successfully blocked once you've implemented the code.

Hope this helps!

Welcome to the Q&A Forum

Browse the forum for helpful insights and fresh discussions about all things SEO.

Moz Q&A is closed.

Robots.txt, does it need preceding directory structure?

Got a burning SEO question?

Browse Questions

Explore more categories

Related Questions

WordPress Sub-directory for SEO

Taxonomy question - best approach for site structure

If I block a URL via the robots.txt - how long will it take for Google to stop indexing that URL?

If Robots.txt have blocked an Image (Image URL) but the other page which can be indexed has this image, how is the image treated?

Wordpress Blog in 2 languages. How to SEO or structure it?

Should comments and feeds be disallowed in robots.txt?

SEOmoz recommended Directories

Robots.txt is blocking Wordpress Pages from Googlebot?

Products

Moz Solutions

Free SEO Tools

Resources

About Moz

Why Moz

Get Involved