Are robots.txt wildcards still valid? If so, what is the proper syntax for setting this up?

mkhGT

I've got several URL's that I need to disallow in my robots.txt file. For example, I've got several documents that I don't want indexed and filters that are getting flagged as duplicate content. Rather than typing in thousands of URL's I was hoping that wildcards were still valid.

DarinPirkey

Great job. I just wanted to add this from Google Webmasters

http://googlewebmastercentral.blogspot.com/2008/06/improving-on-robots-exclusion-protocol.html

and this from Google Developers

https://developers.google.com/webmasters/control-crawl-index/docs/robots_txt

Adam_Cochran

Yup wildcard syntax is indeed still valid. However I can only confirm that the big 3 (Google, Yahoo and Bing) actively observe it. Other secondary search engines may not.

In your case you are probably looking for a syntax along the lines of:

User-agent: *
Disallow: /*.pdf$ This would set that any user agent should be blocked from any file name that ends in .pdf (a $ ties it to the end so pdf.txt would not be blocked in this case)

Keep an eye on how you block them. Missing a trailing slash could block a directory rather than a file, or not appending a strict symbol ($) could mean that phrases throughout a directory could be blocked rather than just a filename.

Also keep in mind if you are using URL re-writing this may play into how you need to block things; and you may also want to remember that disallowing access in a robot.txt does NOT prevent search engines from indexing the data, it is up to them if they honor the request. So if it is very important to block the file access from search engines then robots.txt may not be the way to do it.

Welcome to the Q&A Forum

Browse the forum for helpful insights and fresh discussions about all things SEO.

Moz Q&A is closed.

Are robots.txt wildcards still valid? If so, what is the proper syntax for setting this up?

Got a burning SEO question?

Browse Questions

Explore more categories

Related Questions

Google has deindexed a page it thinks is set to 'noindex', but is in fact still set to 'index'

Robots.txt & meta noindex--site still shows up on Google Search

Is sitemap required on my robots.txt?

Blocked jquery in Robots.txt, Any SEO impact?

Robots.txt to disallow /index.php/ path

Set base-href to subfolders - problems?

Allow or Disallow First in Robots.txt

Why is a 301 redirected url still getting indexed?

Products

Moz Solutions

Free SEO Tools

Resources

About Moz

Why Moz

Get Involved