Welcome to the Q&A Forum

Browse the forum for helpful insights and fresh discussions about all things SEO.

Your browser does not seem to support JavaScript. As a result, your viewing experience will be diminished, and you have been placed in read-only mode.

Please download a browser that supports JavaScript, or enable it if it's disabled (i.e. NoScript).

Moz Q&A is closed.

After more than 13 years, and tens of thousands of questions, Moz Q&A closed on 12th December 2024. Whilst we’re not completely removing the content - many posts will still be possible to view - we have locked both new posts and new replies. More details here.

Robots.txt: Can you put a /* wildcard in the middle of a URL?

Intermediate & Advanced SEO

1028

IHSwebsite last edited by

We have noticed that Google is indexing the language/country directory versions of directories we have disallowed in our robots.txt.

For example:

Disallow: /images/ is blocked just fine

However, once you add our /en/uk/ directory in front of it, there are dozens of pages indexed.

The question is: Can I put a wildcard in the middle of the string, ex. /en/*/images/, or do I need to list out every single country for every language in the robots file. Anyone know of any workarounds?
1 Reply Last reply
Reply Quote 0
irvingw last edited by

Yes, wildcards work, thank god.
1 Reply Last reply
Reply Quote 1

Got a burning SEO question?

Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.

Start my free trial

Browse Questions

View

From

Sorted by

With category

Explore more categories

Related Questions

Robots.txt blocked internal resources Wordpress

Hi all, We've recently migrated a Wordpress website from staging to live, but the robots.txt was deleted. I've created the following new one: User-agent: *
Allow: /
Disallow: /wp-admin/
Disallow: /wp-includes/
Disallow: /wp-content/plugins/
Disallow: /wp-content/cache/
Disallow: /wp-content/themes/
Allow: /wp-admin/admin-ajax.php However, in the site audit on SemRush, I now get the mention that a lot of pages have issues with blocked internal resources in robots.txt file. These blocked internal resources are all cached and minified css elements: links, images and scripts. Does this mean that Google won't crawl some parts of these pages with blocked resources correctly and thus won't be able to follow these links and index the images? In other words, is this any cause for concern regarding SEO? Of course I can change the robots.txt again, but will urls like https://example.com/wp-content/cache/minify/df983.js end up in the index? Thanks for your thoughts!
Intermediate & Advanced SEO | | Mat_C

2
Disallow URLs ENDING with certain values in robots.txt?

Is there any way to disallow URLs ending in a certain value? For example, if I have the following product page URL: http://website.com/category/product1, and I want to disallow /category/product1/review, /category/product2/review, etc. without disallowing the product pages themselves, is there any shortcut to do this, or must I disallow each gallery page individually?
Intermediate & Advanced SEO | | jmorehouse

0
Attack of the dummy urls -- what to do?

It occurs to me that a malicious program could set up thousands of links to dummy pages on a website: www.mysite.com/dynamicpage/dummy123 www.mysite.com/dynamicpage/dummy456 etc.. How is this normally handled? Does a developer have to look at all the parameters to see if they are valid and if not, automatically create a 301 redirect or 404 not found? This requires a table lookup of acceptable url parameters for all new visitors. I was thinking that bad url names would be rare so it would be ok to just stop the program with a message, until I realized someone could intentionally set up links to non existent pages on a site.
Intermediate & Advanced SEO | | friendoffood

1
Should I use meta noindex and robots.txt disallow?

Hi, we have an alternate "list view" version of every one of our search results pages The list view has its own URL, indicated by a URL parameter I'm concerned about wasting our crawl budget on all these list view pages, which effectively doubles the amount of pages that need crawling When they were first launched, I had the noindex meta tag be placed on all list view pages, but I'm concerned that they are still being crawled Should I therefore go ahead and also apply a robots.txt disallow on that parameter to ensure that no crawling occurs? Or, will Googlebot/Bingbot also stop crawling that page over time? I assume that noindex still means "crawl"... Thanks 🙂
Intermediate & Advanced SEO | | ntcma

0
Change url structure and keeping the social media likes/shares

Hi guys, We're thinking of changing the url structure of the tutorials (we call it knowledgebase) section on our website. We want to make it shorter URL so it be closer to the TLD. So, for the convenience we'll call them old page (www.domain.com/profiles/profile_id/kb/article_title) and new page (www.domain.com/kb/article_title) What I'm looking to do is change the url structure but keep the likes/shares we got from facebook. I thought of two ways to do it and would love to hear what the community members thinks is better. 1. Use rel=canonical I thought we might do a rel=canonical to the new page and add a "noindex" tag to the old page. In that way, the users will still be able to reach the old page, but the juice will still link to the new page and the old pages will disappear from Google SERP and the new pages will start to appear. I understand it will be pretty long process. But that's the only way likes will stay 2. Play with the og:url property Do the 301 redirect to the new page, but changing the og:url property inside that page to the old page url. It's a bit more tricky but might work. What do you think? Which way is better, or maybe there is a better way I'm not familiar with yet? Thanks so much for your help! Shaqd
Intermediate & Advanced SEO | | ShaqD

0
Weird 404 URL Problem - domain name being placed at end of urls

Hey there. For some reason when doing crawl tests I'm finding pages with the domain name being tacked on the end and causing 404 errors.
For example: http://domainname.com/page-name/http://domainname.com This is happening to all pages, posts and even category type 1. Site is in Wordpress
2. Using Yoast SEO plugin Any suggestions? Thanks!
Intermediate & Advanced SEO | | Jay328

0
Ending URLs in .html versus /

Hi there! Currently all the URLs on my website, even the home page, end it .html, such as http://www,consumerbase.com/index.html Is this bad?
Is there any benefit to this? Should I remove it and just have them end with a forward slash?
If I 301 redirect the old .html URLs to the forward slash URLs, will I lose PA? Thanks!
Intermediate & Advanced SEO | | Travis-W

0
Brackets in a URL String

Was talking with a friend about this the other day. Do Brackets and or Braces in a URL string impact SEO? (I know short human readable etc... but for the sake of conversation has anyone relaised any impacts of these particular Characters in a URL?
Intermediate & Advanced SEO | | AU-SEO

0

Welcome to the Q&A Forum

Browse the forum for helpful insights and fresh discussions about all things SEO.

Moz Q&A is closed.

Robots.txt: Can you put a /* wildcard in the middle of a URL?

Got a burning SEO question?

Browse Questions

Explore more categories

Related Questions

Robots.txt blocked internal resources Wordpress

Disallow URLs ENDING with certain values in robots.txt?

Attack of the dummy urls -- what to do?

Should I use meta noindex and robots.txt disallow?

Change url structure and keeping the social media likes/shares

Weird 404 URL Problem - domain name being placed at end of urls

Ending URLs in .html versus /

Brackets in a URL String

Products

Moz Solutions

Free SEO Tools

Resources

About Moz

Why Moz

Get Involved