Robots.txt blocked internal resources Wordpress

Mat_C

Hi all,

We've recently migrated a Wordpress website from staging to live, but the robots.txt was deleted. I've created the following new one:

User-agent: *
Allow: /
Disallow: /wp-admin/
Disallow: /wp-includes/
Disallow: /wp-content/plugins/
Disallow: /wp-content/cache/
Disallow: /wp-content/themes/
Allow: /wp-admin/admin-ajax.php

However, in the site audit on SemRush, I now get the mention that a lot of pages have issues with blocked internal resources in robots.txt file. These blocked internal resources are all cached and minified css elements: links, images and scripts.

Does this mean that Google won't crawl some parts of these pages with blocked resources correctly and thus won't be able to follow these links and index the images? In other words, is this any cause for concern regarding SEO?

Of course I can change the robots.txt again, but will urls like https://example.com/wp-content/cache/minify/df983.js end up in the index?

Thanks for your thoughts!

Mat_C

Thanks for the answer!

Last question: is /wp-admin/admin-ajax.php an important part that has to be crawled? I found this explanation: https://wordpress.stackexchange.com/questions/190993/why-use-admin-ajax-php-and-how-does-it-work/191073#191073

However, on this specific website there is no html at all when I check the source code, only one line with 0 on it.

JordanLowry

I would leave all the disallows out except for the /wp-admin/ section. For example, I'd rewrite the robots.txt file to read:

User-agent: *
Disallow: /wp-admin/

Also, you kind of want Google to index your cached content. In the event your servers go down it will still be able to make your content available.

I hope that helps. Let me know how that works out for you!

Mat_C

Thanks for the clear answer.

I've changed the robots.txt to:

User-agent: *
Allow: /
Disallow: /wp-admin/
Disallow: /wp-includes/
Disallow: /wp-content/themes/
Allow: /wp-admin/admin-ajax.php

This should avoid problems with not indexing (parts of) cached content.

Or should I leave all the Disallows out?

JordanLowry

Hey there --

Blocking resources with the robots.txt file prevents search engines from crawling content the no-index tag would be better suited for preventing content from being indexed.

However, previous best practice would dictate blocking access to /wp-includes/ and /wp-content/ directories, etc but that's no longer necessary.

Today, Google will fetch all your styling and JavaScript files so they can render your pages completely. Search engines now try to understand your page's layout and presentation as a key part of how they evaluate quality.

So, yeah this might have some impact on your SEO.

Also, if you're using a plugin to cache content you should want Google to crawl your cache content. And in my experience, Googlebot does a good job of not indexing /wp-content/ sections.

So, for your example page, https://example.com/wp-content/cache/minify/df983.js it shouldn't end up in their index.

Hope this helps some.

Welcome to the Q&A Forum

Browse the forum for helpful insights and fresh discussions about all things SEO.

Moz Q&A is closed.

Robots.txt blocked internal resources Wordpress

Got a burning SEO question?

Browse Questions

Explore more categories

Related Questions

Block session id URLs with robots.txt

Rel=canonical and internal links

How to rank if you are an aggregator or a directory of resource?

Dilemma about "images" folder in robots.txt

Robots.txt, does it need preceding directory structure?

Duplicate internal links on page, any benefit to nofollow

Finding broken links / resources by topic

Increasing Internal Links But Avoiding a Link Farm

Products

Moz Solutions

Free SEO Tools

Resources

About Moz

Why Moz

Get Involved