Multiple robots.txt files on server

mjukhud

Hi!

I have previously hired a developer to put up my site and noticed afterwards that he did not know much about SEO. This lead me to starting to learn myself and applying some changes step by step.

One of the things I am currently doing is inserting sitemap reference in robots.txt file (which was not there before). But just now when I wanted to upload the file via FTP to my server I found multiple ones - in different sizes - and I dont know what to do with them? Can I remove them? I have downloaded and opened them and they seem to be 2 textfiles and 2 dupplicates. Names:

robots.txt (original dupplicate)
robots.txt-Original (original)
robots.txt-NEW (other content)
robots.txt-Working (other content dupplicate)

Would really appreciate help and expertise suggestions. Thanks!

peakdistrictseo

So what's the best policy if a site uses an e-commerce platform like Magento, which has a robots file, but also has a Wordpress blog installed to another folder. eg: /blog and uses a plugin like YOAST which generated a robots file of the Wordpress installation.

Then you have 2 robots files, is this detrimental or no big deal?

mjukhud

Thanks very much for the help!

mjukhud

Thanks very much for the help!

seoman10

Keep a backup and remove them.

Search engines are only going to look at the file which is exactly called robots.txt variations of file name will be ignored.

Do make sure the entries are correct in the main one though, you don't want Google crawling admin pages or other confidential areas of the site.

mjukhud

Hi, thanks for the answer and help!

Well, I only have one domain that has a webpage and no subdomains active (no blog-subdomain or similar) - so how can I configure that to the situation? Can I just remove all and upload the one I want, maybe?

Mustansar

That's a good question, EMS. The robots.txt protocol can get kind of
confusing when you think about it too long, and it sounds like you've
thought about this a bit. However, in this case, it might help to
look at robots.txt from the perspective of the spider.

When a spider finds a URL, it takes the whole domain name (everything
between 'http://' and the next '/'), then sticks a '/robots.txt' on
the end of it and looks for that file. If that file exists, then the
spider should read it to see where it is allowed to crawl.

In your case, Googlebot, or any other spider, should try to access
three URLs: domainA.com/robots.txt, domainB.domainA.com/robots.txt,
and domainB.com/robots.txt. The rules in each are treated as
separate, so disallowing robots from domainA.com/ should result in
domainA.com/ being removed from search results while
domainB.domainA.com/ remains unaffected, which does not sound like not
something you want.

The problem you might have with the setup you have described is this--
in order to keep domainB.domainA.com out of the results, you would
need to have domainB.domainA.com/robots.txt exclude robots, while
domainB.com/robots.txt welcomes them. This means that you would need
to have a way to make domainB.domainA.com/ and domainB.com/ serve
different information, and judging from what you've described, you
have not set up your server to do so yet.

Of course, it is always possible that I have assumed to much about
your situation, so it is a good idea to use Google's robots.txt
analysis tool (see http://www.google.com/support/webmasters/bin/topic.py?topic=8475
) to see if your robots.txt files already produce the results you
want.

If using robots.txt files doesn't solve the problem, and assuming that
you want to continue hosting all of your content on domainA.com, one
strategy you really should look into would be setting up a 301
redirect from the pages on domainB.domainA.com/ to domainB.com/ . If
you need more advice on how to do this with your server software, your
hosting company's tech support would definitely be the best place to
start, but this group is here to help if more isues arise.

Hope that helps!

Welcome to the Q&A Forum

Browse the forum for helpful insights and fresh discussions about all things SEO.

Moz Q&A is closed.

Multiple robots.txt files on server

Got a burning SEO question?

Browse Questions

Explore more categories

Related Questions

Disallow wildcard match in Robots.txt

Removing CSS & JS Files from Index

Umbrella company and multiple domains

Multiple urls for posting multiple classified ads

Should I block robots from URLs containing query strings?

301 Redirect on a PDF, DOCX files?

Keywords in file names vs folder names

Should I set up a disallow in the robots.txt for catalog search results?

Products

Moz Solutions

Free SEO Tools

Resources

About Moz

Why Moz

Get Involved