For non-sensitive information, block unwanted crawling by using robots.txt
A robots.txt file tells search engines whether they can access and therefore crawl parts of your site. This file, which must be named robots.txt, is placed in the root directory of your site. It is possible that pages blocked by robots.txt can still be crawled, so for sensitive pages, use a more secure method.
# Tell Google not to crawl any URLs in the shopping cart or images in the icons folder,
# because they won’t be useful in Google Search results.
You may not want certain pages of your site crawled because they might not be useful to users if found in a search engine’s search results. Note that if your site uses subdomains and you wish to have certain pages not crawled on a particular subdomain, you’ll have to create a separate robots.txt file for that subdomain. For more information on robots.txt, we suggest this guide on using robots.txt files.
Read about several other ways to prevent content from appearing in search results.
Letting your internal search result pages be crawled by Google. Users dislike clicking a search engine result only to land on another search result page on your site.
Allowing URLs created as a result of proxy services to be crawled.