Thursday, January 28, 2021

How to tell Google not to add certain content of your website to search result

If you don't want some of your html content to be scraped and added to your webpage search result in Google, you just need to surround those content that you want to skip with these tags:

<!--googleoff: all-->

Content we don't want Google to crawl

<!--googleon: all-->


 

You will be able to remove content that you don't want from the web search description.

Here is the full instruction from: https://support.google.com/gsa/answer/6329153?hl=en)

 

Excluding Unwanted Text from the Index

There may be Web pages that you want to suppress from search results when users search on certain words or phrases. For example, if a Web page consists of the text “the user conference page will be completed as soon as Jim returns from medical leave,” you might not want this page to appear in the results of a search on the terms “user conference.”

You can prevent this content from being indexed by using googleoff/googleon tags. By embedding googleon/googleoff tags with their flags in HTML documents, you can disable:

  • The indexing of a word or portion of a Web page
  • The indexing of anchor text
  • The use of text to create a snippet in search results

For details about each googleon/googleoff flag, refer to the following table.

Flag

Description

Example

Results

index

Words between the tags are not indexed as occurring on the current page.

fish <!--googleoff: index-->shark

<!--googleon: index-->mackerel

The words fish and mackerel are indexed for this page, but the occurrence of shark is not indexed.

This page could appear in search results for the term shark only if the word appears elsewhere on the page or in anchortext for links to the page.

But the word shark could appear in a result snippet.

Hyperlinks that appear within these tags are followed.

anchor

Anchor text that appears between the tags and in links to other pages is not indexed. This prevents the index from using the hyperlink to associate the link text with the target page in search results.

<!--googleoff: anchor-->
<A href=sharks_rugby.html>

shark </A> <!--googleon: anchor-->

The word shark is not associated with the page sharks_rugby.html. Otherwise this hyperlink would cause the page sharks_rugby.html to appear in the search results for the term shark. Hyperlinks that appear within these tags are followed, so sharks_rugby.html is still crawled and indexed.

snippet

Text between the tags is not used to create snippets for search results.

<!--googleoff: snippet-->Come to the fair!

<A href=sharks_rugby.html>shark</A>

<!--googleon: snippet-->

The text ("Come to the fair!" and "shark") does not appear in snippets with the search results, but the words will still be indexed and searchable. Also, the link sharks_rugby.html will still be followed. The URL sharks_rugby.html will also appear in the search results for the term shark.

all

Turns off all the attributes. Text between the tags is not indexed, is not associated with anchor text, or used for a snippet.

<!--googleoff: all-->Come to the fair!

<!--googleon: all-->

The text Come to the fair! is not indexed, is not associated with anchor text, and does not appear in snippets with the search results.

There must be a space or newline before the googleon tag.

If URL1 appears on page URL2 within googleoff and googleon tags, the search appliance still extracts the URL and adds it to the link structure. For example, the query link:URL2 still contains URL1 in the result set, but depending on which googleoff option you use, you do not see URL1 when viewing the cached version, searching using the anchor text, and so on. If you want the search appliance not to follow the links and ignore the link structure, follow the instructions in Using Robots meta Tags to Control Access to a Web Page.