Google Centerpiece Annotation – Main Page and Site Content

Martin Splitt from Google explained the concept of centerpiece annotation, a term used in Google to define the main content of a page or site. Martin said they are able to understand that the main topic of a page is on A and the rest of the content on that page may not be the main one. So Google will rate content differently based on that, Martin said.

He said this to 28:50 mark In this Duda webinar, here is what Martin said:

I don’t know what we’ve said publicly about this but I think I brought it up in one of the podcast episodes so I can probably say we have a thing called the centerpiece annotation for example and there are some other annotations we have. Where we look at semantic content as well as potentially the layout tree.

But basically we can already read this from the content structure in HTML, and understand that looks like all the natural language processing that we’ve done on this on all of the textual content here that we’ve gotten, it seems to be mostly about topic A, dog food. And then there’s like this other thing here that looks like links to related products but it’s not really part of the centerpiece, it’s not really main content here, it seems like additional stuff. And then there’s like a bunch of boilerplate so well, we figured out that the menu is pretty much the same on all these pages and it looks pretty much like this menu we have on all the other pages or from that of this domain, for example or we have seen this before.

We don’t even go by area or like oh that looks like a menu. We figure out what looks like a boilerplate, and then that’s also weighted differently. So if you have content on a page that isn’t related to the main topic of the rest of the content, we might not give it as much attention as you might think. We still use this information for link discovery and to determine the structure of your site and all that. But if a page has 10,000 words about dog food, and then 3,000, 2,000, or 1,000 words about bikes, then that’s probably not good content for bikes.

Here is the integration:

Yes, he mentioned it briefly in the May 27 episode where he said “A question I often get also with JavaScript is whether we treat JavaScript content differently. We have annotations for content – what we think is the centerpiece of an article or what we think to be content alongside and everything.”

Glenn Gabe summed it up on Twitter like saying “Google has a centerpiece annotation (and such). It examines semantic content and layout tree. From NLP, G can identify a page regarding topic X and then identify additional content vs. main content, boilerplate, etc. Then it may be weighted differently by Google.”

Makes sense, I just didn’t know Google called this “masterpiece annotation” internally.

Discussion forum on Twitter.