5 ways to avoid duplicate content and indexing issues on your e-commerce site
Before a page can rank well, it needs to be crawled and indexed. Contributor Manish Dudharejia shares five tips to give your pages the best chance of getting indexed in the search results.
More than any other type of site, e-commerce sites are notorious for developing URL structures that create crawling and indexing issues with the search engines. It’s important to keep this under control in order to avoid duplicate content and crawl budget complications.
Here are five ways to keep your e-commerce site’s indexation optimal.
1. Know what’s in Google’s index
To begin with, it’s important to regularly check how many of your pages Google reports as indexed. You can do this by running a “site:example.com” search on Google to see how many pages Google is aware of across the web.
While Google webmaster trends analyst Gary Illyes has mentioned this number is only an estimate, it is the easiest way to identify whether or not something is seriously off with your site’s indexing.
In regards to the number of pages in their index, Bing’s Stefan Weitz has also admitted that Bing
…guesstimates the number, which is usually wrong…I think Google has had it for so long that people expect to see it up there
Numbers between your content management system (CMS) and e-commerce platform, sitemap, and server files should match almost perfectly, or at least with any discrepancies addressed and explained. Those numbers, in turn, should roughly line up with what returns in a Google site operator search. Smart on-site SEO helps here; a site developed with SEO in mind helps considerably by avoiding duplicate content and structural problems that can create indexing issues.
While too few results in an index can be an issue, too many results are also an issue since this can mean you have duplicate content in the search results. While Ilyes has confirmed that there is no “duplicate content penalty,” duplicate content still hurts your crawl budget and can also dilute the authority of your pages across the duplicates.
If Google returns too few results:
- Identify which pages from your sitemap are not showing up in your Google Analytics organic search traffic. (Use a long date range.)
- Search for a representative sample of these pages in Google to identify which are actually missing from the index. (You don’t need to do this for every page.)
- Identify patterns in the pages that are not indexing and address those systematically across your site to increase the chances of those pages getting indexed. Patterns to look for include duplicate content issues, a lack of inbound internal links, non-inclusion in the XML sitemap, unintentional noindexing or canonicalization, and HTML with serious validation errors.
If Google is returning too many results:
- Run a site crawl with ScreamingFrog, DeepCrawl, SiteBulb, or a similar tool and identify pages with duplicate titles, since these typically have duplicate content.
- Determine what is causing the duplicates and remove them. There are various causes and solutions and those will make up much of the rest of this post.