ViewSource bug? View Source versions of Confluence pages appearing in Google Search Results

Phil Gochenour November 15, 2012

In checking Google search results for our public Confluence space today, I came across some strange results, which are shown in the attached screenshot. The base URLs for all these is:

<cite>docs.delphix.com/plugins/viewsource/viewpagesrc.action</cite>

with the page GUID appended. If you click through, you see a version of this page without any wrapper around it, just the content (which is formatted as it should be).

This is a very small subset of the total number of topics, and I cannot find anything that they have in common.

On further investigation, I was able to find many other examples of these search results for other publiclly accessible Confluence spaces, as shown in the second screen shot (which includes the search terms). What's slightly distressing is that some of these are https sites, so it seems that this bug might enable someone to see a secured page, even if it's in "source" format.

Can anyone explain to me why these pages are showing up in Google, and what the viewsource plugin has to do with it? I would really prefer that my content not show up in Google without the other information attached to it.

1 answer

0 votes
Alejandro Conde Carrillo
Rising Star
Rising Star
Rising Stars are recognized for providing high-quality answers to other users. Rising Stars receive a certificate of achievement and are on the path to becoming Community Leaders.
November 21, 2012

You can limit the search engine spider to getting to those pages by setting a robots.txt.

In the first comment of Prevent Search Engine Indexing Using Robots.txt there is a great example of one.

Phil Gochenour November 26, 2012

But the issue is that I want my site indexed and to show up in search results, I just don't know why these particular pages are showing up with the viewsource url. And, as I pointed out, there are other random pages from Confluence sites out there that you can find by searching on the View Source keywords, some of which are supposedly behind login firewalls. It don't think it's a question of setting up a robots.txt file, I think there's something strange going on with Confluence that exposes these pages in View Source mode to search engine indexing. Could it be something to do with Confluence caching pages in some way when they are being edited that exposes those page versions to indexing?

Suggest an answer

Log in or Sign up to answer
TAGS
AUG Leaders

Atlassian Community Events