Create
cancel
Showing results for 
Search instead for 
Did you mean: 
Sign up Log in

How do I obtain the filename associated with a page's outgoing link

Brent Austin August 26, 2015

I have images in pages that are links to image attachments in other pages:

<ac:image>
<ri:attachment ri:filename="home.jpg">
<ri:page ri:content-title="Let's edit this page (step 3 of 9)" ri:space-key="ds" />
</ri:attachment>
</ac:image>

I can obtain the linked page's space (ds) and title (Let's edit this page (step 3 of 9)) from the OutgoingLink object but not the filename.

How can I obtain the filename (home.jpg) so that I can access and download the linked attachment?

3 answers

1 accepted

Comments for this post are closed

Community moderators have prevented the ability to post new answers.

Post a new question

0 votes
Answer accepted
Panos
Rising Star
Rising Star
Rising Stars are recognized for providing high-quality answers to other users. Rising Stars receive a certificate of achievement and are on the path to becoming Community Leaders.
August 26, 2015

The easy way but requires jsoup

 

Page p=&lt;Load the page here&gt;;
final ConversionContext conversionContext = new DefaultConversionContext(page.toPageContext());
String rendered = renderer.render(post.getBodyAsString(),conversionContext);
Document doc = Jsoup.parse(rendered); 
Elements images = doc.select("img");
for (Element el : images) {
    String imageUrl = el.attr("src");
	//Do something with image url
}

The harder way:

Pate p=&lt;Load the page here&gt;;
Pattern p1=Pattern.compile("(&lt;ac:image\\s*(?:[ac:height=\"[0-9]*\"]*)?&gt;.*?&lt;/ac:image&gt;)");

Matcher m=p1.matcher(p.getBodyAsString());
if(m.find()){
	Document doc=loadXml(p.getBodyAsString());
	do{
		NodeList img = doc.getElementsByTagName("ac:image");
		Element element=(Element)img.item(0);
		//According to probable scenarios (1)

        //Do like:
		//element.getAttribute("ri:filename");
		//element.getAttribute("ri:space-key");
		//element.getAttribute("ri:content-title");
		//element.getAttribute("ri:value");
	}while(m.find());
}


 

private Document loadXMLFromString(String xml) throws Exception {
  DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
  DocumentBuilder builder = factory.newDocumentBuilder();
  InputSource is = new InputSource(new StringReader(xml));
  is.setEncoding("UTF-8");
  return builder.parse(is);
 }

where (1) is https://confluence.atlassian.com/doc/confluence-storage-format-283640220.html

 

Hope it helps

Brent Austin August 27, 2015

Thanks for the information it was very useful to see a solution based on the page data rather than page related objects. I went for a combination of the two solutions that you provided: Document doc = Jsoup.parse(page.getBodyAsString(), "", Parser.xmlParser()); Elements images = doc.getElementsByTag("ac:image"); for (Element image : images) { Element attachment = image.getElementsByTag("ri:attachment").first(); if (attachment != null) { String fn = attachment.attr("ri:filename"); Element sourcePage = attachment.getElementsByTag("ri:page").first(); if (sourcePage != null) { String ct = sourcePage.attr("ri:content-title"); String sk = sourcePage.attr("ri:space-key"); } } }

Panos
Rising Star
Rising Star
Rising Stars are recognized for providing high-quality answers to other users. Rising Stars receive a certificate of achievement and are on the path to becoming Community Leaders.
August 27, 2015

I am glad it helped :)

0 votes
Brent Austin August 26, 2015

Hi Panos, thanks for your prompt reply.

I'm working on a java plugin which is downloading the images displayed in a page.

I can access the images directly attached to the page using attachmentManager.getLatestVersionsOfAttachments(page);

However, some images are displayed using links to attachments contained on other pages and to access these I was using page.getOutgoingLinks();

The problem is that the OutgoingLink object provides the space and title information for the referred page, but not the attachment filename.

It's the filename that I need and I cannot see anyway of obtaining it other than using some RegEx on the page content.

Panos
Rising Star
Rising Star
Rising Stars are recognized for providing high-quality answers to other users. Rising Stars receive a certificate of achievement and are on the path to becoming Community Leaders.
August 26, 2015

If i was you, i would follow the outgoing links and parse the ac:image macro. It happens to be working on parsing some similar situation, so if you decide to go that way i can help you further.

Brent Austin August 26, 2015

Thanks for the information, yes I would like to see how you are parsing the ac:image tag. I am using the XhtmlContent class but this does not see ac:image as a macro.

0 votes
Panos
Rising Star
Rising Star
Rising Stars are recognized for providing high-quality answers to other users. Rising Stars receive a certificate of achievement and are on the path to becoming Community Leaders.
August 26, 2015

This is relative format, so the home.jpg is attached to that page this macro is located.

Since you don't explain how you are obtaining - java? javascript? something else? - I enlist some of options:

1) Rest call to find the url:

Make a call to /rest/prototype/1/content/PAGEID_HERE/attachment, find the attachment with filename="home.jpg", extract field "link"->"href" from the json.

2) Use backend:

Page p=pageManager.getPage(PAGEID_HERE);
Attachment attachment = attachmentManager.getAttachment(p, "home.jpg");
attachment.getUrlPath();
TAGS
AUG Leaders

Atlassian Community Events