Create
cancel
Showing results for 
Search instead for 
Did you mean: 
Sign up Log in

How to change confluence lucene stop words

Jan Prokeš
Rising Star
Rising Star
Rising Stars are recognized for providing high-quality answers to other users. Rising Stars receive a certificate of achievement and are on the path to becoming Community Leaders.
February 19, 2016

Hello,

I would like to play a little bit with stop words presented in Lucene bundled in Confluence instance. The reason is that we have Service Desk connected to Knowledge base in confluence and word "How" in our language (Czech) is actually in stop words, but it does not make sense to users, because they do not receive any results in SD, and actually error message "The query could not be parsed" in Confluence itself, which is not nice at all.

Confluence.atlassian.com and also answers.atlassian.com indexes the word "How" because it make sense, it leads users to type more exaxt query, which error message really does not do and users understand that as misbehaving application (and I agree with them).

So the question is, can I change Lucene stop word dictionary for my language/global somewhere?

 

Thank you.

1 answer

1 accepted

Comments for this post are closed

Community moderators have prevented the ability to post new answers.

Post a new question

3 votes
Answer accepted
Jan Prokeš
Rising Star
Rising Star
Rising Stars are recognized for providing high-quality answers to other users. Rising Stars receive a certificate of achievement and are on the path to becoming Community Leaders.
February 19, 2016

Ok, I have figured out it by myself. Here is step by step guide how to hack stopwords for each language.

 

  • Stop Confluence
  • Download <install_directory>/confluence/WEB-INF/lib/lucene-analyzers-common-4.4.0-atlassian-02.jar
  • Open it (the easiest way is to change extension to zip)
  • Find stopwords.txt file according to your language org\apache\lucene\analysis\[en,cz,de,fr,es,etc.]\stopwords.txt
  • Add/remove whatever you need and save
  • Rename back to .jar
  • Upload back to confluence install dir, see 2nd line
  • Start Confluence
 
I have also removed <home_directory>/index and <home_directory>/journal to force full reindexing during start.

Enjoy...

Jan Prokeš
Rising Star
Rising Star
Rising Stars are recognized for providing high-quality answers to other users. Rising Stars receive a certificate of achievement and are on the path to becoming Community Leaders.
February 19, 2016

Let me point one thing. I do not know where Atlassian get this stopwords files, but it contains obvious nonsences, at least for czech language. Why words like: article, today or write should be marked as stopwords? What about "how to create an article" - 4 of 5 words are stopwords, why is that? 

More funny part is, that it actually contains even non existing words like re, neg, aj, pta etc. Those does not exists in czech language smile

Like Marco Spinello likes this
TAGS
AUG Leaders

Atlassian Community Events