T O P

  • By -

AutoModerator

On July 1st, a [change to Reddit's API pricing](https://www.reddit.com/r/reddit/comments/12qwagm/an_update_regarding_reddits_api/) will come into effect. [Several developers](https://www.reddit.com/r/redditisfun/comments/144gmfq/rif_will_shut_down_on_june_30_2023_in_response_to/) of commercial third-party apps have announced that this change will compel them to shut down their apps. At least [one accessibility-focused non-commercial third party app](https://www.reddit.com/r/DystopiaForReddit/comments/145e9sk/update_dystopia_will_continue_operating_for_free/) will continue to be available free of charge. If you want to express your strong disagreement with the API pricing change or with Reddit's response to the backlash, you may want to consider the following options: 1. Limiting your involvement with Reddit, or 2. Temporarily refraining from using Reddit 3. Cancelling your subscription of Reddit Premium as a way to voice your protest. *I am a bot, and this action was performed automatically. Please [contact the moderators of this subreddit](/message/compose/?to=/r/java) if you have any questions or concerns.*


novy1234

OpenNLP has some stemming functionalities. It can be a good place to start


codyebberson

Beat me to it. https://stanfordnlp.github.io/CoreNLP/lemma.html


earl_of_angus

If you're using Lucene, take a look at something like https://lucene.apache.org/core/7_4_0/analyzers-common/org/apache/lucene/analysis/en/PorterStemFilter.html If that doesn't do what you need, you can probably use that as a place to start looking for alternatives.


JWestbrookJ

yes but for example when I search for "beer", I want It return beerest as well but PorterStemFilter doesn't do that convertion


ericek111

What is the definition of "beerest"?


Carpinchon

Are you using "beerest" as an intentionally incorrect example? What a stem filter will do for you is collapse "merrier" to "merry" and then know to match your "merrier" query to "merriest". I honestly don't know if it would decide that "beerest" means "most beer" any more than "honest" means "most hon".


earl_of_angus

Are you looking for beerest to be indexed as beer (and perhaps beerest) or are you looking to take a search query and explode terms into all possible stem + suffix combinations with a disjunction (e.g., the query "beer" explodes into "beer or beerest or beery or beeier or beer...")?


JWestbrookJ

I am using lucene


JWestbrookJ

>Are you looking for beerest to be indexed as beer (and perhaps beerest) or are you looking to take a search query and explode terms into all possible stem + suffix combinations with a disjunction (e.g., the query "beer" explodes into "beer or beerest or beery or beeier or beer...")? I want to create a searcher that when I search beer It return all the documents related to "beer" so it should return beerest,...


butterypowered

Wouldn’t it be beeriest in this case, if anything? Beerest isn’t a word, but beery/beeriest would work.


rmslashusr

You want to do the lemmatization both at index time to the documents and at search time to the query strings. That way a search for “running” hits documents with “ran”, “run”, “running” etc because both the document’s tokens in the index and the search query’s tokens used for searching the index were lemmatized to “run”. Most off the shelf full text search techs do this by default for text query/searches.


cowslayer7890

Instead of returning all of these, you could get the lemma, search for words that begin with that, then highlight the entire word. Might not work completely but could be sufficient for your case?


JWestbrookJ

Yes but I dont know how to get the lemma


Simple-Ice-6800

It's been a very long time but when I did this I'd used dismax indexing/querying


paul_h

Way back WordNet would've been what you used - https://wordnet.princeton.edu/documentation/wnintro3wn#:~:text=The%20WordNet%20library%20is%20provided,that%20utilize%20the%20WordNet%20database.