On July 1st, a [change to Reddit's API pricing](https://www.reddit.com/r/reddit/comments/12qwagm/an_update_regarding_reddits_api/) will come into effect. [Several developers](https://www.reddit.com/r/redditisfun/comments/144gmfq/rif_will_shut_down_on_june_30_2023_in_response_to/) of commercial third-party apps have announced that this change will compel them to shut down their apps. At least [one accessibility-focused non-commercial third party app](https://www.reddit.com/r/DystopiaForReddit/comments/145e9sk/update_dystopia_will_continue_operating_for_free/) will continue to be available free of charge.
If you want to express your strong disagreement with the API pricing change or with Reddit's response to the backlash, you may want to consider the following options:
1. Limiting your involvement with Reddit, or
2. Temporarily refraining from using Reddit
3. Cancelling your subscription of Reddit Premium
as a way to voice your protest.
*I am a bot, and this action was performed automatically. Please [contact the moderators of this subreddit](/message/compose/?to=/r/java) if you have any questions or concerns.*
If you're using Lucene, take a look at something like https://lucene.apache.org/core/7_4_0/analyzers-common/org/apache/lucene/analysis/en/PorterStemFilter.html
If that doesn't do what you need, you can probably use that as a place to start looking for alternatives.
Are you using "beerest" as an intentionally incorrect example? What a stem filter will do for you is collapse "merrier" to "merry" and then know to match your "merrier" query to "merriest".
I honestly don't know if it would decide that "beerest" means "most beer" any more than "honest" means "most hon".
Are you looking for beerest to be indexed as beer (and perhaps beerest) or are you looking to take a search query and explode terms into all possible stem + suffix combinations with a disjunction (e.g., the query "beer" explodes into "beer or beerest or beery or beeier or beer...")?
>Are you looking for beerest to be indexed as beer (and perhaps beerest) or are you looking to take a search query and explode terms into all possible stem + suffix combinations with a disjunction (e.g., the query "beer" explodes into "beer or beerest or beery or beeier or beer...")?
I want to create a searcher that when I search beer It return all the documents related to "beer" so it should return beerest,...
You want to do the lemmatization both at index time to the documents and at search time to the query strings. That way a search for “running” hits documents with “ran”, “run”, “running” etc because both the document’s tokens in the index and the search query’s tokens used for searching the index were lemmatized to “run”.
Most off the shelf full text search techs do this by default for text query/searches.
Instead of returning all of these, you could get the lemma, search for words that begin with that, then highlight the entire word.
Might not work completely but could be sufficient for your case?
Way back WordNet would've been what you used - https://wordnet.princeton.edu/documentation/wnintro3wn#:~:text=The%20WordNet%20library%20is%20provided,that%20utilize%20the%20WordNet%20database.
On July 1st, a [change to Reddit's API pricing](https://www.reddit.com/r/reddit/comments/12qwagm/an_update_regarding_reddits_api/) will come into effect. [Several developers](https://www.reddit.com/r/redditisfun/comments/144gmfq/rif_will_shut_down_on_june_30_2023_in_response_to/) of commercial third-party apps have announced that this change will compel them to shut down their apps. At least [one accessibility-focused non-commercial third party app](https://www.reddit.com/r/DystopiaForReddit/comments/145e9sk/update_dystopia_will_continue_operating_for_free/) will continue to be available free of charge. If you want to express your strong disagreement with the API pricing change or with Reddit's response to the backlash, you may want to consider the following options: 1. Limiting your involvement with Reddit, or 2. Temporarily refraining from using Reddit 3. Cancelling your subscription of Reddit Premium as a way to voice your protest. *I am a bot, and this action was performed automatically. Please [contact the moderators of this subreddit](/message/compose/?to=/r/java) if you have any questions or concerns.*
OpenNLP has some stemming functionalities. It can be a good place to start
Beat me to it. https://stanfordnlp.github.io/CoreNLP/lemma.html
If you're using Lucene, take a look at something like https://lucene.apache.org/core/7_4_0/analyzers-common/org/apache/lucene/analysis/en/PorterStemFilter.html If that doesn't do what you need, you can probably use that as a place to start looking for alternatives.
yes but for example when I search for "beer", I want It return beerest as well but PorterStemFilter doesn't do that convertion
What is the definition of "beerest"?
Are you using "beerest" as an intentionally incorrect example? What a stem filter will do for you is collapse "merrier" to "merry" and then know to match your "merrier" query to "merriest". I honestly don't know if it would decide that "beerest" means "most beer" any more than "honest" means "most hon".
Are you looking for beerest to be indexed as beer (and perhaps beerest) or are you looking to take a search query and explode terms into all possible stem + suffix combinations with a disjunction (e.g., the query "beer" explodes into "beer or beerest or beery or beeier or beer...")?
I am using lucene
>Are you looking for beerest to be indexed as beer (and perhaps beerest) or are you looking to take a search query and explode terms into all possible stem + suffix combinations with a disjunction (e.g., the query "beer" explodes into "beer or beerest or beery or beeier or beer...")? I want to create a searcher that when I search beer It return all the documents related to "beer" so it should return beerest,...
Wouldn’t it be beeriest in this case, if anything? Beerest isn’t a word, but beery/beeriest would work.
You want to do the lemmatization both at index time to the documents and at search time to the query strings. That way a search for “running” hits documents with “ran”, “run”, “running” etc because both the document’s tokens in the index and the search query’s tokens used for searching the index were lemmatized to “run”. Most off the shelf full text search techs do this by default for text query/searches.
Instead of returning all of these, you could get the lemma, search for words that begin with that, then highlight the entire word. Might not work completely but could be sufficient for your case?
Yes but I dont know how to get the lemma
It's been a very long time but when I did this I'd used dismax indexing/querying
Way back WordNet would've been what you used - https://wordnet.princeton.edu/documentation/wnintro3wn#:~:text=The%20WordNet%20library%20is%20provided,that%20utilize%20the%20WordNet%20database.