How to select distinct field values using Solr?
I would like to do the equivalent of this SQL but with Solr as my data store.
SELECT
DISTINCT txt
FROM
my_table;
What syntax would force Solr to only give me distinct values?
http://localhost:8983/solr/select?q=txt:?????&fl=txt
EDIT: So faceted searching seems to fit, but as I investigated it, I realized I had only detailed half of the problem.
My SQL query should have read...
SELECT
DISTINCT SUBSTR(txt,0,3)
FROM
my_table;
Any possibility of this with Solr?
Faceting would get you a results set that contains distinct values for a field.
E.g.
http://localhost:8983/solr/select/?q=*%3A*&rows=0&facet=on&facet.field=txt
You should get something back like this:
<response>
<responseHeader><status>0</status><QTime>2</QTime></responseHeader>
<result numFound="4" start="0"/>
<lst name="facet_counts">
<lst name="facet_queries"/>
<lst name="facet_fields">
<lst name="txt">
<int name="value">100</int>
<int name="value1">80</int>
<int name="value2">5</int>
<int name="value3">2</int>
<int name="value4">1</int>
</lst>
</lst>
</lst>
</response>
Check out the wiki for more information. Faceting is a really cool part of solr. Enjoy :)
http://wiki.apache.org/solr/SimpleFacetParameters#Facet_Fields
Note: Faceting will show the indexed value, I.e. after all the filters have been applied. One way to get around this is to use the copyfield method, so that you can create a facet version of the txt field. THis way your results will show the original value.
Hope that helps.. Lots of documentation on faceting available on the wiki. Or I did write some with screen shots.. which you can check out here:
http://www.craftyfella.com/2010/01/faceting-and-multifaceting-syntax-in.html
For the DISTINCT
part of your question, I think you may be looking for Solr's field collapsing / grouping functions. It will enable you to specify a field you want unique results from, create a group on those unique values and it will show you how many documents are that group.
You can then use the same substr
stored in a separate field, and collapse on that.
Use the StatsComponent with parameter stats.calcdistinct
to get a list of distinct values for a certain field:
Solr 7 https://lucene.apache.org/solr/guide/7_7/the-stats-component.html
Solr 6 https://cwiki.apache.org/confluence/display/solr/The+Stats+Component
It will also give you the count of distinct values.
stats.calcdistinct
is probably available since 4.7.
http://wiki.apache.org/solr/StatsComponent
is outdated as it does not cover stats.calcdistinct
Example
/select?stats=on&stats.field=region&rows=0&stats.calcdistinct=true
"stats":{
"stats_fields":{
"region":{
"min":"GB",
"max":"GB",
"count":20276,
"missing":0,
"distinctValues":["GB"],
"countDistinct":1}}}}
Difference to Facets
In case of facets you need to know the count to request all, or you set the facet.limit to something really high and count the result yourself. Also, you need a string field for making facets work the way you need it here.