php - Zend_Lucene and wilcard operator weirdness -
a quick summary of problem, wildcard operator doesn't seem return result expecting. testing against keyword
field.
here come sample showing issue
include 'zend/loader/autoloader.php'; $autoloader = zend_loader_autoloader::getinstance(); $autoloader->setfallbackautoloader(true); zend_search_lucene_analysis_analyzer::setdefault( new zend_search_lucene_analysis_analyzer_common_utf8_caseinsensitive()); @mkdir('/tmp/test-lucene'); $index = zend_search_lucene::create('/tmp/test-lucene'); $doc = new zend_search_lucene_document(); $doc->addfield(zend_search_lucene_field::keyword('path', 'root/1/2/3')); $doc->addfield(zend_search_lucene_field::unstored('contents', 'the lazy fox jump on dog bla bla bla')); $index->adddocument($doc); $doc = new zend_search_lucene_document(); $doc->addfield(zend_search_lucene_field::keyword('path', 'root/1')); $doc->addfield(zend_search_lucene_field::unstored('contents', 'the lazy fox jump on dog bla bla bla')); $index->adddocument($doc); $doc = new zend_search_lucene_document(); $doc->addfield(zend_search_lucene_field::keyword('path', 'root/3/2/1')); $doc->addfield(zend_search_lucene_field::unstored('contents', 'the lazy fox jump on dog bla bla bla')); $index->adddocument($doc); $doc = new zend_search_lucene_document(); $doc->addfield(zend_search_lucene_field::keyword('path', 'root/3/2/2')); $doc->addfield(zend_search_lucene_field::unstored('contents', 'the lazy fox jump on dog bla bla bla')); $index->adddocument($doc); $hits = $index->find('path:root/3/2*'); foreach($hits $hit){ $doc = $hit->getdocument(); echo $doc->getfieldvalue('path') . php_eol; }
this return whole set of documents instead of last 2 expected
output:
root/1/2/3 root/1 root/3/2/1 root/3/2/2
so here question why lucene (zend_lucene in case) matches first documents, thought keyword
fields not tokenized.
ps: might wants know why running test. have ecommerce website database, category table have path field. example category might have path '/1/2/3' means it's category id 3 , parent category index 2 etc ...
the problem when user full text search , specify category, ideally want return results category children categories, need lucene way of doing path '/1/2%'.
one other possibility merge results sql query , lucene hits, if possible avoid case because performs poorly.
if have ideas, welcomed.
use zend_search_lucene_analysis_analyzer_common_utf8num_caseinsensitive , replace slashes character not occur in paths word character zend_search_lucene. used german ß.
include 'zend/loader/autoloader.php'; $autoloader = zend_loader_autoloader::getinstance(); $autoloader->setfallbackautoloader(true); zend_search_lucene_analysis_analyzer::setdefault( new zend_search_lucene_analysis_analyzer_common_utf8num_caseinsensitive()); @mkdir('/tmp/test-lucene'); $index = zend_search_lucene::create('/tmp/test-lucene'); foreach (array('root/1/2/3', 'root/1', 'root/3/2/1', 'root/3/2/2') $path) { $path = str_replace('/', 'ß', $path); $doc = new zend_search_lucene_document(); $doc->addfield(zend_search_lucene_field::keyword('path', $path)); $index->adddocument($doc); } $hits = $index->find(str_replace('/', 'ß', 'path:root/3/2*')); foreach($hits $hit){ echo str_replace('ß', '/', $hit->getdocument()->getfieldvalue('path')) . php_eol; }
Comments
Post a Comment