java - Nutch API advice -
i'm working on project need mature crawler work, , i'm evaluating nutch purpose. current needs relatively straightforward: need crawler able save data disk , need able recrawl updated resources of site , skip parts crawled. have experience working nutch code directly in java, not via command line. start simple: create crawler (or similar), minimally configure , start it, nothing fancy. there example this, or resource should looking at? i'm going on nutch documentation, of command line, search , other stuff. how usable nutch crawling module without need index , search? appreciated. thanks.
nutch different have ever practiced probably. because framework not has front query & search, athough solr seems more powerfull native nutch search front end. has crawling part , indexing (into lucene indexe).
if want use crawled other purposes search, need developp own programms , familiar hadoop , mapreduce programming.
not sure want crawling, doesn't nutch solution
Comments
Post a Comment