java - Nutch API advice -

June 15, 2012

i'm working on project need mature crawler work, , i'm evaluating nutch purpose. current needs relatively straightforward: need crawler able save data disk , need able recrawl updated resources of site , skip parts crawled. have experience working nutch code directly in java, not via command line. start simple: create crawler (or similar), minimally configure , start it, nothing fancy. there example this, or resource should looking at? i'm going on nutch documentation, of command line, search , other stuff. how usable nutch crawling module without need index , search? appreciated. thanks.

nutch different have ever practiced probably. because framework not has front query & search, athough solr seems more powerfull native nutch search front end. has crawling part , indexing (into lucene indexe).

if want use crawled other purposes search, need developp own programms , familiar hadoop , mapreduce programming.

not sure want crawling, doesn't nutch solution

Search This Blog

shell

java - Nutch API advice -

Comments

Post a Comment

Popular posts from this blog

400 Bad Request on Apache/PHP AddHandler wrapper -

Add email recipient to all new Trac tickets -

php - Change action and image src url's with jQuery -