python - Data recognition, parsing, filtering, and transformation -- GUI? -
looking non-cloud based open source app doing data transformation; though killer (and mean killer) app built data transformations, might willing spend $1000.
i've looked @ perl, kapow katalyst, pentaho kettle, , more.
perl, python, ruby languages, unable find frameworks/dsls processing data; meaning they're not great development environments, meaning there's no built gui's building regex, input/output (csv, xml, jdbc, rest, etc.), no debugger testing rows , rows of data -- they're not bad either, not i'm looking for, gui built complex data transformations; said, i'd love if gui/app file in scripting language, , not stored in not human readable xml/ascii file.
kapow katalyst made accessing data via http (html, css, rss, javascript, etc.) it's got nice gui transforming unstructured text, that's not core value offering, , way, way expensive. okay job of traversing document namespace paths; guessing it's xpath on back-end, since syntax appears same.
pentaho kettle has nice gui input/output of common data stores, , own take on handling data processing; okay, , has small learning curve. kettle's debugger ok, in data easy see, errors , exceptions not threaded output, , there no way debug issue; meaning can't reload output/error/exception, able view system feedback. said, kettle data transformation _______ well, let's left me feeling must missing something, because puzzled "if it's not possible, write transformation in javascript"; umm, what?
so, suggestions? realize haven't spec'd out transformations, figure if use product data munging, i'd know it; excel, guess.
in general though, i'm looking product that's able handle 1000-100,000 rows 10-100 columns. it'd super cool if profile data sets, feature kettle sort of does, not super well. i'd built in unit testing, meaning i'm able build out control sets of data, , run changes made against control set. i'd able selectively filter out rows , columns build out transformation without altering build; example, run data set through transformation, filter results, , next run sets automatically blocked @ first "logical" occurrence; in turn mean less data "look at" , reduced runtime per each enhanced iteration; crazy nice if i'd filtering out rows/columns app tracking those, (and output filtered out). , unit tested/highlighted changes. if made change effect application logs , it's ability track unit tests based on me "breaking branch" - it'd give me warning, let me dump data stored branch... and/or track primary keys difference in next generation of output, or attempt match them using fuzzy logic. , yes, know pipe dream, hey, figured i'd ask, in case there's out there i've never seen.
feel free comment, i'd happy answer questions, or offer additional info.
Comments
Post a Comment