Regex speed in Java -
some example wallclock times large number of strings:
.split("[^a-za-z]"); // .44 seconds .split("[^a-za-z]+"); // .47 seconds .split("\\b+"); // 2 seconds
any explanations dramatic increase? can imagine [^a-za-z] pattern being done in processor set of 4 compare operations of 4 happen if true case. \b? have weigh in that?
first, makes no sense split on 1 or more zero-width assertions! java’s regex not clever — , i’m being charitable — sane optimizations.
second, never use \b
in java: messed , out of sync \w
.
for more complete explanation of this, how make work unicode, see this answer.
Comments
Post a Comment