A performance question in MySQL -


i’m seeing performance behavior in mysqld don’t understand.

i have table t primary key id , 3 data columns col1, … col4.

the data in 4 tsv files 'col1.tsv', … 'col4.tsv'. procedure use ingest them is:

create table t (   id int not null,   col1 int not null,   col2 int not null,   col3 int not null,   col4 char(12) character set latin1 not null );  load data local infile      # pop 1   'col1.tsv' table t (id, col1);    alter table t add primary key (id);  set global hot_keys.key_buffer_size= # suitable  cache index t in hot_keys;  load index cache t;  drop table if exists tmpt; create table tmpt ( id int not null, val int not null ); load data local infile 'col2.tsv' table tmpt tt; insert t (id, col2)    # pop 2    select tt.id, tt.val tmpt tt   on duplicate key update col2=tt.val;  drop table if exists tmpt; create table tmpt ( id int not null, val int not null ); load data local infile 'col3.tsv' table tmpt tt; insert t (id, col3)    # pop 3   select tt.id, tt.val tmpt tt   on duplicate key update col3=tt.val;  drop table if exists tmpt; create table tmpt ( id int not null,    val char(12) character set latin1 not null ); load data local infile 'col4.tsv' table tmpt tt; insert t (id, col4)    # pop 4   select tt.id, tt.val tmpt tt   on duplicate key update col4=tt.val; 

now here’s performance thing don’t understand. pop 2 , 3 insert … select … on duplicate key update queries run fast mysqld occupying 100% of core , @ other times mysqld bogs down @ 1% cpu reading t.myd, i.e. table t’s myisam data file, @ random offsets.

i’ve had hard time isolating in circumstances fast , in slow have found 1 repeatable case:

in above sequence, pop 2 , 3 slow. if create t without col4 pop 2 , pop 3 fast. why?

and if, after that, add col4 alter table query pop 4 runs fast too.

again, when inserts run slow, mysqld bogged down in file io reading random offsets in table t’s myisam data file. don’t understand why reading file.

mysql server version 5.0.87. os x 10.6.4 on core 2 duo imac.


update

i found (what think is) answer question. mysterious difference between inserts being slow , fast dependent on data.

the clue was: when insert slow, mysqld seeking on average 0.5gb between reads on t.myd. when fast, successive reads have tiny relative offsets.

the confusion arose because of 'col?.tsv' files happen have rows in same order w.r.t. id column while others randomly ordered relative them.

i able drastically reduce overall processing time using sort(1) on tsv files before loading , inserting them.

it's pretty open question... here's speculative, open answer. :)

... when inserts run slow, mysqld bogged down in file io reading random offsets in table t’s myisam data file. don’t understand why reading file.

i can think of 2 possible explanations:

  1. even after knows there primary key collision, has see used in field updated -- if coincidentally destination value already, 0 in case, won't perform update -- i.e. 0 rows affected.
  2. moreover, when update field, believe mysql re-writes whole row disk (if not multiple rows due paging), , not just single field 1 might assume.

but if create t without col4 pop 2 , pop 3 fast. why?

if it's fixed-row size myisam table, looks due datatypes in table, including char field, if it's blank, make table 75% larger on disk (4 bytes per int field = 16 bytes, whereas char(12) add 12 bytes). so, in theory, you'll need read/write 75% more.

does dataset fit in memory? have considered using innodb or memory tables?

addendum

if usable/active/hot dataset goes fitting in memory not fitting in memory, orders of magnitude decrease in performance isn't unheard of. couple reads:

http://jayant7k.blogspot.com/2010/10/foursquare-outage-post-mortem.html

http://www.mysqlperformanceblog.com/2010/11/19/is-there-benefit-from-having-more-memory/


Comments

Popular posts from this blog

asp.net - repeatedly call AddImageUrl(url) to assemble pdf document -

java - Android recognize cell phone with keyboard or not? -

iphone - How would you achieve a LED Scrolling effect? -