c# - ETL Processing Design and Performance -


i working on etl process data warehouse using c#, supports both sql server , oracle. during development have been writing stored procedures synchronize data 1 database database. stored procedures code rather ugly because involves dynamic sql. needs build sql strings since have dynamic database name.

my team lead want use c# code etl. have code generation automatic generate new classes when database definition changes. that's why decided not use rhino etl.

here pros , cons:

stored procedure:

pros:

  • fast loading process, handled database
  • easy deployment, no compiling needed

cons

  • poor readability due dynamic sql
  • need maintain both t-sql , pl/sql scripts when database definition changes
  • slow development because no intellisense when writing dynamic sql

c# code:

pros:

  • easier develop etl process because intellisense our generated class
  • easier maintain because of generated class
  • better logging , error handling

cons:

  • slow performance compare stored procedure

i prefer use application code etl process, performance horrible compare stored procedures. in 1 test when tries update 10,000 row. stored procedures took 1 sec, while etl code took 70s. somehow manage reduce overhead, 20% of 70s purely calling update statement application code.

could provide me suggestions or comment on how speed etl process using application code?

my next idea try doing parallel etl process opening multiple database connections , perform update , insert.

thanks

you have code generation automatically generates new classes - why don't have code generation automatically generate new stored procedures?

that should give best of 2 worlds; encapsulate few nice classes can inspect database , update things necessary , can, not increase readability, hide (you not need update sps manually)

also, difference should not huge, sounds if not doing right (reusing connections, moving data unnecessary server application or processing data in smaller batches - row row?).

also, regarding better logging - care elaborate on that? can have logging on database layer, too, or can design sps application layer can still logging.


Comments

Popular posts from this blog

asp.net - repeatedly call AddImageUrl(url) to assemble pdf document -

java - Android recognize cell phone with keyboard or not? -

iphone - How would you achieve a LED Scrolling effect? -