python - How to get started with Big Data Analysis -


i've been long time user of r , have started working python. using conventional rdbms systems data warehousing, , r/python number-crunching, feel need hands dirty big data analysis.

i'd know how started big data crunching. - how start simple map/reduce , use of hadoop

  • how can leverage skills in r , python started big data analysis. using python disco project example.
  • using rhipe package , finding toy datasets , problem areas.
  • finding right information allow me decide if need move nosql rdbms type databases

all in all, i'd know how start small , gradually build skills , know-how in big data analysis.

thank suggestions , recommendations. apologize generic nature of query, i'm looking gain more perspective regarding topic.

  • harsh

using python disco project example.

good. play that.

using rhipe package , finding toy datasets , problem areas.

fine. play that, too.

don't sweat finding "big" datasets. small datasets present interesting problems. indeed, dataset starting-off point.

i once built small star-schema analyze $60m budget of organization. source data in spreadsheets, , incomprehensible. unloaded star schema , wrote several analytical programs in python create simplified reports of relevant numbers.

finding right information allow me decide if need move nosql rdbms type databases

this easy.

first, book on data warehousing (ralph kimball's data warehouse toolkit) example.

second, study "star schema" -- particularly variants , special cases kimball explains (in depth)

third, realize following: sql updates , transactions.

when doing "analytical" processing (big or small) there's no update of kind. sql (and related normalization) don't matter more.

kimball's point (and others, too) of data warehouse not in sql, it's in simple flat files. data mart (for ad-hoc, slice-and-dice analysis) may in relational database permit easy, flexible processing sql.

so "decision" trivial. if it's transactional ("oltp") must in relational or oo db. if it's analytical ("olap") doesn't require sql except slice-and-dice analytics; , db loaded official files needed.


Comments

Popular posts from this blog

asp.net - repeatedly call AddImageUrl(url) to assemble pdf document -

java - Android recognize cell phone with keyboard or not? -

iphone - How would you achieve a LED Scrolling effect? -