|
||||||
what tools do you use?This was my first forum discussion from another blog asking the group to discuss what tools they use for data exploration, data visualization, data mining, model building and/or model validating. For the entire thread, follow this link to the forum discussion on the data-mi.ning.com site. I’m happy to say this has generated some great feedback, and it’s the most active discussion on the site with 15 posts! A few of which are mine. To follow the entire contents of the forum conversation please go here. ————– I thought it would be interesting to take an informal poll of the group to see what tools we use to help us do our jobs, especially around data exploration/visualization/mining and model building/validation. I think it would be interesting to also note your expertise / background and what degree(s) you have. I have a Bachelor’s in Engineering in Materials Science & Eng. and about 10 years after I graduated I stumbled into a Business Intelligence Analyst role at an online pure play retailer of designer fashions. I learned my way around a large data warehouse environment (Oracle) using SQL/PL-SQL mainly as a customer/user focused business analyst. I did a lot of work with our marketing dept understanding our database of account holders and customers. I built a customer profitability score that fed our CRM production database which helped our call-center reps make customer related decisions. At the time, however, my notions of statistics and data mining were naive at best. In my current role, at a b2b business directory and web services provider, I’m involved much more heavily on the ‘web analytics’ side of things. This means I have to concern myself with the minutiae of a 3rd party web analytics data collection and reporting tool (popular tools include Google Analytics, Omniture, Web Trends, Coremetrics). I maintain an analytic database and built the ETL from the primary source (our web analytics tool converts log files in a MS SQL Server 2005 environment (including SSAS and SSIS) Much of our analysis (and revenue model) is along our heading structure which contains about 70,000 headings and hundreds of thousands of company listings and links to company websites. There is a challenge to improve our analytic methods. So I am undergoing a self-prescribed crash course on statistics, data mining, mathematics (I have discovered some great resources on Data Mining and thanks to the Web I can view a semester’s worth of MIT Linear Algebra lectures). I am as interested in the educational process as anything else. The business side in today’s web environment needs to fundamentally understand important mathematical/statistical concepts, to think in probabilities, and to understand and utilize technology to make sense (more importantly to get the most value) of today’s data rich environment. I’ve gone on for too long but I wanted to give some context to the question. Currently I’ve been using some of the open source tools I mentioned here and am researching how to incorporate into our analytic process. We currently do our heavier analysis using SPSS. What tools do you use? |
||||||