Answers by Dominic Pouzin of Data Applied

| brunocm, on May 26th, 2009
Thanks again to Dominic of Data Applied. He originally responded on May 13 (sorry for delay!) on another blog site and gave his o.k. for me to re-post them here:
How long did it take you to create the tool?
About 10 months.
What languages/tools did you use?
C# for the backend, Silverlight for the UI. Java would have been a better choice, it’s more portable. There are ways to run C# code on Linux systems, but it’s not very robust.
Can I connect to my MS SQL database?
Right now, it is necessary to import the data (ex: export database content to CSV -> import that). Perhaps later, this will become easier. When running from the “cloud”, direct access to enterprise SQL databases can be a bit tricky. On the other hand, direct access to rich online sources of business data is very feasible (ex: connect to SalesForce.com over the Internet -> download business data). Once the data has been imported, it (along with analysis results) is stored in SQL.
How did you come up with the very robust visualizations?
I think that’s probably more art than science – inspiration is available to all of us! Technologies such as Flash or Silverlight help too.
What is the largest data set that you have analyzed using your tool?
I routinely analyze 100K+ record data sets. So not terabytes of data, but enough for many small to medium business scenarios, perhaps stretching to large marketing campaigns / individual web logs / individual product orders.
And ok, underneath it all (can you discuss?), are you relying on open source (and very powerful) algorithms (like Weka, R), are you using proprietary algorithms, both?
Just robust implementations of efficient data mining algorithms found in the literature, with some tweaks to increase robusiness (ex: handle a mix of discrete / numeric / missing values), and performance (ex: replace discrete values by hash values). There just are too many issues with using open source software in terms of commercialization.