A Better Place? A Better Place!

I just saw a post by Tim O’Reilly (@timoreilly) on twitter linking to this snippet of Shai Agassi’s TED talk about his “bold plan for electric cars.”  Said is Founder and CEO of A Better Place.  His solution to eliminate the world’s dependence on oil is to provide affordable zero emission vehicles (EV’s), an ubiquitous infrastructure for recharging/replacing them, and over time green energy farms to power the grid.   Despite his tremendous success he could use our help in the U.S, to name one place.  Head over to his community site, Planet Better Place, for more info.

The snippet vid is of the final 2 minutes of his presentation (the full vid is under the second link) where Shai makes his poignant closing remarks.   The entire 18 minutes is also well worth viewing.  I am amazed at how far he has come to fulfilling his amazing vision.  Said is a true agent of change.  Along with an alliance with Renault-Nissan, who have committed to making several affordable models (sports cars, station wagons, etc) of EV’s, he has penetrated several markets:  Israel, Australia, Demark, Canada, Japan, and in the U.S: SF Bay area and Hawaii.   It works better in places where the government is willing to make the cost structure work. In Denmark there is a 180% tax on petrol cars, and a 0% tax on EV’s.

If you’ve been under a rock (like me) and don’t know a lot about him, check out his bio on Wiki.

Hope you enjoy.  Thanks for reading.

“The future whispers, it doesn’t shout”

Paul Saffo on Embracing Uncertainty and Forecasting on FORA.tv

I read a quote on Paul Saffo’s homepage that resonated with me so I wanted to share – it’s from 1994:

“The future belongs to neither the conduit or content players, but those that control the filtering, searching and sense-making tools we will rely on to navigate through the expanses of cyberspace.” (Wired 1994)

Paul is a prominent forecaster and professor at Stanford.  I was recently introduced to his ideas on FORA.tv, a great site for ideas (like TED) and worth a visit. You can see his video here.

Some topics and more quotes from his interview at the Commonwealth Club in San Francisco are mentioned below just to whet your appetite for the 56 minute video.  Some are direct quotes by Paul, some just ideas discussed, in no particular order.   Be sure not to judge before seeing these in context! Be sure to listen to the Q&A session too…

1) “All revolutions have to start with the individual”

2) “Privacy – get over it”

3) most interesting things look like an “S” curve (Moore’s Law)

4) there is a shift toward city-states (away from nation-states)

5) Applied Biology will bring forth the next ‘revolution’.   He provides a great quick history of relevant moments between science and industry.

6) “In an uncertain environment it’s better to move fast than to manage well”

7) Clean-tech is a real trend.  He provides it in the context of other evolutions of Silicon Valley. He voices concern about the lag in response time of the social movement, will public opinion change fast enough?…He thinks it’ll be close but we’ll make it … (me: phew!)

8. Current photovoltaic startups CA have close ties to the German market.

9) Solar panels take as much energy to make as they provide in their lifetime – so currently they are only energy neutral (me: not to mention cost prohibitive and impractical to the end user, argh)

10) On the US reaction to 9/11:  “The US responds really well to clear threats or clear opportunities.   We do really badly with ambiguity, we’re a young culture.”

11) “We have to get humans out of the drivers seat”.

12) Per Sebastian Thrun of the AI lab at Stanford, by 2030 50% of all vehicle miles driven in this country will be driven by robots.

13) “Cash – use it while you can.”  US Goverment is working on embedding RFID chips in currency.

14) “Some of the corporations will literally be armies”

15) “The way to find your way forward these days is to embrace uncertainty…and not try to arbitrarily eliminate uncertainty”

Wolfram Alpha continued

So what do you get when you put ‘What is the ratio of GDP for the US and Japan for the last 20 years?’ into Wolfram Alpha?  (see 5/5 post)

A quick note about the layout of the, um, SERP.  It works for me, but I’d like to use WA more before making a judgement. From what I have seen the resutls page in clean, crisp, the font is large and the data I have seen is layed out intuitively.  They all seem to start with an ‘input interpretation’ section to be clear on how the engine interpreted your request, followed by different sections of results including data, charts, maps, diagrams, mathematical equations.

So back to my original query – the engine needs help disambiguating it, and although it catches important terms and in some cases relationships, it doesn’t quite provide you with the right alternative.  This is what WA suggests:

“Wolfram|Alpha isn’t sure what to do with your input.”

1. “ratio of GDP for US” - and if you could tell me what’s going on there I would appreciate it.  This query is interpreted as an F distribution, and I won’t even try to interpret what its doing just yet. This result does remind you that Wolfram Mathemica provides a lot of juice to WA.

The next 4 alternatives were categorized under ‘Countries’.  You start to understand as you see some of these that there is a certain syntax that works well with the engine for country related queries (and all others, for that matter)

2. “GDP Brasil + US” Notice nothing about ‘ratio’ is captured in that query, and it provides the combined GDP for the 2 countries, and it also provides a long term trended chart (over 20 years of data, which you get regardless of whether you include ‘for 2o years’ in the original query)

3. “Brasil, US” (same as “Brasil US”) Nothing about ratio, but a lot of country related data aligned in 2 columns for easy comparison.  If I scroll down I can calculate the ratio of GDP from the data provided.

The other alternatives under ‘Countries’ are similar and not helpful.

The last alternative (#7)  picks up the statistical concept of a ‘ratio distribution’ and provides quite a bit of mathematical data, again thanks to Mathematica.

So after all this, “GDP Brasil, US” (same as “GDP Brasil  US”) is not provided as an alternative.  It was the only query that provides you with the data to calculate the ratio of GDP of the 2 countries for 20 years.  Still, this would be a bit tedious as you would have to interpolate the values on the chart provided before doing the calculation.

I guess a bit disappointing.  Did I miss the obvious here? But despite this simple example, there are a lot of cool things one can do with WA. The engine definitely has a preferred syntax, so you gotta spend time with it and learn to massage your queries.  It would have been nice to have when I was an engineering student.  But certainly the information it has about statistical concepts should be helpful.  A great reference tool for sure, great math engine, probably.  Not a Google killer and not meant to be, however, it should stir things up in the industry.  This could be one of the most interesting battles we’ve seen between Ph.d’s since the Cold War.  :)

Oh, so where does ratio GDP Brasil US get you?  It gives you alternative GDP Brasil  US which is not provided when you type the query in regular language, but still not the right answer, directly.

Answers by Dominic Pouzin of Data Applied

Thanks again to Dominic of Data Applied.  He originally responded on May 13 (sorry for delay!) on another blog site and gave his o.k. for me to re-post them here:

How long did it take you to create the tool?
About 10 months.

What languages/tools did you use?
C# for the backend, Silverlight for the UI. Java would have been a better choice, it’s more portable. There are ways to run C# code on Linux systems, but it’s not very robust.

Can I connect to my MS SQL database?
Right now, it is necessary to import the data (ex: export database content to CSV -> import that). Perhaps later, this will become easier. When running from the “cloud”, direct access to enterprise SQL databases can be a bit tricky. On the other hand, direct access to rich online sources of business data is very feasible (ex: connect to SalesForce.com over the Internet -> download business data). Once the data has been imported, it (along with analysis results) is stored in SQL.

How did you come up with the very robust visualizations?
I think that’s probably more art than science – inspiration is available to all of us! Technologies such as Flash or Silverlight help too.

What is the largest data set that you have analyzed using your tool?
I routinely analyze 100K+ record data sets. So not terabytes of data, but enough for many small to medium business scenarios, perhaps stretching to large marketing campaigns / individual web logs / individual product orders.

And ok, underneath it all (can you discuss?), are you relying on open source (and very powerful) algorithms (like Weka, R), are you using proprietary algorithms, both?
Just robust implementations of efficient data mining algorithms found in the literature, with some tweaks to increase robusiness (ex: handle a mix of discrete / numeric / missing values), and performance (ex: replace discrete values by hash values). There just are too many issues with using open source software in terms of commercialization.

Data Applied continued…

Turns out this little thing called the Internet led Dominic Pouzin to my prior post about the tool he built , Data Applied, and he graciously answered the questions I posed to him towards the end of that post.  Looking back I probably could have asked some meatier questions, like, for one, do you have a release date planned? :)   But if you dig into his site I think you’ll be excited by what you see.  Visualizations aside, if you’ve been in ‘analytics’ for a while you might get excited about the workflow management (see the Gallery) side of it all, as one example.

Due to a problem with the blog, he wasn’t able to leave his reply here, but he gave his permission to copy his responses here.  I’ll do a direct paste in the post above…

bike maps & trail maps & photographs…

… and maybe someday web analytics. :) Seth Holladay is a friend of mine, a colleague in web analytics, and ex-coworker at www.bluefly.com.  He’s also an avid biker and photographer, and his blog is all the better for it.  I went over there for web analytics and I can’t blame him for spending his time on something that’s more, er well, fun.

His site has links to his photos, some picked up by the AP (right Seth?) and other outlets, that are a voyage within themselves.  Maybe he’d lend me one of his photos to help me spruce up this place…

But beyond all of that one of the greatest resources for NYC area bicyclists is accessible from his personal blog but also at www.nycbikemaps.com.  Seth has personally mapped bike paths in the tri-state area that would keep you busy for a while,using his GPS, and made them accessible via google street view maps, google earth, categorized maps by borough,  Manhattan Waterway Greenway map, North and South County maps, I mean it’s really worth a visit.  Even if you’re not into biking (um, my tires are rotted and my frame is dusty, not to mention my gut) is a really nice example of a web site.  Seth has clearly given it some TLC, kudos.  And he doesn’t need me to blog about it, it’s already had over 250K visits per the site and just celebrated it’s 3rd anniversary.

To digest it all, you can go to his site map.

And did I mention he’s also a web analyst during the day?  That’s a different story, one like mine where somewhere along the way you step in that proverbial shit-storm known as web technology and analytics and you roll with the punches.  His story would be an interesting to one to share, one that starts on the web technology side (managing content and publishing at bluefly) and eventually spawns his nyc bike maps site…

It’s getting late.  If anyone is actually reading these posts :) you’ll know my interest in how one learn’s this ‘stuff’ is not new.  Specifically, I’m talking about those who have fallen into web analytics roles (in it’s broadest sense) over the last 10 years who never went to school for it.  There is the technology side of things and the data side, there are the marketing and media aspects, SEO, SEM, ecommerce.  The breadth is huge.

But it’s getting late…

Wolfram Alpha – computational knowledge engine

Just a quick note since I hear Mr. Wolfram has started promoting Wolfram Alpha (May 2009), his new ‘search engine’, through seminars/webinars in NYC.  One colleague has seen 2 presentations over the Web – it seems the overhead display where the show-and-tell was was either not-in-frame or blurry and illegible.  Hmm…

He calls it a ‘computational knowledge engine’ and by what I’ve read and examples I’ve seen it should take things in the search space’ to the next level.

Imagine asking some HTML page ‘What was the ratio of GDP for the US vs Japan for the last 20 years’ and getting back a trended chart (downloadable with data table too?) :: May 29 no downloadable data but pointers to the data sources which I need to review further.

One colleague was recounting one example, where the engine was able to tell you which brother, given 2 first names, was older based on historical trends (probabilities) in name trends (in the U.S.)

Here is an unreviewed list of links for ‘wolfram alpha’ on @delicious.

Data Applied

I ‘met’ Dominic Pouzin today on Analytic Bridge where he had posted some information about a really cool data exploration tool he recently developed – Data Applied.  I was blown away by the graphics and visualizations I saw, and as importantly the user interface and usability looks unique – to me this looks like a next-generation tool.

Per Dominic it is still in alpha, although by the case studies on his website it seems to have been used extensively already.  He’s not sure when it’ll be released as he’s tempted to keep adding functionality.  Maybe he could consider rolling out what he has now, after all as a web-based service I wouldn’t think it would be too hard to add and upgrade features, and even offered tiered services – if that was presumptious of me I apologize :)

I think today’s data is in dire need of this kind of visualization power.  I think today’s analysts need this kind of tool to help them explore, collaborate, publish, and disseminate ideas (see The Gallery part of the tool).  At the end of the day, adoption of advanced data analysis (again esstential in today’s data rich world) at today’s corporations could be greatly advanced by better visualization techniques.  I only say ‘could’ because so much depends on the culture in which you operate.  How do you build teams and departments around analytic processes?   I think organizations today are challenged to ‘take it to the next level’, and there is a certain gravity to the situation – use your data or die.

Granted I don’t understand yet what drives the algorithms (and I haven’t used the tool).  Nor do I know the price point.  But like fax machines (remember those), the transformational power of something like this will be determined by the number of people using it.  Heck at the right price-point I would even consider a personal license of something seemingly this powerful.  It would almost be career suicide not to leverage it.

As a separate discussion, it would be great to see such tools made widely available to educational institutions, and it would be interesting to see development of educational tools being built around this kind of technology.  How could we make it easier to teach these topics to today’s kids? … sorry i digressed there for a bit.

Below are some questions I would have for Dominic and maybe he’ll answer them at some point.  Right now I just need a place to keep them…Dominic if you somehow see this feel free :)

How long did it take you to create the tool?

What languages/tools did you use?

Can I connect to my MS SQL database?

How did you come up with the very robust visualizations?

What is the largest data set that you have analyzed using your tool?

And ok, underneath it all (can you discuss?), are you relying on open source (and very powerful) algorithms (like Weka, R), are you using proprietary algorithms, both?

Ok, I’m tired.  Good night.

Statistical Modeling: The Two Worlds (Breiman 2001)

This is a link to a paper by the late Leo Breiman that I came across recently and really enjoyed – I  highly recommend it to any analyst.  I wish I had come across years ago as it provides a great framework for understanding the interrelated worlds of statistics and predictive analytics. I’m sure I’ll continue to return to it in the future.

The paper itself is very thorough, a great example of a paper, as it includes comments by 5 colleagues and a final rejoinder by Breiman.

Breiman (1928-2005) was Professor Emeritus at UC Berkeley and a member of the National Academy of Sciences and had a career path that gave him a great perspective on the 2 worlds.

If you need to push through it the first time (especially in today’s bit-sized-twitter-ized world :) I think you will find it worth the effort.

updated list of tools

I decided to keep a list of open source data exploration tools on its own page here.

I updated the list with my latest download:

KNIME|Konstanz Information Miner