Unique customer records vs. email based records

This is one of my original journal entries on Toolbox.com back in December 2006.

I’m currently  looking into a question that seems to be permeating the online ecommerce BtoC community…What do you consider a unique account? And I’m not even talking about the web analytic version of this question, which has interesting debate from…

I’m currently at the forefront of the data militia at our co. looking into a question that seems to be permeating the online ecommerce BtoC community…What do you consider a unique account? And I’m not even talking about the web analytic version of this question, which has interesting debate from 2 camps – those that believe unique visitors (as best defined by a cookie – and at best limited by the constraints of a cookie today) are the best metric, and those that believe that total visitors is the way to go … at the risk of regressing too much on this subject, I point you to Matt Belkin’s blog at Ominiture and Avinash Kaushik’s blog , Occam’s Razor. That subject is another entry altogether…

I’m talking about your customer database. In the online world, at least in my online world, our db is email based – this is important here – each email is unique and, at least until now, has been considered representative of one unique individual. I’d like to state here that at our co., by definition and by law, a new customer is associated with a new email address. We have a (low) few million email addresses, not all of whom (I wish!) were customers. Our ‘customer’ db (a nasty misnomer) consists of everybody who has created an account with us (by providing unique email address – until recently, a person who placed placed an order had to have an account or create an account to checkout…) – but we also have newsletter subscribers and sweepstakes participant email addresses.

But for a long time, we have all had, and used, multiple email addresses to conduct our online business. If you step back, you have to assume this can only be a growing trend – first, as more users go online (the internet matures globally) more users
will have and use more than one email address. Even I don’t like multiple addresses, yet I have 3 and wouldn’t want less than 3.
Second, services like anonymous checkout (where a customer can place an order without providing an email address – offered to reduce the friction point associated with having to provide an email address or create an account on the site), and Google Checkout, dilute my ability to distinguish which email accounts are truly unique.

So these days, an email address might not be sufficient to distinguish a unique customer. If you’re running a recency/frequency model this trend could dilute it. People have been creating duplicate accounts for years – this is not a new thing. But new services as mentioned above, like anonymous checkout and services that might promote anon. checkout like Google Checkout, could accelerate this dilution.

In the data warehouse, this creates an interesting problem, with potential for endless posts. From my angle. I might need to restate all past R/F reports to reflect this new concept of what an unique account really is…again, I find this was an issue long before recent services that promote anon. checkout came on the scene. But it warrants monitoring, and I have created a way to essentially decouple an unique email account from its email address. This topic will definitely get our co. discussing the concept of operational customer id vs. analytic customer id.

For arguments sake, would you be concerned if you found out 5% of the 1MM customer email addresses (those email accounts that have purchased at least once) in your database were ‘dupes’ – or by your new definition – 5% were really the same person? What about 10%? etc. Obviously the answer to this question depends on a lot of things at your organization. I’m particularly aware of this issue since I run a R/F model. Also, ‘customer’ attributes are only as good as your definition of a unique customer and consequently that affects the data you feed into your efforts to model online behaviors and pursue any kind of lifetime value analysis. If I was prepping data for data mining efforts, I would want to equate behaviors across email accounts. If I was preparing a direct mail list of names and addresses (a much more expensive communication channel in our email world) I would save money if I deduped these accounts to more accurately represent the individual.

That’s (more than enough) for now…if you read this (wow!) I welcome your thoughts.

Leave a Reply

 

 

 

You can use these HTML tags

<a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>