by a Thinker, Sailor, Blogger, Irreverent Guy from Madras

Fudging email ID difficult with Big Data and Google Contacts


Remember the days when we used to make up email IDs to use for those junk-emails you had to subscribe - to download something or use a forum/website?  I am talking pre-2003, of days before Disposable Email Address Service like Mailnator.  An email ID like ‘dndmax@hotmail’ or ‘midfngr2u@yahoo’ we regularly used to enter such sites, and never bothered to look in their inbox!  BTW before 2003, you could *not* have had a Gmail ID. ;-)

I hear that such ‘fudged’ email IDs are common even today.  Those working in call centres create an email ID in their professional name, and use it for their ‘official’ communications – for e.g. an ‘anantha.raman’ becomes ‘andy.roman@gmail’.  And everyone, including the fudge, is happy that his/her identity is safe, unknown, split from, and not associated with the actual ID.  All these have changed with the Big Data and the capability of data mining.

Last week, I was looking up the new Google Contacts.  Sure enough, it informed that there are some duplicate entries and would I like to merge them?  Expecting the usual – that is showing up yahoo, hotmail or outlook IDs to be associated with the Gmail contact – I clicked Yes.

I was shocked.  Not only did it displayed the associated, understandable IDs – variation in names such as anantha.raman, a.raman, ananthu.r, etc. – as duplicates, it had data mined and associated the alternate IDs being used by the same person – like andy.roman or andy.r. 

Not a way could I (or even you) have guessed that anantha.raman as andy.roman or any of the other dozen variations.  Really woke me up on the reality of big data and data mining.  Google is already doing this, and Microsoft is fast catching up with much in-built data mining in Windows 10.

Here is a screen shot of 2 such duplicates identified by the Google Big data.  There were several such entries, but I chose these two because they did not have their pictures, and had long original names with a totally different associated IDs.  Such long names and IDs allow for partial redacting, so what I write about can be seen, while maintaining privacy.

So the Big data is here.  Say bye-bye to your Privacy Rights, and install the latest Kohler Numi toilets which can sense when you enter the room, and when you approach it (and maybe tell the world that you are flushing right now, with the internet of things)!

google-contacts-data-mining-duplicates


No comments:

Post a Comment

Support - Donate

Your Blog is

Donate thro ECWID

Contact Form