Showing posts with label data analysis. Show all posts
Showing posts with label data analysis. Show all posts

26 August 2008

Reading List #8

So far the Reading Lists have always been focused on printed media as well as being a thematic triptych. Slight change this time as it's been a slow summer and I'm procrastinating. I've been a long time admirer of good data visualization methods so when you're able to accomplish that through blog feeds and a healthy dash of humor, you've won my heart and mind.

In no particular order:


There you have it. Data geek humor and reading habits. Now, it's time to change into my scatter plot jammers and get to bed.

02 February 2008

New Google Search Features

Google has introduced some new search features as experimental through Google Labs. They've added
  • right- and left-handed search navigation
  • keyboard shortcuts for search results
  • keyword suggestions
  • alternate views for search results



  • I like the left-handed search navigation and was playing with the layout and widgets a bit myself last summer when playing with Google's Web Toolkit. eBay has been doing some very similar UI work in its eBay Playground site that I've enjoyed. Amazon tends to overwhelm me at times with JSON this and AJAX that and they can't seem to resist the urge to package the search results and product descriptions to the extreme. I guess this shouldn't surprise me so much as they are a self-billed department store. Netflix, on the other hand, strikes the right balance with me through their consistent and concise detail drill-down through the AJAX essentials, XMLHttpRequest object and javascript onmouseover() event.

    I don't particularly care about keyboard shortcuts and search results. This is a personal inconsistency however as I don't use keyboard shortcuts in Gmail either but almost always use the keyboard to navigate between applications, tabs and the OS in general. Maybe this is my unverbalized position that I just don't like the way Google implemented keyboard shortcuts. Maybe I'm just inconsistent after all.

    The keyword suggestions have been available as a Google Labs offering called Google Suggest for a while and the search bar in Firefox provides JSON-enabled search term suggestions.

    The alternative search results are a great move forward with regard to search result presentation, specifically addressing the need for better contextual-based and grouped/ordered search results. I've written about this previously and was eager for new search primitives to address this perceived shortcoming or at minimum search options that accomplished the same thing.

    At least I'm not alone in liking the latest search presentation options. Ars Technica described it simply as "awesome".

02 August 2007

Data v. Information

I had fun with a previous post regarding a data visualization tool called many eyes, from IBM alphaWorks Services. There are some nice graphing templates available but pretty graphs simply do not the wonderful experience make. OpenOffice CALC and Microsoft Excel can produce a multitude of graphs in a variety of canned formats but do they really assist in helping one understand the data being presented to them.

Are they capable though, as tools, to transform data into information? The distinction may or may not be a subtle but the implications are huge. We're generally over-run with data and consider so much of it to be throw-away. Information, however - information being data with some sort of context applied to it - one holds onto as long as possible because the context applied to the data, the transform or function applied to some data set, increases the data's value and elevates it to that of information.

Consider a couple of simple examples:

What does this string of data mean, if anything: 011903124555555
  1. Well, it could be a random string of 16 digits and not very interesting (highly likely).
  2. Out-of-country phone dialing number? (yes, US Embassy in Turkey)
  3. Credit card number? (same format for Visa/MasterCard but not a valid number)
  4. USPS/FedEx/UPS/DHL tracking number? (UPS if you drop their "1Z" prefix)
  5. US social security number? (Massachusetts SSN with some cruft appended to the end).
  6. Product SKU (I seem to recall that there are standardized SKU formats)
We just don't know, without any context applied to it. Now, what if we thought about another string of digits in the context of identity theft:

  • 034011234,Last,First,Acct#

Huh...that looks important and maybe should be protected. Maybe it's a person with an account # and MA SSN on-record. The problem though, is that if the suspect data were changed to be:

  • Acct#,034011234,Last,First

It could become meaningless because the transformation changed through simple re-ordering of data elements and the context may no longer be identifiable therefore leaving the data as data. There's a good chance, however, in this specific case that the context could be inferred. What happens if we eliminate the comma delimiters and just spew a line of text in the hope that it will be properly caught and processed?

  • Acct#034011234LastFirst

Here we have an example where Acct# and SSN have been concatenated and probably lose meaning outside of the process that knows to stop reading the Acct# field after X characters and read the next nine characters as the SSN. First and last names can be extremely difficult to distinguish without capitalization and/or localized knowledge of standard names. Michael Smith may mean nothing to a non-English speaker.

So what does this mean from a practical point of view? Without waxing philosophical, from an information security and protection standpoint, it is an extremely compelling reason to give serious consideration to Translucent Databases, which I will post about at a future point in time.

13 July 2007

Fun With Data Visualization





My last post noted Data360.org and the impressive number of data sets they have available. I just stumbled on a new data visualization tool from IBM called many eyes so....

I grabbed an oil reserve data set from the US Energy Information Administration, by way of Data360.org, massaged it a bit in OpenOffice and fed it into many eyes. Click, click, click and you're provided with a pretty slick visualization.

Go. Go and play.

12 July 2007

Fun Site for Data Wonks



Loads of fun. It's like Wikipedia for data and analysis.