book

Q: What is big data?

  • Find FOUR really big datasets. They should be sufficiently different. Cite your sources.
  • Discuss and determine a ranking in terms of "bigness".
  • Fill the template below. Replace all (( )) with your answers.

Rank 1: Twitter

All of the tweets from twitter.com and their associated data

There are 100K tweets generated per second.

Rank 2: Amazon

Data would include sales information on the more than 300M products and behavior information for the 250M active users that Amazon has.

Amazon has the approximately the same number of users as the number of people living in the US. Amazon probably has more information per user than the census, and the census

Rank 3: Census data

The US is manadated to perfom a census every 10 years. Census statistics are also used in order to apportion federal funding for many social and economic programs.

In 2010 the US census counted 308 million people.

Rank 4: Electronic Health records

An electronic health record (EHR), or electronic medical record (EMR), refers to the systematized collection of patient and population electronically-stored health information in a digital format.