Seth Robert's post on Personal Science (especially about "data exhaust" ) got me thinking about big data and the implications for the self-tracking work we do. What evidence is there that big data will infiltrate self-experimenting? Under what conditions will self-tracking move from "small data", or "data poor" (a few hundred or a few thousand data points) to "big data" or "data rich" (terminology from The Coming Data Deluge)? Let me share some thoughts and get yours.
Big data are datasets that grow so large that they become awkward to work with using on-hand database management tools. Difficulties include capture, storage, search, sharing, analytics, and visualizing.
This identifies an important problem. While it is natural to throw all our personal data into one big database, there are costs associated with doing so. I don't mean those associated with capture (clearly we will solve the technical and cultural challenges), but the costs in sensemaking - turning data into actionable wisdom. Let's put the problem into context and assume the future for personal science looks something like this (help me here):
- Many of our personal artifacts will be instrumented to know something about us (find many body-oriented ones in Walter's Health Internet of Things), but the sky's the limit. (For some great examples of work data we might capture, see Gary's comment on my post The Quantified Worker.) The idea is that these things will be smart enough to answer questions needed for our experiments, like "How much water did I drink?", "How active was I today?", or "Did I raise my voice this week?"
- These artifacts will seamlessly transmit data to a central place that each individual owns and has complete control over. Also contributing data are medical professionals and any other person or organization that learns something about us. They will be contractually obligated to share it.
- This data is augmented by self-tracking tricorders that we may wear, which capture other personal data channels like cognitive states and life events.
- From all that data the citizen scientist will periodically reflect and analyze via triggers such as periodic reminders, natural events from experiments (e.g., when a question is answered or an experiment ends), or opportunistic situations such as encountering a problem or having a friend asking how we're doing.
- Finally, the experimenter applies the results by integrating them into new mental models or behaviors, and continues this cycle of thinking up experiments, trying them out, and learning from them.
(Note that these steps are non-linear and are happening in parallel.)
Given this flow, I argue that the hard work is in the final two steps - sensemaking and behavior change. Leaving the latter for now (Ian Ayres on Carrots and Sticks addresses that well), how can we do these effectively when we are collecting a lifetime's worth of data? I don't know, but a few things come to mind including using advanced statistical tools, Visual analytics, and possibly the most important, collaboration. After all, successful researchers know that science works best when collaborating with others. In fact, given this possible future, our relationships with professions may move more in this direction.
What do you think?
-  My reply: Exhaust usually means waste that's a byproduct of production. However, in our case data is the means of self-improvement. It's like a catalyst for making a change in ourselves. Plus, unlike exhaust, it has value after its use. While factories may capture waste products for other uses, they don't treat the waste as intrinsically useful. That's a big difference.
-  Two additional resources you might find helpful are Wired's The End of Theory: The Data Deluge Makes the Scientific Method Obsolete and Nature's Special on Big Data.