Information provenance - the missing link between attention, RSS feeds, and value-based filtering - The Experiment-Driven Life Blog - Matthew Cornell. Programmer, Research Software Engineer, Think, Try, Learn

Tuesday

Jan232007

Information provenance - the missing link between attention, RSS feeds, and value-based filtering

Tuesday, January 23, 2007 at 1:01AM

The current spate of RSS feed-reading tools is missing a major feature: None of them (Bloglines, Google Reader, NetNewsWire, etc.) provide help with answering the major focus problem, "Which feeds should I pay attention to?" They are great at collection (one of the five GTD workflow phases I teach clients - gathering new feeds, and sorting them by source, date, number of unread, etc.) but that's just creating a bunch of haystacks. They still require us to laboriously look through each to find the needles (i.e., to assess value).

And this leads us to the real problem:

The links from source to value are one-way, with no feedback.

Here's an example. You're reading through your feeds, and find a post that takes you to an interesting idea (The Rule of Least Power, for example). You jot some notes, save the URL, then move on. Repeat. The problem is the feed reader doesn't know that the article did something important to you - gave you an idea, changed your perspective, made you angry, whatever.

It's like loosing the Chain of custody, or not knowing a painting's provenance. Without completing the cycle back to the source, the reader can't filter feeds by importance, leaving it to us to do manually. (For example, see Bob Walsh's 80/20 Your information feeds idea - though determining which 20% is valuable is hard - and Marshall Kirkpatrick's Open Sourcing My TechCrunch Work Flow.)

Naturally there is work going on in this area. Google has its trends feature (click here if you're a Google Reader user - found via Robert Scoble's article 25,000 items read on Google Reader), and NetNewsWire's neat Sorting by Attention addition. But without having note-taking integrated in the information stream (which suggests standards, or an OS-based solution), tools are limited to impoverished metrics (e.g., clicked-on-post, flagged-feed, sent-to-someone, bookmarked-it), and can't do a good job for us.

How would this work? I haven't thought it through (suggestions anyone?), but maybe a cooperative set of browser plug-ins, sort of a "Zotero meets PageAddict, gets married, has kids" mashup.

The bottom line is I want something that knows when an information stream - podcast, web page, blog post, email, or video - passes the scribble test.

Thoughts?

16 Comments |

Email Article |

Reader Comments (16)

I agree with you. Blogiful.com is a site we are creating that will solve alot of the issues about storing and filtering important rss feed issues. If you are interested, I will put up a link to become an early subscriber. in the meantime you can contact me at info at blogiful.com.

January 23, 2007 |

Anonymous

Thanks, Anonymous. Please let me know when you implement some of these features; I didn't see how blogiful is different from, say, http://del.icio.us/ at this point.

January 23, 2007 |

Matthew Cornell

I use [ FeedDemon | http://www.newsgator.com/NGOLProduct.aspx?ProdId=FeedDemon ] for my RSS feeds and it actually has, on it's main "Subscription" page, a list of feeds that get the most and least "attention" from me. Now, I have not dug into what the author considers attention and what is used in determining those metrics but "attention info" seems to be a big push lately.

January 23, 2007 |

Dave O'Hara

Thanks, Dave - interesting. I found the following on-line: [ Feature Requests - Attention formula adjustment request | http://www.newsgator.com/FORUM/Topic21938-15-1.aspx ] :you can build your attention formula from a larger set of variables:

attentionRank
numUnread
numFlagged
numTotal
numVisits
numPostVisits
numPostsEverFlagged
numEnclosureVisits
numPostsAddedToNewsBins
numPostsAddedToWatches
numPostsEverFlagged
numPostsEmailed

There's also [ FeedDemon 1.6: Attention Report | http://nick.typepad.com/blog/2005/10/feeddemon_16_at.html ] and [ Will Google Reader Give You Your Attention Data? | http://nick.typepad.com/blog/2007/01/will_google_rea.html ].

I'd be curious to hear how well this works for you.

January 23, 2007 |

Matthew Cornell

I've played with an idea to rank what's not important. While you read the feeds (I'm assuming everything together, newest first) you would mark stuff that's not interesting or relevant to you, AND stuff that is interesting but for some reason you would rather read later. I find there are lots of feeds that I'd like to think are interesting but I actually never read anything on those feeds.

The sorting could work by "reducing the priority" of non-interesting and read-later feeds, thus making all posts in those feeds appear after higher priority feeds regardless of date.

Not sure if that would work, but I would love to try a prototype of it. :)

January 23, 2007 |

Niko Nyman

Hi Niko - Hmmm. I've been thinking of two types of posts - interesting (i.e., possibly useful), and not - so I wonder how maybe-interesting-later fits in. Thanks!

January 23, 2007 |

Matthew Cornell

I've actually been using some custom software I wrote for myself to get my RSS feeds into a special IMAP email box.

I read that email account with Thunderbird. I've got POPFile as a Bayesian filter trained specifically for that account. It sorts posts into "interesting" and "uninteresting" and moves them into the appropriate folders.

When I'm pressed for time, I just read the "interesting" folder. If that's empty, I move on to the unfiltered leftovers.

As I'm reading, I use the Thunderbird labels to mark those that are not only interesting, but sparked something: a blog post, an idea, a reaction. For those, I just hit "1" and it gets marked in red. For those that were still interesting, but not particularly noteworthy, I hit "2" and it gets marked in orange.

For a while, I was using "3" for posts that I left a comment on and "4" for those that I quoted in a blog post, but that eventually fell away and I pretty much just use 1 & 2 now.

I've got another set of tools I wrote recently that goes through and moves the marked posts around, including publishing a page of the noteworthy stuff*. It archives any older stuff and gets rid of uninteresting stuff that I never bothered reading.

I use this system to read something like 850+ feeds without much of a problem.

I'm thinking seriously about building a new system that would be usable by people other than me that leverages what I've cobbled together for myself, but it's still in the planning stages.

http://www.wynia.org/personal/saved_feed_items/saved_items_highlight.php

January 24, 2007 |

J Wynia

Just an idea. Try a site like http://www.netvibes.com/ which allows you tabs a the top. On the far left keep tabs for whatever categories you know you want to have. After those categories, on the right, keep tabs from 1 to 5, for potentially interesting feeds. Since each tab has a total at the top, it should be easy to see where the action is.

When you first subscribe to a feed, put it in 3. Try it for a week. Interesting items? It gets moved up. A bunch of boring stuff? Move down. Things can move off 1 into your categories, or off 5 into the trash. You can scale it however you like (rate 1-10, start it off with a bias at 8 if you have that many feeds, etc), change your ratings daily or more frequently.

January 24, 2007 |

lydgate

J Wynia, your system sounds really interesting. I'd be curious to hear what attributes your classifier is training on - bag of words, titles, URLs, etc.

I'd like to encourage you to develop the more generalized system. Maybe as a plug-in to popular existing feed readers. I saw that [ Bloglines was hiring | http://www.bloglines.com/about/jobs ]... :-)

Finally, congratulations on your weight loss program - I'm with you.

January 25, 2007 |

Matthew Cornell

Hi lydgate. Thanks for the suggestion - Sounds like a good manual approach to try. Please let me know if you try it and how it works.

January 25, 2007 |

Matthew Cornell

Hello Mathew, I'm a user of Bloglines and just seen the have a brand new forum to receive feature requests. check it out.

wences.-

January 26, 2007 |

Anonymous

Thanks for the pointer, Anonymous - I had missed that. I put up this on the requests forum: [ value-based (or attention-based) filtering | http://www.bloglines.com/forums/read.php?8,233 ]

January 26, 2007 |

Matthew Cornell

Agreed. Let me offer one addition:

Why do the RSS readers insist on listing the blog feeds in alphabetical order?-)

Yes, you can play games with the way you name them, but I've avoided that.

Hmmm... maybe that is a good idea - tag each feed with something in front to group the "scan mostly" together and the "take time to read" by themselves. (and the volume of items is usually inverse to the desire to read;-)

google reader will tell you things - I've got to look into using more of the features that are there...

January 26, 2007 |

Steven

Steven: Bloglines lets you re-order manually. Also, thanks for the suggestion on grouping feeds by type.

January 26, 2007 |

Matthew Cornell

@matthew: I don't know if you've read Paul Graham's paper on Bayesian spam filtering, but POPFile is based on that technique. It actually takes *everything* about an email (or a blog post) into account as a factor in determining the classifications: which feed it's from, the words in the title, body, etc.

The thing about how Bayesian algorithms work is that if you just train them well enough, they are able to spot factors you never would have considered when doing it manually.

For instance in Paul's paper, he discovered that the number one factor in determining his "spam" classification was the presence of "FF0000" as the hex code for red in an HTML email. That wasn't something anyone's spam filters were looking for explicitly, but was a HUGE predictor of what was spam.

So, over time, my filter has gotten to know the keywords, topics, sites and other factors that determine what I find interesting.

The biggest problem with scaling this up and out is that POPFile is geared entirely to a single user. It only knows about *1* idea of "interesting".

I'm initially planning on my software to be installed on people's servers for a few users rather than a single, central portal type web application/service.

As far as working for Bloglines, it's not likely any time soon, for several reasons, not the least of which is that I'm 100% certain they wouldn't pay what it would take to get me to live and work in California.

Thanks on the weight loss. This is actually round 2 of losing what I originally needed to. I am down 60 pounds from the original high and 13 from where I started in December.

January 27, 2007 |

J Wynia

Thanks very much for the details, J Wynia. Neat stuff.

January 27, 2007 |

Matthew Cornell

Post a New Comment

Enter your information below to add a new comment.

My response is on my own website »

Author:

Author Email (optional):

Author URL (optional):

Post:

↓ | ↑

All HTML will be escaped. Hyperlinks will be created for URLs automatically.

Matthew Cornell