My Big-Arse Text File - a Poor Man's Wiki+Blog+PIM - The Experiment-Driven Life Blog - Matthew Cornell. Programmer, Research Software Engineer, Think, Try, Learn

Sunday

Aug212005

My Big-Arse Text File - a Poor Man's Wiki+Blog+PIM

Sunday, August 21, 2005 at 11:32AM

I was excited to to read this article (found via the unparalleled 43 Folders) describing one user's experiment with using a single (eventually large) text file to organize his stuff. For me the reason it's an interesting read is that I've been using a plain text file for my professional log/diary/journal/notes since Thu Sep 28 10:57:09 EDT 2000. In this post I'd like to talk about how I use the file, in hopes that it will give me some motivation and ideas.

FYI, my current file (see description next) has ~14,000 lines (~0.5MB), and my previous non-wiki file had ~55,000 lines (~1.5MB).

History

I've been using a single file for my professional ProgrammersNotebook since at least 1997. Initially it was a MS Word file (back in my Windows days), but when I moved to Linux I switched over to a simple ASCII file, which I edit in Emacs. The reason I used word was for its outliner - I organized the file by making each day a level 1 entry, and I listed them in reverse chronological order so that I could start at the top when adding the latest entry. The outliner let me structure the file a bit by breaking multi-line activities into separate entities. (Hey - sounds like the GTD principle of making something a project if it requires more than one next action. I'm in trouble - everything has GTD overtones these days...)

I organized the first ASCII file using Emacs' Outline Mode, but organized just like the Word file - reverse chronological, with structure via nesting. Incremental search allowed me to find (but sometimes painfully) items I needed, such as shell script notes, code snippets, and what I had been spending my time on. The problem was that it didn't allow linking and tagging, one of the primary ways I use in structuring information.

So on 2004-07-19 I moved to a second format (still ASCII edited via Emacs), which I described in my post on Photo Blogs, Wikis, and Memories for Life. Briefly, the file has simple entries separated by '----' and a time-stamp at the end. For example:


  ----
  talked w/PersonOne re: Google-style undergraduate programming
  contest. not clear what the topic should be. also talked about future
  fun projects. one possibility: ProxIncrVisualization
  (2005-08-19 12:37:48)
  ----
  continued moving information over to planner. ugh- the undated pages
  are a pain! wrote PersonTwo re: help, or maybe ordering a dated
  set. PaperPlanners
  (2005-08-19 08:32:37)
  ----
  MUS: http://www.sourcewatch.org/index.php?title=SourceWatch
  CitizenshipOversightProject
  (2005-08-19 08:29:25)
  ----
  ...
  ----

The big improvement is linking and tagging via WikiCase (AKA CamelCase or WikiWords). This helps me navigate and find needed information. Of course it opens up another issue, that of consistent tagging. But we'll save that for later. The only other formatting I use in the file is a) I define an entry by placing a WikiWord on the first line by itself, and b) I have some shortcuts for words. The shortcuts are special two- or three-letter words that end with a colon (':') and start a line. My current ones include IN (inbox), MUS (Might Be Useful), IDEA, COOL, and OFF (vacation leave). Finally, URLs are treated specially - I don't mark them up, I just paste them verbatim.

Together these merge (in a very low cost way) some of the good ideas from Wikis, Blogs, and PIM tools, with the simplicity of a text file. (There's a nice discussion of them here.)

Emacs customization

Well, not much really. All I have are keystrokes that create a new time-stamped entry and grab a URL's title. In addition I use the usual Emacs features like 'occur', interactive highlighting, and especially hippie-expand. I'd like to do more, but I just haven't had time.

Isn't this just a cheap RDBMS?

At first glance, yes, it's a just a text-based list of free-form records, which could be stored in a Relational Database System. (Actually, I helped build a new kind of database (Proximity) that directly supports representing semi-structured information like this, but that's another story.) My main reasons for not using a database are:

Easy to set up.
Customizable editors already available (easy to view, merge, format, search, edit, etc.)
Easy to backup.
Easy to write simple external tools to analyze, view, etc.
Supports schema changes.

Analysis and future

All I'll say here is that I use the file in a few basic ways. (See The Design and Long-Term Use of a Personal Electronic Notebook: A Reflective Analysis - AKA A Personal Electronic Notebook, by Thomas Erickson - for a great analysis of a personal journal tool the author built then used.) Mostly I use it to capture ideas, notes, URLs, and work activity like tasks, coding, and email. I've made myself enter every single URL that I come across that I think might even remotely be useful, because many times I've had to spend a LONG TIME trying to find something I've seen before. (Related: Stuff I've Seen and Keeping Found Things Found.)

To do nicer navigation and browsing I wrote a simple Java program (I used Jetty) to load the file's entries into RAM, show them chronologically, allow search, and turn WikiWords and URLs into links. I've used it a bit, but haven't been motivated to do more.

I think there's a great idea for a Journal Construction Kit that supports the emergence of customized specialization (see Jot for a commercial effort in this area). Here's a question: Is a general tool to support this kind of activity possible? Maybe it would be similar to Jetbrain's Meta Programming System, but for information. Related: Chandler, and these two articles by Martin Fowler.

I'd love to hear from others who have created customized journal tools that support these features. I'm not excited by Emacs programming, I'm just trying to get work done. Any thoughts would be appreciated!

35 Comments |

Email Article |

Reader Comments (35)

You really, really, really want to look at [ PlannerMode | http://emacswiki.org/cgi-bin/emacs/PlannerMode ].

August 22, 2005 |

genehack

Or at the very least, you should check out [ RememberMode | http://www.emacswiki.org/cgi-bin/wiki/RememberMode ], which will let you add notes to your outline file from a pop-up buffer. If you use that with [ PlannerMode | http://www.emacswiki.org/cgi-bin/wiki/PlannerMode ] (turn off day pages and plan pages to get your one-page effect), you get awesome automatic hyperlinking. =)

No Emacs programming required. We'd love to tweak it for you on the mailing list at emacs-wiki-discuss AT nongnu.org. =)

August 22, 2005 |

Sacha

Thanks genehack and sacha for the great Emacs pointers. I'll definitely check them out.

matt

August 22, 2005 |

Matthew Cornell

I have been using a single Excel worksheet as a PIM at work for over one year, and I don't think that I could function without it. I recently read GTD for the first time and I have found that many of Excel's built-in features facilitate the GTD structure.

How I use it:

Each item , thought, next action ,waiting, general reference, etc is entered on one line. An Excel worksheet gives you 65,000 lines and I have used about 3,500 lines the first year.

I like to keep the format simple. Left column= date second column = status such as waiting, next action, someday, ... third column= description or headline of the item. this is text and is written to include keywords that can be searched later. this text can be the complete thought or factiod or item , copied text from email or website, or a descriptive reference to a paper document, website, whatever.

Column 4 = reference index this can be a hyperlink to file on harddrive, website,etc or index to a paper file in my file cabinet. I simply label each paper document with a pre-numbered mailing label (document 1,document 2, etc) and date it, stick in the file. I let Excel do the sorting for me.

I have found that Excel has a good text searching function but a very clumsy search interface. As a result I developed two search macros that really make this system work.

1) MessageBox search- enter a keyword (string) , hit go, and the first hit will be displayed in a message box . the way excel works this will display the entire contents of the cell containing the keyword COOL! If this is what I am looking for, I canstop there. done.I Found it ha. If not I hit go again and the next occurence is displayed. and on and on . This process works from the bottom up so the most recent items are displayed first.

2) ad-hoc report created in a new worksheet. Enter a keyword, hit go, and each ROW containing the keyword will be pasted onto the report sheet complete with formating and hyperlinks. A search engine type message is included like "124 results for Robert, 1 second" Now this report can be used as the source for another search : the next report will read "4 results for Smith, 0 seconds within 124 results for Robert, 1 second".

So I have nested reports.

This could easily be used by anybody that has ever used Excel at all. it requires zero programming knowledge to use, and I employ my 11 year old daughter as my beta tester when I add a new feature. If she can use it , its good for me.

Tom

August 22, 2005 |

Tom

Great comment, Tom. Thanks a bunch. It seems to cover lots of bases, and I definitely hadn't thought about using a spreadsheet for this.

matt

August 23, 2005 |

Matthew Cornell

Tom,
With excel spreadsheets, how do you deal with hierarchy, e.g. "b and c under a", and with ordering, "b always before c"

or is that just not a concern for how you use it?

Matt,
Also, for the one big text file, how do you address evolution and accretion and reorganization of your info over time? Do you edit in place? do you create a new entry with all of the edits? do you remove previous entries? Do you just wind up searching through little bits to bring together a big picture?

I am of the small number of medium-sized files (emacs planner-mode, by the way) school of thought.

August 27, 2005 |

Case Larsen

Hi Case,

Thanks for your questions. Answers are below.

> evolution, accretion, reorganization over time?
> edit in place? create a new entry with all of the edits? remove previous entries?

An excellent question. First, I don't worry about the document's (i.e., items') history; I just try to be careful not to do too much editing (including removal) of previous content. Maintaining the wiki-style comments takes discipline. For example, the *second* time I refer to something, I try to go back to the original entry, add a (new) WikiWord, go to the new comment, and insert a 'link' to the new WikiWord. I try not to add to existing comments unless it's to correct something. Otherwise, I prefer to add a new entry with a WikiWord link to the previous one(s). I don't do any inter-entry reorganization, but I sometimes edit existing ones to add a bit of structure, usually as a list or hierarchy using '*' or 'o' chars and indentation. As I use the file I sometimes add new semantics, such as 'tags' (macros, really, though I never expand them) like 'MUS:'.

> Do you just wind up searching through little bits to bring together a big picture?

Exactly. I think Ted Nelson calls such pulling together [ Transclusion | http://en.wikipedia.org/wiki/Transclusion ]. I do it either within Emacs via 'occur' or incremental search, or via the external Java program. (I do the latter rarely because a) I apparently don't need it, and b) I'm too lazy to continue work on the program.)

> small number of medium-sized files (emacs planner-mode)

Interesting! Is this (i.e., the small number of files) the default planner-mode behavior, or is it something you worked up? I'd be curious to hear how you use your system. I really have to check planner-mode out. I see a lot of VIM love out there too.

Thanks again for the comments.

matt

August 27, 2005 |

Matthew Cornell

Hello Case,

I think that setting up a filing hierarchy on the front end has gone out the window for me. When I want to retrieve a file artifact or event, it is often in a different context than when I originally filed it and I have to try to remember how I might have filed way back when.

I have found that the keyword searches allow me to find something from many different directions. I have all this data on a single level in this file and I use my report tools to navigate this file. Excel gives me a structure to work within kind of like a kid playing on monkeybars. Its not the bars but what you do with them. Excel also provides lots of built in tools like vba, hyperlinks, colors, formating, cell comments, insert pictures, fonts, spellcheck and somebody once told me that it can do some math too:-)

A friend that works for an evil multi-national corporation that controls programs on company pc's liked the Excel format because it has become ubiquitous and would go unnoticed by the pc cops.

I think that our language is full of three letter acronyms for a reason- you can describe nearly any pre-defined notion in three words. Jolly Green Giant for example. I wouldn't be sure how to file this guy within a hierarchy, but I could find him quickly with two or three keyword searches and the order of the searches does not even matter.

So I hope that answers your question.

BTW ditto what Matt said about editing previous comments, I also do this to correct mistakes, and add new information using the keyword "update" and appending the original text. Or new comments can be linked to previous comments for reference.

Something I forget to mention on my first post--I have set up special reports for Next Action, Waiting and Someday Reports with toolbar buttons. I am never more than one mouse click away from any of these reports. Got to leave the office or be away from the pc for a while? Print these off and go.

Regards,

Tom

August 27, 2005 |

Tom

Why not use an outliner?

October 16, 2005 |

engr

I implemented my own editor which I can now access from anywhere, just given a web browser:

http://www.oribasan.de

The tool could do a lot more but this version kind of does it for me. I'm planning on making it available via some mobile application but haven't had time yet.

November 11, 2005 |

Anonymous

Great idea, anonymous. Implementing this as a server makes a lot of sense. I'd even push it further by turning it into a AJAX client in order to support the kinds of Wiki features I think are important - completion of WikiWords, following links, etc. Neat; thanks for posting!

November 12, 2005 |

Matthew Cornell

Dude, check out [ TiddlyWiki | http://www.tiddlywiki.com/ ] -- no server required, just throw the files onto a USB keychain drive and you have your notebook whereever you go (and can get to a computer).

December 22, 2005 |

Anonymous

Thanks very much for the TiddlyWiki pointer, Anonymous. Looks neat! It's on my list...

December 22, 2005 |

Matthew Cornell

Hi Matthew,

Did you give a try to Microsoft OneNote? I think it's worth it.

March 16, 2006 |

Lugo

Hi Lugo,

Yes, I've tried OneNote (I was a beta tester and an early TabletPC owner),
but I found it disappointing. Then again, I have some basic features I
want in a PIM, and OneNote has very few of them ( more about PIM ideas [ here | http://www.matthewcornell.org/blog/2006/03/wheres-ide-integrated-development.html ] ).

Please tell me: What did you like about it?

March 16, 2006 |

Matthew Cornell

hi matt,

neat post. what's missing from all of these GTD and PIM discussions is the role of synthesis. these tools are just a means towards that end -- but the information scraps have little value outside of it. a single, well-written essay is worth more than a thousand casual notes.

October 29, 2006 |

MikeD

Hi Mike. Thanks for reading, and for your comment. I totally agree - I think that's why it's hard to tag appropriately: We have to imagine how we'll use something when entering it (i.e., up front) when the use downstream may be very different. Also, I think that was one of the original complaints about hypertext: The whole is more than the sum of the parts. That said, I tend to *think* of things in pieces, then do the hard work of synthesis when I'm putting things together.

Any more thoughts on it would be welcome!

October 29, 2006 |

Matthew Cornell

Check out Emacs Org mode, it sounds like what you want.
http://staff.science.uva.nl/~dominik/Tools/org/

November 8, 2006 |

Anonymous

Thanks for the pointer, Anonymous. org mode looks neat.

November 8, 2006 |

Matthew Cornell

I second the org-mode rec. A bit like planner-mode but based on outliner rather than a wiki (I much prefer the outliner approach). Also has cool features like a built-in spreadsheet (!)

February 3, 2007 |

Anonymous

Thanks for the tip, Anonymous.

February 3, 2007 |

Matthew Cornell

Any new developments in your use of one big text file as a wiki+blog+pim? I'm considering trying it since I can't seem to find a tool I like and I prefer a text editor and regex. Thanks.

June 1, 2007 |

Clint Laskowski

Hi Clint. So far I've not converted to anything else. I did have fun playing with [ stikkit | http://www.stikkit.com/ ], which has some nice auto-connect and auto-recognize features. Cool. However, I'm comfortable with emacs. You might want to look at the emacs add-ons mentioned by others in the comments. They might save you some work...

Thanks for reading, and for your comment.

June 2, 2007 |

Matthew Cornell

I'm now on Org for Emacs and lovin' it. =)

November 29, 2007 |

Sacha Chua

I've got to look at it, Sacha. Now that I have a system that works, I'm loathe to change... Thanks for the tip.

November 29, 2007 |

Matthew Cornell

Post a New Comment

Enter your information below to add a new comment.

My response is on my own website »

Author:

Author Email (optional):

Author URL (optional):

Post:

↓ | ↑

All HTML will be escaped. Hyperlinks will be created for URLs automatically.

Matthew Cornell