Welcome to the IdeaMatt blog!

My rebooted blog on tech, creative ideas, digital citizenship, and life as an experiment.

Wednesday
Apr222015

Smith College Guest Lecture, "How to build a simple production web site in Python"

Last week I had a fun gig giving a one-hour guest lecture for Smith College's freshman Intro Computer Science through Programming class. The topic was "How to build a simple production web site in Python" in which I was invited to share the process and technology that went into building PeepWeather.com . The rationale was that it would give the students a taste of what is possible to create using some relatively straightforward Python frameworks and libraries, hopefully providing some motivation for those wondering what good their current and (necessarily) bounded assignments are.

I had a great time meeting the challenge of putting together a presentation that had breadth and was motivating, but at the same time understandable to the audience (with a the right amount of stretching to keep them engaged). It went well, with some excellent questions during and after the talk. Here I want to share a little about the concepts and technology I covered in case you're considering giving a similar presentation of your own site. For the curious, here's the PowerPoint as a PDF (unfortunately the video did not work out).

Technologies

I named these technologies, describing each pretty briefly, except for diving deeper into Flask:

Concepts

I covered the following at various levels of detail:

The Code

I jumped into the code at the appropriate points, folding as needed so I didn't overwhelm. I stuck to high level basics, highlighting the concepts they currently know including classes and methods. Mainly the focus was on how the concepts got implemented in Python, but no significant code detail; there simply wasn't time. (Note that IntelliJ IDEA's presentation mode worked great for the projector.)

Motivation

I tried to make two points to get the students excited. First, they are programmers and so they have a powerful skill to create web sites! I suggested they pay attention to any thoughts that come up like "Wouldn't it be great if you could __?" or "Why isn't there a site to __?" Second, I pointed out that putting together even a small site is a fantastic portfolio piece that impresses prospective employers. It shows that they can have ideas and, better yet, bring them into reality on their own. I wrapped up by saying that I had a ton of fun writing PeepWeather and learning the various technologies. It's extremely gratifying to bring something into reality (if that's the right word for something as virtual as a web site :-)

Overall I enjoyed both the talk's preparation and presentation, and I hope to give it at the other Five Colleges.

Friday
Mar272015

Loving the wonderful book Code Complete 2

After two false starts [1], I am working my way through Steve McConnell's book Code Complete: A Practical Handbook of Software Construction, Second Edition (links: author, book , amazon). He describes the topic as "an extended description of the programming process for creating classes and routines." Extended is right!

There are very good reasons why it is in the top 10 in software design at Amazon: After 1/2 way through, I've found the book is unique and very well-written. (I am not alone in this assessment [2].)

It is unique because it goes into a level of detail about coding that is specific and really deep. McConnell has given incredibly clear thought into the minutae of programming, and then shared it with fine writing. My boss chucked when I showed him that there are four chapters (table of contents here) on variables, one dedicated solely to naming them. Awesome! I've been writing software for a long time, and it was gratifying to find that McConnell has put into words things learned intuitively, such as Think about efficiency:

... it's usually a waste of effort to work on efficiency at the level of individual routines. The big optimizations come from refining the high-level design, not the individual routines. You generally use micro-optimizations only when the high-level design turns out not to support the system's performance goals, and you won't know that until the whole program is done. Don't waste time scraping for incremental improvements until you know they're needed.

But beyond being reminded of these (and there are many, many of them, for example: "Initialize each variable close to where it's first used"), a big payoff is the new-to-me ones, of which there are many. The book is simply too detailed to summarize the main points, so I'll just share some tidbits that jumped out at me:

  • "Upstream Prerequisites" ("programmers are at the end of the food chain. The architect consumes the requirements, the designer consumes the architecture, and the coder consumes the design.") His point: "the overarching goal of preparation is risk reduction. By far the most common project risks in software development are poor requirements and poor project planning." (Note: I found the StackOverflow question Software design vs. software architecture answered terminology confusion I had about those terms: "I think we should use the following rule to determine when we talk about Design vs. Architecture: If the elements of a software picture you created can be mapped one to one to a programming language syntactical construction, then is Design, if not is Architecture.")
  • Programming IN a language (limit thoughts to constructs the language directly supports) vs. INTO a language (decide what thoughts you want to express then determine how to do so using the language's tools).
  • "Wicked problems" (like software design): can be clearly defined only by solving it or solving part of it.
  • "Design is a sloppy process": is about tradeoffs and priorities; involves restrictions; is nondeterministic; is a heuristic process; is emergent.
  • Software's Primary Technical Imperative: the importance of managing complexity by dividing a system into subsystems. "The goal of all software-design techniques is to break a complicated problem into simple pieces. The more independent they are the better."
  • For the sake of controlling complexity, you should maintain a heavy bias against inheritance (and prefer containment): Inherit when you want the base class to control your interface. Contain when you want to control your interface.
  • Aim for loose coupling (the manner and degree of interdependence between software modules) and strong cohesion (the degree to which the elements of a module belong together).
  • Routine length (he generalizes procedures, functions, and methods as "routines"): Allow them to grow organically up to 100-200 lines (excluding comments & blank lines). (My routines tend to be smaller - I'll have to give this more thought.)
  • Routine parameters: pass <= ~7 of them. More implies too much coupling.
  • Design Practices - Capturing Your Design Work: code documentation, wiki, email summaries, camera (whiteboard) vs. a drawing tool, flip charts, CRC cards (I love 'em), UML diagrams.
  • The Pseudocode Programming Process: It was helpful to see this named and formalized. "Once the pseudocode is written, you build the code around it and the pseudocode turns into programming-language comments. This eliminates most commenting effort. If the pseudocode follows the guidelines, the comments will be complete and meaningful." Cool!

And many more. Bottom line: If you are serious and committed about improving your craft, read the book! How about you? What coding books have helped you be a better programmer?

[1] There were two reasons my first two attempts to read this hefty 900-pager failed. 1) I wasn't fully committed, and 2) I didn't have a concrete plan to read it. I resolved the former when I realized I'd been focusing on my work projects to the exclusion of professional development [3] (other than reading a handful of blog posts a week) and needed to shake things up. The second block was simpler to address - break the book up into manageable chunks and commit to reading one every day. I summarized this generally in Reading Books The GTD Way, but in this case I did a back-of-the-napkin analysis, working backward from an estimated chapter velocity:

900 pages, 35 chapters

  • 1 chapter/hr -> 35 hours
  • 1 hour/workday, 3 workdays/week -> 3 hours/week
  • -> ~12 weeks (~2.8 months)

I've been recording my minutes/page progress, which ranges between 1 and 2 1/2. With an average chapter being about 30 pages, this works out to be about 60 minutes/page. Good estimate! (Actually, I'm readming more chapters than three per week, and at my current rate I will be done at the end of April.)

[2] A helpful review was Matt Grover's blog: Code Complete Review, which pointed me to Clean Code: A Handbook of Agile Software Craftsmanship by Robert C. Martin (Amazon link here). That book looks excellent too, though it apparently has less detail, weighing in at a svelte 500 pages.

[3] I was surprised to find few articles on the Web on the importance of professional development for software engineers, at least via a quick Google search. The main one was 10 Professional-Development Tips for Programmers. It is structured annoyingly as a slide show, meaning you have to click through to see the content (an SEO and advertising gambit?), so I've collected the titles for you:

  • Staying Current Requires Continuous Learning
  • Problem-Solving Skills
  • Communication and People Skills
  • Networking and Personal Branding
  • Code Documentation and Neatness
  • Master Naming Functions
  • Get Familiar With Agile
  • Get Familiar With a Native Mobile Platform
  • Project Management Skills
  • JavaScript, CSS and HTML5 Skills

Searching the two programming-specific sites I like (Hacker News and reddit.com/r/programming - what are your favorites, BTW?) was more fruitful. Here are a few:

Sunday
Mar222015

A few ideas around citizen involvement and government/media oversight

While going through old files, I dug up a few single-paragraph proposals from 2004 I wrote for a research lab I worked for at the time. They did not go forward, but I thought I'd share them with you. When I discovered them, I was surprised because I'd forgotten my interest in these areas has been around since then. Another surprise is that the algorithms and technical approaches to possibly implement some of these ideas are just now emerging from computer science research labs - [1] and [2], for example.

Right now I'm exploring ways to restart this interest, and I hope that posting these archived ideas here might jiggle something loose. Cheers!

References:

1. Increased Corporate Influence of the Federal Government

Specifically, using technology to enable citizens to monitor for abuses and actions that limit its effectiveness. The first example that comes to mind is the commercialization of legislation: Wealthy corporations and individuals who influence laws to their benefit [1]. The result is laws that ignore the greater good (which is usually in the longer term) in favor of shorter term creation of profits. What can our group do?  Let's apply our ideas to the job of examining this cycle of influence. It seems that the data are available and amenable to relational analysis. Is the problem challenging? I'm not sure; it depends on how clear the patterns are.

References:

2. Increased Bias in Consolidated Media Companies

As citizens I believe we are facing challenges to fundamental aspects of our society as a result the the recent creation of large media conglomerates that control a high percentage of mainstream media [1]. The problem here is that corporations are biasing the news in directions that favor profits over journalism. What can we do? One idea is to create a system that can indicate the bias of news stories. I imagine something like Google News that lists major news stories along with a bias indicator. The hope is that people reading their favorite writer on a topic will be able to get a summary of the story's bias. A final idea would be to look for stories that are *not* covered by mainstream media, i.e., an automated version of [2].

References:

3. Reduced Rigor in Major Media News Reports

Another problem related to 2) above is to address the current propensity of some writers to report facts as taken from second sources without investigating the original sources [1]. To solve this we might analyze stories written about a single news event in order to determine what sources they derived their information from. For example, do they all link to a single source, or are sources more diversified? Also consider the recent avalanche of stories (many misinformed) about DARPA's Total (now 'Terrorism') Information Awareness program. Could a software system using our research have been able to determine that a small number of influential stories was propagating around the net with little additional analysis performed by those referring to them? Finally, another idea is to create automated versions of some of the 'fact check' web sites currently available ([2], [3]).

References:

 

Sunday
Mar082015

The Intrinsic Poverty of Thumbs Up/Down Popularity Voting on the Web

I'm tired of the malnutrition of discourse in web sites' comments sections. Pick a controversial topic [1] on a popular site, go to the comments section, and if you're like me, you'll start feeling some emotions. Disbelief, disgust, and maybe a lack of faith in how we've (dis)organized our collective thinking ability. Comments by readers on climate change stories are perhaps the worst of the lot - name calling, propaganda , but perhaps most fundamentally, a dearth of critical thinking skills. You simply see very, very little solid argumentation taking place.

Some examples are The GOP’s climate change skepticism, in one groan-worthy video, Faulted for Avoiding ‘Islamic’ Labels to Describe Terrorism, White House Cites a Strategic Logic, and even my local paper has some doozies, e.g., Tony Robinson shooting: Protestors hit Madison, Wisconsin streets (video) (with gun control at least being a bona fide controversy).

I am just starting to look into the research around "hindrances due to basic human limitations" (check out Table 1 [2] from A Practical Guide To Critical Thinking) and I'm undecided whether it's possible to change minds through technical means (e.g., requiring comment authors to express some kind of well-structured, if simple, argument along with each claim), but it struck me that the thumbs up/down voting feature that's common to most of these sites exacerbates the problem. Let me take a naive look at this, the most meager form of interaction (clicking is literally the least you can do to interact) and maybe re-evaluate the value of popularity's limited usefulness on the web.

I think at its most basic, the thumbs signal expresses one person's opinion of a human artifact (person, place, thing, or utterance, say) where likability, approval, and desire are common interpretations. As a web tool, Thumbs Up/Down Style Ratings says to use it when "A user wants to express a like/dislike (love/hate) type opinion about an object (person, place or thing) they are consuming / reading / experiencing," with the value being:

these ratings, when assessed in aggregate, can quickly give a sense for the community's opinion of a rated object. They may also be helpful for drawing quick qualitative comparisons between like items (this is better than that) but this is of secondary importance with this ratings-type.

From the individual's perspective, I can see a few uses in online discussions:

  • the satisfaction of expressing one's opinion,
  • the ability to get attention from others,
  • being able to see how others voted, which might lead to
  • challenging, or more likely reinforcing, ones opinions and beliefs, and
  • finding others possibly in your tribe (and perhaps more importantly, those not in it)

I suppose that from the site's perspective, the feature's value is engagement; people are excited by the above uses.

Regarding quality of discourse, what is the value of this feature? And how does it enhance or hinder critical thinking? To my mind, its contribution is all negative because opinion does not equate to fact. (However, I think I'm out of sync on this, given the current cultural and political climate where opinion is perhaps valued higher than fact.) Let's simplify the question and phrase it in terms of popularity. What is the value of popularity in online discourse? My thinking is, none. A solid argument is not based on personal feeling, it is based on evidence, support, quality of sources, etc. This is literally (well, should be) grade school material.

It's a problem, and my question is, can we replace the thumbs signal interaction feature with a better one, something that ideally counteracts the echo chamber effect? A nascent thought I had was to keep the voting feature but apply it to portions of explicitly-presented logical arguments instead of the individual unstructured utterances that currently make up comments. Briefly, this would require that comments link to an unbiased and well-structured argument (I list some sites below [3]), and it is there, rather than the comments themselves, that people would vote. However, the difference is that they'd be voting on specific elements of the argument (such as a source's trustworthiness) which we might be able to interpret as a self-disclosed bias, perhaps proudly proclaimed.

Without getting into more detail here (I just wanted to get the thumbs up/down limitations written up here), I wonder if sharing and comparing biases instead of opinions might lead to tools to help bring a tiny fraction of people a tiny bit closer to understanding each other. Instead of "Hey John, check out the asshole liberal comments on this article," would it be useful to hear "Hey John, here's my opinion on this article's argument"?

Again, I realize I'm fighting human nature [2], but what if a computer program could analyze two opposing parties' beliefs and, for example, find an area of the argument they agree on? Or find a slightly diverse group of people whose beliefs are different but close (think of a belief search space) and somehow bring them together.

Naive? Probably, and this post is rough, but I'd love to hear your thoughts on any of it.


[1] I should put "controversial" in quotes for those topics like climate change that are no longer controversial from the scientific - i.e., reality-based - perspective. And I am quite looking forward to watching the Merchants of Doubt documentary (Rotten Tomatoes review and Amazon book links), by the way. Seen it yet?


[2] From A Practical Guide To Critical Thinking: Hindrances Due To Basic Human Limitations:

  • Confirmation Bias & Selective Thinking
  • False Memories & Confabulation
  • Ignorance
  • Perception Limitations
  • Personal Biases & Prejudices
  • Physical & Emotional Hindrances
  • Testimonial Evidence

[3] Argument tools:

Debatewise - where great minds differ

 

Wednesday
Feb182015

Using word clouds to get a birds eye view of a large professional programmer's notebook

(Image couresy of Theen Moy.)

tl;dr: Running a word cloud program on a multi-year professional log is entertaining and useful for a quick understanding of past projects.

Background: My Big-Arse Text File

In 2005 I wrote My Big-Arse Text File - a Poor Man's Wiki+Blog+PIM where I described the simple setup I use for keeping my professional ProgrammersNotebook, something I've been doing for decades. (If you're a programmer and don't keep one, I suggest you experiment with it. You don't have to go old school Emacs like I did - there exist wonderful tools like Evernote.) This practice has been undeniably helpful, especially when combined with using separate outline files for individual projects, and it facilitates my using the journal to:

  • Understand where I've been spending my time, including meetings (I use CamelCase to name people and projects, like GraphxEvaluation),
  • track the lifespan of individual projects,
  • appreciate just how much work I've done,
  • record code snippets,
  • log sites/algorithms/programs/tools that are exciting or that might be useful,
  • capture ideas (of course!),
  • save great quotes [1] I come across,
  • record account information (be careful of security, though), and especially
  • provide a master index to other project files (done simply by naming the file or by inserting a faux hypertext link where I put the relevant text in square brackets, e.g., "see [analyze log of failed graphx 5-step path]")

I find that all of this (including Emacs tools like Occur), combined with Mac OS X's Spotlight, is effective.

The problem: Getting a birds eye view

In preparing for a projects meeting with my boss, I needed to look over the file to get a birds eye view of the last 3 1/2 years. But I wasn't able to get a higher-level perspective of the 2500 entries in 40K lines of text. So I tried out an idea: Would creating a tag cloud on the file help to identify useful patterns? I found the answer is yes; in a just few minutes it can remind you of what you worked on. The only consideration was that a single word cloud for the whole file wasn't useful. I found I had to split the file into chunks to get the right temporal granularity, with 1000-line splits (basically monthly pieces) being about right.

Examples

Here are a few examples, followed by the steps I took to generate them.

#1 - A few main projects

Yep - I've been evaluating GraphX with its Scala API for performance on a essential path-based algorithm the lab needs. (I'll share results in a future post.) Two months earlier you'd see basically the same pattern, but with Giraph and its outline text file as the focus (yes, The Apache Software Foundation is amazing) with Giraph, Vertica, and Impala before that.

#2 - Mixture of projects

This one shows a time that was a little more diverse, including a relational data generator tool and some cluster https://en.wikipedia.org/wiki/Computer_cluster improvements.

#3 - A single giant focus

I really like Postgres, which has been our go-to SQL RDBMS for some time. During this time period I was writing an SQL data store for our "causal database."

#4 - Layout issues

Python development with TDD (it's how I write code - XP rocks). Obviously the layout was skewed by a single very long line - maybe from some program's output log. The text is too small to read, so I couldn't add it to the stop words file below. Unfortunately the program I used doesn't seem to have a line limit feature.

Process

Generating these was straightforward once I found a workable tool, which took about two minutes of searching. I didn't want to use an online one (Wordle is popular) for privacy reasons, but fortunately IBM Word Cloud Generator fit the bill. It's a Java program with reasonable arguments and a config file. I ran split to get the chunks then ran the jar in a Bash for loop, and that's it. All I had to do was create a stop words file to remove some distracting ones ("system", "new", "INFO", etc.)

What do you think? Have you had to examine your log file for projects? Cheers!

[1] A few quotes

"When you're a student, you're judged by how well you answer questions. But in life, you're judged by how good your questions are." The art of entrepreneurship

(Regarding Heinlein's quote, "Specialization is for insects.") "[companies currently] hire people specialized to know some very narrow system. They want them to come running out of the box. [disappointingly] They want to cast them aside when they’re done. [instead] We need people who can think and change and learn what they need to learn." Masterminds of Programming: Conversations with the Creators of Major Programming Languages (Amazon link).

"Then he told me, very tenderly, that it can be dangerous to believe things just because you want them to be true. You can get tricked if you don’t question yourself and others, especially people in a position of authority. He told me that anything that’s truly real can stand up to scrutiny." Lessons of Immortality and Mortality From My Father, Carl Sagan

There are a ton of thought-provoking programming quotes here, such as these (which are not jokes):

  • When in doubt, use brute force.
  • The most effective debugging tool is still careful thought, coupled with judiciously placed print statements.
  • The essence of XML is this: the problem it solves is not hard, and it does not solve the problem well.
  • Nobody should start to undertake a large project .. start small, and think about the details. Don't think about some big picture and fancy design. If it doesn't solve some fairly immediate need, it's almost certainly over-designed.