Answers to selected problems: November 2007

Friday, November 23, 2007

Bill Gates on focused design

"The finest pieces of software are those where one individual has a complete sense of exactly how the program works. To have that, you have to really love the program and concentrate on keeping it simple, to an incredible degree."

"We're no longer in the days where everything is super well crafted. But at the heart of the programs that make it to the top, you'll find that the key internal code was done by a few people who really know what they were doing."

Unfortunately for Bill, his company is having a difficult time getting Windows back to something which resembles David Cutler's locomotive of an OS, which was itself birthed in a rather chaotic manner. How disappointing to see his company forget these essential truths in their core products over the years. I think for them to compete with Apple or Google the way they want to, they're going to have to get back to this. Iconoclasm is something they've forgotten how to do well, and they're getting their asses handed to them for it.

Saturday, November 17, 2007

Abstraction Layers, Change Streams, and Silos

Being a developer, I tend to see people organization through the same lens as systems organization.

In a software system, every interface forms an abstraction layer. Every developer working within that system designs APIs, as this helpful video points out. Good APIs have the qualities outlined in that video, that of being:

Easy to learn
Easy to use, even without documentation
Hard to misuse
Easy to read and maintain code that uses it
Sufficiently powerful to satisfy requirements
Easy to evolve
Appropriate to its audience

The first five items on the list are best accomplished by ensuring that the API presents a consistent level of abstraction. All its functionality should address the same set of concepts. We'll pick on PHP's Iterator interface:


next()
prev()
valid()
rewind()

All these functions deal precisely with items in an iterable object. An example of an API that didn't do this might look like the following:


$user->getHeldItems()
$user->holdItem()
$user->syncProfileData()
$user->checkoutItems()

If you've spent more than 10 minutes of your life writing code, that third method should fall with a loud thunk .

Making an API easy to evolve starts with change stream encapsulation. A change in the details of implementation can generally be ignored by higher-level consumers of the API. So over time, as that method evolves, its change stream will be hidden by its interface.

Organizations must also create the same abstractions, because nobody has the time to keep track of every single thing in the organization. Our office manager needs to not care how the DHCP server is configured, she just needs to know she can get to the Ikea site and get us some desks. Anything else is a distraction. For a developer, this is writ quite large, as the tools I require are typically much more complicated than a web browser. Maintaining groupware like Exchange, for example - not trivial. Babysitting a build process is another common one. I need that stuff to work so I can write good code, but I must spend almost none of my time doing it. So teams get created, presenting an organizational communication interface ("OPI"?).

And that's all well and good, but what happens when that boundary lies across a layer of the software? For example, when I rely on a RewriteRule set in Apache by our sysadmins? I have to communicate across a layer of both organizational and application-layer abstraction. And that presents a problem, because any organization-layer abstraction necessarily restricts the clarity of communication. I don't necessarily work with our sysadmin on Apache most of the time, so I don't have a clear idea of how it intertwines. That's a problem when I'm trying to design something that might interact with that subsystem.

So to fight this, organizational abstraction layers need to have the same careful consideration given them as a good API, and meet all 7 of the qualities listed at the top. Everything from process methodologies to groupware have to be evaluated in this way, because to solve the communication problem that is software, the ideas crossing those boundaries should be as fluent as possible.

Monday, November 12, 2007

Interview pet peeves for today

Really know the language that the company you are interviewing for uses. Don't think you can pull it out of your ass cause you worked in it 3 years ago. Languages change a lot. I don't have time to teach you how to deal with Exceptions in PHP 5, because frankly you aren't worth it if you can't do some homework before you come in for an interview. This goes doubly so if the posting says "PHP Developer" right in the title.

I know we're your fifth interview this week, and frankly I don't care. I get 60 minutes to evaluate whether to give you commit access to my code, code I've spent the last year preening over and perfecting. Separately, I have to evaluate whether I want to be stuck in a room with you for 8 hours a day, potentially for the next several years. You better be on your best behavior, and you better want it, and you better be excited, because if you come in here looking like you're just looking for something to do or a buck to make, you're going right back out the door. But if you want to create, if you want to make something beautiful, you'll be on my short list of favorite people, people on whom I'm going to tell my boss to spend a lot of money.

Have a passion that isn't code. It's not a dealbreaker if you don't, but I really like to see people with multi-disciplined lives. Back when you applied for college, there was a box labeled "Extracurricular activities". If you left that box empty back then, you should try and fill it soon, because it's there for a reason. Musicians write more interesting code. They see patterns I don't. Foodies know where all the good restaurants are. Gamers bring in their xboxes. Diversity in a small office is vital, because we're all going to spend more time here than at home, and if all we do is our jobs, we're going to run out of things to talk about real fast.

Sunday, November 11, 2007

Extending Iterator for fun, profit, and hassles

PHP's built-in Iterator class is lovely, as is the SPL's overcomplicated interpretation. But as soon as you want to do something interesting, like say, go to the previous item in an object, both these implementations fall flat. That's right, you can't call $iterator->prev() on ANY SPL Iterator classes, because PHP's built-in Iterator doesn't have it! This seems to me a major oversight, especially considering the array functions that power most Iterator implementation have it right there.

The other thing you can't do is focus an iterator on a particular item in it. This is more understandable, as arrays have no concept of this. Iterable objects should though.

So here at Treemo, we use lots of list objects. Lists of content, users, comments, etc. etc. Lists, obviously, should be iterable, but we'd like to be able to navigate in both directions, and focus on objects we're interested in.. So we cooked up a base class that does all its Iterator management by hand, because the only way to get this functionality cleanly is to do everything yourself. Because nobody should EVER have to do this twice, and because PHP didn't do it for me the way it ought to have, I'm gifting it to you.


/**
 * A modifiable list object
 * 
 * @package Core
 */
class ObjectList implements Iterator
{
  /**
   * The internal array backing for the list
   *
   * @var Array
   */
  protected $items;
  
  /**
   * The current index of the list.
   *
   * @var int
   */
  protected $key;

  /**
   * Create the list from either nothing or an array of existing items.
   *
   * @param Array $items
   */
  public function __construct ($items = null)
  {
    if (! is_null($items))
    {
      $this->items = $items;
    } else
    {
      $this->items = array();
    }
  }

  /**
   * Focus the list on its first element.
   */
  public function rewind ()
  {
    $this->key = 0;
  }

  /**
   * Get the next item in the list.
   *
   * @return Object
   */
  public function next ()
  {
    if (array_key_exists($this->key + 1, $this->items))
    {
      return $this->items[++ $this->key];
    } else
    {
      $this->key = count($this->items);
      return false;
    }
  }

  /**
   * Get the previous item in the list.
   * 
   * @return Object
   */
  public function prev ()
  {
    if ($this->key < 0)
    {
      return false;
    }

    return $this->items[$this->key--];
  }

  /**
   * Focus the list on a particular object.
   *
   * @param object $item the needle
   * @return object or FALSE if the object was not found.
   */
  public function focus ($item)
  {
    $k = array_search($item, $this->items);
    if ($k !== false)
    {
      $this->key = $k;
      return $this->items[$this->key];
    }
    return false;
  }

  /**
   * Focus the list on its last item, and return it.
   *
   * @return object
   */
  public function end ()
  {
    $this->key = count($this->items) - 1;
    return $this->items[$this->key];
  }

  /**
   * Return the item that the list is currently focused on, or FALSE if it is
   * focused beyond the end of the list.
   *
   * @return object
   */
  public function current ()
  {
    return (isset($this->items[$this->key]) ? $this->items[$this->key] : false);
  }

  /**
   * Return the current list index.
   *
   * @return int
   */
  public function key ()
  {
    return $this->key;
  }

  /**
   * Return the status of the lists' index. Is it between 0 and the end of the list?
   *
   * @return bool
   */
  public function valid ()
  {
    return ($this->key >= 0) && ($this->key < count($this->items));
  }

  /**
   * Returns the length of the list.
   *
   * @return int
   */
  public function count ()
  {
    return count($this->items);
  }

  /**
   * Append an object onto the end of the in-memory list.
   *
   * @param object $newObject
   * @return int 1 for success, 0 for failure.
   */
  public function append ($newObject)
  {
    return array_push($this->items, $newObject);
  }

  /**
   * Prepend an object onto the beginning of the in-memory list.
   *
   * @param object $newObject
   * @return int 1 for success, 0 for failure
   */
  public function prepend ($newObject)
  {
    return array_unshift($this->items, $newObject);
  }

  /**
   * Insert an object into the list after $prevObject
   *
   * @param object $newObject
   * @param object $prevObject
   */
  public function insertAfter ($newObject, $prevObject)
  {
    $oldFocus = $this->key();
    
    $this->focus($prevObject);
    $k = $this->key();
    
    $this->items[$k] = $prevObject;
    
    $this->key = $oldFocus;
  }

}

Hope this is useful. Maybe I just miss Java a lot, but it really bugs me that more of this functionality isn't baked into PHP. Anyway, this class gets really interesting when you start extending it in a database-backed way:


class UserFriendList extends ObjectList
{
  /**
   * Objects in this list are friends of this user
   *
   * @var User
   */
  protected $owner;
  
  public function __construct( User $user )
  {
    $this->owner = $user;
    
    $friendIds = $db->query( "SELECT user_id FROM friends WHERE owner_id = $user->user_id");
    foreach( $friendIds as $friend )
    {
      $this->items[] = new User( $friend );
    }
  }
  
  public function prepend( $newObject )
  {
    if( $db->query( "INSERT INTO friends (user_id, owner_id) VALUES ( $newObject->user_id, {$this->owner->user_id})"))
    {
      parent::prepend( $newObject );
    }
  }
}

...and so on and so forth. (Obviously this code is a quick example, don't run database queries like this!)

Good luck, and good coding!

Saturday, November 3, 2007

Cheap event handling

Websites, whether anyone realizes it or not, are basically event-driven. Request comes in, page comes out. Blogs go in, RSS goes out. Nothing changes until someone clicks the link, uploads the blog post, submits the form.

Because pages are the only thing a user consciously requests, websites were traditionally organized on a page-by-page basis. The site was nothing without the pages. But pages make all kinds of decisions about what is going to occur on them. What forms are available, what information is displayed, the groupings of necessary data.

In a large-scale system like Treemo, there is much more than pages to worry about. In fact, presenting the actual web pages represents maybe 30% of the logic in the system. Deciding what to do is the real problem. Is this a request for an HTML page? An XHTML page? An SMS message? An RSS feed? A picture? An account upgrade? Once we've decided on what's being asked for, objects get inflated from the database and something happens. Then what?

Well, all kinds of things. Things happening in the system have side-effects. A user buying a paid account has their quota lifted. A photo being uploaded causes notifications to be sent. A site message being sent causes another user's inbox to have a new message.

The list is very long. Any request- or event-driven system is basically built on these foundations. Something Has Happened, and Important Classes Need To Know. OO GUI frameworks took this to its logical conclusion. Dozens of event classes from AncestorEvent to UndoableEditEvent. Then classes can attach themselves to controls as listeners of these events, and be guaranteed that anytime the event occurs, the class will be notified.

On the web, for some reason, I don't see this paradigm much. One trouble spot for us as a PHP shop is that we're guaranteed that class state is NOT maintained across requests. This is different from a native GUI application, in which all the classes are sitting in memory, so when the event fires, the listening object is already there. Also, the list of listeners is easily maintained as an in-memory data structure. Not so much in PHP. In PHP you have to back your objects with databases, and listener structures are not so simple to generically store.

Of course, if you were really clever, you would use an ORM tool like DB_DataObject, but we aren't quite so fortunate (also DB_DataObject is really slow, and DBDO isn't ready yet). With or without ORM, you still have to have tables, how would that look?

The first time we encountered this problem we had, as usual, far too little time to think about it, and the primary concern we had was that it be asynchronous. At the time, this need was driven by the idea that we needed to perform regular billing events for paid users. Subscription billing is generated by time elapsing, so there needed to be a way to schedule future events.

Unfortunately, we designed a system that was Just Good Enough, and it has lasted all this time. It's fully asynchronous, which creates big problems in a dynamic system. State changes are very rapid, and being sure you placed EVERY piece of data EXACTLY in right the queue is too much detail to think about, and fails silently. Here's how it looks now:


$extra = array(
  'message' => $_POST['message'],
  'from' => $request->viewer->user_id
);
$evt = new event('share', $extra);
$evt->fire();

The fire() method inserts a row in the database containing the type, the extra array, and the timing data (when to process, etc.) A daemon then consumes the rows, and initiates handlers, which look something like this:


class shareHandler extends eventHandler
{
 /**
  * Handle a share event.
  *
  * extraData array:
  * 'to' => Array of Email address or phone numbers. Required.
  * 'from' => The user_id of the sender. Required.
  * 'what' => String, url of the page that is being shared.
  * 'message' => Text of the message. Defaults to _____.tpl
  */
public function handle() {
...
}

So this works fairly well, but the linchpin is that twiddly extraData bit. What happens when the caller fails to place the right data in there? Well, nobody will know it until the event daemon actually picks it off the queue, but that happens minutes or even hours after the request which generated the event occurred.

What we'd like to happen is for the caller to know a lot more about what's going on. If the user sharing content types in an invalid phone number, for example, it would be best if s/he could be notified of it immediately. So one simple way to do that for us is to move the event processing right into the request processing and do away with the daemon.

Much like MDB2, the event processor could be represented as a singleton object that all the classes could send events to - which it technically is now anyway. So, you want to share something? Try it this way:


$evt = new ShareEvent( $_POST['message'], $request->viewer->user_id );
$evtProc = EventProcessor::singleton();
$result = $evtProc->fire( $evt );

This fixes two of the major problems with the existing events. 1) Event parameters can be type-hinted and required in the constructor, because in this scheme events are typed, rather than handlers. 2) Notice how you get a $result back from the event firing? The request processor can actually deal with the results of the event's handling! The event processing itself can even throw exceptions if it needs to! This is much better.

Of course, you might not even need the singleton object, depending on your model. We're going to need it for tracking purposes, but at its core this pattern is no different than a good object model for system events. To do away with the singleton, just have your Event interface expose the fire() method. Or even if you need central points of logic for event processing, have the abstract event() class implement fire() and call parent::fire() in your extenders. This has lots of room for flexibility.

For more on events, and the observer pattern specifically, see the page on The Observer Pattern at ZDZ.

Pages