Speaking of Freebase and Sunlight Labs…

A colleague of mine mentioned a post by a UC Berkeley professor, Raymond Yee where he illustrates for students of his Mixing and Remixing Information course how he brainstorms and develops ideas for his mash-up projects.

It’s interesting to me that his post covers two topics that I’ve spent time talking about here: Freebase and Sunlight Labs.

Also worth mentioning is Sunlight Labs’ Apps for America mash-up competition.

Doodle.com application contest

Group scheduling tool Doodle.com has announced an app design competition for applications built on their API.

23167v2-max-250x250

The winner gets a week in Zurich.

 

Tyler Hinman maintains winning crossword streak

A Winners

Tyler Hinman was crowned champion for the fifth straight year at the annual American Crossword Puzzle Tournament held this past weekend in Brooklyn, NY. He narrowly eked out a victory over fellow National Puzzlers’ League members Trip Payne and Francis Heaney in a dramatic final match.

Favorite t-shirt series: MS Intern Game 2008

I’m exploring tweaking the rules that I’ve established for my t-shirt series. The funny thing is that I probably don’t even have to mention this, because I think I’m the only person that would even realize that I’m breaking any rules.

When I came up with this idea, I was going to wear a different t-shirt every day for a year, and document each shirt on a site dedicated to the project. I never got around to building out the domain, and I quickly discovered that shooting and editing t-shirt pictures every day was a whole lot of work for a secondary hobby.

So in order to break the logjam, I simplified. I figured I’d debut 1 new shirt a week on Monday and retire 1 old shirt on Thursday. But I never got around to doing the retired shirts, and besides, I kinda like the shirts that I have from before I started this project. So the new rules are that I’m going to document at least one shirt each Monday, new or old, and I reserve the right to do another later in the week.

So without further ado, here’s this week’s classic shirt:

20090302-1

As many of you know, I’m an extreme puzzle fan. I spent this weekend up in Redmond, WA competing in the 12th quasi-annual Microsoft Puzzle Hunt. I like to wear puzzle shirts to Puzzle Hunt, so I wore this shirt.

It’s from last summer’s instance of the Game that some of us puzzle fans write for each summer’s crop of Microsoft interns. Game staff, including beta testers each get a shirt that says “Game Control” on the front. The back documents the locations and clues that we included in the event (click the image above for a bigger version).

Microsoft Puzzle Hunt starts today!

The 12th quasi-annual Microsoft Puzzle Hunt just kicked off. Biggest changes so far are that each team can choose one of two groups to participate in: COMPETITIVE or RECREATIONAL.

Teams who choose to be COMPETITIVE get the experience most like historical hunts:

COMPETITIVE teams will have an experience consistent with past Microsoft Puzzle Hunts. Puzzle Central, however, intends to offer no hints or help to individual COMPETITIVE teams.

On the other hand, teams could choose to have access to unlimited help:

RECREATIONAL teams will have more help available to them- in fact, they’ll have more help than in any past Hunt. A hint database will be available to all RECREATIONAL teams. If a RECREATIONAL team gets stuck on a puzzle, they can request a hint from the automated site. If that hint doesn’t unblock them, they can request another. And another.

Finally, teams have the option to change classes at will, but only in one direction:

Most importantly, any COMPETITIVE team can convert to being a RECREATIONAL team at any time during the event. RECREATIONAL teams, however, cannot become COMPETITIVE.

I’ll provide additional thoughts and analysis here throughout the weekend.

Sunlight Foundation seeks volunteers to open state legislative data

You’ve all heard me say before that every web site benefits from a data API. A corollary to that claim is that every site’s users benefit sunlightlabs_180x180from a data API, and there are few domains where that corollary  applies more than in Government.

The good news is that we lowly citizens have some strong coalitions of fellow lowly citizens advocating for our side.

The Sunlight Labs project over at the Sunlight Foundation just announced that they’re seeking volunteers to help scrape state legislative data to better enable automatic consumption of, and mashups on, government data.

If you have a couple of hours free and are proficient in some parsing library (Beautiful Soup? Perl?), check out the state legislation data project wiki, and consider helping them out with your state.

If you love data and freedom, please spread the word!

Writing to Freebase with (Iron)Python

My mom was in town visiting this weekend, and when I went to demo Freebase for her, I asked her to name a famous person. She suggested Pope John Paul II, so we looked him up. Most of the information that you would expect Freebase to know about him, it did. But there was one glaring omission: it didn’t know that he was Catholic.

Since Freebase knows 263 people who have ever been a professional Pope, I decided that it would take a little too long to update them each by hand. And so I set out to scribble up a few lines of Python to use the Freebase API libs for Python.

(Note for IronPython users: For help with the minor tricky steps of getting the Freebase API libraries working with IronPython, see my write-up over on Stack Overflow.)

Since I intended to write data to Freebase, I needed an authenticated account:

import freebase.api as fb

mss = fb.HTTPMetawebSession("sandbox.freebase.com")
# use be "www.freebase.com" once you've tested against sandbox

mss.username = "user"
mss.password = "password"
mss.login()

Next I needed to get a list of all of known Popes. For that, I used the following MQL Query:

query = [{‘profession’: {‘id’:‘/en/pope’},
          ‘type’:’/people/person’,
          ‘name’: None,
          ‘id’:None,
          ‘limit’:500}]

results = mss.mqlread(query)

This gave me an array of 263 results consisting of the English names and Freebase IDs of everyone who has ever had the profession of Pope. (I could have said ‘profession’: ‘Pope’ as the very first line of the query, but I chose to use the ID to make sure that there isn’t some other kind of Pope that I don’t know about.)

Next came the updates themselves. One safety feature built in to MQL is that you can only update one link per MQL Write, so I had to issue one update for each of the Popes returned from the read above:

for r in results:
    writequery = {
                     'id': r.id,
                     '/people/person/religion':
                          {
                          'connect':'insert',
                          'id':'/en/catholicism'
                          }
                 }
    mss.mqlwrite(writequery)

Each write took slightly under a second, and since I was running this interactively, the mqlwrite() call displayed the update status for each call. For most of the Popes, it had no idea about their religion, and so it replied:

{'/people/person/religion': {'connect': 'inserted', 'id': '/en/catholicism'},
 'id': '/en/pope_sisinnius'}

But for those that it did know:

{'/people/person/religion': {'connect': 'present', 'id': '/en/catholicism'},
 'id': '/en/pope_sisinnius'}

App Engine good for some Real Work[tm] now

You no longer have to live in fear of the day that your application built on Google’s App Engine (GAE) actually becomes successful. Until now, if your application exceeded its daily quotas, your users would simply be turned away.

You had no option to migrate off of GAE because of its completely custom execution environment that is almost impossible to duplicate outside Google, and you had no option to purchase additional resources to keep your application running above and beyond the resources that Google was willing to give to you for free.

Google just announced that this has changed. You can now buy resources such as CPU time, bandwidth, storage, or emails if the popularity of your application causes you to exceed the quotas imposed on freeloaders.

I still have some big concerns about choosing Google App Engine as the hosting environment for a real business application:

  1. Despite attempts at replicating App Engine elsewhere, you still have no good migration path away from GAE if you ever discover a reason to do so.
  2. If you build a startup on App Engine, post-acquisition integration will be difficult unless you were acquired by Google.
  3. There are still built-in limits on your success—notice the “Billing Enabled Quota” numbers that are posted on App Engine’s Quotas page. You have to make a special request to Google to exceed these. Maybe they’ll say yes to every request they ever get. Maybe they won’t.

The bottom line is that the new billing model probably makes Google App Engine good enough now for nearly any casual web services project, but if you’re looking to build a commercial web service that you can scale to the stars, you’d still be better off considering a hosting environment such as Amazon Web Services where you have total control over the environment in which your application runs, and could migrate elsewhere (including your own data center, or that of your acquirer) should the need arise.

Cool application of the New York Times API

@StevenWalling points out a quick write-up on Online Journalism Blog about a really slick real estate mash-up, Suburbified, that has been built using the New York Times Article Search API.

suburbified

To me, the coolest part about this isn’t the mash-up itself—it’s that a blog about jornalism news understands the significance of an old-school media company building a data API to give third parties easy access to their data assets. From Paul Bradshaw’s Online Journalism Blog article:

So, for free*, the NYT now has a new way for people to find its articles, and a new source of traffic for its archives.

*That is, the cost of creating the database and API. Consider it a case of match-funding development costs, with the other half met by people keen to play with your ‘toys’

He’s absolutely right—the New York Times is getting volunteers to build value for them and their readers by simply providing a data API. They’re not having to spend time guessing what might work or testing prototypes—fans of their Data API are doing that for them.

You should provide access to your data, too!

Favorite t-shirt series: Freebase.com

Those of you who read this blog (and my more technical blog API Guy) regularly know that I’m a huge fan of the open collaborative database Freebase.com.

This weekend, I recruited my mom (who is currently visiting from out of state) and my wife to ride along with me on the 6 hour drive (each way) from Portland, OR to Vancouver, BC for a 3 hour Freebase.com meet-up at the Irish Heather Gastropub.

I met a bunch of cool people, including James and Ben from BioVenturist, Dale McGladdery (one of the organizers of Northern Voice), fellow Freebase developer-enthusiast Jim Pick, designer and techno-philosopher Dorian Taylor, and others.

While I was there, the Freebase community director hooked me up with a t-shirt:

IMGP5140

PS—to make 12 hours of total driving for a 3 hour meeting make sense, think of it as a road-trip with home as the destination and a 3 hour tourism break at some interesting point in the middle.