Human-comparison sort in Python

I recently had a need to sort a large list of items (I admit freely that this list comprised episodes of Battlestar Galactica) by a subjective criterion (my liking). This turned into a minor interface problem; how best to sort these without a) a Netflix queue interface or b) dragging Excel rows around? Also, how could I implement an efficient sorting algorithm, rather than waste time with the simplest bubble sort?

The solution I came up with was almost too simple to feel proud of. Python uses an adaptive mergesort by default, which was plenty good enough for my purposes. All I had to do, then, was have it ask a human to compare each pair of elements.


QUESTION_TEMPLATE = """
1. %s or
2. %s or
3. neither? [neither] """

def human_cmp(a, b):
answer = raw_input(
QUESTION_TEMPLATE % (a, b))
if answer=="1": return -1
elif answer=="2": return 1
else: return 0

EPISODES = open('bsg_episodes.txt').readlines()
EPISODES.sort(human_cmp)
for i, title in EPISODES:
print '%s. %s' % (i, title)

So it just uses the built-in algorithm and asks you for each comparison. Let me tell you, way better than creating a single-elimination tournament bracket in Excel.

Read More...

Upgrade bash to 4.0 in Mac OS X

bash 4.0 was released last month, and I'm only now getting around to messing with it. So far the things I use most are the '**' recursive globbing and the 'autocd' option. Here's how I upgraded my system and a few ways I use the new features.

  1. Download, build and install. Get the source tarball:
    $ wget ftp://ftp.cwru.edu/pub/bash/bash-4.0.tar.gz

    Build and install:
    $ tar xzf bash-4.0.tar.gz
    $ cd bash-4.0
    $ ./configure && make && sudo make install

    On my system (Leopard + Xcode) I had no trouble compiling. The binary is installed at /usr/local/bin/bash. You can also install via MacPorts or Fink, I imagine, but I try to do these things myself when possible, if only for the sake of transparency.

  2. Configure to use the new shell. First, register the new binary as a valid shell:
    $ sudo bash -c "echo /usr/local/bin/bash >> /private/etc/shells"

    Then change your user to use it as its shell with chsh (this is blatantly obvious, but modify the "Shell" line to point to /usr/local/bin/bash).
    Now open a new shell (restarting Terminal.app will do, or just Cmd-N for a new window) and make sure your changes took:
    [ian@iansmbp] ~/> echo $SHELL
    /usr/local/bin/bash
    [ian@iansmbp] ~/> echo $BASH_VERSION
    4.0.0(1)-release

  3. Enable the new features. There's a bunch of features in the 4.0 release that I haven't gone through yet, including coproc, improved programmable completion, a new &>> redirect operator (synonym of the old >>myfile 2>&1 pattern) and new case-insensitive expansion options. You can peruse a list of the new stuff at your leisure. For immediate gratification, however, just turn on recursive globbing and autocd:
    $ echo "shopt -s globstar autocd" >> ~/.profile
    $ source ~/.profile

    ** matches contents recursively. For example, where you might previously have recursively removed all your byte-compiled Python modules with:
    $ find . -name \*.pyc | xargs rm -f

    You can now simply:
    $ rm -f **/*.pyc

    If you've got a deep directory structure, and you want to get into a subdirectory whose name you know, you can replace:
    $ cd path/to/the/directory/i/want/named/mydir

    with:
    $ cd **/mydir

    Hey, and if those extra three characters at the beginning of that last command are too much for you, then you'll love the autocd option, which, when enabled, permits you to cd to a directory merely by typing its name:
    [ian@iansmbp] ~/src/zenoss/core/Products/> **/yui
    cd ZenWidgets/skins/zenui/yui
    [ian@iansmbp] ~/src/zenoss/core/Products/ZenWidgets/skins/zenui/yui/>


More on new features as warranted; these two, however, are those of zsh that I missed most.

Read More...

How Joseph Method Stole My Day

My friend Method shared this with me via Google Reader this morning, accompanied by the provocative message, "Here you go, Ian." The problem, in short, was to discover the first ten-digit prime in consecutive digits of e. And a day of planned work-catching-up dissolved before it began.

I had a solution in twenty minutes or so, using third-party libraries pycrypto (for isPrime) and mpmath (for e), but that made the whole enterprise trivial and didn't quite seem cricket. Also, it was slightly inefficient, because, being limited by the code of others, I had to calculate e to n digits first, and then check that for primes; I figured if I could calculate digits of e lazily I'd get an extremely minor but nonetheless pride-inducing speed increase.

To Wikipedia! where I learned scads about primality tests (luckily I didn't read the page on the constant e until later, or I would have learned that this was a common puzzle that pretty much everybody tries to solve. Also I would have learned the answer). I remembered how to calculate e using the Taylor series 1 + 1/1! + 1/2! + 1/3!..., so that wasn't a problem—until I ran into precision woes.

That led me down a dark path whereon I attempted to program arbitrary-precision arithmetic from scratch, and learned all about mantissas and bitwise operators over the course of five hours. I got pretty far with that before I realized that I was spending my entire day trying to comprehend what I imagine amounts to at least a month of college-level CS. I went to school for other things, like French poetry.

Also at this point I remembered the existence of Python's own decimal floating-point arithmetic package, decimal, which had apparently dropped completely out of my brain exactly when I needed it. Thankfully, it resurfaced before I had to give up on this psychotic enterprise altogether.

After that it was easy. I post the code below not because I believe it is in any way notable—I'm sure it's old hat to those who went to school for this—but because I just want someone to know that I actually did something today. Thanks, Method.


from random import randint
from operator import mul
from collections import deque
from decimal import Decimal, getcontext

# Allow enough precision to get an answer
getcontext().prec = 1000

def factorial(x):
if x==1: return x
return reduce(mul, xrange(2, x+1))

def taylor_series():
"""
e == 1/1! + 1/2! + 1/3! ...
"""
s = Decimal(1)
_e = Decimal(1)
while True:
_e += (Decimal(1)/factorial(s))
yield _e
s += 1

def digits_of_e():
"""
Yield digits of e lazily by noticing when
they stop changing as the Taylor series
converges.
"""
idx = 2 # skip '2.'
last = None
for iteration in taylor_series():
i = str(iteration)
try:
cur = i[idx]
except IndexError:
continue
if cur==last:
idx += 1
yield cur
last = i[idx]

def ten_digit_chunks():
"""
Collect digits of e as they are discovered
and yield them in ten-digit chunks.
"""
l = deque()
for digit in digits_of_e():
l.append(digit)
# Keep the stack at 10 by popping
# from the left
while len(l)>10:
l.popleft()
if len(l)==10:
yield ''.join(l)

def is_prime(n, k=20):
"""
Simple implementation of the Miller-Rabin
primality test. Thanks, Wikipedia!

n is the integer to be tested; k is the
precision (number of random possible
witnesses to test against)
"""
# Find s and d such that (2**s)*d==n
d, s = n-1, 0
while not d & 1: # d & 1 == d % 2, only faster
d >>= 1 # d >> 1 == d / 2, only faster
s += 1
# Test for k random numbers
for i in xrange(k):
_c = False
a = randint(2, n-2)
x = pow(a, d, n) # a**d % n
if x==1 or x==n-1:
continue
for r in xrange(1, s):
x = pow(x, 2, n) # x**2 % n
if x==1:
return False
elif x==n-1:
_c = True
break
if _c: continue
return False
return True

def main():
for chunk in ten_digit_chunks():
print "Trying %s..." % chunk
if is_prime(long(chunk)):
print "="*20
print
print "Found! First ten-digit"
print "prime in consecutive digits"
print "of e is %s!" % chunk
print
print "="*20
break

if __name__=="__main__":
main()


UPDATE: In order to get through this ridiculous set of problems to the unsatisfying and insulting end, you can use this implementation of the Brent-Salamin algorithm to find primes in digits of pi, replacing taylor_series in the digits_of_e function in the code above:

def brent_salamin():
"""
A series whose sum converges on pi
incredibly quickly.
"""
a = Decimal("1.")
b = Decimal("1.")/(Decimal("2.").sqrt())
t = Decimal("1.")/Decimal("4.")
p = Decimal("1.")
while True:
a1 = Decimal(a + b)/Decimal("2.")
b1 = Decimal(a*b).sqrt()
t1 = t - p*((a - a1)**2)
p1 = 2*p
yield ((a1+b1)**2)/(t1*Decimal("4."))
a, b, t, p = a1, b1, t1, p1

Read More...

How I Use OmniFocus

OmniFocus is a first-rate task manager. It's so flexible and powerful, however, that coming up with a workflow and a set of contexts that efficiently aid the management of one's tasks can be difficult. I've fiddled with my setup for several months, completely overhauling my contexts three times--should they be time-based (Today/Tomorrow/Soon/Next Week), resource-based (Laptop/Grocery store/Printer), state-based (Home/Work/Errands/Spare time)? I've tried out several perspectives, seeing which fit into my workflow and which are disruptive or merely unhelpful. And I've experimented with the various levels of containment--folders, projects, action groups--in order to determine where they mesh with the natural hierarchy of my own tasks.

Most helpfully, I've perused the OmniFocus forums, hoping to glean insight from the practices of other users. While much of that time was spent navigating around outbursts from orthodox GTDists, I managed to find and appropriate several ideas. In an effort to ease others' OmniFocus learning curve, here's how I have things set up.

Contexts

My first act upon installing OF was to set up a large, complete hierarchy of contexts describing many aspects of my life. It didn't take me long to realize this in no way helped me. My difficulty lay in not having a clear idea of the identity of contexts in general. Nearly everything for which I need OmniFocus involves my job, which is basically a single large context, as I'm a programmer on one project on which I work from home. Having a single context seemed like a misuse to me, but artificial distinctions between contexts was equally ridiculous. For example, there's no reason to have a "Laptop:Photoshop" context, since there's no situation in which I'll have my laptop but /not/ Photoshop.

Eventually I decided to drop all my contexts and allowed them to be created when needed, to try to figure out the contexts in which I naturally think of tasks. This led to a pretty good set -- only a few, but each had real meaning. They ended up describing a mix of states and resources:

  • Errands
    • Grocery Store

    • Home Depot

    • ...

  • Exercise

  • Computer

    • Blog

    • Zenoss

    • Internet

    • Printer

  • People

    • Gillian

    • Jason

    • ...

  • Home

  • Waiting

    • Code Review

    • Chad

    • Gillian

    • ...

  • Spare Time


These are fairly run-of-the-mill, except the "Waiting" context. I set the status of that context to "On Hold", which means that when I apply it to an action, the project is similarly on hold. It took me some time to realize that I could change the context of an action as it moved through various states; there's no need to create actions, for example, like "Send email about X", "Read response about X," and "Apply response to X", merely to use the contexts "Email", "Waiting" and "Computer"--instead, just have "X" as a task, and change its context to match the state it's in. But don't go overboard; use as few contexts as necessary. I generally just move things in and out of "Waiting" as needed, which lets me know if I can work on it at the moment or not.

Due Dates

Due dates are apparently Not Used in GTD unless a task actually has a hard date by which it must be completed. I shied away from them for that reason, attempting to use contexts, flags, then folders to indicate relative priority of tasks. Eventually I gave that up, since it seemed silly to ignore a perfectly good mechanism for recording when I planned to work on a task merely because of its connotation. Now I use due dates to indicate the time I intend to have finished a project or action; this allows me to set up perspectives sorted by due date. The thing at the top of the list is what I work on next. If priorities shift, I alter due dates accordingly.


Perspectives

Note: I've left off Duration and Flag filters from these perspectives, because I don't currently have a use for them; all perspectives use the default.

Work
View Mode: Planning
Focus: Work
Filter: Remaining
Grouping: Due
Sorting: Due
Action Filter: Available


My projects for work, grouped by due date, showing available actions only--thus, if a project is stalled by the "Waiting" context, its actions won't be visible (though the project will). Answers the question: "What will I be working on today/tomorrow/this week?"

Now
View Mode: Context
Focus: Work
Filter: Active
Grouping: Ungrouped
Sorting: Due
Action Filter: Next Action


A flat list of available next actions, sorted by due date. Answers the question: "What should I do next?"

Completed
View Mode: Context
Focus: Work
Filter: All Contexts
Grouping: Completed
Sorting: Project
Action Filter: Completed


Actions I've completed, grouped by completion date. Answers the question: "What did I do yesterday/this week/last week?"

Waiting
View Mode: Context
Focus: Work
Filter: On Hold
Grouping: Context
Sorting: Due
Action Filter: Remaining


Projects on which I'm stuck because I'm waiting for something from someone. Answers the question, "What's blocking me from finishing my tasks?"

Errands
View Mode: Context
Focus: No Focus
Filter: Remaining
Grouping: Context
Sorting: Project
Action Filter: Available


My errands, grouped by context; this gives me a nice list of what I need from each store.

During our morning standup meeting, I reference "Completed" for what I did yesterday, "Work" for what I plan to do today, and "Waiting" for what's blocking me. Then when I start work, I live in the "Now" perspective. Weekends I'll reference "Errands" (and a "Spare Time" perspective I have yet to perfect).

Template Projects

This is one I got from the forums. If you have a project with several actions that occurs every so often, but not at regular times, make yourself a template project that you can duplicate. For example, I have a "Travel to Austin" template that reminds me to pack a toothbrush, make hotel reservations, fill out an expense report, etc.; when it's time to make a new trip, I duplicate the project and set the due dates accordingly.

Another great use of this technique I found on the forums was for grocery shopping. Create template projects for recipes with actions for all the ingredients. When you plan your meals for the week, make copies of the appropriate recipes, and your "Errands : Grocery Store" context will have a grocery list all ready. Sync to your iPhone and go.

Set the start date of the template project far in the future and put it in a folder at the bottom of your projects tree, so it doesn't get in the way of your active projects. You can apply the appropriate contexts to the actions, because they'll all be pending.




Hopefully, these examples will help newcomers over the hump presented by the very, very large blank sheet of paper OmniFocus provides. As my OF configuration has evolved, I've found myself using it more frequently. When you manage to the find the groove, you'll never leave.

Read More...