How-to: Simplify Python shell scripts with setuptools

A lot of people don't know how easy it is to use setuptools to create shell scripts. I didn't know how to use it properly until a couple of weeks ago; since discovering how, I've taken to repackaging all my Python scripts into an egg.

Here's how I was doing it before. I had modified PYTHONPATH to include ~/lib/python, and PATH to include ~/bin; this is where I would put scripts and modules I'd written. Since my home directory is checked into my Subversion repository, I could access these new scripts on remote boxes by making a user and checking them out.

This works fine, except for a few things. First, the scripts aren't usable by other users without their checking out my home directory. Second, modifying paths can be tricky. Third, it's a hassle setting everything up. Fourth, managing dependencies boils down to trusting that my home directory will be set up in the right way. Finally, using environment variables limits the use of the scripts to my own environment.

Enter setuptools. If I package my scripts up into eggs, I can install them as root and make them available to everyone, other users can install the egg in whatever way they see fit, and setuptools can handle all the dependencies and PYTHONPATH hackery for me. Once I learned that setuptools will actually create shell scripts for me pointing at functions I specify and install them in the correct place on the system, I was totally sold. This pattern was so useful for me that I even wrote and released a package making it easier.

Here's how I do it. Normally, I package related shell scripts together. In this example, we'll make an egg that installs a single shell script, which does nothing but return the titles of the HTML documents at the URLs passed in. I assure you that I recognize the total lack of utility this script provides.

  1. Create the egg structure. I used to do this by hand until I discovered a utility that does it for me. Install PasteScript:

    $ sudo easy_install PasteScript

    PasteScript provides the command paster, which, among other things, will create an empty egg for you to modify. Make that egg now:

    $ paster create -t basic_package TitleGetter

    It'll ask you a bunch of question that you can answer or not at your discretion. You can always change them later, and unless you're going to be releasing this to the world they're unncessary. At the end, you'll have a directory with an egg structure inside!

  2. Write the code. Naturally, this is the part you'll care about most once you've got the process down. For now, it's irrelevant. Here's some code that gets a web page and pulls out a title:

    import re, urllib2
    pattern = re.compile(r'<title[^<>]*>(.*)</title>', re.I|re.S)
    def print_titles(*urls):
    for url in urls:
    html = urllib2.urlopen(url).read()
    match = pattern.search(html)
    if match:
    title = match.groups()[0]
    else:
    title = "** Could not find a <title> **"
    print url, '\t', title

    Inside your egg is a directory called "titlegetter", and inside that is where you put your code. Make a file in there called titles.py and paste the above code into it.

  3. Test the code. Normally, I write unit tests before writing the code, placing them in a tests directory inside the package. That's up to you. Some people don't write tests. setuptools does allow you to run tests on an egg easily. I use nosetests, which you can install with:

    $ sudo easy_install nose

    You can then run them by going to the egg's top directory, where setup.py is, and running:

    $ python setup.py nosetests

    That'll build the egg in place and run the tests. Right now, though, we'll just test it manually. Install a development version of the egg:

    $ sudo python setup.py develop

    Now test your function:

    $ python
    >>> from titlegetter.titles import print_titles
    >>> print_titles('http://www.google.com')
    http://www.google.com Google


  4. Have setup.py create a shell script. This is the key bit of utility setuptools provides. Open up setup.py and find the entry_points definition. Change it to look like this:

    ...
    entry_points={
    "console_scripts": [
    'gettitle = titlegetter.titles:print_titles'
    ]},
    ...

    Now re-install the code with sudo python setup.py develop. setuptools will create a script called gettitle and install it somewhere on your PATH; that script will call titlegetter.titles.print_titles. Try it out:

    $ gettitle http://www.google.com http://www.yahoo.com
    $

    Notice nothing is printed out. Why not? Well, the script that setuptools creates doesn't do any argument parsing or anything, it just calls the function with no arguments. This makes sense, actually; if you're doing something fancy with args, you don't want to have to override the setuptools mechanism. On the other hand, for really simple scripts, you have to plop in a bunch of sys.argv parsing overhead.

    Luckily, I wrote a decorator that handles this for you. Add cliutils as a dependency by including the string in the install_requires definition in setup.py:

    ...
    install_requires = ['cliutils'],
    ...

    Then go down into titles.py and add the decorator:

    ...
    from cliutils import cliargs

    @cliargs
    def print_titles(*urls):
    ...

    That decorator will parse the command line arguments into args and keyword args and pass the result into the decorated function. Now that that's done, re-install your development egg with sudo python setup.py develop. You'll see that cliutils is downloaded and installed. Now try your script again:

    $ gettitle http://www.google.com http://www.yahoo.com
    http://www.google.com Google
    http://www.yahoo.com Yahoo!



And just like that, you've got a portable command-line utility. Check it in somewhere. If you want to build an actual egg, you can just do the usual setuptools thing and run:


$ python setup.py bdist_egg

Which will build and egg and dump it in the dist directory.

So that's that. I have two or three eggs that I keep modifying and updating with new functions. Ones that have the same dependencies or are generally related I put all in one egg, but organization is up to you. That cliutils package has a few other good things in it; read its documentation for more info.

3 comments

Post a Comment