RPM-style version comparison in Python

We've been going over some changes to our versioning scheme at Zenoss. Anything we choose has to provide a clear upgrade path on RHEL-based systems, which means we needed to check out how RPM compares two versions to see which is newer. Google yielded this description, which it says is deprecated, but I can't find anything newer. That led me to search for the comparison code itself, which I eventually tracked down. Eventually, I decided the most direct path to understanding involved porting it to a language in which I am moderately fluent, namely Python. You'll find the code at the bottom of this post.

As it turns out, RPM version comparison is pretty stupid. Letters are counted as older than numbers (1.2.3 > 1.2.a), which is good. Unfortunately, they count as a new segment as well, and things with more segments are newer. This means that 1.2.3a > 1.2.3 (because it splits into ('1', '2', '3', 'a') and ('1', '2', '3')). Thus there isn't a very good way to put descriptions of prereleases into an artifact name—at least, not if you want to be able to upgrade to the final.

After reading a bunch about this (including Wikipedia's thoughts on the subject, which were pretty enlightening), I recommended we go to an odd/even scheme: odd-numbered versions are unstable, even are stable. This would allow us to go from 1.2.3 (the development branch) to 1.2.3a (alpha release) to 1.2.3b to 1.2.3rc1 to 1.2.4 (final release), and RPM would know how to upgrade the whole way. Also, it's apparent from the name of the artifact how much trust one can place in it.

We'll see if it takes hold. If nothing else, we'll throw the build number in the artifact name, so 1.2.3-1234 will upgrade to 1.2.3-1250 just fine, except you'll have to know ahead of time that 1234 was the beta and 1250 the final.



import re
isalnum = re.compile('[^a-zA-Z0-9]')

def rpmvercmp(a, b):
# If they're the same, we're done
if a==b: return 0

def _gen_segments(val):
"""
Generator that splits a string into segments.
e.g., '2xFg33.+f.5' => ('2', 'xFg', '33', 'f', '5')
"""
val = isalnum.split(val)
for dot in val:
res = ''
for s in dot:
if not res:
res += s
elif (res.isdigit() and s.isdigit()) or \
(res.isalpha() and s.isalpha()):
res += s
else:
if res:
yield res
res = s
if res:
yield res

ver1, ver2 = a, b

# Get rid of the release number
ver1_rel, ver2_rel = None, None
if '-' in ver1: ver1, ver1_rel = ver1.rsplit('-')
if '-' in ver2: ver2, ver2_rel = ver2.rsplit('-')

l1, l2 = map(_gen_segments, (ver1, ver2))
while l1 and l2:
# Get the next segment; if none exists, done
try: s1 = l1.next()
except StopIteration: s1 = None
try: s2 = l2.next()
except StopIteration: s2 = None

if s1 is None and s2 is None: break
if (s1 and not s2): return 1
if (s2 and not s1): return -1

# Check for type mismatch
if s1.isdigit() and not s2.isdigit(): return 1
if s2.isdigit() and not s1.isdigit(): return -1

# Cast as ints if possible
if s1.isdigit(): s1 = int(s1)
if s2.isdigit(): s2 = int(s2)

rc = cmp(s1, s2)
if rc: return rc

# If we've gotten this far, check release numbers
if ver1_rel is not None and ver2_rel is not None:
return rpmvercmp(ver1_rel, ver2_rel)

return 0

Read More...