RPM-style version comparison in Python
written by Ian McCracken
at Wednesday, December 17, 2008
We've been going over some changes to our versioning scheme at Zenoss. Anything we choose has to provide a clear upgrade path on RHEL-based systems, which means we needed to check out how RPM compares two versions to see which is newer. Google yielded this description, which it says is deprecated, but I can't find anything newer. That led me to search for the comparison code itself, which I eventually tracked down. Eventually, I decided the most direct path to understanding involved porting it to a language in which I am moderately fluent, namely Python. You'll find the code at the bottom of this post.
As it turns out, RPM version comparison is pretty stupid. Letters are counted as older than numbers (1.2.3 > 1.2.a), which is good. Unfortunately, they count as a new segment as well, and things with more segments are newer. This means that 1.2.3a > 1.2.3 (because it splits into ('1', '2', '3', 'a') and ('1', '2', '3')). Thus there isn't a very good way to put descriptions of prereleases into an artifact name—at least, not if you want to be able to upgrade to the final.
After reading a bunch about this (including Wikipedia's thoughts on the subject, which were pretty enlightening), I recommended we go to an odd/even scheme: odd-numbered versions are unstable, even are stable. This would allow us to go from 1.2.3 (the development branch) to 1.2.3a (alpha release) to 1.2.3b to 1.2.3rc1 to 1.2.4 (final release), and RPM would know how to upgrade the whole way. Also, it's apparent from the name of the artifact how much trust one can place in it.
We'll see if it takes hold. If nothing else, we'll throw the build number in the artifact name, so 1.2.3-1234 will upgrade to 1.2.3-1250 just fine, except you'll have to know ahead of time that 1234 was the beta and 1250 the final.
As it turns out, RPM version comparison is pretty stupid. Letters are counted as older than numbers (1.2.3 > 1.2.a), which is good. Unfortunately, they count as a new segment as well, and things with more segments are newer. This means that 1.2.3a > 1.2.3 (because it splits into ('1', '2', '3', 'a') and ('1', '2', '3')). Thus there isn't a very good way to put descriptions of prereleases into an artifact name—at least, not if you want to be able to upgrade to the final.
After reading a bunch about this (including Wikipedia's thoughts on the subject, which were pretty enlightening), I recommended we go to an odd/even scheme: odd-numbered versions are unstable, even are stable. This would allow us to go from 1.2.3 (the development branch) to 1.2.3a (alpha release) to 1.2.3b to 1.2.3rc1 to 1.2.4 (final release), and RPM would know how to upgrade the whole way. Also, it's apparent from the name of the artifact how much trust one can place in it.
We'll see if it takes hold. If nothing else, we'll throw the build number in the artifact name, so 1.2.3-1234 will upgrade to 1.2.3-1250 just fine, except you'll have to know ahead of time that 1234 was the beta and 1250 the final.
import re
isalnum = re.compile('[^a-zA-Z0-9]')
def rpmvercmp(a, b):
# If they're the same, we're done
if a==b: return 0
def _gen_segments(val):
"""
Generator that splits a string into segments.
e.g., '2xFg33.+f.5' => ('2', 'xFg', '33', 'f', '5')
"""
val = isalnum.split(val)
for dot in val:
res = ''
for s in dot:
if not res:
res += s
elif (res.isdigit() and s.isdigit()) or \
(res.isalpha() and s.isalpha()):
res += s
else:
if res:
yield res
res = s
if res:
yield res
ver1, ver2 = a, b
# Get rid of the release number
ver1_rel, ver2_rel = None, None
if '-' in ver1: ver1, ver1_rel = ver1.rsplit('-')
if '-' in ver2: ver2, ver2_rel = ver2.rsplit('-')
l1, l2 = map(_gen_segments, (ver1, ver2))
while l1 and l2:
# Get the next segment; if none exists, done
try: s1 = l1.next()
except StopIteration: s1 = None
try: s2 = l2.next()
except StopIteration: s2 = None
if s1 is None and s2 is None: break
if (s1 and not s2): return 1
if (s2 and not s1): return -1
# Check for type mismatch
if s1.isdigit() and not s2.isdigit(): return 1
if s2.isdigit() and not s1.isdigit(): return -1
# Cast as ints if possible
if s1.isdigit(): s1 = int(s1)
if s2.isdigit(): s2 = int(s2)
rc = cmp(s1, s2)
if rc: return rc
# If we've gotten this far, check release numbers
if ver1_rel is not None and ver2_rel is not None:
return rpmvercmp(ver1_rel, ver2_rel)
return 0
December 18, 2008 at 12:44 PM
Do you expect RedHat or Fedora to package and widely distribute your alphas and betas?
December 18, 2008 at 12:49 PM
No, not at all. But we'd still like to keep upgrading from an RPM as simple as possible.
January 28, 2011 at 10:19 PM
Above snippet Copyright 2008 Ian McCracken, licensed under GPLv3.
This program is free software: you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation, either version 3 of the License, or
(at your option) any later version.
This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
GNU General Public License for more details.
You should have received a copy of the GNU General Public License
along with this program. If not, see .
January 13, 2012 at 3:17 PM
<3 I really needed this for a project. Thanks a lot for doing this work.