Python is a great language, and I use it frequently. However, I also use Ruby. One thing that recently became a bit of a problem was hashing. Python has a magic function called hash which works on strings and produces a signed integer hash. Because this was used in a script for storing files in the file-system, I've got a problem when I wanted to write a script in ruby, as it was impossible for me to calculate the Python hash in Ruby.
The basic problem was as follows. Here is the working Python code.
path = "..."
cache_path = hash(path) % 255
Here is the ruby code:
path = "..."
cache_path = ????(path) % 255
Because hash
function is specific to Python, this was a bit of a problem. Well, I found a great page explaining the python hash
function.
- Python Hash Algorithms - Python Implementation of
hash
- python hash() function - C implementation of
hash
The only complicated part in porting this code to Ruby was the signed arithmetic required. Basically, the above code relies on the behaviour of 2s-compliment overflow for 32bit signed integers. Because Ruby doesn't (as far as I know) have facilities for this, I've added in a function unsigned_to_signed
which can do the conversion for arbitrary sized integers.
#!/usr/bin/env ruby
def unsigned_to_signed(h, mask)
# h must be within the unsigned range we are dealing with, i.e. [0, mask] inclusive
h = h % (mask + 1)
# calculate the maximum positive signed value
max = (mask / 2) + 1
# if h is bigger than the maximum unsigned value, we need to wrap it around.
h > max ? (h % max) - max : h
end
def signed_32bit_multiply(a, b)
unsigned_to_signed((a * b), 0xFFFFFFFF)
end
class String
def python_hash
return 0 if size == 0
sum = self[0] << 7
(0...size).each do |i|
sum = signed_32bit_multiply(1000003, sum) ^ self[i]
end
sum = sum ^ size
return sum == -1 ? -2 : sum
end
end
This is a simple program you can use pick out some random words and compare the hash between Python and Ruby.
def check_hash(s)
puts " Hashing #{s.dump} ".center(64, "-")
py_val = `python -c 'print hash(#{s.dump})'`.strip.to_i
puts "Python hash value is: ".rjust(25) + " #{py_val}"
rb_val = s.python_hash
puts "Our hash value is: ".rjust(25) + " #{rb_val}"
if (py_val != rb_val)
puts " ERROR HASH VALUE MISMATCH ".center(64, "*")
end
end
dict = File.open("/usr/share/dict/words", "r").readlines
(0...10).each do |i|
word = dict[rand * dict.size-1].strip
check_hash(word)
end
This code will be helpful for what I am trying to achieve, however if you are storing hash functions, make sure you are using standard functions such as MD5, SHA1, etc. This way, logic is easy to port across multiple platforms and code bases.