Tuesday, March 4, 2008

Approximate Ruby Programming

What if your programming language interpreter didn't mind small spelling mistakes. Imagine that you type a method name a wee bit wrong, but the the interpreter seems to read your mind and call the correct method instead of throwing an NoMethodFound exception at you.

I've been playing a bit with this idea and found a simple but naive way this in Ruby. If you try to call a method that doesn't exist, then the method_missing method gets called. If this method_missing in turn figured out what method you really meant to call and called that instead, you where in the clear? Of course, programs can't read your mind, but a simple way of approximate this is to find the existing method with the shortest edit distance to the misspelled method and call that instead. This is pretty naive, but it will work in many cases and serve as a simple baseline. And it has a very straight-forward implementation in Ruby. Say hello to ... drumroll, please ... approximatize!.

class Example
def test(str)
puts "test method called: #{str}\n"


ex = Example.new
ex.test "a normal method call"
ex.text "Did you mean test?"
ex.tes "Did you mean test? (then you forgot a letter)"
ex.ttest "Did you mean test? (then wrote a letter to much)"

This example illustrates a simple use of approximatize. In both all cases, but the last, the test method gets called even though the method name was misspelled. However, the are no methods whose spelling resembles the last call, and thus a NoMethodError is thrown. It's possible to adjust how much error to allow, but it's recommended to keep the max_edit_distance low.

The implementation of approximatize:

def approximatize(target, max_edit_distance = 1)
target = target.class unless target.class == Class

target.class_eval do
define_method :method_missing do |*args|
meth = args.shift
similar_methods = {}

self.methods.each do |m|
dist = m.edit_distance(meth.to_s)
if dist <= max_edit_distance then
if similar_methods[dist].nil? then
similar_methods[dist] = [ m.to_s ]
similar_methods[dist] << m.to_s

# Eliminate candidates with higher edit distances than the candidates with the lowest
similar_methods = similar_methods.min.pop unless similar_methods.min.nil?

# Call method only if there is _exactly_ one element with the minimum edit distance
if similar_methods.nil? or similar_methods.size != 1 then
raise NoMethodError.new("undefined method ‘#{meth.to_s}’ for #{self}",meth,args)

Approximatize defines an method_missing method on the target class. When invoked this method runs through all the methods of the target class and calculates the edit distance of the method. It then selects the method with the lowest edit distance and invokes it (assuming there is only one with such a low edit distance and the this edit distance is lower than the allowed threshold). If no such method can be found, it will throw a NoSuchMethod exception, as would normally have happened when you call a non-existing method.

The dynamic programming version of the edit distance algorithm is implemented directly on the String class. The running time is O(m*n) so it's feasible (polynomial) even though it is executed for each method in the target class. In practice, this doesn't seem to be a problem, since method names tend to be rather short. However, it would probably be a good idea to cache the result of the calculations, instead doing them each time a non-existing method gets called.

class String
def edit_distance(other)
m = []

# create base case entries:
0.upto(size) { |i| m[i] = []; m[i][0] = i }
0.upto(other.size) { |j| m[0][j] = j }

# Fill out the rest of the matrix
1.upto size do |i|
1.upto other.size do |j|
etj = (self[i-1] == other[j-1])?0:1
m[i][j] = [ m[i-1][j-1] + etj , m[i][j-1]+1, m[i-1][j]+1 ].min

If you're the type who likes to live life dangerously and are not afraid to break things, why not approximatize your entire Ruby environment?
def approximatize_world(max_edit_distance=1)
ObjectSpace.each_object(Class) do |clazz|

But there are some obvious problems with the concept and the approach:

  • Expect the unexpected: Sometimes the wrong method gets called. One obviously dangerous case springs to mind: The way Ruby has destructive methods ending with an exclamation mark; an edit distance of one from the original name. As such, edit distance is not very clever, and it's really sensitive when it comes to short method names. It should be possible to come up with something better, but until then, edit distance serves as reasonable baseline approximation.
  • It's possible to get a list of defined methods of a class, but these doesn't include aliases for methods or methods implemented using method_missing. Actually, using approximatize on a class might break it's functionality if it depends on method_missing. This can probably be fixed with some clever aliasing though. However, using method_missing like this is asking for trouble.
  • Of course, the approach only handles method names. Syntax errors still cause the interpreter to complain like a "strict old aunt", even though it was perfectly clear what I meant ;-) I would be nice with approximate syntax, but that would require a different kind of Ruby parser
Ahh well, so it isn't very useful in practice, but the idea seems worthwhile, doesn't it? Even if it isn't very useful it is a working prototype illustrating an interesting concept (imho). And more importantly it provided me with a couple of hours of fun :-)


griff said...

A few comments.

When doing approximatize on an object the object should get a method_missing not its class. Eg:
t = Example.new

The standard alias trick is to call your method "method_missing_with_approximate", alias the old method_missing to "method_missing_without_approximate" and last alias method_missing_with_approximate to method_missing.

One way to avoid the problem with "test", "test!", "test=" and "test?" all being one distance from each other would be to simply disallow changing postfix when approximating.

Christian Theil Have said...

Thanks for the comments (and the code you just sent :-) I've posted your code in a new blog post here.