Tuesday, September 30, 2008

JAOO, day one

I am currently at the JAOO conference :-)

The conference seems to be buzzing with functional programming today. It started with the keynote by Anders Hejlsberg, where he talked about the future of programming languages, of which he claimed that one of the key components was functional programming. Concurrency was also mentioned as a key component. The talk also mentioned DSL's and used LINQ as example. It does seem nice - almost makes me want to play with C#. The F# language from Microsoft also seems like a nice alternative to C# if you wish the utilize the dot.net runtime. Very CAML/ML like.

A few other talks of the day made a great impression on me. I didn't see that many presentation, because I'm was working as crew most of the day.

I originally planned to go to the Aslaks rspec talk (I missed it a Rubyfools) but it was cancelled and replaced with a talk about meta-programming, and while it's an interesting topic, I've seen my share of those talks. Instead I decided to go the Fortress talk by Guy Steele - and it blew my mind! Fortress seems like a really interesting concurrent programming language. I will have to play around with this.

After the fortress talk I went to a talk about Scala, by Bill Venners. Scala is a functional language which runs on the JVM an integrates with java libraries. The talk it self was more of an introduction to functional programming for Java programmers and it didn't really mention any of the really interesting features Scala.

The final talk I went to was "Why Functional Programming (Still) Matters" by Erik Meijer. This was mindblowing. He talked about dealing with the "impurity" of languages introduced by side-effects by using monads. The idea of embracing side-effects - in the right way - rather than than shunning them was eye opening.

Sunday, August 31, 2008

Extracting facebook friends from public profiles

Today I just discovered a small little disclosure issue with facebook and google. Actually, the discovery was due to my girlfriend.

When you enter a public facebook profile for a person, it will display five friends of that person. Since everybody who is friends with this person also have a public profile, which also lists five friends, there is a reasonable chance that the person turns up in their profile.

Google indexes these pages, so it is possible to search these public profiles. By searching for the a particular name on facebook using google, it is possible to find a lot more friends of the person. For instance, the query

http://www.google.com/search?q=%22Christian+Theil+Have%22+site:facebook.com&hl=en&safe=off&filter=0

will list a few of my facebook friends.

I imagine that it is possible to extract quite a big portion of the facebook social network using this method.

Wednesday, April 23, 2008

NLP keyword on google

Searching on "NLP" on google seems to bring up pages and pages of stuff on neuro-linguistic programming (psychology stuff) rather than on natural language processing.

In a recent post to comp.ai.nat-lang Amnon urges NLP people "not cede the NLP keyword without a fight". I agree with him, that this is an issue. But I am not sure how to deal with it. Even though, I believe that the abbreviation "NLP" meaning natural language processing is more scientific and probably precedes the neuro-linguistic programming use of the abbreviation (I have no definite source for this), both uses of abbreviation is equally legitimate. However, there is a big overlap in terminology used eg. linguistics, language etc.

However, it's annoying for me to have a million pages neuro-lingustic programming pages pop when I search for NLP, because I am a million times more interested in natural language processing. But google neglects this fact, although this knowledge should be derivable from my previous searches. Oppositely, it's not transparent what kind of result a given query should give me. Pagerank results in an approximation which can be compared with the lowest common denominator of query results. With gradual improvement of my query I usually find what I am looking for though, but this is besides the point.

I think there is still room for other search engines out there. I could definitely use a search engine which gave me predictable result. And a search engine that took a lot more context into consideration. But google domination (and superiority in terms on amount of indexed pages) is quite a barrier. But I think it's slowly going to happen, as the quality of googles search results deteriorates because of the exceeding amount of information. One algorithm doesn't fit all.

A scary side-effect is that because of googles domination, I think googles indexing mechanisms might have some serious impacts on the development of language and ultimately the way we think. If a search term is drowned by a more popular search term, the less popular might slowly be forgotten or loose it's place in common speak. To add to seriousness of this, it's possible to affect this by means of google bombing or similar techniques. That is what SEO is mostly about.

Until we get better search engines, I guess I might as well throw in a link to the wikipedia definition of NLP.

Natural language processing: one, Neuro-lingustic programming: zero ;-)

Wednesday, March 5, 2008

Better Approximate Ruby Programming

Griff just sent me a nicer version of the approximatize function, which dynamically defines a module. It's a really cool hack: Now approximatize works for classes, modules and objects alike!

If the target of the function turns out to be an other module, it will include the new module in it and it's included method will do the aliasing magic. Should the target be a class or an object, it will extend it and is able to re-alias the method_missing method directly. Nice.


def approximatize(target, max_edit_distance = 1)
m = Module.new do
def self.included(other)
other.send :alias_method, :method_missing_without_approximate, :method_missing
other.send :alias_method, :method_missing, :method_missing_with_approximate
end

define_method :method_missing_with_approximate do |*args|
meth = args.shift
similar_methods = {}

self.methods.each do |m|
dist = m.edit_distance(meth.to_s)
if dist <= max_edit_distance then
if similar_methods[dist].nil? then
similar_methods[dist] = [ m.to_s ]
else
similar_methods[dist] << m.to_s
end
end
end

# Eliminate candidates with higher edit distances than the candidates with the lowest
similar_methods = similar_methods.min.pop unless similar_methods.min.nil?

# Call method only if there is _exactly_ one element with the minimum edit distance
if similar_methods.nil? or similar_methods.size != 1 then
method_missing_without_approximate(meth, *args)
else
self.send(similar_methods.first,*args)
end
end
end

if target.kind_of? Module
target.send :include, m
else
target.extend m
target.instance_eval do
alias :method_missing_without_approximate :method_missing
alias :method_missing :method_missing_with_approximate
end
end
end


Thanks :-)

Tuesday, March 4, 2008

Approximate Ruby Programming

What if your programming language interpreter didn't mind small spelling mistakes. Imagine that you type a method name a wee bit wrong, but the the interpreter seems to read your mind and call the correct method instead of throwing an NoMethodFound exception at you.

I've been playing a bit with this idea and found a simple but naive way this in Ruby. If you try to call a method that doesn't exist, then the method_missing method gets called. If this method_missing in turn figured out what method you really meant to call and called that instead, you where in the clear? Of course, programs can't read your mind, but a simple way of approximate this is to find the existing method with the shortest edit distance to the misspelled method and call that instead. This is pretty naive, but it will work in many cases and serve as a simple baseline. And it has a very straight-forward implementation in Ruby. Say hello to ... drumroll, please ... approximatize!.

class Example
def test(str)
puts "test method called: #{str}\n"
end
end

approximatize(Example)

ex = Example.new
ex.test "a normal method call"
ex.text "Did you mean test?"
ex.tes "Did you mean test? (then you forgot a letter)"
ex.ttest "Did you mean test? (then wrote a letter to much)"
ex.and_now_for_something_completely_different

This example illustrates a simple use of approximatize. In both all cases, but the last, the test method gets called even though the method name was misspelled. However, the are no methods whose spelling resembles the last call, and thus a NoMethodError is thrown. It's possible to adjust how much error to allow, but it's recommended to keep the max_edit_distance low.

The implementation of approximatize:

def approximatize(target, max_edit_distance = 1)
target = target.class unless target.class == Class

target.class_eval do
define_method :method_missing do |*args|
meth = args.shift
similar_methods = {}

self.methods.each do |m|
dist = m.edit_distance(meth.to_s)
if dist <= max_edit_distance then
if similar_methods[dist].nil? then
similar_methods[dist] = [ m.to_s ]
else
similar_methods[dist] << m.to_s
end
end
end

# Eliminate candidates with higher edit distances than the candidates with the lowest
similar_methods = similar_methods.min.pop unless similar_methods.min.nil?

# Call method only if there is _exactly_ one element with the minimum edit distance
if similar_methods.nil? or similar_methods.size != 1 then
raise NoMethodError.new("undefined method ‘#{meth.to_s}’ for #{self}",meth,args)
else
self.__send__(similar_methods.first,*args)
end
end
end
end

Approximatize defines an method_missing method on the target class. When invoked this method runs through all the methods of the target class and calculates the edit distance of the method. It then selects the method with the lowest edit distance and invokes it (assuming there is only one with such a low edit distance and the this edit distance is lower than the allowed threshold). If no such method can be found, it will throw a NoSuchMethod exception, as would normally have happened when you call a non-existing method.

The dynamic programming version of the edit distance algorithm is implemented directly on the String class. The running time is O(m*n) so it's feasible (polynomial) even though it is executed for each method in the target class. In practice, this doesn't seem to be a problem, since method names tend to be rather short. However, it would probably be a good idea to cache the result of the calculations, instead doing them each time a non-existing method gets called.

class String
def edit_distance(other)
m = []

# create base case entries:
0.upto(size) { |i| m[i] = []; m[i][0] = i }
0.upto(other.size) { |j| m[0][j] = j }

# Fill out the rest of the matrix
1.upto size do |i|
1.upto other.size do |j|
etj = (self[i-1] == other[j-1])?0:1
m[i][j] = [ m[i-1][j-1] + etj , m[i][j-1]+1, m[i-1][j]+1 ].min
end
end

m[size][other.size]
end
end
If you're the type who likes to live life dangerously and are not afraid to break things, why not approximatize your entire Ruby environment?
def approximatize_world(max_edit_distance=1)
ObjectSpace.each_object(Class) do |clazz|
approximatize_class(clazz,max_edit_distance)
end
end


But there are some obvious problems with the concept and the approach:

  • Expect the unexpected: Sometimes the wrong method gets called. One obviously dangerous case springs to mind: The way Ruby has destructive methods ending with an exclamation mark; an edit distance of one from the original name. As such, edit distance is not very clever, and it's really sensitive when it comes to short method names. It should be possible to come up with something better, but until then, edit distance serves as reasonable baseline approximation.
  • It's possible to get a list of defined methods of a class, but these doesn't include aliases for methods or methods implemented using method_missing. Actually, using approximatize on a class might break it's functionality if it depends on method_missing. This can probably be fixed with some clever aliasing though. However, using method_missing like this is asking for trouble.
  • Of course, the approach only handles method names. Syntax errors still cause the interpreter to complain like a "strict old aunt", even though it was perfectly clear what I meant ;-) I would be nice with approximate syntax, but that would require a different kind of Ruby parser
Ahh well, so it isn't very useful in practice, but the idea seems worthwhile, doesn't it? Even if it isn't very useful it is a working prototype illustrating an interesting concept (imho). And more importantly it provided me with a couple of hours of fun :-)

Thursday, November 1, 2007

php has problems

Arrggh! I'm coding stuff in PHP again at work. I've done a lot PHP in past times, and even though the language has improved it's still such a exemplary piece of crap. Language design at it's worst. PHP is supposed to be a recursive acronym of: PHP: Hypertext Processor, however, these acronyms seems to describe the language equally well:

PHP Has Problems
Pretty Half-baked Programming-language
PHP Hates Programmmers (and also the other way around for many...)

Feel free to add to list..


I just needed to get my frustrations out...

Sunday, September 30, 2007

RANLP conference

This week I attended the RANLP conference (Recent Advances in Natural Language Processing) in Bulgaria and it has been an amazing experience. There were many really good and interesting presentations. It seems that NLP is maturing in many respects. A mindblowing conference :-)

By the way, many of the papers from the workshops are available online on the workshop homepages.


I presented a poster,
From Use Cases to UML Class Diagrams using Logic Grammars and Constraints, with Henning Christiansen and we got some good feedback.

RANLP poster: From Use Cases to UML Class Diagrams using Logic Grammars and Constraints