Well, not much progress on the main task has occurred in recent days. I’m slowly sinking into a swamp of details around how to implement ideas in code.
There was a long trek through C#, mono and MySQL that turned out to mostly be a dead end. It is possible to put the WordNet data into MySQL, and to access the database from C#/mono on Linux, but it’s not easy. I think the Linux tools are not really ready to be used quite that way by people with my limited skills, and I’m not willing to lock myself into a Windows-only platform.
The current frontrunner seems to be Python with the NLTK toolkit, which offers a lot of high quality AI code that promises to be useful. It seems fairly straightforward to get the whole thing running with a web front end. Unfortunately, NLTK really wants to be running on Python 2.5+ and my server is on Ubuntu 6.06 LTS (with Python 2.4). I guess it’s time to upgrade the server to Ubuntu 8.04 LTS anyway.
Once all this is done maybe the main task can resume. I have run across a bunch of work that has been done on the Semantic Web that seems very close to what I’m trying to do. Things like the Resource Description Framework (RDF) and Web Ontology Language (OWL) look like exactly what I’m after and will likely be among the first “meaning storage” schemes I try.
Going back to your “I saw her duck…” example. Would there be a way to use a language, even an artificial one, without the semantic ambiguity? For example, define a language where “saw” ONLY means the past tense of “see.” “Saw”, meaning “to cut,” would be “saw*” or “saww”. Likewise, “duck” would only mean a web-footed bird. Any other definition, like “move down quickly” would require a modifier.
Obviously this would require an enormous amount of work (how many modifiers would you need for the word “lie”?), but at least it would cut out the semantics issue.
Yes, there is a way to do this that is in common use! Programming languages like C++, C#, Visual Basic, Fortran, etc. have absolutely no ambiguity in their syntax and semantics. Every statement in these languages has exactly one meaning. Everything is crystal clear.
The problem with these languages is that they are very narrow in terms of what can be expressed. There is less room for creative expression, and they do not extend well to uses beyond their original design. They evolve very slowly if at all.
The other possibility is the Conceptual Dependency framework I mentioned in an earlier post. That was an effort to take every sentence and express its meaning in a unique and unambiguous way. I have started into the code to convert English sentences into CD language. Here’s a taste:
The sentences:
“John owned a house. John gave the house to his son.”
Translate into:
(POSSESS (OBJECT house)(VALUE John)(TIME past))
(ATRANS
(ACTOR John)
(OBJECT house)
(DIRECTION (FROM John)(TO John's son))
(TIME past))
Wow. Just discovered your new blog, Rob. (Thanks for the heads up.)
You have a way of over-reaching with style, my friend. Bruce’s comment was predictable: “He has WAY too much time on his hands.” So now you’re going to solve the natural language problem in your spare time? Let’s hope Bruce is right. You better have LOTS of spare time!!
On a (slightly) related topic, I’d like to see a post about how you’ve set up your server. I take it you are running it on your own hardware. Is it set up in a DMZ? Are you using a dynamic DNS service? Did you lease the domain name from the same company? What’s involved in setting up a WordPress blog on your own box, and are you planning to have any non-blog web pages on your site?
Bruce apparently doesn’t read my other blog, or he’d know how this one is likely to turn out!
This blog is actually not running on my hardware. The web hosting service I’m using (also the source for the domain name) offers WordPress as one of the applications they support with the account. That makes it a lot easier because I don’t need dynamic DNS and all that stuff.
I do run my own web server in the orange zone on my Smoothwall box, but there are no “public” pages served from there (I hope). I use dyndns.com to keep tabs on the constantly changing IP address.
I eventually hope to host the results of my current project from my web server. I don’t think I can get all the code to run on the web-hosting machine.