It’s easy to get out of control on a project like this, and I think that’s where I’ve been for the last several weeks.
I’ve read more about AI and natural language processing than I ever knew existed a few months ago. This is a large and active field, and I’m in awe of the amazing work that’s going on all over the world. Places like the University of Rochester are full of smart people working long hours for years to get PhDs in this area. I have read several of the papers available on their CS department web site, and it sounds like they have already accomplished much of what I set out to do and have moved on.
Other, non-academics have made significant inroads into the problem. They are beginning to move beyond simple script-driven AIML bots to more sophisticated programs. I had added a link to one of them (Jeeney) to this post so you could see what I mean, but it’s not working. It’s not quite human-level interaction, but it’s not bad. I really like this approach and intend to follow a similar route if I can.
Which brings me back to the scope question… What am I really going to attempt here? I have asked this question several times already, and will likely do so again. Here’s my answer for today. This is heavily influenced by Homer [S. Vere and T. Bickmore, “A basic agent” (Computational Intelligence 1990)].
- Parser/Interpreter – This will tear apart the incoming text stream and convert it into some kind of standard internal form. I don’t expect this to be a flawless English parser because that’s both too difficult and not necessary. I need to extract the meaning from sentences at the level of a 3rd-grader at most. Vocabulary is pretty easy given all the work the Princeton folks have put into WordNet.
- Episodic Memory – This is what we might call our short-term memory. It’s surprisingly important in our ability to make sense of language. Basically I think of it as a list of the recent communication/event stream. I’m not sure how to “compress” this short-term memory into general memory, but it will need to be done.
- General Memory – All the stuff the computer “knows” stored in some kind of formal ontology. I have spent a lot of time worrying about how to do this, and have decided it probably doesn’t matter too much right now. If the thing ever gets really big I may have scaling problems. I don’t know how to solve or even pose them at this point.
- Planner/Reasoner – Figure out what to say, or perhaps do. This is a new idea for me, but it makes sense. Once the computer has understood the meaning of an input stream, what does it do? That’s the planner’s job. It will identify goals and come up with the steps needed to achieve the goal. I don’t think I’m too clear on how to do this.
- Executer – Figures out whether a plan step can and should be executed right now. Executes the step.
- Learner – Adds new knowledge to General Memory. No idea how to do this well. Jeeney claims to be reading through Wikipedia!
- Text generator – The opposite of the Parser/Interpreter. Converts from the standard internal form generated elsewhere in the program to simple 3rd-grade English.
I have several useful-looking snippets of Python code spread across these areas and have been trying to figure out where they go. I don’t know whether I’ll be able to tell if any of them will work before I get a basic shell constructed for each function.
Scope control… How much can one person get done in a few hours a week? Probably not as much as I’d like to. Oh well!