Now, don’t get all excited. The theme changed from one that came out in 2010 to one from 2016. It’s progress, but we’ve still got a ways to go to get “modern.” Don’t hold your breath!
Category: Natural Language (Page 1 of 2)
After a l-o-n-g break away from this project, it seems like there may be some new activity forthcoming. I recently acquired a Raspberry Pi, and I’ve been looking for a “project” for the little beastie. Perhaps this is the right device for the task.
The Pi is one of the newer ones with a bit more memory and such. I have it running Raspbian a Debian Linux distribution, which has Python already installed. I suspect much of my old NLTK code will run fine, but s-l-o-w-l-y.
I’ve decided to use Twitter as the main interface to the program. Eventually, it will check Twitter every 5-10 minutes for “mentions” of it’s handle, parse the message, and generate a suitable response. Python Twitter Tools seems to include most of the functionality I need and is tailor-made for the job. I have the Twitter account (@braynebuddy) ready, I’ve generated the required credentials, and am (not very) busily coding.
Stay tuned for more…
What would prompt new activity on a dead blog…? Well, I listened to a Python411 podcast recently (something I highly recommend, by the way) and learned about a project called Open Allure. It’s focused on building systems for teaching, but the things that caught my interest were that it used Python as the development language, and that it was able to have a “conversation with your computer: it talks, it listens, it watches, it responds.”
It seems to me this is a promising development. I’m inspired to try and dig up my old code out of the trash bin and start this whole project again! In six months these two (way-smarter-than-me) guys have made some amazing progress. Maybe there’s hope…?
It’s easy to get out of control on a project like this, and I think that’s where I’ve been for the last several weeks.
I’ve read more about AI and natural language processing than I ever knew existed a few months ago. This is a large and active field, and I’m in awe of the amazing work that’s going on all over the world. Places like the University of Rochester are full of smart people working long hours for years to get PhDs in this area. I have read several of the papers available on their CS department web site, and it sounds like they have already accomplished much of what I set out to do and have moved on.
Other, non-academics have made significant inroads into the problem. They are beginning to move beyond simple script-driven AIML bots to more sophisticated programs. I had added a link to one of them (Jeeney) to this post so you could see what I mean, but it’s not working. It’s not quite human-level interaction, but it’s not bad. I really like this approach and intend to follow a similar route if I can.
Which brings me back to the scope question… What am I really going to attempt here? I have asked this question several times already, and will likely do so again. Here’s my answer for today. This is heavily influenced by Homer [S. Vere and T. Bickmore, “A basic agent” (Computational Intelligence 1990)].
- Parser/Interpreter – This will tear apart the incoming text stream and convert it into some kind of standard internal form. I don’t expect this to be a flawless English parser because that’s both too difficult and not necessary. I need to extract the meaning from sentences at the level of a 3rd-grader at most. Vocabulary is pretty easy given all the work the Princeton folks have put into WordNet.
- Episodic Memory – This is what we might call our short-term memory. It’s surprisingly important in our ability to make sense of language. Basically I think of it as a list of the recent communication/event stream. I’m not sure how to “compress” this short-term memory into general memory, but it will need to be done.
- General Memory – All the stuff the computer “knows” stored in some kind of formal ontology. I have spent a lot of time worrying about how to do this, and have decided it probably doesn’t matter too much right now. If the thing ever gets really big I may have scaling problems. I don’t know how to solve or even pose them at this point.
- Planner/Reasoner – Figure out what to say, or perhaps do. This is a new idea for me, but it makes sense. Once the computer has understood the meaning of an input stream, what does it do? That’s the planner’s job. It will identify goals and come up with the steps needed to achieve the goal. I don’t think I’m too clear on how to do this.
- Executer – Figures out whether a plan step can and should be executed right now. Executes the step.
- Learner – Adds new knowledge to General Memory. No idea how to do this well. Jeeney claims to be reading through Wikipedia!
- Text generator – The opposite of the Parser/Interpreter. Converts from the standard internal form generated elsewhere in the program to simple 3rd-grade English.
I have several useful-looking snippets of Python code spread across these areas and have been trying to figure out where they go. I don’t know whether I’ll be able to tell if any of them will work before I get a basic shell constructed for each function.
Scope control… How much can one person get done in a few hours a week? Probably not as much as I’d like to. Oh well!
I’ve spent the last few weeks building and deleting little snippets of code in the general vicinity of language parsing. Mostly it has been an exercise of exploring some of the things that have been learned by the serious folks in the NLP field over the past 10-20 years. Some of what I’ve tried has worked out, and some has not. Most of it will never see the light of day.
Much of what I’ve been tinkering with is code for parsing English grammar. There has been a lot of very complex work done on solving this problem, so there are a lot of things to read and understand. I’ve learned a lot about top-down, bottom-up, shift-reduce, and left-corner parsers. I’ve tinkered with taggers and chunkers and sense and valency in an effort to deal with the complexity that we build into our language.
One of the “big” things I’ve concluded from all this is that I’m not as interested as I thought in building a good parser (i.e., a parser that includes all valid sentences while excluding all invalid ones). What the machine really needs is to do is extract the “meaning” from the word-stream coming in. Meaning can be gleaned from all sorts of ungrammatical constructions. For example, “In the park running the dog is” and “girl like boy” are easily understood in spite of the fact they’re nonsense grammatically. Even things like “Tihs snceente is esay to raed eevn wtih meixd up lteetrs” can be understood without much effort (try it here, and read more about this effect here). There is a lot of redundancy in our language, so the grammar and spelling doesn’t matter as much as our teachers implied, especially since the average letters/word runs around 4 or so.
This brings me back to an earlier question, “What does it mean to understand?” I’m going to start with the idea that if the machine can convert the incoming word-stream into a set of entities, the relationships among those entities, and answer questions about them requiring inference or deduction, then it has understood the statements.
I’m currently grappling with understanding propositional logic, first-order logic, and lambda abstraction (or λ-calculus) because I think these ideas might lead to a way of systematically encoding meaning in a form that the machine can use easily.
I’ve got such a long way to go.
These guys are really good! Their stuff actually does what I’m struggling mightily to conceive might be possible. Why isn’t this stuff already embedded in the products we use?
These guys are really smart! They’re a special interest group of the big group of smart people working on Computational Semantics.
What have I got myself into…
It has occurred to me that I have drifted from building an intelligent machine to building a working language parser. I suppose this is a necessary first step, but I must not lose sight of the real goal. There are so many fascinating distractions along the way!
In any case, the next step in this journey was to connect the parse-WordNet-concept pieces together to generate a symbolic representation of the meaning in a normal English sentence. As with much of this project, it’s easy to get a simple system going and much harder to make it work generally.
I feel like I’m “standing on the shoulders of giants” to steal a phrase from Stephen Hawking (who borrowed it from others as well). There has been an enormous amount of work done by the Natural Language Toolkit (nltk) folks to implement NLP algorithms in Python. Virtually everything I’m doing uses the software they have written. When I say, “I have built”, or “I have written” you must understand that what I really mean is that I have labored mightilly to stick a couple of lines of glue code between calls to the nltk functions.
So, I have successfully connected the WordNet lexical database to a recursive-descent parser. The parser is running a simple context-free grammar (CFG) that covers a small fraction of the English language. Even so, it does surprisingly well. For example, it correctly parses “the old man the ship” as a noun phrase (the old), a verb (man), and a noun phrase (the ship):
(S (NP (Det the) (Nom (N old))) (VP (V man) (NP (Det the) (Nom (N ship)))))
This is a sentence that would not be obvious to a person but is easy for the machine because it’s not confused (yet) by the fact that “man” is generally a noun not a verb.
What it doesn’t do yet is handle punctuation or capitalization. For example, it fails to parse “The” as “the” at the moment and chokes on commas, semicolon, quotes, etc. Some of these are easy to fix, others might require more low-level coding to replace the functions already part of NLTK.
The other thing that seems to be facing me is that a CFG is unlikely to be flexible enough for general language parsing. There are simply too many special cases. That’s why I need the semantic concepts data. I’m going to use a simpler, non-generic parser even though it can’t weed out nonsense sentences like, “the dog flew water.” I’ll use a semantic filter to get at the meaning of the sentence, if any. I think that will still not be enough to get rid of the ambiguity, but it might make a CFG good enough to use as a parser.
I’m slowly making progress and have a word-finder that looks up a word and gets its hypernyms from WordNet, a parser that takes a sentence and gets all the parts of speech for the words, and a concept structure for the conceptual dependency information. All three with simple web interfaces.
Now, to wire it all together!
I knew it was coming, but it’s still a bit of a shock. I hit the inevitable wall…this is indeed a “hard” problem to take on. Maybe I can’t do it after all!
The title of the post (from MIT EECS 6.864) illustrates a little corner of the problem. As an English-language sentence, what does it mean? The problem is with the ambiguity inherent in a natural language, and there isn’t a correct interpretation. The sentence is grammatically correct and semantically ambiguous. Here are the possible meanings I can think of:
- I used a telescope to observe a small web-footed broad-billed swimming bird belonging to a female person.
- I observed a small web-footed broad-billed swimming bird belonging to a female person. The bird had a telescope.
- I observed a female person move quickly downwards. The person had a telescope.
- I used a telescope to observe a female person move quickly downwards.
- I used a telescope to cut a small web-footed broad-billed swimming bird belonging to a female person.
- I used a telescope to observe heavy cotton fabric of plain weave belonging to a female person.
- I used a telescope to cut heavy cotton fabric of plain weave belonging to a female person.
There are several other possible meanings as well. The problem is that more information is needed to establish the real meaning of the sentence. It is not just world knowledge (i.e., characteristics and typical behaviors of duck and telescopes) that’s needed. I’ve already used that kind of knowledge to generate the list of meanings above. The extra information needed comes from the context in which the sentence occurs. Were the speaker and listener just talking about odd things to see ducks doing? Is this a cartoon? Are we discussing tent-making methods?
My short-term approach is likely going to be the same. I’m going to try to use PHP or C# (maybe Python), connected to WordNet and MySQL, to build a conceptual analyzer that can gradually learn the world knowledge needed to understand what I tell it in English. There are other projects (e.g., Babel, Dashboard, Mindlog, and even Godwhale) that may be trying to do the same kind of thing, so collaboration might be in the future.
It’s going to be harder than I thought!
My limited research to date has led me to the conclusion that learning to talk is hard to do unless there is some level of intelligence behind the learner. That’s where the chatterbots fall short. They exhibit a superficial ability to talk to you, but there’s nothing behind the curtain. They respond credibly only if you stick to the kind of discussion anticipated by their creators. Any deviation produces nonsense that is obvious to the human but not to the machine.
So, teaching a machine to understand normal English is going to require some kind of intelligence. Like many of the ideas I’ve been exploring, this is a well-worn intellectual path. Most of the material on the Internet that I can actually understand is from research done in the 1970s and 1980s. Some researchers concluded that coding up a program with all of the grammer rules of a language will not allow the program to capture the meaning of even a simple sentence. Most of what we say requires some kind of “world knowledge” beforeyou can make sense of it. You figure out what a sentence means by parsing the meaning as much as you parse the grammer, or maybe by doing both together. Some of the buzzwords I’ve come across in this regard are conceptual dependency, procedural semantics, and conceptual analysis. All have to do with knowledge representation and I won’t take the space here to define them as Google does a fine job of covering the topic. SHRDLU (and here) was a program written in 1972 that seemed to capture much of what I’d like to do, albeit in a very small domain. I can’t quite figure out why it was never extended.
I think I’m going to start with a simple program that’s based on these ideas about using world knowledge as as a means of understanding language. A lot of what I’ll try first is described in online course materials from GA Tech and Cardiff Univ among others. The first decisions are going to be how to represent world knowledge, what should the machine “know” to begin with (all of WordNet, for example?), and how will the machine learn new things.
I also look over my shoulder every now and then wondering when Sarah Connor is going to show up to prevent the birth of Skynet. Is this a thing that should be done?