Monday, 16 June 2008

Narcolepsy

The electrical system in Mylne's Court, the dorm hall where I happen to reside, is getting a complete overhaul today and as a result all tenants have been kicked out and not allowed to return before 8am. As a result, I'm currently hauled up in Appleton Tower at 6 am, where I've been for the past 11 hours, tired, hungry, and quite possibly the only person in this entire building.

What does that mean to you, the avid NLTK GSoC project follower? It means that dependency grammars now support explicit arity/arguments, and that the parser behaves accordingly. This wasn't actually in the proposal so I didn't want to throw too much time at it, but I think that it's an important feature to have in this particular parser.
The ChartCell has been changed to a set, and the duplicate parses that existed in the previous commit are no longer a problem. Until I figure out to hash a list, there are still problems that could occur in more complex examples due to different spans being deemed equal by __eq__, but it's not tough fix, and I haven't even tested to see if this error actually exists - I'm just going on a hunch. There's also grammar parsing features similar to CFG, where the user can specify the grammar in the form of a string like " 'a' --> 'b' 'c' ", and get the set of productions automatically.

So I don't want to say that this parser is "done", but I think it's safe now to move on to the statistical projective parser for the moment, before coming back to put the finishing touches on this one. I still haven't done tests with more elaborate test sentences, but I'll do that shortly. I'll commit this once I get my laptop back on the internet, probably after a really long nap.

No comments: