wordpos/README.md

219 lines
6.0 KiB
Markdown

wordpos
=======
wordpos is a set of part-of-speech utilities for Node.js using [natural's](http://github.com/NaturalNode/natural) WordNet module.
There is no lexigraphical intelligence here (eg, see [pos-js](https://github.com/fortnightlabs/pos-js)). Only dictionary lookup.
Usage
-------
```js
var WordPOS = require('wordpos'),
wordpos = new WordPOS();
wordpos.getAdjectives('The angry bear chased the frightened little squirrel.', function(result){
console.log(result);
});
// [ 'little', 'angry', 'frightened' ]
wordpos.isAdjective('awesome', function(result){
console.log(result);
});
// true
```
See `wordpos_spec.js` for full usage.
Installation
------------
npm install wordpos
Note: `wordpos-bench.js` requires a [forked uubench](https://github.com/moos/uubench) module.
To run spec:
npm install jasmine-node -g
jasmine-node wordpos_spec.js --verbose
API
-------
Please note: all API are async since the underlying WordNet library is async.
WordPOS is a subclass of natural's [WordNet class](https://github.com/NaturalNode/natural#wordnet) and inherits all its methods.
### getX()
Get POS from text.
```
wordpos.getPOS(str, callback) -- callback receives a result object:
{
nouns:[], Array of str words that are nouns
verbs:[], Array of str words that are verbs
adjectives:[], Array of str words that are adjectives
adverbs:[], Array of str words that are adverbs
rest:[] Array of str words that are not in dict or could not be categorized as a POS
}
Note: a word may appear in multiple POS (eg, 'great' is both a noun and an adjective)
wordpos.getNouns(str, callback) -- callback receives an array of nouns in str
wordpos.getVerbs(str, callback) -- callback receives an array of verbs in str
wordpos.getAdjectives(str, callback) -- callback receives an array of adjectives in str
wordpos.getAdverbs(str, callback) -- callback receives an array of adverbs in str
```
NB: If you're only interested in a certain POS (say, adjectives), using the particular getX() is faster
than getPOS() which looks up the word in all index files.
NB: [stopwords] (https://github.com/NaturalNode/natural/blob/master/lib/natural/util/stopwords.js)
are stripped out from str before lookup.
Example:
```js
wordpos.getNouns('The angry bear chased the frightened little squirrel.', console.log)
// [ 'bear', 'squirrel', 'little', 'chased' ]
wordpos.getPOS('The angry bear chased the frightened little squirrel.', console.log)
// output:
{
nouns: [ 'bear', 'squirrel', 'little', 'chased' ],
verbs: [ 'bear' ],
adjectives: [ 'little', 'angry', 'frightened' ],
adverbs: [ 'little' ],
rest: [ 'the' ]
}
```
This has no relation to correct grammer of given sentence, where here only 'bear' and 'squirrel'
would be considered nouns. (see http://nltk.googlecode.com/svn/trunk/doc/book/ch08.html#ex-recnominals)
[pos-js](https://github.com/fortnightlabs/pos-js), e.g., shows only 'squirrel' as noun:
The / DT
angry / JJ
bear / VB
chased / VBN
the / DT
frightened / VBN
little / JJ
squirrel / NN
### isX()
Determine if a word is a particular POS.
```
wordpos.isNoun(word, callback) -- callback receives result (true/false) if word is a noun.
wordpos.isVerb(word, callback) -- callback receives result (true/false) if word is a verb.
wordpos.isAdjective(word, callback) -- callback receives result (true/false) if word is an adjective.
wordpos.isAdverb(word, callback) -- callback receives result (true/false) if word is an adverb.
```
Examples:
```js
wordpos.isVerb('fish', console.log);
// true
wordpos.isNoun('fish', console.log);
// true
wordpos.isAdjective('fishy', console.log);
// true
wordpos.isAdverb('fishly', console.log);
// false
```
### lookupX()
These calls are similar to natural's [lookup()](https://github.com/NaturalNode/natural#wordnet) call, except they can be faster if you
already know the POS of the word.
```
wordpos.lookupNoun(word, callback) -- callback receives array of lookup objects for a noun
wordpos.lookupVerb(word, callback) -- callback receives array of lookup objects for a verb
wordpos.lookupAdjective(word, callback) -- callback receives array of lookup objects for an adjective
wordpos.lookupAdverb(word, callback) -- callback receives array of lookup objects for an adverb
```
Example:
```js
wordpos.lookupAdjective('awesome', console.log);
// output:
[ { synsetOffset: 1282510,
lexFilenum: 0,
pos: 's',
wCnt: 5,
lemma: 'amazing',
synonyms: [ 'amazing', 'awe-inspiring', 'awesome', 'awful', 'awing' ],
lexId: '0',
ptrs: [],
gloss: 'inspiring awe or admiration or wonder; "New York is an amazing city"; "the Grand Canyon is an awe-inspiring
sight"; "the awesome complexity of the universe"; "this sea, whose gently awful stirrings seem to speak of some hidden s
oul beneath"- Melville; "Westminster Hall\'s awing majesty, so vast, so high, so silent" ' } ]
```
In this case only one lookup was found. But there could be several.
Or use WordNet's inherited method:
```js
wordpos.lookup('great', console.log);
// ...
```
Benchmark
----------
node wordpos-bench.js
Generally slow as it requires loading and searching large WordNet index files.
Single word lookup:
```
getPOS : 30 ops/s { iterations: 10, elapsed: 329 }
getNouns : 106 ops/s { iterations: 10, elapsed: 94 }
getVerbs : 111 ops/s { iterations: 10, elapsed: 90 }
getAdjectives : 132 ops/s { iterations: 10, elapsed: 76 }
getAdverbs : 137 ops/s { iterations: 10, elapsed: 73 }
```
128-word lookup:
```
getPOS : 0 ops/s { iterations: 1, elapsed: 2210 }
getNouns : 2 ops/s { iterations: 1, elapsed: 666 }
getVerbs : 2 ops/s { iterations: 1, elapsed: 638 }
getAdjectives : 2 ops/s { iterations: 1, elapsed: 489 }
getAdverbs : 2 ops/s { iterations: 1, elapsed: 407 }
```
On a win7/64-bit/dual-core/3GHz. getPOS() is slowest as it searches through all four index files.
There is probably room for optimization in the underlying library.
License
-------
(The MIT License)
Copyright (c) 2012, mooster@42at.com