Part-of-speech utilities for node.js based on the WordNet database.
Go to file
moos a2a39fed3e Use http for WNdb dependency to fix CRLF issue 2012-05-06 03:50:07 -07:00
README.md fixed link 2012-05-06 02:48:46 -07:00
package.json Use http for WNdb dependency to fix CRLF issue 2012-05-06 03:50:07 -07:00
text-128.txt updated README. Added bench text file 2012-05-03 01:09:33 -07:00
wordpos-bench.js added WNdb module to obtain WordNet files offline 2012-05-06 02:44:21 -07:00
wordpos.js added WNdb module to obtain WordNet files offline 2012-05-06 02:44:21 -07:00
wordpos_spec.js added WNdb module to obtain WordNet files offline 2012-05-06 02:44:21 -07:00

README.md

wordpos

wordpos is a set of part-of-speech utilities for Node.js using natural's WordNet module.

There is no lexigraphical intelligence here (eg, see pos-js). Only dictionary lookup.

Usage

var WordPOS = require('./wordpos'),
    wordpos = new WordPOS();

wordpos.getAdjectives('The angry bear chased the frightened little squirrel.', function(result){
    console.log(result);
});
// [ 'little', 'angry', 'frightened' ]

wordpos.isAdjective('awesome', function(result){
    console.log(result);
});
// true

See wordpos_spec.js for full usage.

Installation

Get the script wordpos.js and use it. (npm package may be coming.)

or use a git path in your package.json dependencies:

  ...
  "dependencies": {
    "wordpos": "git://github.com/moos/wordpos.git"
  },
  ...

As of version 0.1.1, WordNet DB files are obtained off-line through dependency provided by WNdb module.

Note: wordpos-bench.js requires a forked uubench module.

API

Please note: all API are async since the underlying WordNet library is async.

WordPOS is a subclass of natural's WordNet class and inherits all its methods.

getX()

Get POS from text.

wordpos.getPOS(str, callback) -- callback receives a result object:
    {
      nouns:[],       Array of str words that are nouns
      verbs:[],       Array of str words that are verbs
      adjectives:[],  Array of str words that are adjectives
      adverbs:[],     Array of str words that are adverbs
      rest:[]         Array of str words that are not in dict or could not be categorized as a POS
    }

    Note: a word may appear in multiple POS (eg, 'great' is both a noun and an adjective)

wordpos.getNouns(str, callback) -- callback receives an array of nouns in str

wordpos.getVerbs(str, callback) -- callback receives an array of verbs in str

wordpos.getAdjectives(str, callback) -- callback receives an array of adjectives in str

wordpos.getAdverbs(str, callback) -- callback receives an array of adverbs in str

NB: If you're only interested in a certain POS (say, adjectives), using the particular getX() is faster than getPOS() which looks up the word in all index files.

NB: [stopwords] (https://github.com/NaturalNode/natural/blob/master/lib/natural/util/stopwords.js) are stripped out from str before lookup.

Example:

wordpos.getNouns('The angry bear chased the frightened little squirrel.', console.log)
// [ 'bear', 'squirrel', 'little', 'chased' ]

wordpos.getPOS('The angry bear chased the frightened little squirrel.', console.log)
// output:
  {
    nouns: [ 'bear', 'squirrel', 'little', 'chased' ],
    verbs: [ 'bear' ],
    adjectives: [ 'little', 'angry', 'frightened' ],
    adverbs: [ 'little' ],
    rest: [ 'the' ]
  }

This has no relation to correct grammer of given sentence, where here only 'bear' and 'squirrel' would be considered nouns. (see http://nltk.googlecode.com/svn/trunk/doc/book/ch08.html#ex-recnominals)

pos-js, e.g., shows only 'squirrel' as noun:

The / DT
angry / JJ
bear / VB
chased / VBN
the / DT
frightened / VBN
little / JJ
squirrel / NN

isX()

Determine if a word is a particular POS.

wordpos.isNoun(word, callback) -- callback receives result (true/false) if word is a noun.

wordpos.isVerb(word, callback) -- callback receives result (true/false) if word is a verb.

wordpos.isAdjective(word, callback) -- callback receives result (true/false) if word is an adjective.

wordpos.isAdverb(word, callback) -- callback receives result (true/false) if word is an adverb.

Examples:

wordpos.isVerb('fish', console.log);
// true

wordpos.isNoun('fish', console.log);
// true

wordpos.isAdjective('fishy', console.log);
// true

wordpos.isAdverb('fishly', console.log);
// false

lookupX()

These calls are similar to natural's lookup() call, except they can be faster if you already know the POS of the word.

wordpos.lookupNoun(word, callback) -- callback receives array of lookup objects for a noun

wordpos.lookupVerb(word, callback) -- callback receives array of lookup objects for a verb

wordpos.lookupAdjective(word, callback) -- callback receives array of lookup objects for an adjective

wordpos.lookupAdverb(word, callback) -- callback receives array of lookup objects for an adverb

Example:

wordpos.lookupAdjective('awesome', console.log);
// output:
[ { synsetOffset: 1282510,
    lexFilenum: 0,
    pos: 's',
    wCnt: 5,
    lemma: 'amazing',
    synonyms: [ 'amazing', 'awe-inspiring', 'awesome', 'awful', 'awing' ],
    lexId: '0',
    ptrs: [],
    gloss: 'inspiring awe or admiration or wonder; "New York is an amazing city"; "the Grand Canyon is an awe-inspiring
sight"; "the awesome complexity of the universe"; "this sea, whose gently awful stirrings seem to speak of some hidden s
oul beneath"- Melville; "Westminster Hall\'s awing majesty, so vast, so high, so silent"  ' } ]

In this case only one lookup was found. But there could be several.

Or use WordNet's inherited method:

wordpos.lookup('great', console.log);
// ...

Benchmark

Generally slow as it requires loading and searching large WordNet index files.

Single word lookup:

  getPOS : 30 ops/s { iterations: 10, elapsed: 329 }
  getNouns : 106 ops/s { iterations: 10, elapsed: 94 }
  getVerbs : 111 ops/s { iterations: 10, elapsed: 90 }
  getAdjectives : 132 ops/s { iterations: 10, elapsed: 76 }
  getAdverbs : 137 ops/s { iterations: 10, elapsed: 73 }

128-word lookup:

  getPOS : 0 ops/s { iterations: 1, elapsed: 2210 }
  getNouns : 2 ops/s { iterations: 1, elapsed: 666 }
  getVerbs : 2 ops/s { iterations: 1, elapsed: 638 }
  getAdjectives : 2 ops/s { iterations: 1, elapsed: 489 }
  getAdverbs : 2 ops/s { iterations: 1, elapsed: 407 }

On a win7/64-bit/dual-core/3GHz. getPOS() is slowest as it searches through all four index files.

There is probably room for optimization in the underlying library.

License

(The MIT License)

Copyright (c) 2012, mooster@42at.com