Part-of-speech utilities for node.js based on the WordNet database.

adjective adverb grammar language noun pos verb wordnet

Go to file

moos a2a39fed3e Use http for WNdb dependency to fix CRLF issue		2012-05-06 03:50:07 -07:00
README.md	fixed link	2012-05-06 02:48:46 -07:00
package.json	Use http for WNdb dependency to fix CRLF issue	2012-05-06 03:50:07 -07:00
text-128.txt	updated README. Added bench text file	2012-05-03 01:09:33 -07:00
wordpos-bench.js	added WNdb module to obtain WordNet files offline	2012-05-06 02:44:21 -07:00
wordpos.js	added WNdb module to obtain WordNet files offline	2012-05-06 02:44:21 -07:00
wordpos_spec.js	added WNdb module to obtain WordNet files offline	2012-05-06 02:44:21 -07:00

README.md

wordpos

wordpos is a set of part-of-speech utilities for Node.js using natural's WordNet module.

There is no lexigraphical intelligence here (eg, see pos-js). Only dictionary lookup.

Usage

var WordPOS = require('./wordpos'),
    wordpos = new WordPOS();

wordpos.getAdjectives('The angry bear chased the frightened little squirrel.', function(result){
    console.log(result);
});
// [ 'little', 'angry', 'frightened' ]

wordpos.isAdjective('awesome', function(result){
    console.log(result);
});
// true

See wordpos_spec.js for full usage.

Installation

Get the script wordpos.js and use it. (npm package may be coming.)

or use a git path in your package.json dependencies:

  ...
  "dependencies": {
    "wordpos": "git://github.com/moos/wordpos.git"
  },
  ...

As of version 0.1.1, WordNet DB files are obtained off-line through dependency provided by WNdb module.

Note: wordpos-bench.js requires a forked uubench module.

API

Please note: all API are async since the underlying WordNet library is async.

WordPOS is a subclass of natural's WordNet class and inherits all its methods.

getX()

Get POS from text.

wordpos.getPOS(str, callback) -- callback receives a result object:
    {
      nouns:[],       Array of str words that are nouns
      verbs:[],       Array of str words that are verbs
      adjectives:[],  Array of str words that are adjectives
      adverbs:[],     Array of str words that are adverbs
      rest:[]         Array of str words that are not in dict or could not be categorized as a POS
    }

    Note: a word may appear in multiple POS (eg, 'great' is both a noun and an adjective)

wordpos.getNouns(str, callback) -- callback receives an array of nouns in str

wordpos.getVerbs(str, callback) -- callback receives an array of verbs in str

wordpos.getAdjectives(str, callback) -- callback receives an array of adjectives in str

wordpos.getAdverbs(str, callback) -- callback receives an array of adverbs in str

NB: If you're only interested in a certain POS (say, adjectives), using the particular getX() is faster than getPOS() which looks up the word in all index files.

NB: [stopwords] (https://github.com/NaturalNode/natural/blob/master/lib/natural/util/stopwords.js) are stripped out from str before lookup.

Example:

wordpos.getNouns('The angry bear chased the frightened little squirrel.', console.log)
// [ 'bear', 'squirrel', 'little', 'chased' ]

wordpos.getPOS('The angry bear chased the frightened little squirrel.', console.log)
// output:
  {
    nouns: [ 'bear', 'squirrel', 'little', 'chased' ],
    verbs: [ 'bear' ],
    adjectives: [ 'little', 'angry', 'frightened' ],
    adverbs: [ 'little' ],
    rest: [ 'the' ]
  }

This has no relation to correct grammer of given sentence, where here only 'bear' and 'squirrel' would be considered nouns. (see http://nltk.googlecode.com/svn/trunk/doc/book/ch08.html#ex-recnominals)

pos-js, e.g., shows only 'squirrel' as noun:

The / DT
angry / JJ
bear / VB
chased / VBN
the / DT
frightened / VBN
little / JJ
squirrel / NN

isX()

Determine if a word is a particular POS.

wordpos.isNoun(word, callback) -- callback receives result (true/false) if word is a noun.

wordpos.isVerb(word, callback) -- callback receives result (true/false) if word is a verb.

wordpos.isAdjective(word, callback) -- callback receives result (true/false) if word is an adjective.

wordpos.isAdverb(word, callback) -- callback receives result (true/false) if word is an adverb.

Examples:

wordpos.isVerb('fish', console.log);
// true

wordpos.isNoun('fish', console.log);
// true

wordpos.isAdjective('fishy', console.log);
// true

wordpos.isAdverb('fishly', console.log);
// false

lookupX()

These calls are similar to natural's lookup() call, except they can be faster if you already know the POS of the word.

wordpos.lookupNoun(word, callback) -- callback receives array of lookup objects for a noun

wordpos.lookupVerb(word, callback) -- callback receives array of lookup objects for a verb

wordpos.lookupAdjective(word, callback) -- callback receives array of lookup objects for an adjective

wordpos.lookupAdverb(word, callback) -- callback receives array of lookup objects for an adverb

Example:

wordpos.lookupAdjective('awesome', console.log);
// output:
[ { synsetOffset: 1282510,
    lexFilenum: 0,
    pos: 's',
    wCnt: 5,
    lemma: 'amazing',
    synonyms: [ 'amazing', 'awe-inspiring', 'awesome', 'awful', 'awing' ],
    lexId: '0',
    ptrs: [],
    gloss: 'inspiring awe or admiration or wonder; "New York is an amazing city"; "the Grand Canyon is an awe-inspiring
sight"; "the awesome complexity of the universe"; "this sea, whose gently awful stirrings seem to speak of some hidden s
oul beneath"- Melville; "Westminster Hall\'s awing majesty, so vast, so high, so silent"  ' } ]

In this case only one lookup was found. But there could be several.

Or use WordNet's inherited method:

wordpos.lookup('great', console.log);
// ...

Benchmark

Generally slow as it requires loading and searching large WordNet index files.

Single word lookup:

  getPOS : 30 ops/s { iterations: 10, elapsed: 329 }
  getNouns : 106 ops/s { iterations: 10, elapsed: 94 }
  getVerbs : 111 ops/s { iterations: 10, elapsed: 90 }
  getAdjectives : 132 ops/s { iterations: 10, elapsed: 76 }
  getAdverbs : 137 ops/s { iterations: 10, elapsed: 73 }

128-word lookup:

  getPOS : 0 ops/s { iterations: 1, elapsed: 2210 }
  getNouns : 2 ops/s { iterations: 1, elapsed: 666 }
  getVerbs : 2 ops/s { iterations: 1, elapsed: 638 }
  getAdjectives : 2 ops/s { iterations: 1, elapsed: 489 }
  getAdverbs : 2 ops/s { iterations: 1, elapsed: 407 }

On a win7/64-bit/dual-core/3GHz. getPOS() is slowest as it searches through all four index files.

There is probably room for optimization in the underlying library.

License

(The MIT License)