Major update - first v1.0 checkin with support for Promise, remove natural dependency, and more.
This commit is contained in:
parent
2001182b7a
commit
b27c49fd01
|
@ -3,5 +3,3 @@ node_js:
|
||||||
- '5'
|
- '5'
|
||||||
- '4'
|
- '4'
|
||||||
- '0.12'
|
- '0.12'
|
||||||
before_script:
|
|
||||||
- npm install -g jasmine-node
|
|
107
README.md
107
README.md
|
@ -6,7 +6,7 @@ wordpos
|
||||||
|
|
||||||
wordpos is a set of *fast* part-of-speech (POS) utilities for Node.js using fast lookup in the WordNet database.
|
wordpos is a set of *fast* part-of-speech (POS) utilities for Node.js using fast lookup in the WordNet database.
|
||||||
|
|
||||||
Version 1.x is a mojor update with no direct depedence on [natural's](http://github.com/NaturalNode/natural), with support for Promises, and roughly 5x speed improvement over previous version.
|
Version 1.x is a major update with no direct dependence on [natural's](http://github.com/NaturalNode/natural), with support for [Promises](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Promise), and roughly 5x speed improvement over previous version.
|
||||||
|
|
||||||
**CAUTION** The WordNet database [wordnet-db](https://github.com/moos/wordnet-db) comprises [155,287 words](http://wordnet.princeton.edu/wordnet/man/wnstats.7WN.html) (3.0 numbers) which uncompress to over **30 MB** of data in several *un*[browserify](https://github.com/substack/node-browserify)-able files. It is *not* meant for the browser environment.
|
**CAUTION** The WordNet database [wordnet-db](https://github.com/moos/wordnet-db) comprises [155,287 words](http://wordnet.princeton.edu/wordnet/man/wnstats.7WN.html) (3.0 numbers) which uncompress to over **30 MB** of data in several *un*[browserify](https://github.com/substack/node-browserify)-able files. It is *not* meant for the browser environment.
|
||||||
|
|
||||||
|
@ -104,7 +104,7 @@ wordpos.getPOS(text, callback) -- callback receives a result object:
|
||||||
```
|
```
|
||||||
|
|
||||||
If you're only interested in a certain POS (say, adjectives), using the particular getX() is faster
|
If you're only interested in a certain POS (say, adjectives), using the particular getX() is faster
|
||||||
than getPOS() which looks up the word in all index files. [stopwords](https://github.com/moos/wordpos/lib/natural/util/stopwords.js)are stripped out from text before lookup.
|
than getPOS() which looks up the word in all index files. [stopwords](lib/natural/util/stopwords.js) are stripped out from text before lookup.
|
||||||
|
|
||||||
If `text` is an *array*, all words are looked-up -- no deduplication, stopword filtering or tokenization is applied.
|
If `text` is an *array*, all words are looked-up -- no deduplication, stopword filtering or tokenization is applied.
|
||||||
|
|
||||||
|
@ -127,8 +127,7 @@ wordpos.getPOS('The angry bear chased the frightened little squirrel.', console.
|
||||||
}
|
}
|
||||||
|
|
||||||
```
|
```
|
||||||
This has no relation to correct grammar of given sentence, where here only 'bear' and 'squirrel'
|
This has no relation to correct grammar of given sentence, where here only 'bear' and 'squirrel' would be considered nouns.
|
||||||
would be considered nouns.
|
|
||||||
|
|
||||||
#### isNoun(word, callback)
|
#### isNoun(word, callback)
|
||||||
#### isVerb(word, callback)
|
#### isVerb(word, callback)
|
||||||
|
@ -228,7 +227,33 @@ Access the array of stopwords.
|
||||||
|
|
||||||
## Promises
|
## Promises
|
||||||
|
|
||||||
TODO
|
As of v1.0, all `get`, `is`, `rand`, and `lookup` methods return a standard ES6 [Promise](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Promise).
|
||||||
|
|
||||||
|
```js
|
||||||
|
wordpos.isVerb('fish').then(console.log);
|
||||||
|
// true
|
||||||
|
```
|
||||||
|
|
||||||
|
Compound, with error handler:
|
||||||
|
|
||||||
|
```js
|
||||||
|
wordpos.isVerb('fish')
|
||||||
|
.then(console.log)
|
||||||
|
.then(doSomethingElse)
|
||||||
|
.catch(console.error);
|
||||||
|
```
|
||||||
|
|
||||||
|
Callbacks, if given, are executed _before_ the Promise is resolved.
|
||||||
|
|
||||||
|
```js
|
||||||
|
wordpos.isVerb('fish', console.log)
|
||||||
|
.then(console.log)
|
||||||
|
.catch(console.error);
|
||||||
|
// true 'fish' 13
|
||||||
|
// true
|
||||||
|
```
|
||||||
|
Note that callback receives full arguments (including profile, if enabled), while the Promise receives only the result of the call. Also, beware that exceptions in the _callback_ will result in the Promise being _rejected_ and caught by `catch()`, if provided.
|
||||||
|
|
||||||
|
|
||||||
## Fast Index
|
## Fast Index
|
||||||
|
|
||||||
|
@ -236,7 +261,7 @@ Version 0.1.4 introduces `fastIndex` option. This uses a secondary index on the
|
||||||
|
|
||||||
Fast index improves performance **30x** over Natural's native methods. See blog article [Optimizing WordPos](http://blog.42at.com/optimizing-wordpos).
|
Fast index improves performance **30x** over Natural's native methods. See blog article [Optimizing WordPos](http://blog.42at.com/optimizing-wordpos).
|
||||||
|
|
||||||
As of version 1.0, the fast index option is always on and cannot be turned off.
|
As of version 1.0, fast index is always on and cannot be turned off.
|
||||||
|
|
||||||
## Command-line: CLI
|
## Command-line: CLI
|
||||||
|
|
||||||
|
@ -245,73 +270,15 @@ For CLI usage and examples, see [bin/README](bin).
|
||||||
|
|
||||||
## Benchmark
|
## Benchmark
|
||||||
|
|
||||||
Note: `wordpos-bench.js` requires a [forked uubench](https://github.com/moos/uubench) module.
|
See [benchmark](benchmark/README).
|
||||||
|
|
||||||
cd bench
|
|
||||||
node wordpos-bench.js
|
|
||||||
|
|
||||||
|
|
||||||
512-word corpus (< v0.1.4, comparable to Natural) :
|
|
||||||
```
|
|
||||||
getPOS : 0 ops/s { iterations: 1, elapsed: 9039 }
|
|
||||||
getNouns : 0 ops/s { iterations: 1, elapsed: 2347 }
|
|
||||||
getVerbs : 0 ops/s { iterations: 1, elapsed: 2434 }
|
|
||||||
getAdjectives : 1 ops/s { iterations: 1, elapsed: 1698 }
|
|
||||||
getAdverbs : 0 ops/s { iterations: 1, elapsed: 2698 }
|
|
||||||
done in 20359 msecs
|
|
||||||
```
|
|
||||||
|
|
||||||
512-word corpus (as of v0.1.4, with fastIndex) :
|
|
||||||
```
|
|
||||||
getPOS : 18 ops/s { iterations: 1, elapsed: 57 }
|
|
||||||
getNouns : 48 ops/s { iterations: 1, elapsed: 21 }
|
|
||||||
getVerbs : 125 ops/s { iterations: 1, elapsed: 8 }
|
|
||||||
getAdjectives : 111 ops/s { iterations: 1, elapsed: 9 }
|
|
||||||
getAdverbs : 143 ops/s { iterations: 1, elapsed: 7 }
|
|
||||||
done in 1375 msecs
|
|
||||||
```
|
|
||||||
|
|
||||||
220 words are looked-up (less stopwords and duplicates) on a win7/64-bit/dual-core/3GHz. getPOS() is slowest as it searches through all four index files.
|
|
||||||
|
|
||||||
### Version 1.0 Benchmark
|
|
||||||
|
|
||||||
Re-run v0.1.16:
|
|
||||||
```
|
|
||||||
getPOS : 11 ops/s { iterations: 1, elapsed: 90 }
|
|
||||||
getNouns : 21 ops/s { iterations: 1, elapsed: 47 }
|
|
||||||
getVerbs : 53 ops/s { iterations: 1, elapsed: 19 }
|
|
||||||
getAdjectives : 29 ops/s { iterations: 1, elapsed: 34 }
|
|
||||||
getAdverbs : 83 ops/s { iterations: 1, elapsed: 12 }
|
|
||||||
lookup : 1 ops/s { iterations: 1, elapsed: 720 }
|
|
||||||
lookupNoun : 1 ops/s { iterations: 1, elapsed: 676 }
|
|
||||||
|
|
||||||
looked up 220 words
|
|
||||||
done in 2459 msecs
|
|
||||||
```
|
|
||||||
|
|
||||||
V1.0:
|
|
||||||
```
|
|
||||||
getPOS : 14 ops/s { iterations: 1, elapsed: 73 }
|
|
||||||
getNouns : 26 ops/s { iterations: 1, elapsed: 38 }
|
|
||||||
getVerbs : 42 ops/s { iterations: 1, elapsed: 24 }
|
|
||||||
getAdjectives : 24 ops/s { iterations: 1, elapsed: 42 }
|
|
||||||
getAdverbs : 26 ops/s { iterations: 1, elapsed: 38 }
|
|
||||||
lookup : 6 ops/s { iterations: 1, elapsed: 159 }
|
|
||||||
lookupNoun : 13 ops/s { iterations: 1, elapsed: 77 }
|
|
||||||
|
|
||||||
looked up 221 words
|
|
||||||
done in 1274 msecs
|
|
||||||
```
|
|
||||||
That's roughly **2x** better across the board. Functions that read the data files see much improved performance: `lookup` about **5x** and `lookupNoun` over **8x**.
|
|
||||||
|
|
||||||
|
|
||||||
## Changes
|
## Changes
|
||||||
|
|
||||||
1.0.1
|
1.0.0
|
||||||
- Removed direct dependency on Natural. Certain modules are included in /lib.
|
- Removed npm dependency on Natural. Certain modules are included in /lib.
|
||||||
- Add support for Promises.
|
- Add support for ES6 Promises.
|
||||||
- Improved data file reads for up to **5x** performance increase.
|
- Improved data file reads for up to **5x** performance increase compared to previous version.
|
||||||
- Tests are now mocha-based with assert interface.
|
- Tests are now [mocha](https://mochajs.org/)-based with [chai](http://chaijs.com/) assert interface.
|
||||||
|
|
||||||
0.1.16
|
0.1.16
|
||||||
- Changed dependency to wordnet-db (renamed from WNdb)
|
- Changed dependency to wordnet-db (renamed from WNdb)
|
||||||
|
|
|
@ -0,0 +1,80 @@
|
||||||
|
## Benchmark
|
||||||
|
|
||||||
|
```bash
|
||||||
|
cd bench
|
||||||
|
node wordpos-bench.js
|
||||||
|
```
|
||||||
|
|
||||||
|
### Version 1.0 Benchmark
|
||||||
|
|
||||||
|
The following benchmarks were run on a Win8.1/Core i7/3.5GHz machine on a Seagate 500GB SATA II, 7200 RPM disk. The corpus was a 512-word text, with stopwords and duplicates removed, resulting in 220 words looked-up.
|
||||||
|
|
||||||
|
#### Pre v0.14 (comparable to Natural)
|
||||||
|
```
|
||||||
|
getPOS : 1 ops/s { iterations: 1, elapsed: 1514 }
|
||||||
|
getNouns : 2 ops/s { iterations: 1, elapsed: 409 }
|
||||||
|
getVerbs : 2 ops/s { iterations: 1, elapsed: 418 }
|
||||||
|
getAdjectives : 3 ops/s { iterations: 1, elapsed: 332 }
|
||||||
|
getAdverbs : 4 ops/s { iterations: 1, elapsed: 272 }
|
||||||
|
lookup : 1 ops/s { iterations: 1, elapsed: 1981 }
|
||||||
|
lookupNoun : 0 ops/s { iterations: 1, elapsed: 2016 }
|
||||||
|
|
||||||
|
looked up 220 words
|
||||||
|
done in 7770 msecs
|
||||||
|
```
|
||||||
|
|
||||||
|
#### v0.1.16 (with fastIndex):
|
||||||
|
```
|
||||||
|
getPOS : 11 ops/s { iterations: 1, elapsed: 90 }
|
||||||
|
getNouns : 21 ops/s { iterations: 1, elapsed: 47 }
|
||||||
|
getVerbs : 53 ops/s { iterations: 1, elapsed: 19 }
|
||||||
|
getAdjectives : 29 ops/s { iterations: 1, elapsed: 34 }
|
||||||
|
getAdverbs : 83 ops/s { iterations: 1, elapsed: 12 }
|
||||||
|
lookup : 1 ops/s { iterations: 1, elapsed: 720 }
|
||||||
|
lookupNoun : 1 ops/s { iterations: 1, elapsed: 676 }
|
||||||
|
|
||||||
|
looked up 220 words
|
||||||
|
done in 2459 msecs
|
||||||
|
```
|
||||||
|
|
||||||
|
#### v1.0:
|
||||||
|
```
|
||||||
|
getPOS : 14 ops/s { iterations: 1, elapsed: 73 }
|
||||||
|
getNouns : 26 ops/s { iterations: 1, elapsed: 38 }
|
||||||
|
getVerbs : 42 ops/s { iterations: 1, elapsed: 24 }
|
||||||
|
getAdjectives : 24 ops/s { iterations: 1, elapsed: 42 }
|
||||||
|
getAdverbs : 26 ops/s { iterations: 1, elapsed: 38 }
|
||||||
|
lookup : 6 ops/s { iterations: 1, elapsed: 159 }
|
||||||
|
lookupNoun : 13 ops/s { iterations: 1, elapsed: 77 }
|
||||||
|
|
||||||
|
looked up 221 words
|
||||||
|
done in 1274 msecs
|
||||||
|
```
|
||||||
|
|
||||||
|
These are **3.5x** better compared to v0.1.16 and **15x** better compared to pre v0.14, overall. Functions that read the data files see much improved performance: `lookup` about **13x** and `lookupNoun` **26x** compared to pre v0.14.
|
||||||
|
|
||||||
|
|
||||||
|
### Old benchmark
|
||||||
|
|
||||||
|
512-word corpus (< v0.1.4, comparable to Natural) :
|
||||||
|
```
|
||||||
|
getPOS : 0 ops/s { iterations: 1, elapsed: 9039 }
|
||||||
|
getNouns : 0 ops/s { iterations: 1, elapsed: 2347 }
|
||||||
|
getVerbs : 0 ops/s { iterations: 1, elapsed: 2434 }
|
||||||
|
getAdjectives : 1 ops/s { iterations: 1, elapsed: 1698 }
|
||||||
|
getAdverbs : 0 ops/s { iterations: 1, elapsed: 2698 }
|
||||||
|
done in 20359 msecs
|
||||||
|
```
|
||||||
|
|
||||||
|
512-word corpus (as of v0.1.4, with fastIndex) :
|
||||||
|
```
|
||||||
|
getPOS : 18 ops/s { iterations: 1, elapsed: 57 }
|
||||||
|
getNouns : 48 ops/s { iterations: 1, elapsed: 21 }
|
||||||
|
getVerbs : 125 ops/s { iterations: 1, elapsed: 8 }
|
||||||
|
getAdjectives : 111 ops/s { iterations: 1, elapsed: 9 }
|
||||||
|
getAdverbs : 143 ops/s { iterations: 1, elapsed: 7 }
|
||||||
|
done in 1375 msecs
|
||||||
|
```
|
||||||
|
|
||||||
|
220 words are looked-up (less stopwords and duplicates) on a win7/64-bit/dual-core/3GHz. getPOS() is slowest as it searches through all four index files.
|
||||||
|
|
|
@ -1,15 +1,23 @@
|
||||||
|
/**
|
||||||
|
* wordpos-bench.js
|
||||||
|
*
|
||||||
|
* Copyright (c) 2012-2016 mooster@42at.com
|
||||||
|
* https://github.com/moos/wordpos
|
||||||
|
*
|
||||||
|
* Released under MIT license
|
||||||
|
*/
|
||||||
|
|
||||||
var uubench = require('uubench'), // from: https://github.com/moos/uubench
|
var Bench = require('mini-bench'),
|
||||||
fs = require('fs'),
|
fs = require('fs'),
|
||||||
_ = require('underscore')._,
|
_ = require('underscore')._,
|
||||||
WordPOS = require('../src/wordpos'),
|
WordPOS = require('../src/wordpos'),
|
||||||
wordpos = new WordPOS();
|
wordpos = new WordPOS();
|
||||||
|
|
||||||
|
|
||||||
suite = new uubench.Suite({
|
suite = new Bench.Suite({
|
||||||
type: 'fixed',
|
type: 'fixed',
|
||||||
iterations: 1,
|
iterations: 1,
|
||||||
sync: true, // important!
|
async: false, // important!
|
||||||
|
|
||||||
start: function(tests){
|
start: function(tests){
|
||||||
console.log('starting %d tests', tests.length);
|
console.log('starting %d tests', tests.length);
|
||||||
|
@ -110,6 +118,7 @@ suite.section('--512 words--', function(next){
|
||||||
suite.options.iterations = 1;
|
suite.options.iterations = 1;
|
||||||
next();
|
next();
|
||||||
});
|
});
|
||||||
|
|
||||||
suite.bench('getPOS', getPOS);
|
suite.bench('getPOS', getPOS);
|
||||||
suite.bench('getNouns', getNouns);
|
suite.bench('getNouns', getNouns);
|
||||||
suite.bench('getVerbs', getVerbs);
|
suite.bench('getVerbs', getVerbs);
|
||||||
|
@ -118,6 +127,4 @@ suite.bench('getAdverbs', getAdverbs);
|
||||||
suite.bench('lookup', lookup);
|
suite.bench('lookup', lookup);
|
||||||
suite.bench('lookupNoun', lookupNoun);
|
suite.bench('lookupNoun', lookupNoun);
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
suite.run();
|
suite.run();
|
||||||
|
|
22
package.json
22
package.json
|
@ -1,7 +1,15 @@
|
||||||
{
|
{
|
||||||
"name": "wordpos",
|
"name": "wordpos",
|
||||||
"author": "Moos <mooster@42at.com>",
|
"author": "Moos <mooster@42at.com>",
|
||||||
"keywords": ["natural", "language", "wordnet", "adjectives", "nouns", "adverbs", "verbs"],
|
"keywords": [
|
||||||
|
"natural",
|
||||||
|
"language",
|
||||||
|
"wordnet",
|
||||||
|
"adjectives",
|
||||||
|
"nouns",
|
||||||
|
"adverbs",
|
||||||
|
"verbs"
|
||||||
|
],
|
||||||
"description": "wordpos is a set of part-of-speech utilities for Node.js using the WordNet database.",
|
"description": "wordpos is a set of part-of-speech utilities for Node.js using the WordNet database.",
|
||||||
"version": "1.0.0-RC1",
|
"version": "1.0.0-RC1",
|
||||||
"homepage": "https://github.com/moos/wordpos",
|
"homepage": "https://github.com/moos/wordpos",
|
||||||
|
@ -10,18 +18,18 @@
|
||||||
},
|
},
|
||||||
"bin": "./bin/wordpos-cli.js",
|
"bin": "./bin/wordpos-cli.js",
|
||||||
"dependencies": {
|
"dependencies": {
|
||||||
|
"commander": "^2.0.0",
|
||||||
"underscore": ">=1.3.1",
|
"underscore": ">=1.3.1",
|
||||||
"wordnet-db": "latest",
|
"wordnet-db": "latest"
|
||||||
"commander": "^2.0.0"
|
|
||||||
},
|
},
|
||||||
"devDependencies": {
|
"devDependencies": {
|
||||||
"uubench": "git://github.com/moos/uubench.git",
|
"mini-bench": "^1.0.0",
|
||||||
"chai": "*",
|
"chai": "*",
|
||||||
"mocha": "*"
|
"mocha": "*"
|
||||||
},
|
},
|
||||||
"repository" : {
|
"repository": {
|
||||||
"type" : "git",
|
"type": "git",
|
||||||
"url" : "git://github.com/moos/wordpos.git"
|
"url": "git://github.com/moos/wordpos.git"
|
||||||
},
|
},
|
||||||
"main": "./src/wordpos.js",
|
"main": "./src/wordpos.js",
|
||||||
"scripts": {
|
"scripts": {
|
||||||
|
|
|
@ -1,24 +1,41 @@
|
||||||
|
/*!
|
||||||
|
* dataFile.js
|
||||||
|
*
|
||||||
|
* Copyright (c) 2012-2016 mooster@42at.com
|
||||||
|
* https://github.com/moos/wordpos
|
||||||
|
*
|
||||||
|
* Portions: Copyright (c) 2011, Chris Umbel
|
||||||
|
*
|
||||||
|
* Released under MIT license
|
||||||
|
*/
|
||||||
|
|
||||||
var fs = require('fs'),
|
var fs = require('fs'),
|
||||||
path = require('path'),
|
path = require('path'),
|
||||||
_ = require('underscore');
|
_ = require('underscore');
|
||||||
|
|
||||||
|
|
||||||
// courtesy of natural.WordNet
|
/**
|
||||||
// TODO link
|
* parse a single data file line, returning data object
|
||||||
|
*
|
||||||
|
* @param line {string} - a single line from WordNet data file
|
||||||
|
* @returns {object}
|
||||||
|
*
|
||||||
|
* Credit for this routine to https://github.com/NaturalNode/natural
|
||||||
|
*/
|
||||||
function lineDataToJSON(line) {
|
function lineDataToJSON(line) {
|
||||||
var data = line.split('| '),
|
var data = line.split('| '),
|
||||||
tokens = data[0].split(/\s+/),
|
tokens = data[0].split(/\s+/),
|
||||||
ptrs = [],
|
ptrs = [],
|
||||||
wCnt = parseInt(tokens[3], 16),
|
wCnt = parseInt(tokens[3], 16),
|
||||||
synonyms = [];
|
synonyms = [],
|
||||||
|
i;
|
||||||
|
|
||||||
for(var i = 0; i < wCnt; i++) {
|
for(i = 0; i < wCnt; i++) {
|
||||||
synonyms.push(tokens[4 + i * 2]);
|
synonyms.push(tokens[4 + i * 2]);
|
||||||
}
|
}
|
||||||
|
|
||||||
var ptrOffset = (wCnt - 1) * 2 + 6;
|
var ptrOffset = (wCnt - 1) * 2 + 6;
|
||||||
for(var i = 0; i < parseInt(tokens[ptrOffset], 10); i++) {
|
for(i = 0; i < parseInt(tokens[ptrOffset], 10); i++) {
|
||||||
ptrs.push({
|
ptrs.push({
|
||||||
pointerSymbol: tokens[ptrOffset + 1 + i * 4],
|
pointerSymbol: tokens[ptrOffset + 1 + i * 4],
|
||||||
synsetOffset: parseInt(tokens[ptrOffset + 2 + i * 4], 10),
|
synsetOffset: parseInt(tokens[ptrOffset + 2 + i * 4], 10),
|
||||||
|
@ -51,10 +68,15 @@ function lineDataToJSON(line) {
|
||||||
};
|
};
|
||||||
}
|
}
|
||||||
|
|
||||||
|
/**
|
||||||
|
* read data file at location (bound to a data file).
|
||||||
|
* Reads nominal length and checks for EOL. Continue reading until EOL.
|
||||||
|
*
|
||||||
|
* @param location {Number} - seek location
|
||||||
|
* @param callback {function} - callback function
|
||||||
|
*/
|
||||||
function readLocation(location, callback) {
|
function readLocation(location, callback) {
|
||||||
//console.log('## read location ', this.fileName, location);
|
//console.log('## read location ', this.fileName, location);
|
||||||
|
|
||||||
var
|
var
|
||||||
file = this,
|
file = this,
|
||||||
str = '',
|
str = '',
|
||||||
|
@ -68,8 +90,6 @@ function readLocation(location, callback) {
|
||||||
return;
|
return;
|
||||||
}
|
}
|
||||||
//console.log(' read %d bytes at <%d>', count, location);
|
//console.log(' read %d bytes at <%d>', count, location);
|
||||||
//console.log(str);
|
|
||||||
|
|
||||||
callback(null, lineDataToJSON(str));
|
callback(null, lineDataToJSON(str));
|
||||||
});
|
});
|
||||||
|
|
||||||
|
@ -77,10 +97,9 @@ function readLocation(location, callback) {
|
||||||
fs.read(file.fd, buffer, 0, len, pos, function (err, count) {
|
fs.read(file.fd, buffer, 0, len, pos, function (err, count) {
|
||||||
str += buffer.toString('ascii');
|
str += buffer.toString('ascii');
|
||||||
var eol = str.indexOf('\n');
|
var eol = str.indexOf('\n');
|
||||||
|
|
||||||
//console.log(' -- read %d bytes at <%d>', count, pos, eol);
|
//console.log(' -- read %d bytes at <%d>', count, pos, eol);
|
||||||
|
|
||||||
if (eol === -1 && len < file.maxLineLength) {
|
if (eol === -1 && len < file.maxLineLength) {
|
||||||
|
// continue reading
|
||||||
return readChunk(pos + count, cb);
|
return readChunk(pos + count, cb);
|
||||||
}
|
}
|
||||||
|
|
||||||
|
@ -90,14 +109,19 @@ function readLocation(location, callback) {
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
|
/**
|
||||||
|
* main lookup function
|
||||||
|
*
|
||||||
|
* @param record {object} - record to lookup, obtained from index.find()
|
||||||
|
* @param callback{function} (optional) - callback function
|
||||||
|
* @returns {Promise}
|
||||||
|
*/
|
||||||
function lookup(record, callback) {
|
function lookup(record, callback) {
|
||||||
var results = [],
|
var results = [],
|
||||||
self = this,
|
self = this,
|
||||||
offsets = record.synsetOffset;
|
offsets = record.synsetOffset;
|
||||||
|
|
||||||
return new Promise(function(resolve, reject) {
|
return new Promise(function(resolve, reject) {
|
||||||
//console.log('data lookup', record);
|
|
||||||
|
|
||||||
offsets
|
offsets
|
||||||
.map(function (offset) {
|
.map(function (offset) {
|
||||||
return _.partial(readLocation.bind(self), offset);
|
return _.partial(readLocation.bind(self), offset);
|
||||||
|
@ -109,7 +133,6 @@ function lookup(record, callback) {
|
||||||
|
|
||||||
function done(lastResult) {
|
function done(lastResult) {
|
||||||
closeFile();
|
closeFile();
|
||||||
//console.log('done promise -- ');
|
|
||||||
if (lastResult instanceof Error) {
|
if (lastResult instanceof Error) {
|
||||||
callback && callback(lastResult, []);
|
callback && callback(lastResult, []);
|
||||||
reject(lastResult);
|
reject(lastResult);
|
||||||
|
@ -129,7 +152,6 @@ function lookup(record, callback) {
|
||||||
//console.log(' ... opening', self.filePath);
|
//console.log(' ... opening', self.filePath);
|
||||||
self.fd = fs.openSync(self.filePath, 'r');
|
self.fd = fs.openSync(self.filePath, 'r');
|
||||||
}
|
}
|
||||||
|
|
||||||
// ref count so we know when to close the main index file
|
// ref count so we know when to close the main index file
|
||||||
++self.refcount;
|
++self.refcount;
|
||||||
return Promise.resolve();
|
return Promise.resolve();
|
||||||
|
@ -145,13 +167,17 @@ function lookup(record, callback) {
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
|
/**
|
||||||
|
* turn ordinary function into a promising one!
|
||||||
|
*
|
||||||
|
* @param collect {Array} - used to collect results
|
||||||
|
* @returns {Function}
|
||||||
|
*/
|
||||||
function promisifyInto(collect) {
|
function promisifyInto(collect) {
|
||||||
return function(fn) {
|
return function(fn) {
|
||||||
return function() {
|
return function() {
|
||||||
return new Promise(function (resolve, reject) {
|
return new Promise(function (resolve, reject) {
|
||||||
fn(function (error, result) { // Note callback signature!
|
fn(function (error, result) { // Note: callback signature!
|
||||||
//console.log('cb from get', arguments)
|
|
||||||
if (error) {
|
if (error) {
|
||||||
reject(error);
|
reject(error);
|
||||||
}
|
}
|
||||||
|
@ -166,7 +192,13 @@ function promisifyInto(collect) {
|
||||||
}
|
}
|
||||||
|
|
||||||
|
|
||||||
|
/**
|
||||||
|
* DataFile class
|
||||||
|
*
|
||||||
|
* @param dictPath {string} - path to dict folder
|
||||||
|
* @param name {string} - POS name
|
||||||
|
* @constructor
|
||||||
|
*/
|
||||||
var DataFile = function(dictPath, name) {
|
var DataFile = function(dictPath, name) {
|
||||||
this.dictPath = dictPath;
|
this.dictPath = dictPath;
|
||||||
this.fileName = 'data.' + name;
|
this.fileName = 'data.' + name;
|
||||||
|
@ -177,13 +209,23 @@ var DataFile = function(dictPath, name) {
|
||||||
this.refcount = 0;
|
this.refcount = 0;
|
||||||
};
|
};
|
||||||
|
|
||||||
// maximum read length at a time
|
/**
|
||||||
|
* maximum read length at a time
|
||||||
|
* @type {Number}
|
||||||
|
*/
|
||||||
var MAX_SINGLE_READ_LENGTH = 512;
|
var MAX_SINGLE_READ_LENGTH = 512;
|
||||||
|
|
||||||
//DataFile.prototype.get = get;
|
/**
|
||||||
|
* lookup
|
||||||
|
*/
|
||||||
DataFile.prototype.lookup = lookup;
|
DataFile.prototype.lookup = lookup;
|
||||||
|
|
||||||
// e.g.: wc -L data.adv as of v3.1
|
|
||||||
|
/**
|
||||||
|
* maximum line length in each data file - used to optimize reads
|
||||||
|
*
|
||||||
|
* wc -L data.adv as of v3.1
|
||||||
|
*/
|
||||||
DataFile.MAX_LINE_LENGTH = {
|
DataFile.MAX_LINE_LENGTH = {
|
||||||
noun: 12972,
|
noun: 12972,
|
||||||
verb: 7713,
|
verb: 7713,
|
||||||
|
@ -191,4 +233,5 @@ DataFile.MAX_LINE_LENGTH = {
|
||||||
adv: 638
|
adv: 638
|
||||||
};
|
};
|
||||||
|
|
||||||
|
|
||||||
module.exports = DataFile;
|
module.exports = DataFile;
|
||||||
|
|
|
@ -6,6 +6,8 @@
|
||||||
* Copyright (c) 2012-2016 mooster@42at.com
|
* Copyright (c) 2012-2016 mooster@42at.com
|
||||||
* https://github.com/moos/wordpos
|
* https://github.com/moos/wordpos
|
||||||
*
|
*
|
||||||
|
* Portions: Copyright (c) 2011, Chris Umbel
|
||||||
|
*
|
||||||
* Released under MIT license
|
* Released under MIT license
|
||||||
*/
|
*/
|
||||||
|
|
||||||
|
@ -16,6 +18,7 @@ var _ = require('underscore')._,
|
||||||
piper = require('./piper'),
|
piper = require('./piper'),
|
||||||
KEY_LENGTH = 3;
|
KEY_LENGTH = 3;
|
||||||
|
|
||||||
|
|
||||||
/**
|
/**
|
||||||
* load fast index bucket data
|
* load fast index bucket data
|
||||||
*
|
*
|
||||||
|
@ -112,7 +115,7 @@ function find(search, callback) {
|
||||||
// pay the piper
|
// pay the piper
|
||||||
this.piper(task, readIndexForKey, args, context, collector);
|
this.piper(task, readIndexForKey, args, context, collector);
|
||||||
|
|
||||||
function collector(key, index, search, callback, buffer){
|
function collector(_key, index, search, callback, buffer){
|
||||||
var lines = buffer.toString().split('\n'),
|
var lines = buffer.toString().split('\n'),
|
||||||
keys = lines.map(function(line){
|
keys = lines.map(function(line){
|
||||||
return line.substring(0,line.indexOf(' '));
|
return line.substring(0,line.indexOf(' '));
|
||||||
|
@ -136,21 +139,24 @@ function find(search, callback) {
|
||||||
* @param word {string} - search word
|
* @param word {string} - search word
|
||||||
* @param callback {function} - callback function receives result
|
* @param callback {function} - callback function receives result
|
||||||
* @returns none
|
* @returns none
|
||||||
|
*
|
||||||
|
* Credit for this routine to https://github.com/NaturalNode/natural
|
||||||
*/
|
*/
|
||||||
function lookup(word, callback) {
|
function lookup(word, callback) {
|
||||||
var self = this;
|
var self = this;
|
||||||
|
|
||||||
return new Promise(function(resolve, reject){
|
return new Promise(function(resolve, reject){
|
||||||
self.find(word, function (record) {
|
self.find(word, function (record) {
|
||||||
var indexRecord = null;
|
var indexRecord = null,
|
||||||
|
i;
|
||||||
|
|
||||||
if (record.status == 'hit') {
|
if (record.status == 'hit') {
|
||||||
var ptrs = [], offsets = [];
|
var ptrs = [], offsets = [];
|
||||||
|
|
||||||
for (var i = 0; i < parseInt(record.tokens[3]); i++)
|
for (i = 0; i < parseInt(record.tokens[3]); i++)
|
||||||
ptrs.push(record.tokens[i]);
|
ptrs.push(record.tokens[i]);
|
||||||
|
|
||||||
for (var i = 0; i < parseInt(record.tokens[2]); i++)
|
for (i = 0; i < parseInt(record.tokens[2]); i++)
|
||||||
offsets.push(parseInt(record.tokens[ptrs.length + 6 + i], 10));
|
offsets.push(parseInt(record.tokens[ptrs.length + 6 + i], 10));
|
||||||
|
|
||||||
indexRecord = {
|
indexRecord = {
|
||||||
|
|
|
@ -12,7 +12,6 @@
|
||||||
|
|
||||||
var _ = require('underscore')._,
|
var _ = require('underscore')._,
|
||||||
util = require('util'),
|
util = require('util'),
|
||||||
path = require('path'),
|
|
||||||
fs = require('fs');
|
fs = require('fs');
|
||||||
|
|
||||||
/**
|
/**
|
||||||
|
@ -21,7 +20,7 @@ var _ = require('underscore')._,
|
||||||
*
|
*
|
||||||
* @param task {string} - task name unique to method!
|
* @param task {string} - task name unique to method!
|
||||||
* @param method {function} - method to execute, gets (args, ... , callback)
|
* @param method {function} - method to execute, gets (args, ... , callback)
|
||||||
* @param args {array} - args to pass to method
|
* @param args {Array} - args to pass to method
|
||||||
* @param context {object} - other params to remember and sent to callback
|
* @param context {object} - other params to remember and sent to callback
|
||||||
* @param callback {function} - result callback
|
* @param callback {function} - result callback
|
||||||
*/
|
*/
|
||||||
|
|
124
src/rand.js
124
src/rand.js
|
@ -36,10 +36,10 @@ function makeRandX(pos){
|
||||||
callback = opts;
|
callback = opts;
|
||||||
}
|
}
|
||||||
|
|
||||||
index.rand(startsWith, count, function(record) {
|
return index.rand(startsWith, count, function (record) {
|
||||||
args.push(record, startsWith);
|
args.push(record, startsWith);
|
||||||
profile && args.push(new Date() - start);
|
profile && args.push(new Date() - start);
|
||||||
callback.apply(null, args);
|
callback && callback.apply(null, args);
|
||||||
});
|
});
|
||||||
};
|
};
|
||||||
}
|
}
|
||||||
|
@ -50,6 +50,7 @@ function makeRandX(pos){
|
||||||
* @param startsWith {string} - get random word(s) that start with this, or ''
|
* @param startsWith {string} - get random word(s) that start with this, or ''
|
||||||
* @param num {number} - number of words to return
|
* @param num {number} - number of words to return
|
||||||
* @param callback {function} - callback function, receives words array and startsWith
|
* @param callback {function} - callback function, receives words array and startsWith
|
||||||
|
* @returns Promise
|
||||||
*/
|
*/
|
||||||
function rand(startsWith, num, callback){
|
function rand(startsWith, num, callback){
|
||||||
var self = this,
|
var self = this,
|
||||||
|
@ -57,8 +58,10 @@ function rand(startsWith, num, callback){
|
||||||
trie = this.fastIndex.trie,
|
trie = this.fastIndex.trie,
|
||||||
key, keys;
|
key, keys;
|
||||||
|
|
||||||
|
return new Promise(function(resolve, reject) {
|
||||||
|
|
||||||
//console.log('-- ', startsWith, num, self.fastIndex.indexKeys.length);
|
//console.log('-- ', startsWith, num, self.fastIndex.indexKeys.length);
|
||||||
if (startsWith){
|
if (startsWith) {
|
||||||
key = startsWith.slice(0, KEY_LENGTH);
|
key = startsWith.slice(0, KEY_LENGTH);
|
||||||
|
|
||||||
/**
|
/**
|
||||||
|
@ -67,17 +70,17 @@ function rand(startsWith, num, callback){
|
||||||
if (key.length < KEY_LENGTH) {
|
if (key.length < KEY_LENGTH) {
|
||||||
|
|
||||||
// calc trie if haven't done so yet
|
// calc trie if haven't done so yet
|
||||||
if (!trie){
|
if (!trie) {
|
||||||
trie = new Trie();
|
trie = new Trie();
|
||||||
trie.addStrings(self.fastIndex.indexKeys);
|
trie.addStrings(self.fastIndex.indexKeys);
|
||||||
this.fastIndex.trie = trie;
|
self.fastIndex.trie = trie;
|
||||||
//console.log(' +++ Trie calc ');
|
//console.log(' +++ Trie calc ');
|
||||||
}
|
}
|
||||||
|
|
||||||
try{
|
try {
|
||||||
// trie throws if not found!!!!!
|
// trie throws if not found!!!!!
|
||||||
keys = trie.keysWithPrefix( startsWith );
|
keys = trie.keysWithPrefix(startsWith);
|
||||||
} catch(e){
|
} catch (e) {
|
||||||
keys = [];
|
keys = [];
|
||||||
}
|
}
|
||||||
|
|
||||||
|
@ -87,72 +90,83 @@ function rand(startsWith, num, callback){
|
||||||
nextKey = _.last(keys);
|
nextKey = _.last(keys);
|
||||||
}
|
}
|
||||||
|
|
||||||
if (!key || !(key in self.fastIndex.offsets)) return process.nextTick(function(){ callback([], startsWith) });
|
if (!key || !(key in self.fastIndex.offsets)) {
|
||||||
|
callback && callback([], startsWith);
|
||||||
|
resolve([]);
|
||||||
|
}
|
||||||
|
|
||||||
} else {
|
} else {
|
||||||
// no startWith given - random select among keys
|
// no startWith given - random select among keys
|
||||||
keys = _.sample( this.fastIndex.indexKeys, num );
|
keys = _.sample(self.fastIndex.indexKeys, num);
|
||||||
|
|
||||||
// if num > 1, run each key independently and collect results
|
// if num > 1, run each key independently and collect results
|
||||||
if (num > 1){
|
if (num > 1) {
|
||||||
var results = [], ii = 0;
|
var results = [], ii = 0;
|
||||||
_(keys).each(function(startsWith){
|
_(keys).each(function (startsWith) {
|
||||||
self.rand(startsWith, 1, function(result){
|
self.rand(startsWith, 1, function (result) {
|
||||||
results.push(result[0]);
|
results.push(result[0]);
|
||||||
if (++ii == num) {
|
if (++ii == num) {
|
||||||
callback(results, '');
|
callback && callback(results, '');
|
||||||
|
resolve(results);
|
||||||
}
|
}
|
||||||
})
|
});
|
||||||
});
|
});
|
||||||
return;
|
return;
|
||||||
}
|
}
|
||||||
key = keys;
|
key = keys;
|
||||||
}
|
}
|
||||||
// console.log(' using key', key, nextKey);
|
|
||||||
|
|
||||||
// prepare the piper
|
// prepare the piper
|
||||||
var args = [key, nextKey, this],
|
var args = [key, nextKey, self],
|
||||||
task = 'rand:' + key + nextKey,
|
task = 'rand:' + key + nextKey,
|
||||||
context = [startsWith, num, callback]; // last arg MUST be callback
|
context = [startsWith, num, callback]; // last arg MUST be callback
|
||||||
|
|
||||||
// pay the piper
|
// pay the piper
|
||||||
this.piper(task, IndexFile.readIndexBetweenKeys, args, context, collector);
|
self.piper(task, IndexFile.readIndexBetweenKeys, args, context, collector);
|
||||||
|
|
||||||
function collector(key, nextKey, index, startsWith, num, callback, buffer){
|
function collector(key, nextKey, index, startsWith, num, callback, buffer) {
|
||||||
var lines = buffer.toString().split('\n'),
|
var lines = buffer.toString().split('\n'),
|
||||||
matches = lines.map(function(line){
|
matches = lines.map(function (line) {
|
||||||
return line.substring(0,line.indexOf(' '));
|
return line.substring(0, line.indexOf(' '));
|
||||||
});
|
});
|
||||||
|
|
||||||
//console.log(' got lines for key ', key, lines.length);
|
//console.log(' got lines for key ', key, lines.length);
|
||||||
|
|
||||||
// we got bunch of matches for key - now search within for startsWith
|
// we got bunch of matches for key - now search within for startsWith
|
||||||
if (startsWith !== key){
|
if (startsWith !== key) {
|
||||||
|
|
||||||
// binary search for startsWith within set of matches
|
// binary search for startsWith within set of matches
|
||||||
var ind = _.sortedIndex(matches, startsWith);
|
var ind = _.sortedIndex(matches, startsWith);
|
||||||
if (ind >= lines.length || matches[ind].indexOf(startsWith) === -1){
|
if (ind >= lines.length || matches[ind].indexOf(startsWith) === -1) {
|
||||||
return callback([], startsWith);
|
callback && callback([], startsWith);
|
||||||
|
resolve([]);
|
||||||
|
return;
|
||||||
}
|
}
|
||||||
|
|
||||||
// FIXME --- using Trie's new keysWithPrefix not yet pushed to npm.
|
|
||||||
// see https://github.com/NaturalNode/natural/commit/5fc86c42e41c1314bfc6a37384dd14acf5f4bb7b
|
|
||||||
|
|
||||||
var trie = new Trie();
|
var trie = new Trie();
|
||||||
|
|
||||||
trie.addStrings(matches);
|
trie.addStrings(matches);
|
||||||
//console.log('Trie > ', trie.matchesWithPrefix( startsWith ));
|
//console.log('Trie > ', trie.matchesWithPrefix( startsWith ));
|
||||||
|
matches = trie.keysWithPrefix(startsWith);
|
||||||
matches = trie.keysWithPrefix( startsWith );
|
|
||||||
}
|
}
|
||||||
|
|
||||||
var words = _.sample(matches, num);
|
var words = _.sample(matches, num);
|
||||||
callback(words, startsWith);
|
callback && callback(words, startsWith);
|
||||||
|
resolve(words);
|
||||||
}
|
}
|
||||||
|
|
||||||
|
}); // Promise
|
||||||
}
|
}
|
||||||
|
|
||||||
|
// relative weight of each POS word count (DB 3.1 numbers)
|
||||||
|
var POS_factor = {
|
||||||
|
Noun: 26,
|
||||||
|
Verb: 3,
|
||||||
|
Adjective: 5,
|
||||||
|
Adverb: 1,
|
||||||
|
Total: 37
|
||||||
|
};
|
||||||
|
|
||||||
/**
|
/**
|
||||||
* rand() - for all Index files
|
* rand() - for all Index files
|
||||||
|
* @returns Promise
|
||||||
*/
|
*/
|
||||||
function randAll(opts, callback) {
|
function randAll(opts, callback) {
|
||||||
var
|
var
|
||||||
|
@ -163,12 +177,7 @@ function randAll(opts, callback) {
|
||||||
count = opts && opts.count || 1,
|
count = opts && opts.count || 1,
|
||||||
args = [null, startsWith],
|
args = [null, startsWith],
|
||||||
parts = 'Noun Verb Adjective Adverb'.split(' '),
|
parts = 'Noun Verb Adjective Adverb'.split(' '),
|
||||||
self = this,
|
self = this;
|
||||||
done = function(){
|
|
||||||
profile && (args.push(new Date() - start));
|
|
||||||
args[0] = results;
|
|
||||||
callback.apply(null, args)
|
|
||||||
};
|
|
||||||
|
|
||||||
if (typeof opts === 'function') {
|
if (typeof opts === 'function') {
|
||||||
callback = opts;
|
callback = opts;
|
||||||
|
@ -176,36 +185,45 @@ function randAll(opts, callback) {
|
||||||
opts = _.clone(opts);
|
opts = _.clone(opts);
|
||||||
}
|
}
|
||||||
|
|
||||||
// TODO -- or loop count times each time getting 1 from random part!!
|
|
||||||
// slower but more random.
|
|
||||||
|
|
||||||
// select at random a part to look at
|
return new Promise(function(resolve, reject) {
|
||||||
|
// select at random a POS to look at
|
||||||
var doParts = _.sample(parts, parts.length);
|
var doParts = _.sample(parts, parts.length);
|
||||||
tryPart();
|
tryPart();
|
||||||
|
|
||||||
function tryPart(){
|
function tryPart() {
|
||||||
var rand = 'rand' + doParts.pop();
|
var part = doParts.pop(),
|
||||||
self[ rand ](opts, partCallback);
|
rand = 'rand' + part,
|
||||||
|
factor = POS_factor[part],
|
||||||
|
weight = factor / POS_factor.Total;
|
||||||
|
|
||||||
|
// pick count according to relative weight
|
||||||
|
opts.count = Math.ceil(count * weight * 1.1); // guard against dupes
|
||||||
|
self[rand](opts, partCallback);
|
||||||
}
|
}
|
||||||
|
|
||||||
function partCallback(result){
|
function partCallback(result) {
|
||||||
if (result) {
|
if (result) {
|
||||||
results = _.uniq(results.concat(result)); // make sure it's unique!
|
results = _.uniq(results.concat(result)); // make sure it's unique!
|
||||||
}
|
}
|
||||||
|
|
||||||
//console.log(result);
|
|
||||||
if (results.length < count && doParts.length) {
|
if (results.length < count && doParts.length) {
|
||||||
// reduce count for next part -- NO! may get duplicates
|
|
||||||
// opts.count = count - results.length;
|
|
||||||
return tryPart();
|
return tryPart();
|
||||||
}
|
}
|
||||||
|
|
||||||
// trim excess
|
// final random and trim excess
|
||||||
if (results.length > count) {
|
results = _.sample(results, count);
|
||||||
results.length = count;
|
|
||||||
}
|
|
||||||
done();
|
done();
|
||||||
}
|
}
|
||||||
|
|
||||||
|
function done() {
|
||||||
|
profile && (args.push(new Date() - start));
|
||||||
|
args[0] = results;
|
||||||
|
callback && callback.apply(null, args);
|
||||||
|
resolve(results);
|
||||||
|
}
|
||||||
|
|
||||||
|
}); // Promise
|
||||||
}
|
}
|
||||||
|
|
||||||
/**
|
/**
|
||||||
|
|
|
@ -1,4 +1,4 @@
|
||||||
/**
|
/*!
|
||||||
* wordpos.js
|
* wordpos.js
|
||||||
*
|
*
|
||||||
* Node.js part-of-speech utilities using WordNet database.
|
* Node.js part-of-speech utilities using WordNet database.
|
||||||
|
@ -149,11 +149,11 @@ function get(isFn) {
|
||||||
};
|
};
|
||||||
}
|
}
|
||||||
|
|
||||||
|
// setImmediate executes callback AFTER promise handlers.
|
||||||
|
// Without it, exceptions in callback may be caught by Promise.
|
||||||
function nextTick(fn, args) {
|
function nextTick(fn, args) {
|
||||||
if (fn) {
|
if (fn) {
|
||||||
setImmediate(function(){
|
|
||||||
fn.apply(null, args);
|
fn.apply(null, args);
|
||||||
});
|
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
|
@ -216,7 +216,7 @@ var wordposProto = WordPOS.prototype;
|
||||||
* lookup a word in all indexes
|
* lookup a word in all indexes
|
||||||
*
|
*
|
||||||
* @param word {string} - search word
|
* @param word {string} - search word
|
||||||
* @param callback {Functino} (optional) - callback with (results, word) signature
|
* @param callback {Function} (optional) - callback with (results, word) signature
|
||||||
* @returns {Promise}
|
* @returns {Promise}
|
||||||
*/
|
*/
|
||||||
wordposProto.lookup = function(word, callback) {
|
wordposProto.lookup = function(word, callback) {
|
||||||
|
@ -362,7 +362,17 @@ wordposProto.getVerbs = get('isVerb');
|
||||||
wordposProto.parse = prepText;
|
wordposProto.parse = prepText;
|
||||||
|
|
||||||
|
|
||||||
|
/**
|
||||||
|
* access to WordNet DB
|
||||||
|
* @type {object}
|
||||||
|
*/
|
||||||
WordPOS.WNdb = WNdb;
|
WordPOS.WNdb = WNdb;
|
||||||
|
|
||||||
|
/**
|
||||||
|
* access to stopwords
|
||||||
|
* @type {Array}
|
||||||
|
*/
|
||||||
WordPOS.stopwords = stopwords;
|
WordPOS.stopwords = stopwords;
|
||||||
|
|
||||||
|
|
||||||
module.exports = WordPOS;
|
module.exports = WordPOS;
|
||||||
|
|
40
test.js
40
test.js
|
@ -1,40 +0,0 @@
|
||||||
var
|
|
||||||
WordPOS = require('./src/wordpos'),
|
|
||||||
wordpos = new WordPOS({profile: true}),
|
|
||||||
getAllPOS = wordpos.getPOS
|
|
||||||
;
|
|
||||||
|
|
||||||
|
|
||||||
console.log(1111,
|
|
||||||
wordpos.lookup('foot')
|
|
||||||
//wordpos.getPOS('was doing the work the ashtray closer Also known as inject and foldl, reduce boils down a list of values into a single value', console.log
|
|
||||||
.then(function(result){
|
|
||||||
console.log(' xxx - ', result)
|
|
||||||
})
|
|
||||||
.catch(function(result){
|
|
||||||
console.log(' error xxx - ', result)
|
|
||||||
}));
|
|
||||||
|
|
||||||
//wordpos.rand({count: 3},console.log)
|
|
||||||
|
|
||||||
return;
|
|
||||||
|
|
||||||
|
|
||||||
//getAllPOS('se', console.log)
|
|
||||||
wordpos.getPOS('se', console.log)
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
a=wordpos.getPOS('se', function(res) {
|
|
||||||
console.log(1, res)
|
|
||||||
wordpos.getPOS('sea hey who work', function(res) {
|
|
||||||
console.log(2, res)
|
|
||||||
wordpos.getPOS('sear done work ', function(res) {
|
|
||||||
console.log(3, res)
|
|
||||||
console.log('all done');
|
|
||||||
});
|
|
||||||
});
|
|
||||||
});
|
|
||||||
|
|
||||||
console.log(a)
|
|
|
@ -1,11 +1,11 @@
|
||||||
/**
|
/**
|
||||||
* wordpos_spec.js
|
* wordpos_test.js
|
||||||
*
|
*
|
||||||
* test file for main wordpos functionality
|
* test file for main wordpos functionality
|
||||||
*
|
*
|
||||||
* Usage:
|
* Usage:
|
||||||
* npm install mocha -g
|
* npm install mocha -g
|
||||||
* mocha wordpos_spec.js --verbose
|
* mocha wordpos_test.js
|
||||||
*
|
*
|
||||||
* or
|
* or
|
||||||
*
|
*
|
||||||
|
@ -388,4 +388,29 @@ describe('Promise pattern', function() {
|
||||||
assert.equal(result, true);
|
assert.equal(result, true);
|
||||||
});
|
});
|
||||||
});
|
});
|
||||||
|
|
||||||
|
it('rand()', function () {
|
||||||
|
return wordpos.rand({count: 5}).then(function (result) {
|
||||||
|
assert.equal(result.length, 5);
|
||||||
|
});
|
||||||
|
});
|
||||||
|
|
||||||
|
it('randNoun()', function () {
|
||||||
|
return wordpos.randNoun().then(function (result) {
|
||||||
|
assert.equal(result.length, 1);
|
||||||
|
});
|
||||||
|
});
|
||||||
|
|
||||||
|
it('randNoun({count: 3})', function () {
|
||||||
|
return wordpos.randNoun({count: 3}).then(function (result) {
|
||||||
|
assert.equal(result.length, 3);
|
||||||
|
});
|
||||||
|
});
|
||||||
|
|
||||||
|
it('randNoun({startsWith: "foo"})', function () {
|
||||||
|
return wordpos.randNoun({startsWith: 'foo'}).then(function (result) {
|
||||||
|
assert.equal(result.length, 1);
|
||||||
|
assert.equal(result[0].indexOf('foo'), 0);
|
||||||
|
});
|
||||||
|
});
|
||||||
});
|
});
|
|
@ -40,7 +40,7 @@
|
||||||
* read index file between the two offsets
|
* read index file between the two offsets
|
||||||
* binary search read data O(log avg)
|
* binary search read data O(log avg)
|
||||||
*
|
*
|
||||||
* Copyright (c) 2012 mooster@42at.com
|
* Copyright (c) 2012-2016 mooster@42at.com
|
||||||
* https://github.com/moos/wordpos
|
* https://github.com/moos/wordpos
|
||||||
*
|
*
|
||||||
* Released under MIT license
|
* Released under MIT license
|
||||||
|
@ -48,7 +48,7 @@
|
||||||
var
|
var
|
||||||
WNdb = require('../src/wordpos').WNdb,
|
WNdb = require('../src/wordpos').WNdb,
|
||||||
util = require('util'),
|
util = require('util'),
|
||||||
BufferedReader = require ("./buffered-reader"),
|
BufferedReader = require ('./buffered-reader'),
|
||||||
_ = require('underscore')._,
|
_ = require('underscore')._,
|
||||||
fs = require('fs'),
|
fs = require('fs'),
|
||||||
path = require('path'),
|
path = require('path'),
|
||||||
|
|
|
@ -6,7 +6,7 @@
|
||||||
* Usage:
|
* Usage:
|
||||||
* node validate index.adv
|
* node validate index.adv
|
||||||
*
|
*
|
||||||
* Copyright (c) 2012 mooster@42at.com
|
* Copyright (c) 2012-2016 mooster@42at.com
|
||||||
* https://github.com/moos/wordpos
|
* https://github.com/moos/wordpos
|
||||||
*
|
*
|
||||||
* Released under MIT license
|
* Released under MIT license
|
||||||
|
|
Loading…
Reference in New Issue