simple_english
v0.0.1simple_english reduce the complexity of written english
#Justification A working NLP library can be satisfactory with a breathtaking lightness. By Zipfs law:
The top 10 words account for 25% of our language.
The top 100 words account for 50% of our language.
The top 50,000 words account for 95% of our language.
The trade-offs for processing english are way more profound than the 80/20 rule. On the Penn treebank, for example, the following is possible:
- choosing all nouns: 33% correct
- using a 1 thousand word lexicon: 45% correct
- using a 1 thousand word lexicon, and falling back to nouns: 70% correct
- using a 1 thousand word lexicon, common suffix regexes, and falling back to nouns: 74% correct
The process is to get curated data, find the patterns, list the exceptions. bada-bing, bada-BOOM.
#Usage
Server-side
npm install simple_english
simple("well as a matter of fact, the went at full blast.")
//"well actually, they went at top speed"
Client-side
<script src"https://s3.amazonaws.com/spencermounta.in/simple_english/client_side/simple.min.js"</script>
<script>
simple("well as a matter of fact, the went at full blast.")
//"well actually, they went at top speed"
</script>
Licence
npm i simple_english
Metadata
- Unknown
- Whatever
- Spencer Kelly
- released 4/15/2014