-
Notifications
You must be signed in to change notification settings - Fork 22
Updates to Zope's keyphrase extractor (forked from 1.1.0)
turian/topia.termextract
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
This package determines important terms within a given piece of content. It uses linguistic tools such as Parts-Of-Speech (POS) and some simple statistical analysis to determine the terms and their strength. NOTE: This is a fork by Joseph Turian of topia.termextract 1.1.0 CONTRIBUTIONS: * Unicode alphabetic characters are tokenized correctly. I changed TERM_SPEC in topic.termextract.tag: Old = [u'S', u'\xe3o', u'Paulo', u'was', u'home', u'to'] New = [u'S\xe3o', u'Paulo', u'was', u'home', u'to'] * extractor.extract() now has a parameter KEEP_ORIGINAL_SPACING=True, which allows you to keep the original spacing of the term: Old = [u'Mr . Smith'] New = [u'Mr. Smith'] * Fixed a bug where a term wouldn't be found if it was literally the last token of the sentence. * Fixed a bug (?) where unigram terms were included even if their tokens were part of a multiterm.
About
Updates to Zope's keyphrase extractor (forked from 1.1.0)
Resources
Stars
Watchers
Forks
Releases
No releases published
Packages 0
No packages published