Support of custom features #161

bratao · 2019-10-31T16:02:19Z

Hello,
Thank you for this awesome library. I´m very impressed by the quality.

My task is to extract/segment information from a semi structured text. But not only the text is important some "external features" are also important.

For example, imagine that I want to segment this text about GitHub projects in Category, Project name, URl and description.

I utilize an BIO scheme to tag each html token as a category.

Where
token=NLP start_of_p=True bold=True center=True B-Category
token=Projects start_of_p=False bold=True center=True I-Category
token=Project start_of_p=True bold=True center=False B-Project-name
token=Name start_of_p=False bold=True center=False I-Project-name
token=: start_of_p=False bold=False center=False I-Project-name

The final result is something like:

Pay attention that some features are important such as: Text formatting (italic, bold, centered), position in text and more...

There is anyway of using those custom external features for training Sticker?

Thank you!

danieldk · 2019-10-31T19:29:00Z

Excellent question! This is currently not possible and would be hard to add without breaking compatibility with existing models. We are currently working towards a stable 1.0 version, which should be done in several months at most. Until 1.0 is released (and then maintained as a separate branch), compatibility is the highest priority.

Once 1.0 is branched, we can start working again on features that change the configuration format, graph placeholders, etc. One of the larger plans for sticker 2 is to make sticker more flexible in the inputs (free-er form features) and outputs (multi-task prediction) that can be configured.

This may all take a while, because we also want to investigate a switch to libtorch for sticker 2, which should probably be done before adding new features (with a Tensorflow-specific implementation).

I will start a sticker-2 milestone and add this issue to it.

danieldk added this to the sticker-2 milestone Oct 31, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support of custom features #161

Support of custom features #161

bratao commented Oct 31, 2019

danieldk commented Oct 31, 2019

Support of custom features #161

Support of custom features #161

Comments

bratao commented Oct 31, 2019

danieldk commented Oct 31, 2019