Skip to content
This repository has been archived by the owner on Mar 19, 2021. It is now read-only.

Support of custom features #161

Open
bratao opened this issue Oct 31, 2019 · 1 comment
Open

Support of custom features #161

bratao opened this issue Oct 31, 2019 · 1 comment
Milestone

Comments

@bratao
Copy link

bratao commented Oct 31, 2019

Hello,
Thank you for this awesome library. I´m very impressed by the quality.

My task is to extract/segment information from a semi structured text. But not only the text is important some "external features" are also important.

For example, imagine that I want to segment this text about GitHub projects in Category, Project name, URl and description.

I utilize an BIO scheme to tag each html token as a category.

Where
token=NLP start_of_p=True bold=True center=True B-Category
token=Projects start_of_p=False bold=True center=True I-Category
token=Project start_of_p=True bold=True center=False B-Project-name
token=Name start_of_p=False bold=True center=False I-Project-name
token=: start_of_p=False bold=False center=False I-Project-name

The final result is something like:


Pay attention that some features are important such as: Text formatting (italic, bold, centered), position in text and more...

There is anyway of using those custom external features for training Sticker?

Thank you!

@danieldk
Copy link
Member

Excellent question! This is currently not possible and would be hard to add without breaking compatibility with existing models. We are currently working towards a stable 1.0 version, which should be done in several months at most. Until 1.0 is released (and then maintained as a separate branch), compatibility is the highest priority.

Once 1.0 is branched, we can start working again on features that change the configuration format, graph placeholders, etc. One of the larger plans for sticker 2 is to make sticker more flexible in the inputs (free-er form features) and outputs (multi-task prediction) that can be configured.

This may all take a while, because we also want to investigate a switch to libtorch for sticker 2, which should probably be done before adding new features (with a Tensorflow-specific implementation).

I will start a sticker-2 milestone and add this issue to it.

@danieldk danieldk added this to the sticker-2 milestone Oct 31, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Development

No branches or pull requests

2 participants