Skip to content

Language Specific Application

Nicolay Rusnachenko edited this page Nov 23, 2023 · 13 revisions

Russian Language 🇷🇺

Infer sentiment attitudes from text file with further D3JS-based demo launch:

python3 -m arelight.run.infer  \
    --sampling-framework "arekit" \
    --ner-model-name "ner_ontonotes_bert_mult" \
    --ner-types "ORG|PERSON|LOC|GPE" \
    --terms-per-context 50 \
    --sentence-parser "nltk:russian" \
    --text-b-type "nli_m" \
    --tokens-per-context 128 \
    --bert-framework "opennre" \
    --batch-size 10 \
    --stemmer "mystem" \
    --pretrained-bert "DeepPavlov/rubert-base-cased" \
    --bert-torch-checkpoint "ra4-rsr1_DeepPavlov-rubert-base-cased_cls.pth.tar" \
    --backend "d3js_graphs" \
    -o "output" \
    --from-files "<PATH-TO-TEXT-FILE>"

Sentiment Analysis Pipeline: ARElight core is powered by AREkit framework, responsible for raw text sampling. To annotate objects in text, we use BERT-based models trained on OntoNotes5 (powered by DeepPavlov) For relations annotation, we support OpenNRE BERT models. The default inference is pretrained BERT with transfer learning based on RuSentRel and RuAttitudes collections, that were sampled and translated into English via arekit-ss.

Any Other Languages

It is possible to utilize google-trans API wrapper to launch inference from any language by transfering the knowledge towards the specific model in a following way and additional translation flags:

python3 -m arelight.run.infer  \
    ... # LIST OF PARAMETERS FROM YOUR PAST SCRIPT
    --translate-framework "googletrans" \
    --translate-entity "en:ru" \
    --translate-text "en:ru"

NOTE: We separate translation of words in text and entities (the reason is to support different language for entities).

Clone this wiki locally