Skip to content

Latest commit

 

History

History
40 lines (27 loc) · 2.11 KB

notes_on_the_data.md

File metadata and controls

40 lines (27 loc) · 2.11 KB

To BHMA

  1. There are articles like this on Here not all the content of the article is available
  2. Not all articles have tags
  3. On articles like this There is no clear line that a twit starts

Ta NEA

  1. There are 2323 for the span of a year
  2. Articles have in the text the origin of the article, like this one -->Removed

SKAI

  1. Sometimes there is embedded video
  2. Sometimes the author is embeded in the text (like here)

Kathimerini

  1. We have articles of only half a year back (the quantity is almost 3700 though
  2. The subtitle of the articles is not always contained in the article main body (as in all other cases)

KontraNews

  1. Subtitle is always the first paragraph of the main text
  2. Author is inside the text and cannot be removed or extracted in some way (preset in data as NaN). Example like here
  3. Articles like this and that need to be removed

EFSYN

  1. In articles like this the source of the photos appears -->SOLVED

ProtoThema

  1. There are 1551 articles for 2020

Ethnos

  1. In articles like this source of the photo appears --> SOLVED
  2. In articles like this the twitter post is included