Wiki Entity Summarization Benchmark (WikES)

This repository leverages the wiki-entity-summarization-preprocessor project to construct an Entity Summarization Graph based on a given set of nodes. The project tries to maintain the structure of the Wikidata knowledge graph by performing random walk sampling with a depth of K, starting from seed nodes after all the summary edges have been added to the result. It then checks if the expanded graph is a single weakly connected component. If not, it finds B paths to connect the components. The final result is a heterogeneous graph consisting of the seed nodes, their summary edges, (1..K)-hop neighbors of the seed nodes and their edges, and any intermediary nodes added to ensure graph connectivity. Each node and edge in the graph is enriched with metadata obtained from Wikidata and Wikipedia and predicate information, providing additional context and details about the entities and their relationships.

Loading the Datasets

Load Using `wikes-toolkit`

To load the dataset, we have introduced a toolkit that can be used to download, load, work, and evaluate 48 Wiki-Entity-Summarization datasets. The toolkit is available as a Python package and can be installed using pip:

pip install wikes-toolkit

A simple example of how to use the toolkit is as follows:

from wikes_toolkit import WikESToolkit, V1, WikESGraph

toolkit = WikESToolkit(save_path="./data")  # save_path is optional
G = toolkit.load_graph(
    WikESGraph,
    V1.WikiLitArt.SMALL,
    entity_formatter=lambda e: f"Entity({e.wikidata_label})",
    predicate_formatter=lambda p: f"Predicate({p.label})",
    triple_formatter=lambda
        t: f"({t.subject_entity.wikidata_label})-[{t.predicate.label}]-> ({t.object_entity.wikidata_label})"
)

root_nodes = G.root_entities()
nodes = G.entities()

Please refer to the Wiki-Entity-Summarization-Toolkit repository for more information.

Using mlcroissant

To load WikES datasets, you can use mlcorissant as well. You can find the metadata JSON files in the dataset details tabel.

Here is an example of loading our dataset using mlcorissant:

from mlcroissant import Dataset


def print_first_item(record_name):
    for record in dataset.records(record_set=record_name):
        for key, val in record.items():
            if isinstance(val, bytes):
                val = str(val, "utf-8")
            print(f"{key}=[{val}]({type(val)})", end=", ")
        break
    print()


dataset = Dataset(
    jsonld="https://github.com/msorkhpar/wiki-entity-summarization/releases/download/1.0.5/WikiProFem-s.json")

print(dataset.metadata.record_sets)

print_first_item("entities")
print_first_item("root-entities")
print_first_item("predicates")
print_first_item("triples")
print_first_item("ground-truths")
""" The output of the above code:
wikes-dataset
[RecordSet(uuid="entities"), RecordSet(uuid="root-entities"), RecordSet(uuid="predicates"), RecordSet(uuid="triples"), RecordSet(uuid="ground-truths")]
id=[0](<class 'int'>), entity=[Q6387338](<class 'str'>), wikidata_label=[Ken Blackwell](<class 'str'>), wikidata_description=[American politician and activist](<class 'str'>), wikipedia_id=[769596](<class 'int'>), wikipedia_title=[Ken_Blackwell](<class 'str'>), 
entity=[9](<class 'int'>), category=[singer](<class 'str'>), 
id=[0](<class 'int'>), predicate=[P1344](<class 'str'>), predicate_label=[participant in](<class 'str'>), predicate_desc=[event in which a person or organization was/is a participant; inverse of P710 or P1923](<class 'str'>), 
subject=[1](<class 'int'>), predicate=[0](<class 'int'>), object=[778](<class 'int'>), 
root_entity=[9](<class 'int'>), subject=[9](<class 'int'>), predicate=[8](<class 'int'>), object=[31068](<class 'int'>), 
"""

Loading the Pre-processed Databases

As described in wiki-entity-summarization-preprocessor, we have imported en-wikidata items as a graph with their summaries into a Neo4j database using Wikipedia and Wikidata XML dump files. Additionally, all the other related metadata was imported into a Postgres database.

If you want to create your own dataset but do not want to run the pre-processor again, you can download and load the exported files from these two databases. Please refer to the release notes of the current version 1.0.0 ( enwiki-2023-05-1 and wikidata-wiki-2023-05-1).

Process Overview

1. Building the Summary Graph

Create a summary graph where each seed node is expanded with its summary edges.

2. Expanding the Summary Graph

Perform random walks starting from the seed nodes to mimic the structure of the Wikidata graph.
Scale the number of walks based on the degree of the seed nodes.
Add new edges to the graph from the random walk results.

3. Connecting Components

Check if the expanded graph forms a single weakly connected component.
If not, iteratively connect smaller components using the shortest paths until a single component is achieved.

4. Adding Metadata

Enhance the final graph with additional metadata for each node and edge.
Include labels, descriptions, and other relevant information from Wikidata, Wikipedia, and predicate information.

Pre-generated Datasets

We have generated datasets using A Brief History of Human Time project. These datasets contain different sets of seed nodes, categorized by various human arts and professions.

dataset (variant, size, None/train/val/test)	#roots	#smmaries	#nodes	#edges	#labels	roots category distribution	Running Time(sec)
WikiLitArt-s csv, graphml, croissant.json	494	10416	85346	136950	547	actor=150 composer=35 film=41 novelist=24 painter=59 poet=39 screenwriter=17 singer=72 writer=57	91.934
WikiLitArt-s-train csv, graphml, croissant.json	346	7234	61885	96497	508	actor=105 composer=24 film=29 novelist=17 painter=42 poet=27 screenwriter=12 singer=50 writer=40	66.023
WikiLitArt-s-val csv, graphml, croissant.json	74	1572	14763	20795	340	actor=23 composer=5 film=6 novelist=4 painter=9 poet=6 screenwriter=2 singer=11 writer=8	14.364
WikiLitArt-s-test csv, graphml, croissant.json	74	1626	15861	22029	350	actor=22 composer=6 film=6 novelist=3 painter=8 poet=6 screenwriter=3 singer=11 writer=9	14.6
WikiLitArt-m csv, graphml, croissant.json	494	10416	128061	220263	604	actor=150 composer=35 film=41 novelist=24 painter=59 poet=39 screenwriter=17 singer=72 writer=57	155.368
WikiLitArt-m-train csv, graphml, croissant.json	346	7234	93251	155667	566	actor=105 composer=24 film=29 novelist=17 painter=42 poet=27 screenwriter=12 singer=50 writer=40	111.636
WikiLitArt-m-val csv, graphml, croissant.json	74	1572	22214	33547	375	actor=23 composer=5 film=6 novelist=4 painter=9 poet=6 screenwriter=2 singer=11 writer=8	22.957
WikiLitArt-m-test csv, graphml, croissant.json	74	1626	24130	35980	394	actor=22 composer=6 film=6 novelist=3 painter=8 poet=6 screenwriter=3 singer=11 writer=9	26.187
WikiLitArt-l csv, graphml, croissant.json	494	10416	239491	466905	703	actor=150 composer=35 film=41 novelist=24 painter=59 poet=39 screenwriter=17 singer=72 writer=57	353.113
WikiLitArt-l-train csv, graphml, croissant.json	346	7234	176057	332279	661	actor=105 composer=24 film=29 novelist=17 painter=42 poet=27 screenwriter=12 singer=50 writer=40	244.544
WikiLitArt-l-val csv, graphml, croissant.json	74	1572	42745	71734	446	actor=23 composer=5 film=6 novelist=4 painter=9 poet=6 screenwriter=2 singer=11 writer=8	57.263
WikiLitArt-l-test csv, graphml, croissant.json	74	1626	46890	77931	493	actor=22 composer=6 film=6 novelist=3 painter=8 poet=6 screenwriter=3 singer=11 writer=9	60.466
WikiCinema-s csv, graphml, croissant.json	493	11750	70753	126915	469	actor=405 film=88	118.014
WikiCinema-s-train csv, graphml, croissant.json	345	8374	52712	89306	437	actor=284 film=61	84.364
WikiCinema-s-val csv, graphml, croissant.json	73	1650	13362	19280	305	actor=59 film=14	18.651
WikiCinema-s-test csv, graphml, croissant.json	75	1744	14777	21567	313	actor=62 film=13	19.851
WikiCinema-m csv, graphml, croissant.json	493	11750	101529	196061	541	actor=405 film=88	196.413
WikiCinema-m-train csv, graphml, croissant.json	345	8374	75900	138897	491	actor=284 film=61	142.091
WikiCinema-m-val csv, graphml, croissant.json	73	1650	19674	30152	344	actor=59 film=14	31.722
WikiCinema-m-test csv, graphml, croissant.json	75	1744	22102	34499	342	actor=62 film=13	33.674
WikiCinema-l csv, graphml, croissant.json	493	11750	185098	397546	614	actor=405 film=88	475.679
WikiCinema-l-train csv, graphml, croissant.json	345	8374	139598	284417	575	actor=284 film=61	333.148
WikiCinema-l-val csv, graphml, croissant.json	73	1650	37352	63744	412	actor=59 film=14	68.62
WikiCinema-l-test csv, graphml, croissant.json	75	1744	43238	74205	426	actor=62 film=13	87.07
WikiPro-s csv, graphml, croissant.json	493	9853	79825	125912	616	actor=58 football=156 journalist=14 lawyer=16 painter=23 player=25 politician=125 singer=27 sport=21 writer=28	126.119
WikiPro-s-train csv, graphml, croissant.json	345	6832	57529	87768	575	actor=41 football=109 journalist=10 lawyer=11 painter=16 player=17 politician=87 singer=19 sport=15 writer=20	89.874
WikiPro-s-val csv, graphml, croissant.json	74	1548	15769	21351	405	actor=9 football=23 journalist=2 lawyer=3 painter=3 player=4 politician=19 singer=4 sport=3 writer=4	21.021
WikiPro-s-test csv, graphml, croissant.json	74	1484	15657	21145	384	actor=8 football=24 journalist=2 lawyer=2 painter=4 player=4 politician=19 singer=4 sport=3 writer=4	21.743
WikiPro-m csv, graphml, croissant.json	493	9853	119305	198663	670	actor=58 football=156 journalist=14 lawyer=16 painter=23 player=25 politician=125 singer=27 sport=21 writer=28	208.157
WikiPro-m-train csv, graphml, croissant.json	345	6832	86434	138676	633	actor=41 football=109 journalist=10 lawyer=11 painter=16 player=17 politician=87 singer=19 sport=15 writer=20	141.563
WikiPro-m-val csv, graphml, croissant.json	74	1548	24230	34636	463	actor=9 football=23 journalist=2 lawyer=3 painter=3 player=4 politician=19 singer=4 sport=3 writer=4	36.045
WikiPro-m-test csv, graphml, croissant.json	74	1484	24117	34157	462	actor=8 football=24 journalist=2 lawyer=2 painter=4 player=4 politician=19 singer=4 sport=3 writer=4	36.967
WikiPro-l csv, graphml, croissant.json	493	9853	230442	412766	769	actor=58 football=156 journalist=14 lawyer=16 painter=23 player=25 politician=125 singer=27 sport=21 writer=28	489.409
WikiPro-l-train csv, graphml, croissant.json	345	6832	166685	290069	725	actor=41 football=109 journalist=10 lawyer=11 painter=16 player=17 politician=87 singer=19 sport=15 writer=20	334.864
WikiPro-l-val csv, graphml, croissant.json	74	1548	48205	74387	549	actor=9 football=23 journalist=2 lawyer=3 painter=3 player=4 politician=19 singer=4 sport=3 writer=4	84.089
WikiPro-l-test csv, graphml, croissant.json	74	1484	47981	72845	546	actor=8 football=24 journalist=2 lawyer=2 painter=4 player=4 politician=19 singer=4 sport=3 writer=4	92.545
WikiProFem-s csv, graphml, croissant.json	468	8338	79926	123193	571	actor=141 athletic=25 football=24 journalist=16 painter=16 player=32 politician=81 singer=69 sport=18 writer=46	177.63
WikiProFem-s-train csv, graphml, croissant.json	330	5587	58329	87492	521	actor=98 athletic=18 football=17 journalist=9 painter=13 player=22 politician=57 singer=48 sport=14 writer=34	127.614
WikiProFem-s-val csv, graphml, croissant.json	68	1367	14148	19360	344	actor=21 athletic=4 football=3 journalist=4 painter=1 player=5 politician=13 singer=11 sport=1 writer=5	29.081
WikiProFem-test csv, graphml, croissant.json	70	1387	13642	18567	360	actor=22 athletic=3 football=4 journalist=3 painter=2 player=5 politician=11 singer=10 sport=3 writer=7	27.466
WikiProFem-m csv, graphml, croissant.json	468	8338	122728	196838	631	actor=141 athletic=25 football=24 journalist=16 painter=16 player=32 politician=81 singer=69 sport=18 writer=46	301.718
WikiProFem-m-train csv, graphml, croissant.json	330	5587	89922	140505	600	actor=98 athletic=18 football=17 journalist=9 painter=13 player=22 politician=57 singer=48 sport=14 writer=34	217.699
WikiProFem-m-val csv, graphml, croissant.json	68	1367	21978	31230	409	actor=21 athletic=4 football=3 journalist=4 painter=1 player=5 politician=13 singer=11 sport=1 writer=5	46.793
WikiProFem-m-test csv, graphml, croissant.json	70	1387	21305	29919	394	actor=22 athletic=3 football=4 journalist=3 painter=2 player=5 politician=11 singer=10 sport=3 writer=7	46.317
WikiProFem-l csv, graphml, croissant.json	468	8338	248012	413895	722	actor=141 athletic=25 football=24 journalist=16 painter=16 player=32 politician=81 singer=69 sport=18 writer=46	768.99
WikiProFem-l-train csv, graphml, croissant.json	330	5587	183710	297686	676	actor=98 athletic=18 football=17 journalist=9 painter=13 player=22 politician=57 singer=48 sport=14 writer=34	544.893
WikiProFem-l-val csv, graphml, croissant.json	68	1367	46018	67193	492	actor=21 athletic=4 football=3 journalist=4 painter=1 player=5 politician=13 singer=11 sport=1 writer=5	116.758
WikiProFem-l-test csv, graphml, croissant.json	70	1387	44193	63563	472	actor=22 athletic=3 football=4 journalist=3 painter=2 player=5 politician=11 singer=10 sport=3 writer=7	118.524

Keep in mind that by providing a new set of seed nodes, you can generate the output for your own dataset.

Dataset Parameters

Parameter	Value
Min valid summary edges	5
Random walk depth length	3
Min random walk number-small	100
Min random walk number-medium	150
Min random walk number-large	300
Max random walk number-small	300
Max random walk number-medium	600
Max random walk number-large	1800
Bridges number	5

Graph Structure

In the following you can see a sample of the graph format (we highly recommend using our toolkit to load the datasets):

CSV Format

After unzipping {variant}-{size}-{dataset_type}.zip file, you will find the following CSV files:

{variant}-{size}-{dataset_type}-entities.csv contains entities. An entity is a Wikidata item (node) in our dataset.

Field	Description	datatype
id	incremental integer starting by zero	int
entity	Wikidata qid, e.g. `Q76`	string
wikidata_label	Wikidata label (nullable)	string
wikidata_desc	Wikidata description (nullable)	string
wikipedia_title	Wikipedia title (nullable)	string
wikipedia_id	Wikipedia page id (nullable)	long

{variant}-{size}-{dataset_type}-root-entities.csv contains root entities. A root entity is a seed node described previously.

Field	Description	datatype
entity	id key in `{variant}-{size}-{dataset_type}-entities.csv`	int
category	category	string

{variant}-{size}-{dataset_type}-predicates.csv contains predicates. A predicate is a Wikidata property or a describing a connection.

Field	Description	datatype
id	incremental integer starting by zero	int
predicate	Wikidata Property id, e.g. `P121`	string
predicate_label	Wikidata Property label (nullable)	string
predicate_desc	Wikidata Property description (nullable)	string

{variant}-{size}-{dataset_type}-triples.csv contains triples. A triple is an edge between two entities with a predicate.

Field	Description	datatype
subject	id key in `{variant}-{size}-{dataset_type}-entities.csv`	int
predicate	id key in `{variant}-{size}-{dataset_type}-predicates.csv`	int
object	id key in `{variant}-{size}-{dataset_type}-entities.csv`	int

{viariant}_{size}_{dataset_type}-ground-truths.csv contains ground truth triples. A ground truth triple is an edge that is marked as a summary for a root entity.

Field	Description	datatype
root_entity	entity in `{variant}-{size}-{dataset_type}-root-entities.csv`	int
subject	id key in `{variant}-{size}-{dataset_type}-entities.csv`	int
predicate	id key in `{variant}-{size}-{dataset_type}-predicates.csv`	int
object	id key in `{variant}-{size}-{dataset_type}-entities.csv`	int

Note: for this file one of the columns subject or object is equal to the root_entity.

Example of CSV Files

# entities.csv
id,entity,wikidata_label,wikidata_desc,wikipedia_title,wikipedia_id
0,Q43416,Keanu Reeves,Canadian actor (born 1964),Keanu_Reeves,16603
1,Q3820,Beirut,capital and largest city of Lebanon,Beirut,37428
2,Q639669,musician,person who composes, conducts or performs music,Musician,38284
3,Q219150,Constantine,2005 film directed by Francis Lawrence,Constantine_(film),1210303

# root-entities.csv
entity,category
0,Q43416,actor

# predicates.csv
id,predicate,predicate_label,predicate_desc
0,P19,place of birth,location where the subject was born
1,P106,occupation,occupation of a person; see also "field of work" (Property:P101), "position held" (Property:P39)
2,P161,cast member,actor in the subject production [use "character role" (P453) and/or "name of the character role" (P4633) as qualifiers] [use "voice actor" (P725) for voice-only role]

# triples.csv
subject,predicate,object
0,0,1
0,1,2
3,2,0

# ground-truth.csv
root_entity,subject,predicate,object
0,0,0,1
3,3,2,0

GraphML Example

The same graph can be represented in GraphML format, available in the dataset details tabel

<?xml version="1.0" encoding="UTF-8"?>
<graphml xmlns="http://graphml.graphdrawing.org/xmlns" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
         xsi:schemaLocation="http://graphml.graphdrawing.org/xmlns http://graphml.graphdrawing.org/xmlns/1.0/graphml.xsd">
    <key id="d9" for="edge" attr.name="summary_for" attr.type="string"/>
    <key id="d8" for="edge" attr.name="predicate_desc" attr.type="string"/>
    <key id="d7" for="edge" attr.name="predicate_label" attr.type="string"/>
    <key id="d6" for="edge" attr.name="predicate" attr.type="string"/>
    <key id="d5" for="node" attr.name="category" attr.type="string"/>
    <key id="d4" for="node" attr.name="is_root" attr.type="boolean"/>
    <key id="d3" for="node" attr.name="wikidata_desc" attr.type="string"/>
    <key id="d2" for="node" attr.name="wikipedia_title" attr.type="string"/>
    <key id="d1" for="node" attr.name="wikipedia_id" attr.type="long"/>
    <key id="d0" for="node" attr.name="wikidata_label" attr.type="string"/>
    <graph edgedefault="directed">
        <node id="Q43416">
            <data key="d0">Keanu Reeves</data>
            <data key="d1">16603</data>
            <data key="d2">Keanu_Reeves</data>
            <data key="d3">Canadian actor (born 1964)</data>
            <data key="d4">True</data>
            <data key="d5">actor</data>
        </node>
        <node id="Q3820">
            <data key="d0">Beirut</data>
            <data key="d1">37428</data>
            <data key="d2">Beirut</data>
            <data key="d3">capital and largest city of Lebanon</data>
        </node>
        <node id="Q639669">
            <data key="d0">musician</data>
            <data key="d1">38284</data>
            <data key="d2">Musician</data>
            <data key="d3">person who composes, conducts or performs music</data>
        </node>
        <node id="Q219150">
            <data key="d0">Constantine</data>
            <data key="d1">1210303</data>
            <data key="d2">Constantine_(film)</data>
            <data key="d3">2005 film directed by Francis Lawrence</data>
        </node>
        <edge source="Q43416" target="Q3820" id="P19">
            <data key="d6">P19</data>
            <data key="d7">place of birth</data>
            <data key="d8">location where the subject was born</data>
            <data key="d9">Q43416</data>
        </edge>
        <edge source="Q43416" target="Q639669" id="P106">
            <data key="d6">P106</data>
            <data key="d7">occupation</data>
            <data key="d8">occupation of a person; see also "field of work" (Property:P101), "position held"
                (Property:P39)
            </data>
        </edge>
        <edge source="Q219150" target="Q43416" id="P106">
            <data key="d6">P161</data>
            <data key="d7">cast member</data>
            <data key="d8">actor in the subject production [use "character role" (P453) and/or "name of the character
                role" (P4633) as qualifiers] [use "voice actor" (P725) for voice-only role]
            </data>
            <data key="d9">Q43416</data>
        </edge>
    </graph>
</graphml>

Usage

Generate a New Dataset

To get started with this project, first clone this repository and install the necessary dependencies using Poetry.

git clone https://github.com/yourusername/wiki-entity-summarization.git
cd wiki-entity-summarization
curl -sSL https://install.python-poetry.org | python3 -
poetry config virtualenvs.in-project true
poetry install
poetry shell

# You can set the parameters via .env file instead of providing command line arguments.
cp .env_sample .env

python3 main.py [-h] [--min_valid_summary_edges MIN_VALID_SUMMARY_EDGES] 
                [--random_walk_depth_len RANDOM_WALK_DEPTH_LEN] [--bridges_number BRIDGES_NUMBER] 
                [--max_threads MAX_THREADS] [--output_path OUTPUT_PATH] [--db_name DB_NAME] [--db_user DB_USER] 
                [--db_password DB_PASSWORD] [--db_host DB_HOST] [--db_port DB_PORT] [--neo4j_user NEO4J_USER] 
                [--neo4j_password NEO4J_PASSWORD] [--neo4j_host NEO4J_HOST] [--neo4j_port NEO4J_PORT]
                [dataset_name] [min_random_walk_number] [max_random_walk_number] [seed_node_ids] [categories]
                
        options:
                -h, --help                Show this help message and exit
                --min_valid_summary_edges Minimum number of valid summaries for a seed ndoe
                --random_walk_depth_len   Depth length of random walks (number of nodes in each random walk)
                --bridges_number          Number of connecting path bridges between components
                --max_threads             Maximum number of threads
                --output_path             Path to save output data
                --db_name                 Database name
                --db_user                 Database user
                --db_password             Database password
                --db_host                 Database host
                --db_port                 Database port
                --neo4j_user              Neo4j user
                --neo4j_password          Neo4j password
                --neo4j_host              Neo4j host
                --neo4j_port              Neo4j port

        Positional arguments:
                dataset_name              The name of the dataset to process (required)
                min_random_walk_number    Minimum number of random walks for each seed node (required)
                max_random_walk_number    Maximum number of random walks for each seed node (required)
                seed_node_ids             Seed node ids in comma-separated format (required)
                categories                Seed node categories in comma-separated format (optional)

Re-generate WikES Dataset

To re-construct our pre-generated datasets, you can use the following command:

python3 human_history_dataset.py

This project uses our pre-processor project databases. Make sure you have loaded the data and run the databases properly.

Citation

If you use this project in your research, please cite the following paper:

@misc{javadi2024wiki,
    title = {Wiki Entity Summarization Benchmark},
    author = {Saeedeh Javadi and Atefeh Moradan and Mohammad Sorkhpar and Klim Zaporojets and Davide Mottin and Ira Assent},
    year = {2024},
    eprint = {2406.08435},
    archivePrefix = {arXiv},
    primaryClass = {cs.IR}
}

License

This project and its released datasets are licensed under the CC BY 4.0 License. See the LICENSE file for details.

In the following, you can check other licenses that we used as external services, libraries, or software. By using this project, you accept the third parties' licenses.

Name		Name	Last commit message	Last commit date
Latest commit History 62 Commits
commons		commons
croissant		croissant
data/seed_nodes		data/seed_nodes
experiment		experiment
graph_generator		graph_generator
.env_sample		.env_sample
.gitignore		.gitignore
CITATION.cff		CITATION.cff
LICENSE		LICENSE
README.md		README.md
WikES-example.png		WikES-example.png
human_history_dataset.py		human_history_dataset.py
leakage_test.py		leakage_test.py
main.py		main.py
poetry.lock		poetry.lock
pyproject.toml		pyproject.toml
sample.py		sample.py
stats.py		stats.py
wikes-metadata.json		wikes-metadata.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Wiki Entity Summarization Benchmark (WikES)

Loading the Datasets

Load Using `wikes-toolkit`

Using mlcroissant

Loading the Pre-processed Databases

Process Overview

1. Building the Summary Graph

2. Expanding the Summary Graph

3. Connecting Components

4. Adding Metadata

Pre-generated Datasets

Dataset Parameters

Graph Structure

CSV Format

Example of CSV Files

GraphML Example

Usage

Generate a New Dataset

Re-generate WikES Dataset

Citation

License

About

Releases 4

Contributors 3

Languages

License

msorkhpar/wiki-entity-summarization

Folders and files

Latest commit

History

Repository files navigation

Wiki Entity Summarization Benchmark (WikES)

Loading the Datasets

Load Using wikes-toolkit

Using mlcroissant

Loading the Pre-processed Databases

Process Overview

1. Building the Summary Graph

2. Expanding the Summary Graph

3. Connecting Components

4. Adding Metadata

Pre-generated Datasets

Dataset Parameters

Graph Structure

CSV Format

Example of CSV Files

GraphML Example

Usage

Generate a New Dataset

Re-generate WikES Dataset

Citation

License

About

Topics

Resources

License

Stars

Watchers

Forks

Releases 4

Contributors 3

Languages

Load Using `wikes-toolkit`