Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make sure NetworkX can properly read/write GEXF 1.3 #16

Open
mbastian opened this issue Aug 28, 2022 · 7 comments
Open

Make sure NetworkX can properly read/write GEXF 1.3 #16

mbastian opened this issue Aug 28, 2022 · 7 comments

Comments

@mbastian
Copy link
Member

mbastian commented Aug 28, 2022

NetworkX is the goto Python library for graph manipulation and already has a GEXF import/export.

Definition of done

  • Import/Export are fully 1.3 compatible (based on what is possible given networkx internals)
@rjurney
Copy link

rjurney commented Sep 24, 2023

@mbastian looks like networkx can't do lists of floats as a attributes, which were added in 1.3.

@rjurney
Copy link

rjurney commented Sep 25, 2023

@mbastian to be specific about my use case... in reading the code, it looks like Gephi's format assumes that lists of properties are time series. My use case is that I want to store a 384-dimension embedding from a paraphrase embedding of a citation graph's node properties on the nodes and do analysis in NetworkX and then also use this GEXF file in Deep Graph Library (DGL) and PyG aka PyTorch Geometric.

Dataset: https://snap.stanford.edu/data/cit-HepTh.html
Embedding: https://huggingface.co/sentence-transformers/paraphrase-MiniLM-L6-v2
Sentence Transformers: https://www.sbert.net/

Example code below JSONizes the embedding list of floats to make things go, but I'd like to be able to store it. @mbastian Can you make GEXF support embeddings moving forward in the next version?

# Embed the abstracts for GNN features. Embedding is a generic approach for retrieval as well.
# Note: NetworkX can't save lists in GEXF format, so we'll JSONize the list & save the embeddings separately.
embedded_abstracts: np.ndarray = None
if os.path.exists("data/embedded_abstracts.npy"):
    embedded_abstracts = np.load("data/embedded_abstracts.npy")
else:
    embedded_abstracts = embed_paper_info(all_abstracts, convert_to_tensor=False)
    np.save("data/embedded_abstracts.npy", embedded_abstracts)

for paper_id, emb in zip(file_paper_ids, embedded_abstracts):
    assert emb.shape == (384,)

    # Gephi assumes a list of floats is a time series, so we need to convert to a string
    G.nodes[file_to_networkx_ids[paper_id]]["Embedding-JSON"] = json.dumps(emb.tolist())

Example document:

------------------------------------------------------------------------------
\\
Paper: hep-th/0001001
From: Paul S. Aspinwall <[email protected]>
Date: Sat, 1 Jan 2000 00:02:31 GMT   (84kb)
Date (revised v2): Mon, 17 Jan 2000 14:52:43 GMT   (85kb)

Title: Compactification, Geometry and Duality: N=2
Authors: Paul S. Aspinwall
Comments: 82 pages, 8 figures, LaTeX2e, TASI99, refs added and some typos fixed
Report-no: DUKE-CGTP-00-01
\\
  These are notes based on lectures given at TASI99. We review the geometry of
the moduli space of N=2 theories in four dimensions from the point of view of
superstring compactification. The cases of a type IIA or type IIB string
compactified on a Calabi-Yau threefold and the heterotic string compactified on
K3xT2 are each considered in detail. We pay specific attention to the
differences between N=2 theories and N>2 theories. The moduli spaces of vector
multiplets and the moduli spaces of hypermultiplets are reviewed. In the case
of hypermultiplets this review is limited by the poor state of our current
understanding. Some peculiarities such as ``mixed instantons'' and the
non-existence of a universal hypermultiplet are discussed.
\\

Its embedding:

[-0.5083363652229309, -0.35725411772727966, 0.1389939785003662, -0.1347253918647766, -0.1535784900188446, 0.43154388666152954, 0.15374013781547546, -0.008106844499707222, -0.1662866771221161, -0.15766437351703644, 0.35521116852760315, 0.15607962012290955, 0.6218618750572205, 0.07288412749767303, -0.08790934085845947, -0.145784392952919, 0.14549043774604797, -0.03458674997091293, -0.741215705871582, 0.019919676706194878, -0.2773298919200897, -0.16332964599132538, -0.42131808400154114, 0.06080969050526619, 0.55726158618927, 0.18690286576747894, -0.19952552020549774, 0.23189248144626617, 0.39608946442604065, 0.031538791954517365, 0.4129146337509155, 0.37623560428619385, 0.16398969292640686, 0.09904278814792633, 0.5887687802314758, 0.19061870872974396, -0.020812658593058586, 0.6324356198310852, 0.005971217527985573, 0.2787822186946869, 0.20738601684570312, -1.136680006980896, 0.4140499532222748, 0.7376874685287476, 0.26450657844543457, 0.08141785860061646, -0.529627799987793, -0.07897279411554337, 0.302225261926651, 0.26963791251182556, -0.5572066307067871, 0.022079501301050186, -0.41076093912124634, -0.16617120802402496, -0.014963116496801376, 0.2403220683336258, 0.03146751970052719, -0.514580488204956, 0.02357768639922142, -0.19823256134986877, -0.1633021980524063, 0.14651842415332794, -0.5526030659675598, 0.5041884183883667, 0.20464496314525604, 0.16364993155002594, -0.0379401370882988, -0.16234970092773438, 0.273735910654068, 0.4701267182826996, 0.38202783465385437, 0.6249184608459473, -0.6957732439041138, -0.4264785051345825, 0.06444322317838669, 0.6805640459060669, -0.3116794228553772, 0.009198327548801899, -0.18131123483181, -0.4511978328227997, 0.2052099108695984, -0.7076764106750488, -0.2577372193336487, -0.11397387087345123, 0.004945039749145508, 0.29662612080574036, 0.48335978388786316, 0.16308338940143585, 0.02071310393512249, -0.06133018806576729, 0.3547375500202179, -0.015222515910863876, -0.3296150863170624, 0.27946799993515015, 0.10797177255153656, 0.5158742070198059, 0.3182218670845032, -0.1535983383655548, 0.6189644932746887, 0.16411934792995453, -0.20841538906097412, -0.09344162046909332, -0.5550981760025024, -0.0629420131444931, -0.5624946355819702, -0.6402942538261414, -0.201442688703537, 0.18017089366912842, 0.27435120940208435, 0.18869590759277344, 0.04372529685497284, -0.3697742521762848, -0.06247770041227341, 0.14726705849170685, -0.5059475302696228, 0.17057615518569946, 0.49116864800453186, 0.303863525390625, 0.7109688520431519, -0.08683305978775024, 0.4489392042160034, 0.8849781155586243, 0.2691556513309479, 0.054163508117198944, 0.20481964945793152, -0.047171857208013535, 0.49669820070266724, 0.3995380997657776, -0.2686813771724701, -0.1840616762638092, -0.03536504507064819, -0.6438066959381104, 0.0884658545255661, -0.049895793199539185, 0.1340586543083191, 0.008303023874759674, 0.12762904167175293, 0.19640912115573883, 0.09768808633089066, -0.17605964839458466, 0.03801923617720604, 0.22554127871990204, -0.0682666227221489, -0.21554642915725708, 0.34073975682258606, -0.1460971236228943, -0.6941462755203247, 0.20569857954978943, 0.5059947967529297, -0.3478425145149231, -0.13772228360176086, -0.06816817820072174, -0.5381731390953064, 0.05074828490614891, 0.06547494232654572, -0.29076358675956726, -0.15378691256046295, 0.2487240433692932, 0.3956683874130249, 0.28119516372680664, -0.36075934767723083, -0.13970033824443817, 0.3972870111465454, 0.24897192418575287, 0.39377814531326294, 0.28017812967300415, 0.5327494740486145, -0.4372592270374298, -0.33479222655296326, 0.06613282114267349, 0.4145204424858093, -0.09375417977571487, 0.006537675857543945, 0.44525378942489624, 0.03501797467470169, -0.2608524560928345, -0.006014466285705566, -0.036333389580249786, -0.537621796131134, 0.18642160296440125, 0.07950431853532791, -0.2662293016910553, -0.24478109180927277, -0.5388363003730774, 0.0674142986536026, 0.006562564522027969, 0.13258269429206848, 0.43928781151771545, 0.14479145407676697, -0.6222834587097168, -0.33258986473083496, -0.6179389357566833, -0.2406272441148758, 0.014090614393353462, -0.3714263439178467, -0.412462443113327, 0.27592408657073975, 0.0349738746881485, -0.2271711528301239, 0.5821718573570251, -0.36073049902915955, -0.2708200216293335, 0.20686064660549164, -0.23197627067565918, 0.042743708938360214, 0.14470048248767853, -0.024556558579206467, -0.6748477816581726, -0.16571849584579468, 0.20108835399150848, -0.07298190146684647, -0.5514233112335205, -0.06006268784403801, -0.04524163901805878, 0.012701082974672318, 0.41854313015937805, -0.23032033443450928, -0.7118092179298401, -0.3731357455253601, -0.038922086358070374, 0.11315789818763733, -0.19573336839675903, 0.5248740911483765, -0.8068038821220398, -0.3490540087223053, 0.6316984295845032, -0.24007821083068848, 0.19816532731056213, 0.02993026375770569, -0.09062369167804718, 0.32186055183410645, 0.41794851422309875, 0.504360556602478, 0.1191108375787735, 0.3482481837272644, 0.15071724355220795, 0.05511059984564781, -0.14041967689990997, 0.18092676997184753, 0.02112441509962082, 0.1610906720161438, 0.03389054536819458, -0.15241602063179016, -0.1575293093919754, -0.12149085104465485, 0.5990638136863708, -0.7717245817184448, -0.04483901336789131, 0.19884341955184937, 0.10792878270149231, 0.10256698727607727, -0.5565033555030823, 0.029021425172686577, 0.16152621805667877, 0.3552182912826538, -0.19814762473106384, 0.19467827677726746, -0.1417803019285202, -0.4221956431865692, 0.29962822794914246, 0.6577330827713013, 0.17069461941719055, 0.28435853123664856, 0.21476049721240997, 0.8059138059616089, -0.048171523958444595, -0.16125980019569397, -0.07039059698581696, -0.09816092252731323, -0.1514281928539276, 0.24609962105751038, -0.0849226862192154, 0.09835521876811981, 0.32943952083587646, -0.25816798210144043, -0.06863641738891602, 0.049438249319791794, 0.025209199637174606, 0.08355040848255157, 0.21580441296100616, -0.41988956928253174, 0.07675647735595703, -0.14934852719306946, -0.4311261475086212, -0.3233030140399933, -0.19432544708251953, 0.09847439080476761, -0.24860693514347076, 0.1917468160390854, -0.04119320958852768, 0.036722056567668915, -0.21387654542922974, -0.0030690915882587433, -0.13641610741615295, 0.012929495424032211, 0.3078806400299072, -0.34233883023262024, 0.045709915459156036, 0.11729196459054947, 0.13548825681209564, -0.3334689736366272, 0.29789718985557556, 0.12125445902347565, 0.13667646050453186, -0.6150417327880859, 0.0011353977024555206, -0.012479695491492748, 0.2989681363105774, 0.3227967321872711, -0.052288718521595, 0.3666779100894928, -0.2939664423465729, 0.12823599576950073, -0.10072129964828491, -0.176337331533432, 0.2739074230194092, -0.26633912324905396, 0.43988385796546936, -0.09746330976486206, -0.2637675702571869, 0.02734220400452614, -0.20562905073165894, -0.6480699777603149, 0.1781962364912033, 0.17634740471839905, -0.07000317424535751, 0.3828813135623932, -0.6547756195068359, 0.15146368741989136, 0.03579747676849365, -0.007166197523474693, 0.15733617544174194, 0.046128399670124054, -0.7098756432533264, 0.22380834817886353, 0.3733425438404083, -0.7145859003067017, 0.18655464053153992, -0.4990553557872772, -0.2336399257183075, -0.3922877907752991, -0.12291472405195236, 0.3854149878025055, -0.3202831447124481, -0.0007252912037074566, 0.34592050313949585, -0.07235311716794968, 0.5941299796104431, -0.04594670981168747, -0.10191763192415237, 0.15881231427192688, 0.38152000308036804, 0.4613525867462158, 0.07394368201494217, -0.031655725091695786, -0.1491849571466446, -0.4769206941127777, 0.11919506639242172, 0.52707439661026, 0.12066393345594406, -0.3855656683444977, 0.0897144302725792, -0.015513844788074493, 0.8330134153366089, 0.44915086030960083, 0.07939314842224121, -0.387637197971344, 0.21580561995506287, 0.18721160292625427, -0.3700406849384308, -0.1043381541967392, 0.19310817122459412, 0.116238072514534, -0.40746667981147766, 0.7291035056114197, -0.43795716762542725, 0.22398078441619873, -0.24590949714183807, -0.06679191440343857, -0.5940830111503601, -0.018695345148444176, -0.33444738388061523, -0.09381847828626633, 0.18644794821739197]

@rjurney
Copy link

rjurney commented Sep 25, 2023

Oh uh, supporting embeddings in Gephi is going to be essential to keeping it relevant as graph AI and visualization merge and computing becomes more GPU-centric.

@mbastian
Copy link
Member Author

Thanks @rjurney, I would be happy to chat about what would make it easier to handle embeddings in Gephi. The GEXF format supports float lists so if it's not properly imported in Gephi it must be a bug. Compatibility with NetworkX is surely also important. Let me investigate, I bet that we don't have much unit tests around lists import as it hasn't been super popular in the past.

@rjurney
Copy link

rjurney commented Oct 9, 2023

Cool, I will share my notebook with you so you can see. It isn't open source at this ppl t but I trust you ;)

Another issue is that integer node IDs become strings. I have had to cast them back to integers. A lot of Python tools around networkx like littleballoffur (graph sampling) and karateclub (graph embeddings) won't work without integer node IDs.

@Deanozk
Copy link

Deanozk commented Jun 15, 2024

What was the result of this discussion?

@Deanozk
Copy link

Deanozk commented Jun 15, 2024

also does pickle file format have same issue or not?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
No open projects
Status: No status
Development

No branches or pull requests

3 participants