We conduct a systematic comparison between a weighted graph summarization model and an unweighted graph summarization model. Our contributions are as follow:
-
Systematic Comparison: coducting the first systematic comparison between two extensively-studied graph summarization models using 3 search algorithms, 8 datasets, and 5 evaluation metrics.
-
Unexpected Observation: leading to a surprising observation that unweighted models is significantly better than weighted ones in all considered aspects.
-
Improvement of the State-of-the-art Algorithm: improving a state-of-the-art graph-summarizaion algorithm for the weighted model by exploiting the observation.
The online-appendix is available at Here
The algorithms of each path is listed below.
| Path | Weighted | Unweighted |
|---|---|---|
| kGrassComparison | kGrass | kGrass (Unweighted) |
| SSumMComparison | SSumM | SSumM (Unweighted) |
| MoSSo-LossyComparison | MoSSo-Lossy | MoSSo-Lossy (Unweighted) |
| Name | #Nodes | #Edges | Summary | Source | Download |
|---|---|---|---|---|---|
| Email-Enron (EE) | 36,692 | 183,831 | SNAP | LINK | |
| DBLP (DB) | 317,080 | 1,049,866 | Collaboration | SNAP | LINK |
| Amazon0601 (A6) | 403,394 | 2,443,408 | Co-purchase | SNAP | LINK |
| CNR-2000 (C2) | 325,557 | 2,738,969 | Hyperlinks | LAW | LINK |
| Skitter (SK) | 1,696,415 | 11,095,298 | Internet | SNAP | LINK |
| LiveJournal (LJ) | 3,997,962 | 34,681,189 | Social | SNAP | LINK |
| DBPedia (DP) | 18,268,991 | 126,890,209 | Hyperlinks | KONECT | LINK |
| WebLarge (WL) | 18,483,186 | 261,787,258 | Hyperlinks | LAW | LINK |
- >= OpenJDK 12
kGrassComparison assumes that the input graph G = (V, E) is undirected without self-loops. Thus, the format of an input file is as follows. Each line represents a single edge. Each edge {u, v} ∈ E joins two distinct nodes u != v ∈ V, separated by a tab. Each node v ∈ V is assigned to a unique integer id. The format of the example file is given below.
0 1
2 3
3 4
- The example consists of 5 nodes with 3 edges.
java -jar kGrassComparison.jar [data path] [target compression ratio] [target reconstruction error] [model type]
ex) java -jar kGrassComparison.jar ./email-enron.txt 0.4 1 uwe
- data path: Path to the input text file
- target compression ratio [0,1]: The desired size of a summary graph compared relative to the input graph size in bits
- target reconstruction error {1,2}: The reconstruction error measure to be used.
- model type {we, uwe}: The model type to be used
The output file contains information about subnodes (nodes in G) belonging to each supernode s ∈ S of the output weighted summary graph G_we = (S, P, ω) and information about each superedge p ∈ P. The first integer on each line following the line "<Subnode of each supernode>" represents the id of the supernode, and the following integers separated by tabs represent the ids of the subnodes belonging to that supernode. Each line following the line "<Superedge info>" represents a single superedge. The three integers separated by tabs represent the id of the source supernode, the id of the destination supernode, and the weight of the superedge (i.e., the number of subedges belonging to the superedge).
- Output
Model Type: we
|V|: 36692
|E|: 183831
Compression Ratio 39.98541285884974%
L1 Reconstruction Error 3.8248179192134426E-4
|S| 985
|P| 61086
- Output file
<Subnode of each supernode>
5 5
9 9
27 27
45 45 39 59
.
.
<Superedge info>
.
.
36357 36357 10
36444 36444 24
36588 36588 10
36604 36604 10
The output file contains information about subnodes (nodes in G) belonging to each supernode s ∈ S of the output unweighted summary graph G_uwe = (S, P) and information about each superedge p ∈ P. The first integer on each line following the line represents the id of the supernode, and the following integers separated by tabs represent the ids of the subnodes belonging to that supernode. Each line following the line represents a single superedge. The two integers separated by tabs represent the id of the source supernode and the id of the destination supernode.
- Output
Model Type: uwe
|V|: 36692
|E|: 183831
Compression Ratio 39.991402550397524%
L1 Reconstruction Error 1.3075126127435666E-4
|S| 3380
|P| 76746
- Output file
<Subnode of each supernode>
5 5
9 9
27 27
46 46
.
.
<Superedge info>
53 46
54 53
56 5
56 27
.
.
- >= OpenJDK 12
MoSSo-LossyComparison assumes that The input file lists the additions in an undirected graph without self-loops in the order that they arrive. Each line corresponds to an edge addition. Each line consists of a source node id, a destination node id, and an indicator, which are integers separated by a tab. The format of the example file is given below.
0 1 1
2 3 1
3 4 1
- The example consists of 3 additions (i.e., 5 nodes with 3 edges).
java -jar MoSSo-LossyComparsion.jar [data path] [model type]
ex) java -jar MoSSo-LossyComparsion.jar ./email-enron.txt we
- data path: Path to the input text file
- model type {we, uwe}: The model type to be used
The output file contains information about subnodes (nodes in G) belonging to each supernode s ∈ S of the output weighted summary graph G_we = (S, P, ω) and information about each superedge p ∈ P. The first integer on each line following the line "<Subnode of each supernode>" represents the id of the supernode, and the following integers separated by tabs represent the ids of the subnodes belonging to that supernode. Each line following the line "<Superedge info>" represents a single superedge. The three integers separated by tabs represent the id of the source supernode, the id of the destination supernode, and the weight of the superedge (i.e., the number of subedges belonging to the superedge).
- Output
Model Type: we
|V| 36692
|E| 183831
|S| 4850
|P| 12841
Compression Ratio 15.428383%
L1 Reconstruction Error 4.551099050444017E-4
L2 Reconstruction Error 4.1576597176530075E-7
- Output file
<Subnode of each supernode>
0 26077 26085 26076 26084 26078 26080 26075 26086
3430 31185
6860 33327
5673 28028
.
.
<Superedge info>
31584 31584 3
31584 17215 3
29287 21040 42
.
.
The output file contains information about subnodes (nodes in G) belonging to each supernode s ∈ S of the output unweighted summary graph G_uwe = (S, P) and information about each superedge p ∈ P. The first integer on each line following the line "<Subnode of each supernode>" represents the id of the supernode, and the following integers separated by tabs represent the ids of the subnodes belonging to that supernode. Each line following the line "<Superedge info>" represents a single superedge. The two integers separated by tabs represent the id of the source supernode and the id of the destination supernode.
- Output
Model Type: uwe
|V| 36692
|E| 183831
|S| 12896
|P| 57345
Compression Ratio 37.077922%
L1 Reconstruction Error 1.1157377576891236E-4
L2 Reconstruction Error 2.8788272074421435E-7
- Output file
<Subnode of each supernode>
0 7282
6860 3685
8047 11279 4369
5673 25080 20307
.
.
<Superedge info>
12760 1296
12760 16
12760 3923
6380 2144
.
.
- >= OpenJDK 12
SSumMComparison assumes that the input graph G = (V, E) is undirected without self-loops. Thus, the format of an input file is as follows. Each line represents a single edge. Each edge {u, v} ∈ E joins two distinct nodes u != v ∈ V, separated by a tab. Each node v ∈ V is assigned to a unique integer id. The format of the example file is given below.
0 1
2 3
3 4
- The example consists of 5 nodes with 3 edges.
java -jar SSumMComparison.jar [data path] [target compression ratio] [target reconstruction error] [model type]
ex) java -jar SSumMComparison.jar ./email-enron.txt 0.4 1 uwe
- data path: Path to the input text file
- target compression ratio [0,1]: The desired size of a summary graph compared relative to the input graph size in bits
- target reconstruction error {1,2}: The reconstruction error measure to be used.
- model type {we, uwe}: The model type to be used
The output file contains information about subnodes (nodes in G) belonging to each supernode s ∈ S of the output weighted summary graph G_we = (S, P, ω) and information about each superedge p ∈ P. The first integer on each line following the line "<Subnode of each supernode>" represents the id of the supernode, and the following integers separated by tabs represent the ids of the subnodes belonging to that supernode. Each line following the line "<Superedge info>" represents a single superedge. The three integers separated by tabs represent the id of the source supernode, the id of the destination supernode, and the weight of the superedge (i.e., the number of subedges belonging to the superedge).
- Output
Model Type: we
---------------------------------------------------
|V| 36692
|E| 183831
iter: 1
iter: 2
.
.
---------------------------------------------------
|S| 14201
|P| 47942
Compression Ratio 39.832586%
L1 Reconstriction Error 1.62e-04
- Output file
<Subnode of each supernode>
29126 32989 32990
1 1
25863 22032
3 3 4
.
.
<Superedge info>
31584 31584 3
31584 17215 3
29287 21040 42
.
.
The output file contains information about subnodes (nodes in G) belonging to each supernode s ∈ S of the output unweighted summary graph G_uwe = (S, P) and information about each superedge p ∈ P. The first integer on each line following the line "<Subnode of each supernode>" represents the id of the supernode, and the following integers separated by tabs represent the ids of the subnodes belonging to that supernode. Each line following the line "<Superedge info>" represents a single superedge. The two integers separated by tabs represent the id of the source supernode and the id of the destination supernode.
- Output
Model Type: uwe
---------------------------------------------------
|V| 36692
|E| 183831
iter: 1
iter: 2
.
.
frac 0.4383425410897358
---------------------------------------------------
|S| 12909
|P| 63301
Compression Ratio 39.999781%
L1 Reconstriction Error 1.10e-04
- Output file
<Subnode of each supernode>
34787 31202
1 1
32151 25092
3 3
.
.
<Superedge info>
12824 7503
12824 8111
12824 4383
12824 4620
.
.