# HugeGraph BenchMark Performance

### 1 Test environment

#### 1.1 Hardware information

CPU | Memory | 网卡 | 磁盘 |
---|---|---|---|

48 Intel(R) Xeon(R) CPU E5-2650 v4 @ 2.20GHz | 128G | 10000Mbps | 750GB SSD |

#### 1.2 Software information

##### 1.2.1 Test cases

Testing is done using the graphdb-benchmark, a benchmark suite for graph databases. This benchmark suite mainly consists of four types of tests:

- Massive Insertion, which involves batch insertion of vertices and edges, with a certain number of vertices or edges being submitted at once.
- Single Insertion, which involves the immediate insertion of each vertex or edge, one at a time.
- Query, which mainly includes the basic query operations of the graph database:
- Find Neighbors, which queries the neighbors of all vertices.
- Find Adjacent Nodes, which queries the adjacent vertices of all edges.
- Find Shortest Path, which queries the shortest path from the first vertex to 100 random vertices.

- Clustering, which is a community detection algorithm based on the Louvain Method.

##### 1.2.2 Test dataset

Tests are conducted using both synthetic and real data.

MIW, SIW, and QW use SNAP datasets:

CW uses synthetic data generated by the LFR-Benchmark generator.

The size of the datasets used in this test are not mentioned.

Name | Number of Vertices | Number of Edges | File Size |
---|---|---|---|

email-enron.txt | 36,691 | 367,661 | 4MB |

com-youtube.ungraph.txt | 1,157,806 | 2,987,624 | 38.7MB |

amazon0601.txt | 403,393 | 3,387,388 | 47.9MB |

com-lj.ungraph.txt | 3997961 | 34681189 | 479MB |

#### 1.3 Service configuration

HugeGraph version: 0.5.6, RestServer and Gremlin Server and backends are on the same server

- RocksDB version: rocksdbjni-5.8.6

Titan version: 0.5.4, using thrift+Cassandra mode

- Cassandra version: cassandra-3.10, commit-log and data use SSD together

Neo4j version: 2.0.1

The Titan version adapted by graphdb-benchmark is 0.5.4.

### 2 Test results

#### 2.1 Batch insertion performance

Backend | email-enron(30w) | amazon0601(300w) | com-youtube.ungraph(300w) | com-lj.ungraph(3000w) |
---|---|---|---|---|

HugeGraph | 0.629 | 5.711 | 5.243 | 67.033 |

Titan | 10.15 | 108.569 | 150.266 | 1217.944 |

Neo4j | 3.884 | 18.938 | 24.890 | 281.537 |

*Instructions*

- The data scale is in the table header in terms of edges
- The data in the table is the time for batch insertion, in seconds
- For example, HugeGraph(RocksDB) spent 5.711 seconds to insert 3 million edges of the amazon0601 dataset.

##### Conclusion

- The performance of batch insertion: HugeGraph(RocksDB) > Neo4j > Titan(thrift+Cassandra)

#### 2.2 Traversal performance

##### 2.2.1 Explanation of terms

- FN(Find Neighbor): Traverse all vertices, find the adjacent edges based on each vertex, and use the edges and vertices to find the other vertices adjacent to the original vertex.
- FA(Find Adjacent): Traverse all edges, get the source vertex and target vertex based on each edge.

##### 2.2.2 FN performance

Backend | email-enron(3.6w) | amazon0601(40w) | com-youtube.ungraph(120w) | com-lj.ungraph(400w) |
---|---|---|---|---|

HugeGraph | 4.072 | 45.118 | 66.006 | 609.083 |

Titan | 8.084 | 92.507 | 184.543 | 1099.371 |

Neo4j | 2.424 | 10.537 | 11.609 | 106.919 |

*Instructions*

- The data in the table header “( )” represents the data scale, in terms of vertices.
- The data in the table represents the time spent traversing vertices, in seconds.
- For example, HugeGraph uses the RocksDB backend to traverse all vertices in amazon0601, and search for adjacent edges and another vertex, which takes a total of 45.118 seconds.

##### 2.2.3 FA性能

Backend | email-enron(30w) | amazon0601(300w) | com-youtube.ungraph(300w) | com-lj.ungraph(3000w) |
---|---|---|---|---|

HugeGraph | 1.540 | 10.764 | 11.243 | 151.271 |

Titan | 7.361 | 93.344 | 169.218 | 1085.235 |

Neo4j | 1.673 | 4.775 | 4.284 | 40.507 |

*Explanation*

- The data size in the header “( )” is based on the number of vertices.
- The data in the table is the time it takes to traverse the vertices, in seconds.
- For example, HugeGraph with RocksDB backend traverses all vertices in the amazon0601 dataset, and looks up adjacent edges and other vertices, taking a total of 45.118 seconds.

###### Conclusion

- Traversal performance: Neo4j > HugeGraph(RocksDB) > Titan(thrift+Cassandra)

#### 2.3 Performance of Common Graph Analysis Methods in HugeGraph

##### Terminology Explanation

- FS (Find Shortest Path): finding the shortest path between two vertices
- K-neighbor: all vertices that can be reached by traversing K hops (including 1, 2, 3…(K-1) hops) from the starting vertex
- K-out: all vertices that can be reached by traversing exactly K out-edges from the starting vertex.

##### FS performance

Backend | email-enron(30w) | amazon0601(300w) | com-youtube.ungraph(300w) | com-lj.ungraph(3000w) |
---|---|---|---|---|

HugeGraph | 0.494 | 0.103 | 3.364 | 8.155 |

Titan | 11.818 | 0.239 | 377.709 | 575.678 |

Neo4j | 1.719 | 1.800 | 1.956 | 8.530 |

*Explanation*

- The data in the header “()” represents the data scale in terms of edges
- The data in the table is the time it takes to find the shortest path
**from the first vertex to 100 randomly selected vertices**in seconds - For example, HugeGraph using the RocksDB backend to find the shortest path from the first vertex to 100 randomly selected vertices in the amazon0601 graph took a total of 0.103s.

###### Conclusion

- In scenarios with small data size or few vertex relationships, HugeGraph outperforms Neo4j and Titan.
- As the data size increases and the degree of vertex association increases, the performance of HugeGraph and Neo4j tends to be similar, both far exceeding Titan.

##### K-neighbor Performance

Vertex | Depth | Degree 1 | Degree 2 | Degree 3 | Degree 4 | Degree 5 | Degree 6 |
---|---|---|---|---|---|---|---|

v1 | Time | 0.031s | 0.033s | 0.048s | 0.500s | 11.27s | OOM |

v111 | Time | 0.027s | 0.034s | 0.115s | 1.36s | OOM | – |

v1111 | Time | 0.039s | 0.027s | 0.052s | 0.511s | 10.96s | OOM |

*Explanation*

- HugeGraph-Server’s JVM memory is set to 32GB and may experience OOM when the data is too large.

##### K-out performance

Vertex | Depth | 1st Degree | 2nd Degree | 3rd Degree | 4th Degree | 5th Degree | 6th Degree |
---|---|---|---|---|---|---|---|

v1 | Time | 0.054s | 0.057s | 0.109s | 0.526s | 3.77s | OOM |

Degree | 10 | 133 | 2453 | 50,830 | 1,128,688 | ||

v111 | Time | 0.032s | 0.042s | 0.136s | 1.25s | 20.62s | OOM |

Degree | 10 | 211 | 4944 | 113150 | 2,629,970 | ||

v1111 | Time | 0.039s | 0.045s | 0.053s | 1.10s | 2.92s | OOM |

Degree | 10 | 140 | 2555 | 50825 | 1,070,230 |

*Explanation*

- The JVM memory of HugeGraph-Server is set to 32GB, and OOM may occur when the data is too large.

###### Conclusion

- In the FS scenario, HugeGraph outperforms Neo4j and Titan in terms of performance.
- In the K-neighbor and K-out scenarios, HugeGraph can achieve results returned within seconds within 5 degrees.

#### 2.4 Comprehensive Performance Test - CW

Database | Size 1000 | Size 5000 | Size 10000 | Size 20000 |
---|---|---|---|---|

HugeGraph(core) | 20.804 | 242.099 | 744.780 | 1700.547 |

Titan | 45.790 | 820.633 | 2652.235 | 9568.623 |

Neo4j | 5.913 | 50.267 | 142.354 | 460.880 |

*Explanation*

- The “scale” is based on the number of vertices.
- The data in the table is the time required to complete community discovery, in seconds. For example, if HugeGraph uses the RocksDB backend and operates on a dataset of 10,000 vertices, and the community aggregation is no longer changing, it takes 744.780 seconds.
- The CW test is a comprehensive evaluation of CRUD operations.
- In this test, HugeGraph, like Titan, did not use the client and directly operated on the core.

##### Conclusion

- Performance of community detection algorithm: Neo4j > HugeGraph > Titan

Last modified May 14, 2023: Update hugegraph-benchmark-0.5.6.md (#226) (2e5bf8c6)