This is the multi-page printable view of this section. Click here to print.

Return to the regular view of this page.

QUERY LANGUAGE

1 - HugeGraph Gremlin

Overview

HugeGraph supports Gremlin, a graph traversal query language of Apache TinkerPop3. While SQL is a query language for relational databases, Gremlin is a general-purpose query language for graph databases. Gremlin can be used to create entities (Vertex and Edge) of a graph, modify the properties of entities, delete entities, as well as perform graph queries.

Gremlin can be used to create entities (Vertex and Edge) of a graph, modify the properties of entities, and delete entities. More importantly, it can be used to perform graph querying and analysis operations.

TinkerPop Features

HugeGraph implements the TinkerPop framework, but not all TinkerPop features are implemented.

The table below lists the support status of various TinkerPop features in HugeGraph:

Graph Features

NameDescriptionSupport
ComputerDetermines if the {@code Graph} implementation supports {@link GraphComputer} based processingfalse
TransactionsDetermines if the {@code Graph} implementations supports transactions.true
PersistenceDetermines if the {@code Graph} implementation supports persisting it’s contents natively to disk.This feature does not refer to every graph’s ability to write to disk via the Gremlin IO packages(.e.g. GraphML), unless the graph natively persists to disk via those options somehow. For example,TinkerGraph does not support this feature as it is a pure in-sideEffects graph.true
ThreadedTransactionsDetermines if the {@code Graph} implementation supports threaded transactions which allow a transaction be executed across multiple threads via {@link Transaction#createThreadedTx()}.false
ConcurrentAccessDetermines if the {@code Graph} implementation supports more than one connection to the same instance at the same time. For example, Neo4j embedded does not support this feature because concurrent access to the same database files by multiple instances is not possible. However, Neo4j HA could support this feature as each new {@code Graph} instance coordinates with the Neo4j cluster allowing multiple instances to operate on the same database.false

Vertex Features

NameDescriptionSupport
UserSuppliedIdsDetermines if an {@link Element} can have a user defined identifier. Implementation that do not support this feature will be expected to auto-generate unique identifiers. In other words, if the {@link Graph} allows {@code graph.addVertex(id,x)} to work and thus set the identifier of the newly added {@link Vertex} to the value of {@code x} then this feature should return true. In this case, {@code x} is assumed to be an identifier data type that the {@link Graph} will accept.false
NumericIdsDetermines if an {@link Element} has numeric identifiers as their internal representation. In other words,if the value returned from {@link Element#id()} is a numeric value then this method should be return {@code true}. Note that this feature is most generally used for determining the appropriate tests to execute in the Gremlin Test Suite.false
StringIdsDetermines if an {@link Element} has string identifiers as their internal representation. In other words, if the value returned from {@link Element#id()} is a string value then this method should be return {@code true}. Note that this feature is most generally used for determining the appropriate tests to execute in the Gremlin Test Suite.false
UuidIdsDetermines if an {@link Element} has UUID identifiers as their internal representation. In other words,if the value returned from {@link Element#id()} is a {@link UUID} value then this method should be return {@code true}.Note that this feature is most generally used for determining the appropriate tests to execute in the Gremlin Test Suite.false
CustomIdsDetermines if an {@link Element} has a specific custom object as their internal representation.In other words, if the value returned from {@link Element#id()} is a type defined by the graph implementations, such as OrientDB’s {@code Rid}, then this method should be return {@code true}.Note that this feature is most generally used for determining the appropriate tests to execute in the Gremlin Test Suite.false
AnyIdsDetermines if an {@link Element} any Java object is a suitable identifier. TinkerGraph is a good example of a {@link Graph} that can support this feature, as it can use any {@link Object} as a value for the identifier. Note that this feature is most generally used for determining the appropriate tests to execute in the Gremlin Test Suite. This setting should only return {@code true} if {@link #supportsUserSuppliedIds()} is {@code true}.false
AddPropertyDetermines if an {@link Element} allows properties to be added. This feature is set independently from supporting “data types” and refers to support of calls to {@link Element#property(String, Object)}.true
RemovePropertyDetermines if an {@link Element} allows properties to be removed.true
AddVerticesDetermines if a {@link Vertex} can be added to the {@code Graph}.true
MultiPropertiesDetermines if a {@link Vertex} can support multiple properties with the same key.false
DuplicateMultiPropertiesDetermines if a {@link Vertex} can support non-unique values on the same key. For this value to be {@code true}, then {@link #supportsMetaProperties()} must also return true. By default this method, just returns what {@link #supportsMultiProperties()} returns.false
MetaPropertiesDetermines if a {@link Vertex} can support properties on vertex properties. It is assumed that a graph will support all the same data types for meta-properties that are supported for regular properties.false
RemoveVerticesDetermines if a {@link Vertex} can be removed from the {@code Graph}.true

Edge Features

NameDescriptionSupport
UserSuppliedIdsDetermines if an {@link Element} can have a user defined identifier. Implementation that do not support this feature will be expected to auto-generate unique identifiers. In other words, if the {@link Graph} allows {@code graph.addVertex(id,x)} to work and thus set the identifier of the newly added {@link Vertex} to the value of {@code x} then this feature should return true. In this case, {@code x} is assumed to be an identifier data type that the {@link Graph} will accept.false
NumericIdsDetermines if an {@link Element} has numeric identifiers as their internal representation. In other words,if the value returned from {@link Element#id()} is a numeric value then this method should be return {@code true}. Note that this feature is most generally used for determining the appropriate tests to execute in the Gremlin Test Suite.false
StringIdsDetermines if an {@link Element} has string identifiers as their internal representation. In other words, if the value returned from {@link Element#id()} is a string value then this method should be return {@code true}. Note that this feature is most generally used for determining the appropriate tests to execute in the Gremlin Test Suite.false
UuidIdsDetermines if an {@link Element} has UUID identifiers as their internal representation. In other words,if the value returned from {@link Element#id()} is a {@link UUID} value then this method should be return {@code true}.Note that this feature is most generally used for determining the appropriate tests to execute in the Gremlin Test Suite.false
CustomIdsDetermines if an {@link Element} has a specific custom object as their internal representation.In other words, if the value returned from {@link Element#id()} is a type defined by the graph implementations, such as OrientDB’s {@code Rid}, then this method should be return {@code true}.Note that this feature is most generally used for determining the appropriate tests to execute in the Gremlin Test Suite.false
AnyIdsDetermines if an {@link Element} any Java object is a suitable identifier. TinkerGraph is a good example of a {@link Graph} that can support this feature, as it can use any {@link Object} as a value for the identifier. Note that this feature is most generally used for determining the appropriate tests to execute in the Gremlin Test Suite. This setting should only return {@code true} if {@link #supportsUserSuppliedIds()} is {@code true}.false
AddPropertyDetermines if an {@link Element} allows properties to be added. This feature is set independently from supporting “data types” and refers to support of calls to {@link Element#property(String, Object)}.true
RemovePropertyDetermines if an {@link Element} allows properties to be removed.true
AddEdgesDetermines if an {@link Edge} can be added to a {@code Vertex}.true
RemoveEdgesDetermines if an {@link Edge} can be removed from a {@code Vertex}.true

Data Type Features

NameDescriptionSupport
BooleanValuestrue
ByteValuestrue
DoubleValuestrue
FloatValuestrue
IntegerValuestrue
LongValuestrue
MapValuesSupports setting of a {@code Map} value. The assumption is that the {@code Map} can contain arbitrary serializable values that may or may not be defined as a feature itselffalse
MixedListValuesSupports setting of a {@code List} value. The assumption is that the {@code List} can contain arbitrary serializable values that may or may not be defined as a feature itself. As this{@code List} is “mixed” it does not need to contain objects of the same type.false
BooleanArrayValuesfalse
ByteArrayValuestrue
DoubleArrayValuesfalse
FloatArrayValuesfalse
IntegerArrayValuesfalse
LongArrayValuesfalse
SerializableValuesfalse
StringArrayValuesfalse
StringValuestrue
UniformListValuesSupports setting of a {@code List} value. The assumption is that the {@code List} can contain arbitrary serializable values that may or may not be defined as a feature itself. As this{@code List} is “uniform” it must contain objects of the same type.false

Gremlin Steps

HugeGraph supports all steps of Gremlin. For complete reference information about Gremlin, please refer to the Gremlin official website.

StepDescriptionDocumentation
addEAdd an edge between two vertices.addE step
addVadd vertices to graph.addV step
andMake sure all traversals return values.and step
asStep modulator for assigning variables to the step’s output.as step
byStep Modulators used in conjunction with group and order.by step
coalesceReturns the first traversal that returns a result.coalesce step
constantReturns a constant value. Used in conjunction with coalesce.constant step
countReturns a count from the traversal.count step
dedupReturns values with duplicates removed.dedup step
dropDiscards a value (vertex/edge).drop step
foldActs as a barrier for computing aggregated values from results.fold step
groupGroups values based on specified labels.group step
hasUsed to filter properties, vertices, and edges. Supports hasLabel, hasId, hasNot, and has variants.has step
injectInjects values into the stream.inject step
isUsed to filter by a Boolean expression.is step
limitUsed to limit the number of items in a traversal.limit step
localLocally wraps a part of a traversal, similar to a subquery.local step
notUsed to generate the negation result of a filter.not step
optionalReturns the result of a specified traversal if it generates any results, otherwise returns the calling element.optional step
orEnsures that at least one traversal returns a value.or step
orderReturns results in the specified order.order step
pathReturns the full path of the traversal.path step
projectProjects properties as a map.project step
propertiesReturns properties with specified labels.properties step
rangeFilters based on a specified range of values.range step
repeatRepeats a step a specified number of times. Used for looping.repeat step
sampleUsed to sample results returned by the traversal.sample step
selectUsed to project the results returned by the traversal.select step
storeThis step is used fo.r non-blocking aggregation of results returned by traversalstore step
treeAggregate the paths in vertices into a tree.tree step
unfoldUnfolds an iterator as a step.unfold step
unionMerge the results returned by multiple traversals.union step
VThese are the steps required for traversing between vertices and edges: V, E, out, in, both, outE, inE, bothE, outV, inV, bothV, and otherV.order step
whereUsed to filter the results returned by a traversal. Supports eq, neq, lt, lte, gt, gte, and between operators.where step

2 - HugeGraph Examples

1 概述

本示例将TitanDB Getting Started 为模板来演示HugeGraph的使用方法。通过对比HugeGraph和TitanDB,了解HugeGraph和TitanDB的差异。

1.1 HugeGraph与TitanDB的异同

HugeGraph和TitanDB都是基于Apache TinkerPop3框架的图数据库,均支持Gremlin图查询语言,在使用方法和接口方面具有很多相似的地方。然而HugeGraph是全新设计开发的,其代码结构清晰,功能较为丰富,接口更为友好等特点。

HugeGraph相对于TitanDB而言,其主要特点如下:

  • HugeGraph目前有HugeGraph-API、HugeGraph-Client、HugeGraph-Loader、HugeGraph-Studio、HugeGraph-Spark等完善的工具组件,可以完成系统集成、数据载入、图可视化查询、Spark 连接等功能;
  • HugeGraph具有Server和Client的概念,第三方系统可以通过jar引用、client、api等多种方式接入,而TitanDB仅支持jar引用方式接入。
  • HugeGraph的Schema需要显式定义,所有的插入和查询均需要通过严格的schema校验,目前暂不支持schema的隐式创建。
  • HugeGraph充分利用后端存储系统的特点来实现数据高效存取,而TitanDB以统一的Kv结构无视后端的差异性。
  • HugeGraph的更新操作可以实现按需操作(例如:更新某个属性)性能更好。TitanDB的更新是read and update方式。
  • HugeGraph的VertexId和EdgeId均支持拼接,可实现自动去重,同时查询性能更好。TitanDB的所有Id均是自动生成,查询需要经索引。

1.2 人物关系图谱

本示例通过Property Graph Model图数据模型来描述希腊神话中各人物角色的关系(也被成为人物关系图谱),具体关系详见下图。

image

其中,圆形节点代表实体(Vertex),箭头代表关系(Edge),方框的内容为属性。

该关系图谱中有两类顶点,分别是人物(character)和位置(location)如下表:

名称类型属性
charactervertexname,age,type
locationvertexname

有六种关系,分别是父子(father)、母子(mother)、兄弟(brother)、战斗(battled)、居住(lives)、拥有宠物(pet) 关于关系图谱的具体信息如下:

名称类型source vertex labeltarget vertex label属性
fatheredgecharactercharacter-
motheredgecharactercharacter-
brotheredgecharactercharacter-
petedgecharactercharacter-
livesedgecharacterlocationreason

在HugeGraph中,每个edge label只能作用于一对source vertex label和target vertex label。也就是说,如果一个图内定义了一种关系father连接character和character,那farther就不能再连接其他的vertex labels。

因此本例子将原TitanDB中的monster, god, human, demigod均使用相同的vertex label: character来表示, 同时增加属性type来标识人物的类型。edge label与原TitanDB保持一致。当然为了满足edge label约束,也可以通过调整edge labelname来实现。

2 Graph Schema and Data Ingest Examples

HugeGraph需要显示创建Schema,因此需要依次创建PropertyKey、VertexLabel、EdgeLabel,如果有需要索引还需要创建IndexLabel。

2.1 Graph Schema

schema = hugegraph.schema()

schema.propertyKey("name").asText().ifNotExist().create()
schema.propertyKey("age").asInt().ifNotExist().create()
schema.propertyKey("time").asInt().ifNotExist().create()
schema.propertyKey("reason").asText().ifNotExist().create()
schema.propertyKey("type").asText().ifNotExist().create()

schema.vertexLabel("character").properties("name", "age", "type").primaryKeys("name").nullableKeys("age").ifNotExist().create()
schema.vertexLabel("location").properties("name").primaryKeys("name").ifNotExist().create()

schema.edgeLabel("father").link("character", "character").ifNotExist().create()
schema.edgeLabel("mother").link("character", "character").ifNotExist().create()
schema.edgeLabel("battled").link("character", "character").properties("time").ifNotExist().create()
schema.edgeLabel("lives").link("character", "location").properties("reason").nullableKeys("reason").ifNotExist().create()
schema.edgeLabel("pet").link("character", "character").ifNotExist().create()
schema.edgeLabel("brother").link("character", "character").ifNotExist().create()

2.2 Graph Data

// add vertices
Vertex saturn = graph.addVertex(T.label, "character", "name", "saturn", "age", 10000, "type", "titan")
Vertex sky = graph.addVertex(T.label, "location", "name", "sky")
Vertex sea = graph.addVertex(T.label, "location", "name", "sea")
Vertex jupiter = graph.addVertex(T.label, "character", "name", "jupiter", "age", 5000, "type", "god")
Vertex neptune = graph.addVertex(T.label, "character", "name", "neptune", "age", 4500, "type", "god")
Vertex hercules = graph.addVertex(T.label, "character", "name", "hercules", "age", 30, "type", "demigod")
Vertex alcmene = graph.addVertex(T.label, "character", "name", "alcmene", "age", 45, "type", "human")
Vertex pluto = graph.addVertex(T.label, "character", "name", "pluto", "age", 4000, "type", "god")
Vertex nemean = graph.addVertex(T.label, "character", "name", "nemean", "type", "monster")
Vertex hydra = graph.addVertex(T.label, "character", "name", "hydra", "type", "monster")
Vertex cerberus = graph.addVertex(T.label, "character", "name", "cerberus", "type", "monster")
Vertex tartarus = graph.addVertex(T.label, "location", "name", "tartarus")

// add edges
jupiter.addEdge("father", saturn)
jupiter.addEdge("lives", sky, "reason", "loves fresh breezes")
jupiter.addEdge("brother", neptune)
jupiter.addEdge("brother", pluto)
neptune.addEdge("lives", sea, "reason", "loves waves")
neptune.addEdge("brother", jupiter)
neptune.addEdge("brother", pluto)
hercules.addEdge("father", jupiter)
hercules.addEdge("mother", alcmene)
hercules.addEdge("battled", nemean, "time", 1)
hercules.addEdge("battled", hydra, "time", 2)
hercules.addEdge("battled", cerberus, "time", 12)
pluto.addEdge("brother", jupiter)
pluto.addEdge("brother", neptune)
pluto.addEdge("lives", tartarus, "reason", "no fear of death")
pluto.addEdge("pet", cerberus)
cerberus.addEdge("lives", tartarus)

2.3 Indices

HugeGraph默认是自动生成Id,如果用户通过primaryKeys指定VertexLabelprimaryKeys字段列表后,VertexLabel的Id策略将会自动切换到primaryKeys策略。 启用primaryKeys策略后,HugeGraph通过vertexLabel+primaryKeys拼接生成VertexId ,可实现自动去重,同时无需额外创建索引即可以使用primaryKeys中的属性进行快速查询。 例如 “character” 和 “location” 都有primaryKeys("name")属性,因此在不额外创建索引的情况下可以通过g.V().hasLabel('character') .has('name','hercules')查询vertex 。

3 Graph Traversal Examples

3.1 Traversal Query

1. Find the grandfather of hercules

g.V().hasLabel('character').has('name','hercules').out('father').out('father')

也可以通过repeat方式:

g.V().hasLabel('character').has('name','hercules').repeat(__.out('father')).times(2)

2. Find the name of hercules’s father

g.V().hasLabel('character').has('name','hercules').out('father').value('name')

3. Find the characters with age > 100

g.V().hasLabel('character').has('age',gt(100))

4. Find who are pluto’s cohabitants

g.V().hasLabel('character').has('name','pluto').out('lives').in('lives').values('name')

5. Find pluto can’t be his own cohabitant

pluto = g.V().hasLabel('character').has('name', 'pluto')
g.V(pluto).out('lives').in('lives').where(is(neq(pluto)).values('name')

// use 'as'
g.V().hasLabel('character').has('name', 'pluto').as('x').out('lives').in('lives').where(neq('x')).values('name')

6. Pluto’s Brothers

pluto = g.V().hasLabel('character').has('name', 'pluto').next()
// where do pluto's brothers live?
g.V(pluto).out('brother').out('lives').values('name')

// which brother lives in which place?
g.V(pluto).out('brother').as('god').out('lives').as('place').select('god','place')

// what is the name of the brother and the name of the place?
g.V(pluto).out('brother').as('god').out('lives').as('place').select('god','place').by('name')

推荐使用HugeGraph-Studio 通过可视化的方式来执行上述代码。另外也可以通过HugeGraph-Client、HugeApi、GremlinConsole和GremlinDriver等多种方式执行上述代码。

3.2 总结

HugeGraph 目前支持 Gremlin 的语法,用户可以通过 Gremlin / REST-API 实现各种查询需求。