Open Access
Wuhan Univ. J. Nat. Sci.
Volume 28, Number 1, February 2023
Page(s) 45 - 52
Published online 17 March 2023

© Wuhan University 2023

Licence Creative CommonsThis is an Open Access article distributed under the terms of the Creative Commons Attribution License (, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

0 Introduction

Knowledge Graphs (KGs) such as DBpedia and Freebase encode statements about the world around us. They have attracted increasing attention from multiple fields, including question answering, knowledge inference, recommendation systems, and so on. By their very nature, KGs are far from complete as the world evolves continuously. This motivates work on automated prediction of new knowledge based on known facts. ​In inference tasks, KG completion has become a major focus of statistical feature learning.​

​ KG encodes structured information about entities and their relationships as directed labels in a multilingual language. Many researchers encode entities and relations into low-dimensional vector spaces[1]. Various efforts have attempted to address the data sparse representation challenge. Several studies have shown that reducing data variability improves the convergence of global models. However, they typically require modifying the embedding, which may result in the loss of important data about the intrinsic diversity of the local distribution. ​So, some methods stabilize the translation method by adjusting the distance and global model bias across the parameter space, such as TransE[2], TransD[3], and TransR[4]. Other studies, such as RESCAL[5], Dist-Mult[6], and ComplEx[7] improve the generalization ability of the model by semantic matching bilinear strategies. ​In addition, some studies treat a relation as a rotation from head entity to tail entity.​RotatE[8] is proposed as the first model to handle symmetry/anti-symmetry. We admit that the optimality of parties in heterogeneous multilingualism is fundamentally at odds with the global optimality point. ​Then, completing the prediction is not a trivial technical challenge.​

In view of the above corollary, we propose a dual-branch learning framework called Triple Encoder-Scoring Module​ (TEsm), which is a typical personalized joint learning frame. ​Specifically, we apply a convolutional network that decouples the target graph into a base encoder that participates in collaborative training and a locally preserved personalization layer. ​The base encoder layer learns the global feature knowledge, while the personalization layer retains the translation information to resist the embedding leakage of gradients.​ ​Each party's local training is corrected from a global perspective using global class center self-learning. ​A global center is defined as the average vector of each shared representation.​ Further, inspired by Chen[9], TEsm dramatically reduces the computational complexity by only using iterative and alternating binary branches for training, instead of repeating sample pairs and large batches through joint learning. ​TEsm significantly outperforms state-of-the-art learning algorithms on multilingualism.​ For instance, TEsm achieves 83.3% Hits@10 (accuracy of the top 10 candidate entities) on the Japanese dataset, while the best top-1 accuracy of existing studies is 66.1%. ​TEsm improves the accuracy by 26% compared with most algorithms.​ So, the main innovation of this paper is the following:

1) We propose a new personalized learning framework to solve the distribution embedding, which mitigates the local and global problem by introducing the global dual-branch model with joint knowledge and alignment learning to correct local training.

2) We design the local layered network architecture to learn the global knowledge through self-learning effectively, and joint loss function after a classifier algorithm is used to "filter" candidate entities.

3) We implement TEsm and conduct extensive experiments on different datasets. The results show that TEsm outperforms state-of-the-art methods regarding inference accuracy and computational efficiency.

1 Related Work

1.1 Representation Learning

KG embedding is derived from the idea of word embedding. Each entity and relation is mapped to a low-dimensional vector space to measure the accuracy of the triples.​ For instance, the Trans model was proposed by Bordes et al[2], in which translation distance used a specific relation vector to measure the distance between two entity vectors and. Besides, RotatE[8] employs complex embedding space to a model relationship as the rotation of the vector from header entity to tail entity . In addition, unstructured models, such as KDCoE[10], AttrE[11], MultiKE[12], and HMAN[13],utilize supplementary information about Wikipedia entity descriptions to improve representation learning. ​Later, many modifications of the above model result in several models that achieve satisfactory results with respect to data sparsity.​ Since there is already a large body of work in this area, we provide only a highly general summary.​

1.2 Graph Neural Network

Graph Neural Network (GNN) learns node vectors by recursively aggregating the representations of neighboring nodes. GNN is applied to various processing tasks of natural languages, such as semantic role tags[14] and machine translation[15]. Extensive variational networks have emerged on GNN, such as Graph Convolutional Network (GCN)[16], Relationship Graph Convolutional Network (R-GCN)[17], and Graph Attention Network (GAN)[18]. ​Because of this, the KG will be allowed to acquire more facts and hence to be better completed smoothly due to the directivity of the nodes and the structural aggregation from single points to local first-order nodes. The word vectors are obtained by the above model due to its excellent ability in modeling graph structures. So, it increases the aggregation of features to handle accurate word vector representations.

1.3 Integrated Inference

Integrated inference can achieve the best learning performance and improve the generalization ability of the algorithm by training multiple learners and combining outcomes according to a certain strategy. One of the most usual methods is Boosting[19], followed by AdaBoost[20] and RankBoost[21]. ​AdaBoost starts from a pool of weak classifiers and iteratively selects the best one based on the sample weights in that iteration. The final classifier is a linear combination of the selected weak classifiers, where each classifier is weighted by its performance. At each iteration, the sample weights are updated based on the selected classifier, so that subsequent classifiers focus more on hard samples. Multilingual KGs use fused source information, which then helps the target graph to predict missing facts. ​In this paper, we extend AdaBoost to combine ranking results from multiple KG embedding models that do an excellent job of selecting candidate entities.

2 Method

KGs are known as the data source. . It is composed of several triples, where include the head and tail entities of triples,, and represents the relationship between head and tail entities, and.

2.1 Preliminary Knowledge

This paper performs optimization based on the work of multilingual completion[9].​The framework is depicted in Fig. 1 and consists of the following three main points:​

thumbnail Fig. 1 Network design (TEsm)

1-5 indicate entities in the graph, and the lines between Source 1 and 2 indicate aligned seeds

1) Using joint operators fuse local structural information to achieve the interaction between triples.

2) Dual-branch structure. Knowledge Model (KM) learns semantic vectors to encode entities and relationships, while Alignment Model (AM) uses the supervised method to obtain correspondence sets in different graphs.

3) Design the local layered network architecture to effectively learn the global knowledge through self-learning and joint loss function.

2.2 Triple Encoder

​Graph convolutional encoding is a vertex representation that integrates features of relations between adjacent vertices towards the spatial domain.​ The process requires semantic knowledge and structure data to provide a rich word vector representation for learning and alignment learning (overlapping parts are represented by sharing). ​The local aggregates are depicted in Fig.2. Two methods are jointly learned:​

thumbnail Fig. 2 Local encoder

1) Neighbor aggregation. GCN encodes local features to better capture entity proximity information.

2) Relationship model. A relation is an arithmetic operation on a vector space. The goal is to fully exploit structural and entity information proximity factors for fine-grained entity embedding.

To achieve the above, the encoder stacks the layers of the GCN network inside the KG. of layer is calculated as:


where represents the diagonal matrix, which is the input feature of the node. is the sum of the adjacency matrix and identity matrix. is layer weight matrix. The result of the convolution operation performed by the activation function is passed to the network layer. The initial features of an entity can be properties of the entity itself or randomly generated. is the output result of the previous layer and the last layer reperesents an entity embedding.​

2.3 Knowledge Model (KM)

represents the score function (2) of language in the knowledge graph :


where , is a specific triplet score function, which is used in the following functions in ① and ② below. is a hyperparameter, is a positive sample, is a set of actual triples in the KG. A negative sample generated by randomly replacing the true triplet with either a head entity or a tail entity,​ and ' is the set of constituent negative samples. Notably, implementing this procedure requires finding positive and negative samples in a uniform vector space, and we mainly consider two representative triplet scoring functions.

① TransE-GCN ​TransE translates relations into translations between head and tail entities in Euclidean space. The formula is as follows:


where is the embedding vector graph convolution in the last layer.

② RotatE-GCN RotatE treats relations as rotations in a complex space, and tail entities are derived from head entities that rotate through relations in the vector space. The formula is as follows:


where represents the Hadama product in complex vector space.

2.4 Alignment Model (AM)

​Alignment information between each pair of embeddings is connected by additional property labels of the graph structure, and entity alignment is achieved using self-supervised learning. Given a set of different language graphsand, ​the entity correspondence between any two graphs requires matching based on a small number of seed entities. Alignment function (5) is defined using cosine similarity:


where represented the two graphs as seed data. In particular, Nearest Neighbor (NN)[22] adaptive learning of the relationship is used during alignment to reduce the candidate after each iteration.

2.5 Joint Model

​The above model is a two-branch structure that optimizes the loss per function for entities and relations, and multiple KGs are trained uniformly and jointly. The formula is as follows:


where is the loss of the knowledge model as defined in Eq. (2), is alignment loss in Eq. (5), and is the weight of the two models that forms the positive value hyperparameter. There is no direct optimization function . Instead, each iterates through different batches and optimizes each function loss separately. ​The value of M is 5, representing the other five datasets in the paper.​ In addition, L2 regularization is used to prevent overfitting problems.

​It is shown that only very high initial cosine similarity can achieve an equivalent computation that maintains stable alignment within a few iterations. In this way, the above learning patterns can be efficiently constructed using an iterative process with equal batches of joint training.

2.6 Integrated Reasoning

​Knowledge transfer and fact completion are based on the results of the above model. The target query is transformed into the source KG, and the result is then transformed back to the target graph by aligning the information. It is weighted and summed using an ensemble of classifier algorithms.

As shown in Fig. 3, given an incomplete triplet, the completion goal is to accurately find the tail entity, which forms a new triplet in the target graph. A weighted evaluation scheme is used to systematically rank and validate candidate value entities.

thumbnail Fig. 3 Integrated reasoning

A triple group (h, r, ?) query the corresponding two source map entities and deduce the ranking of their triples; Finally, the final target candidate entity is determined by the transfer

First, the source and target aligned entity pairs are predicted, and then the source is queried to the target based on the alignment results. The information from sources 1 and 2 is used to complete the target graph, and the result obtains the candidate results by the target graph. The final candidate entities are predicted by weighting[21]. The formula is as follows:


where represents an entity in the target graph and represents a specific entity model weight. The AdaBoost classifier is used in the computation to learn each specific weight. If an entity is in top-k (where k is selected as 1,3,10), is 1 or 0.

3 Experiment and Result Analysis

3.1 Experimental Setting

​This section evaluates and validates the importance of each module for TEsm. A detailed study demonstrates the effectiveness and generalization of the model.

3.1.1 Experimental dataset

All experiments in this paper are based on the DBP-5L dataset. Moreover, the multilingual alignment task is relatively late-stage, with only a few suitable datasets providing seed alignment between entity and relation pairs. In this way, the matching dataset is dominated by the five languages derived from DBpedia, namely English (En), Greek (El), Spanish (Es), Japanese (Ja), and French (Fr), as listed in Table 1. In the DBP5L dataset, about 40% of entities in one language are aligned with entities in another language. The relations of the five graphs are represented in a uniform pattern, consistent with the definition of the problem.

All language graphs are aligned to collect triples gradually. The entity alignment covers approximately 40% of the distance between any two graphs. Based on the same set of seed entities, the El KG has significantly less vocabulary than the other four. ​The dataset is divided into training, validation, and testing at a ratio of 60:30:10.​Relational labels represent only 8% of actual triplets, meanwhile, almost 80% of genuine triplets express relations across all languages.​The study also randomly selects half of the seed entities to be aligned for training and evaluation.

Table 1

Statistics of DBP-5L dataset

3.1.2 Evaluation method

All other entities in the graph are wrong "candidates" to replace their head or tail entities. After the previous work, the "Filter" setting is used, and the candidate space for the post triple has been excluded. The goal of the model is to select the correct triple with the "filter" method. Each test set is considered as a query and the top-k predictions are retrieved. In this paper, the Hits@k metric refers to the average proportion of triples ranked less than n in link prediction. In general, take n to be equal to 1, 3, or 10. When n is 10, the ranking of candidate entities is more accurate.

3.1.3 Experimental environment

​The equipment selected for the training part of this experiment is as follows: a server running Ubuntu18.04 and using a graphics card NVIDIA RTX 8000. The algorithm is implemented using Python3.6 and the TensorFlow 1.10 framework. In the training process, Adam[22] is used as an optimizer, and the hyperparameter fine-tunes Hits@k through a grid search. The learning rate lr∈{0.1,0.01,0.001,0.000 1}, the dimension d∈{32, 64, 128, 200, 300}, batch size b∈{64, 256, 512, 1 024}. This parameter is determined by the accuracy of the validation set, and the L2 regularization coefficient is fixed at 0.000 1. The boundary is 0.5. TEsm (TransE) is best set to {lr = 0.001, d = 300, b = 256}. TEsm (RotatE) is best set to {lr = 0.01, d = 200, b =512}.

3.2 Accuracy Results

3.2.1 Main results

The results for datasets are reported in Table 2. Effects on the baseline models TransE, RotatE, DistMult, TransD and HolE are taken from Ref. [8]. RotatE-booting and TransE-booting are obtained from Ref. [9]. ​GCN combines the methods of TEsm model and ensemble inference to improve Hits@10 by 1.1% to 13.0%.​As shown in Table 2, the Ja dataset is used as the target grape and the other four are used as source grapes for completion. Compared with the baseline data, RotatE-GCN is improved by 48%, Hits@3 by 15%, and Hits@1 by 0.4%. The En dataset is used as the target graph and the other four are used as source grapes for completion.​

Compared with the baseline data, the experimental results are improved by 26% for TransE-GCN, 30% for Hits@1, and 14% for RotatE-GCN with Hits@1. The Es dataset is used as the target graph and the other four are used as source grapes for completion.​RotatE-GCN improves the baseline by 26% on Hits@10, 10% on Hits@3 and 35% on Hits@1. The Fr dataset is used as the target graph and the other four are used as source grapes for completion. RotatE-GCN improves Hits@10 by 24% and Hits@3 by 14% over baseline TransE. ​The El dataset is used as the target graph and the other four are used as source grapes for completion. ​Compared to baseline data, TransE-GCN improves Hits@10 by 31% over baseline TransE. In the face of sparse data, the performance of the simple model is not greatly degraded compared with the accuracy of the full data. Does this mean that the performance of the simple model is sufficient for some simple entities, and that the complex model actually overfits the data? As can be observed, TEsm performs best on datasets with varying levels of heterogeneity. The model presented in this paper outperforms the base model. Concerning sparse semantic interference, it is also observed that TEsm improves in sparse graphs compared with dense ones. In particular, the multilingual (Greek) KG with a small number of entities Hits@1 increased by 13% over the baseline. This finding confirms that low knowledge coverage and sparse structure favor additional knowledge.

It is worth elaborating on the comparison between our model and the bootstrap model that inspired our work. The experimental results showed that our models (TransE-GCN, RotatE-GCN) both consistently yielded better results with improvements of 10.8% and 6.7% in terms of multilingual five datasets, respectively. ​We believe that the improvement is attributable to two reasons.​ ​First, due to the idea of heterogeneous features of KGs proposed,​semantic embedding successfully captures both local structural information by considering entities and relations in the neighborhood, as well as the semantic information residing within the transformation operator.​Moreover, by doing so, the dual-branch model of relations in a KG is modeled only once.​Simultaneous replacement of previously entity-specific matrices by shared ones could potentially facilitate encoding of more complex latent information. As a result, fewer parameters need to be learned in our model, which helps alleviate the overfitting problem. To explore the contribution of each module, the following ablation experiments are performed. KM represents the knowledge learning model in Section 2.3 and AM represents the alignment model in Section 2.4. Extensive ablation experiments have been performed with this model. The specific ablation methods are as follows:

1) For the entire TEsm framework, remove the components from Eq. (1) to estimate the importance of structure information.

2) Picking out the GCN encoder for different model acquire triplet embedding. Depending on the combination, only suboptimal results can be obtained.​

3) Accuracy is measured by differential TransE or RotatE combined with Eq. (2).

The results are presented in Table 3.​The accuracy of the proposed model differs from that of the base translation model. The performance of either the knowledge model or the alignment model is improved compared with the baseline.

Above, both models better learn the internal information of the triplet during the fusion of local knowledge. ​The accuracy of the full multilingual Hit@10 is improved by 30%.​Thus, GCN(TEsm) generalization and robustness of the model are verified.k

Table 2

Completion result of different language (unit:%)

Table 3

GCN for RotatE and TransE model generalization performance (unit:%)

4 Conclusion

In this paper, we propose a unified framework for simultaneous knowledge acquisition and alignment. ​To handle the characteristic features of KGs when using traditional translation models, we develop a novel approach to transform a neighborhood into a homogeneous neighborhood via GCN. The embedding space captures both structured and unstructured vectors based on local features. In addition, the AdaBoost classifier selects an auxiliary input of the source graph to aid the goal completion. ​Experimental results demonstrate the superiority of the TEsm model, showing the effectiveness of handling completion.

In the future, besides graph structure, there are other multimodal improvement learning methods including text, speech, pictures, and videos. Realizing cross-modal knowledge will be quite challenging.


  1. Dettmers T, Minervini P, Stenetorp P, et al. Convolutional 2D knowledge graph embeddings [J]. Proceedings of the AAAI Conference on Artificial Intelligence, 2018, 32(1): 1811-1818. [Google Scholar]
  2. Bordes A, Usunier N, Duran A G, et al. Translating embeddings for modeling multi-relational data [J]. Advances in neural Information Processing Systems, 2013, 26(1282): 2787-2795. [Google Scholar]
  3. Ji G L, He S Z, Xu L H, et al. Knowledge graph embedding via dynamic mapping matrix [C]//Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing. Stroudsburg: Association for Computational Linguistics, 2015: 687-696. [Google Scholar]
  4. Lin Y K, Liu Z Y, Luan H B, et al. Modeling relation paths for representation learning of knowledge bases [EB/OL]. [2022-08-05]. [Google Scholar]
  5. Nickel M, Tresp V, Kriegel H P. A three-way model for collective learning on multi-relational data [C]// Proceedings of the 28th International Conference on Machine Learning. New York: ACM, 2011: 809-816. [Google Scholar]
  6. Yang B S, Yih W T, He X D, et al. Embedding entities and relations for learning and inference in knowledge bases [EB/OL]. [2022-08-29]. [Google Scholar]
  7. Trouillon T, Gaussier É, Bouchard G. Complex embeddings for simple link prediction [C]// International Conference on Machine Learning. New York: ACM, 2016: 2071-2080. [Google Scholar]
  8. Sun Z Q, Deng Z H, Nie J Y, et al. RotatE: Knowledge graph embedding by relational rotation in complex space[EB/OL]. [2022-08-29]. [Google Scholar]
  9. Chen X. Multilingual knowledge graph completion via ensemble knowledge transfer [EB/OL]. [2022-06-08]. [Google Scholar]
  10. Trisedya B D, Qi J Z, Zhang R. Entity alignment between knowledge graphs using attribute embeddings [J]. Proceedings of the AAAI Conference on Artificial Intelligence, 2019, 33(1): 297-304. [Google Scholar]
  11. Zhang Q H, Sun Z Q, Qu Y Z. Multi-view knowledge graph embedding for entity alignment [EB/OL]. [2022-08-06]. [Google Scholar]
  12. Yang H W, Zou Y Y, Shi P, et al. Aligning cross-lingual entities with multi-aspect information [EB/OL]. [2022-08-20]. [Google Scholar]
  13. Marcheggiani D, Titov I. Encoding sentences with graph convolutional networks for semantic role labeling [EB/OL]. [2022-07-30]. [Google Scholar]
  14. Bastings J, Titov I, Aziz W, et al. Graph convolutional encoders for syntax aware neural machine translation [EB/OL]. [2022-08-18]. [Google Scholar]
  15. Thomas N, Kipf T N, Welling M. Semi supervised classification with graph convolutional networks [EB/OL]. [2022-08-22]. [Google Scholar]
  16. Schlichtkrull M, Kipf T N, Bloem P, et al. Modeling relational data with graph convolutional networks [C]// European Semantic Web Conference. Cham: Springer International Publishing, 2018: 593-607. [Google Scholar]
  17. Veličković P, Cucurull G, Casanova A, et al. Graph attention networks [EB/OL]. [2022-08-26]. [Google Scholar]
  18. Conneau A, Lample G, J´egou H. Word translation without parallel data [EB/OL]. [2022-08-30]. [Google Scholar]
  19. Kingma D P, Ba J. Adam: A method for stochastic optimization [EB/OL]. [2022-07-30]. [Google Scholar]
  20. Freund Y, Schapire R E. A decision-theoretic generalization of online learning and an application to boosting [J]. Journal of Computer and System Sciences, 1997, 55(1): 119-139. [Google Scholar]
  21. Freund Y, Iyer R D, Schapire R E, et al. An efficient boosting algorithm for combining preferences [J]. Journal of Machine Learning Research, 2003, 4: 933-969. [Google Scholar]
  22. Artetxe M, Labaka G, Agirre E. A robust self-learning method for fully unsupervised cross-lingual mappings of word embeddings [EB/OL]. [2022-08-26]. [Google Scholar]

All Tables

Table 1

Statistics of DBP-5L dataset

Table 2

Completion result of different language (unit:%)

Table 3

GCN for RotatE and TransE model generalization performance (unit:%)

All Figures

thumbnail Fig. 1 Network design (TEsm)

1-5 indicate entities in the graph, and the lines between Source 1 and 2 indicate aligned seeds

In the text
thumbnail Fig. 2 Local encoder
In the text
thumbnail Fig. 3 Integrated reasoning

A triple group (h, r, ?) query the corresponding two source map entities and deduce the ranking of their triples; Finally, the final target candidate entity is determined by the transfer

In the text

Current usage metrics show cumulative count of Article Views (full-text article views including HTML views, PDF and ePub downloads, according to the available data) and Abstracts Views on Vision4Press platform.

Data correspond to usage on the plateform after 2015. The current usage metrics is available 48-96 hours after online publication and is updated daily on week days.

Initial download of the metrics may take a while.