Constrained anomaly detection in heterogeneous information networks with rich attributes

被引:0
|
作者
Zhang R. [1 ,2 ,3 ]
Zhang G. [1 ]
Guo J. [1 ]
Jiang H. [4 ]
机构
[1] School of Computer Science and Technology, Wuhan University of Technology, Wuhan
[2] Hubei Key Laboratory of Transportation Internet of Things, Wuhan University of Technology, Wuhan
[3] Hubei Key Laboratory of Inland Shipping Technology, Wuhan University of Technology, Wuhan
[4] School of Electronic Information and Communications, Huazhong University of Science and Technology, Wuhan
来源
| 1600年 / Huazhong University of Science and Technology卷 / 45期
关键词
Anomaly detection; Heterogeneous information network; Meta path; Rich attribute; Similarity;
D O I
10.13245/j.hust.171205
中图分类号
学科分类号
摘要
For heterogeneous information networks, anomalous vertex detection taking into account network structures would possibly distort the results or produce complicated results. To solve this problem, an algorithm for constrained anomaly detection in attributed heterogeneous information networks (CADAHIN) was proposed. In this method, interactive data with rich information was modeled as an attributed heterogeneous information network, where users are allowed to specify attributes and sub-spaces through attributed meta paths and evaluate the outlierness of vertexes in terms of network structure and attribute content. On this basis, a constrained anomaly detection algorithm framework was presented. Experiments were conducted on the real-world dataset Arxiv. Under the constraints specified by attributed meta paths on author, paper, title and abstract, the queries output a top-k list of anomalous vertexes and a set of anomalous vertexes in the constraint domain. The results show that the proposed method outperforms the baseline algorithms considering only network structures or attribute contents by over at least 12.95%. © 2017, Editorial Board of Journal of Huazhong University of Science and Technology. All right reserved.
引用
收藏
页码:26 / 31
页数:5
相关论文
共 12 条
  • [1] Sun Y., Han J., Mining heterogeneous information networks: a structural analysis approach, ACM SIGKDD Explorations Newsletter, 14, 2, pp. 20-28, (2013)
  • [2] Breunig M.M., Kriegel H.P., Ng R.T., Et al., LOF: identifying density-based local outliers, ACM Sigmod Record, 29, 2, pp. 93-104, (2000)
  • [3] Kumar K.M., Reddy A.R.M., A fast DBSCAN clustering algorithm by accelerating neighbor searching using groups method, Pattern Recognition, 58, 8, pp. 39-48, (2016)
  • [4] Schnitzer D., Flexer A., The unbalancing effect of hubs on K-medoids clustering in high-dimensional spaces, Proc of International Joint Conference on Neural Networks (IJCNN), pp. 1-8, (2015)
  • [5] Perozzi B., Schueppert M., Saalweachter J., Et al., When recommendation goes wrong: anomalous link discovery in recommendation networks, Proc of International Conference on Knowledge Discovery and Data Mining(SIGKDD), pp. 569-578, (2016)
  • [6] Gupta M., Gao J., Han J., Community distribution outlier detection in heterogeneous information networks, Proc of Joint European Conference on Machine Learning and Knowledge Discovery in Databases(ECML PKDD), pp. 557-573, (2013)
  • [7] Kuck J., Zhuang H., Yan X., Et al., Query-based outlier detection in heterogeneous information networks, Proc of International Conference on Extending Database Technology(EDBT), pp. 325-336, (2015)
  • [8] Jeh G., Widom J., SimRank: a measure of structural-context similarity, Proc of International Conference on Knowledge Discovery and Data Mining(SIGKDD), pp. 538-543, (2002)
  • [9] Sun Y., Han J., Yan X., Et al., Pathsim: Meta path-based top-k similarity search in heterogeneous information networks, Proc of the VLDB Endowment, 4, 11, pp. 992-1003, (2011)
  • [10] Cimpoi M., Maji S., Kokkinos I., Et al., Deep filter banks for texture recognition, description, and segmentation, International Journal of Computer Vision, 118, 1, pp. 65-94, (2016)