Issue |
Wuhan Univ. J. Nat. Sci.
Volume 28, Number 5, October 2023
|
|
---|---|---|
Page(s) | 451 - 460 | |
DOI | https://doi.org/10.1051/wujns/2023285451 | |
Published online | 10 November 2023 |
Information Technology
CLC number: TP301.6
Improved Hybrid Collaborative Fitering Algorithm Based on Spark Platform
1
College of Computer Information Engineering, Jiangxi Normal University, Nanchang 330022, Jiangxi, China
2
National-Level International Science and Technology Cooperation Base of Networked Supporting Software, Nanchang 330022, Jiangxi, China
Received:
1
July
2022
An improved Hybrid Collaborative Filtering algorithm (H-CF) is proposed, addressing the issues of data sparsity, low recommendation accuracy, and poor scalability present in traditional collaborative filtering algorithms. The core of H-CF is a linear weighted hybrid algorithm based on the Latent Factor Model (LFM) and the Improved Item Clustering and Similarity Calculation Collaborative Filtering Algorithm (ITCSCF). To begin with, the items are clustered based on their attribute dimension, which accelerates the computation of the nearest neighbor set. Subsequently, H-CF enhances the formula for scoring similarity by penalizing popular items and optimizing unpopular items. This improvement enhances the rationality of scoring similarity and reduces the impact of data sparseness. Furthermore, a weighting function is employed to combine the various improved algorithms. The balance factor of the weighting function is dynamically adjusted to attain the optimal recommendation list. To address the real-time and scalability concerns, the algorithm leverages the Spark big data distributed cluster computing framework. Experiments were conducted using the public dataset MovieLens, where the improved algorithm's performance was compared against the algorithm before enhancement and the algorithm running on a single machine. The experimental results demonstrate that the improved algorithm outperforms in terms of data sparsity, recommendation personalization, accuracy, recall, and efficiency.
Key words: recommendation algorithm / collaborative filtering / latent factor model / score weighting / item clustering / spark / similarity calculation
Biography: YOU Zhen, female, Associate professor, research direction: software formalization, concurrent distributed computing, virtual reality, big data algorithm. E-mail: youzhenjxnu@163.com
Fundation item: Supported by the Natural Science Foundation of Jiangxi Province (20212BAB202018), Provincial Virtual Simulation Experiment Education Project of Jiangxi Education Department (2020-2-0048) and the Science and Technology Research Project of Jiangxi Province Educational Department (GJJ210333)
© Wuhan University 2023
This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
Current usage metrics show cumulative count of Article Views (full-text article views including HTML views, PDF and ePub downloads, according to the available data) and Abstracts Views on Vision4Press platform.
Data correspond to usage on the plateform after 2015. The current usage metrics is available 48-96 hours after online publication and is updated daily on week days.
Initial download of the metrics may take a while.