论文标题

Shadfa 0.1:伊朗电影知识图和基于图形的建议系统

Shadfa 0.1: The Iranian Movie Knowledge Graph and Graph-Embedding-Based Recommender System

论文作者

Pouyan, Rayhane, Kalamati, Hadi, Ebrahimian, Hannane, Karrabi, Mohammad, Akbarzadeh-T, Mohammad-R.

论文摘要

电影是娱乐的重要来源。但是,当一个人试图在每年大量增加的数据中找到所需的内容时,就会出现问题。推荐系统可以提供适当的算法来解决此问题。基于content_b的技术在大多数情况下缺乏可用的用户数据,因此很受欢迎。基于content_bate的推荐系统基于项目人口统计信息的相似性;术语频率_反向文档频率(TF_IDF)和知识图嵌入(KGE)是用于矢量化数据以计算这些相似性的两种方法。在本文中,我们通过组合TF_IDF提出了一个基于加权的content_ -bed电影RS,这是嵌入文本数据(例如绘图/描述)的适当方法,以及用于嵌入命名实体(例如导演名称)的KGE。使用遗传算法确定特征之间的权重。此外,伊朗电影数据集是通过从Movie_Reed网站上刮除数据来创建的。该数据集和Farsbase kg的结构用于创建Moviefarsbase kg,这是所提出的content_base_based rs实现过程中的一个组件。使用精度,召回和F1分数指标,本研究表明,所提出的方法的表现优于使用TF_IDF嵌入所有属性的常规方法。

Movies are a great source of entertainment. However, the problem arises when one is trying to find the desired content within this vast amount of data which is significantly increasing every year. Recommender systems can provide appropriate algorithms to solve this problem. The content_based technique has found popularity due to the lack of available user data in most cases. Content_based recommender systems are based on the similarity of items' demographic information; Term Frequency _ Inverse Document Frequency (TF_IDF) and Knowledge Graph Embedding (KGE) are two approaches used to vectorize data to calculate these similarities. In this paper, we propose a weighted content_based movie RS by combining TF_IDF which is an appropriate approach for embedding textual data such as plot/description, and KGE which is used to embed named entities such as the director's name. The weights between features are determined using a Genetic algorithm. Additionally, the Iranian movies dataset is created by scraping data from movie_related websites. This dataset and the structure of the FarsBase KG are used to create the MovieFarsBase KG which is a component in the implementation process of the proposed content_based RS. Using precision, recall, and F1 score metrics, this study shows that the proposed approach outperforms the conventional approach that uses TF_IDF for embedding all attributes.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源